How to Use `sed` and `awk` for Text Processing on Arch Linux

How to Use sed and awk for Text Processing on Arch Linux

Text processing is a fundamental part of Linux system administration, scripting, and data analysis. Two of the most powerful tools for handling text streams and files are sed (stream editor) and awk (pattern scanning and processing language). On Arch Linux, these utilities are available by default in the base system, and mastering them can significantly boost your productivity.

In this article, we’ll explore how to effectively use sed and awk for various text processing tasks on Arch Linux. Whether you’re editing configuration files, extracting data, or automating system reports, these tools will be your best allies.


Introduction to sed and awk

What is sed?

sed stands for Stream EDitor. It reads input line by line, applies the specified operation(s), and outputs the result. It’s great for simple substitutions, deletions, insertions, and complex multi-line edits.

What is awk?

awk is both a command-line utility and a programming language designed for pattern scanning and text processing. It allows you to filter and format text using conditions and expressions, making it ideal for parsing structured data like CSV or logs.


Installing sed and awk on Arch Linux

In most cases, both tools are already available in a fresh Arch Linux installation:

sed --version
awk --version
  • sed is part of the coreutils package.
  • awk is usually implemented by gawk (GNU Awk), which is in the base system.

If for some reason they’re missing, you can install them with:

sudo pacman -S gawk
sudo pacman -S sed

Basic sed Usage

Syntax

sed [options] 'script' file

Common Examples

1. Replacing text

To replace “foo” with “bar” in a file:

sed 's/foo/bar/' file.txt

To replace all occurrences in each line:

sed 's/foo/bar/g' file.txt

2. In-place editing

To modify the file directly:

sed -i 's/foo/bar/g' file.txt

You can create a backup with:

sed -i.bak 's/foo/bar/g' file.txt

3. Deleting lines

Delete line 5:

sed '5d' file.txt

Delete lines matching a pattern:

sed '/^#/d' file.txt   # Remove comments

4. Print specific lines

Print only line 3:

sed -n '3p' file.txt

Print lines 2 to 4:

sed -n '2,4p' file.txt

Basic awk Usage

Syntax

awk 'pattern { action }' file

Default Behavior

By default, awk splits each line into fields based on whitespace and lets you reference them using $1, $2, etc.

1. Print specific columns

Print the first column:

awk '{ print $1 }' file.txt

Print first and third columns:

awk '{ print $1, $3 }' file.txt

2. Use a custom delimiter

For comma-separated values:

awk -F',' '{ print $1, $2 }' data.csv

3. Filter with conditions

Print lines where the second column is greater than 100:

awk '$2 > 100' data.txt

Print lines where the first column matches “john”:

awk '$1 == "john"' data.txt

4. Begin and End Blocks

awk 'BEGIN { print "Start" } { print $0 } END { print "End" }' file.txt

Real-World Examples

Example 1: Extracting IP addresses from logs

awk '{ print $1 }' /var/log/nginx/access.log | sort | uniq -c | sort -nr

This command:

  • Extracts the first field (IP address)
  • Counts unique entries
  • Sorts them in reverse numerical order

Example 2: Mass renaming using sed

If you have files like image01.jpg, image02.jpg, …, and want to rename them to pic01.jpg, etc.:

for f in image*.jpg; do
  mv "$f" "$(echo "$f" | sed 's/image/pic/')"
done

Example 3: Summing values with awk

Assume data.txt contains:

Item1  25
Item2  40
Item3  35

To calculate the total:

awk '{ sum += $2 } END { print "Total:", sum }' data.txt

Example 4: Find and replace in multiple files

find . -type f -name "*.conf" -exec sed -i 's/localhost/127.0.0.1/g' {} +

This finds all .conf files and replaces localhost with 127.0.0.1.


Combining sed and awk

Both tools can complement each other. For example:

cat data.txt | sed 's/foo/bar/' | awk '{ print $1, $3 }'

Or in a more efficient way, without cat:

sed 's/foo/bar/' data.txt | awk '{ print $1, $3 }'

Tips and Best Practices

1. Test before using -i

Always test your sed command before using the -i (in-place) option to avoid accidental data loss.

2. Use comments in awk scripts

When writing complex awk scripts, use comments and line breaks for clarity:

awk '
# Print rows where column 2 is > 100
$2 > 100 {
  print $1, $2
}
' file.txt

3. Use awk over cut or grep for complex tasks

While tools like cut, grep, and head are great for simple jobs, awk shines when you need conditional logic, math, or formatting.


Creating Reusable awk and sed Scripts

awk Script File

You can write an awk script in a file, e.g., script.awk:

BEGIN { FS=":"; OFS=" | " }
$3 > 1000 { print $1, $3 }

Run it with:

awk -f script.awk /etc/passwd

sed Script File

You can also save multiple sed commands in a file:

s/foo/bar/g
s/baz/qux/g

Run it with:

sed -f script.sed file.txt

Advanced Examples

Replace only on specific lines using sed

Replace apple with orange only on line 2:

sed '2s/apple/orange/' file.txt
awk '{ sum += $2; count++ } END { print "Average:", sum/count }' data.txt

Update /etc/hosts programmatically

Add an entry if it doesn’t exist:

grep -q 'example.com' /etc/hosts || echo '127.0.0.1 example.com' | sudo tee -a /etc/hosts

You could also use sed to modify an existing entry:

sudo sed -i '/example.com/ s/127.0.0.1/127.0.1.1/' /etc/hosts

Conclusion

Both sed and awk are indispensable tools for any Linux user, especially those managing systems or automating tasks. On Arch Linux, they are lightweight, fast, and available by default, making them perfect for quick fixes, data transformation, and script-based automation.

By learning to use sed for quick text substitution and editing, and awk for powerful data extraction and processing, you can handle nearly any text manipulation task from the command line.

Don’t be afraid to experiment — build small commands, chain them together, and start crafting your own command-line magic.


Further Reading:

  • man sed
  • man awk (or man gawk)
  • GNU awk manual
  • Arch Wiki pages on scripting and shell tools