About sed
What is sed?
— WikiWikiWeb: Sed Language
- sed - a Stream EDitor
- sed is a UNIX utility which processes a text file one line at a time. It has RegularExpression based string manipulation, a hold buffer, and some basic flow control. Amazing things can be done with these BearSkinsAndStoneKnives.
Yes, I guess sed
is a stone knife or bear-skin, in that it's one of those ancient1 Unix utilities with a reputation for being powerful in the right context, but a bit difficult to wield. Whether or not the reputation is justified, it's good to know a little about sed
because it's everywhere in Unix/Linux, and it proves useful in many situations.
My main use case for sed
is when I find myself thinking, "Gee, I just need to run a regular expression over this file or bunch of text."
In addition to replacing, it can delete certain text or lines, or insert blank lines. The way sed
changes files is called non-interactive editing: all change instructions are defined up front, and then applies them to the input, line by line.
This makes it suited to be part of shell scripts or other automated workflows, and handy for one-time changes such as data cleanup.
How to learn sed
Check out Sed Examples by Sasikala for a nice overview of sed
features and uses. Each of the posts contains examples organized by feature and command, which makes it easy to find something specific to the task at hand.
There's a Digitalocean tutorial: The Basics of Using the Sed Stream Editor to Manipulate Text in Linux. However, it's my opinion that some of the best sed
info is found on websites of a certain vintage:
- Sed - An Introduction and Tutorial by Bruce Barnett for plenty more instructive examples.
- the sed $HOME is a cornucopia of examples and one-liners, but also has lots of awesome
sed
scripts, including tic-tac-toe and other games! man sed
as well -- it's a goodman
page,sed
.
Example usage
Here's a simple example for replacing a certain string value in some data. Given a text file:
> cat trends.txt
old and busted fashions
old and busted hats
music that is old and busted
all old and busted everything
Perform the substitution on all lines of the original file with command s
, and pipe the output to a new file:
> sed 's/old and busted/new hotness/' trends.txt > new-trends.txt
The new file looks exactly like the original file, but with our phrase replaced:
> cat new-trends.txt
new hotness fashions
new hotness hats
music that is new hotness
all new hotness everything
Case study
I encountered some wild Pokemon data that needed a bit of cleanup:
{
"id": "040",
"name": "Wigglytuff",
"img": "http://img.pokemondb.net/artwork/wigglytuff.jpg",
"type": ["Normal"],
"stats": {
"hp": "140",
"attack": "70",
"defense": 45,
"spattack": "75",
"spdefense": "50",
"speed": 45
},
"moves": {
// ...
}
// ...
}
As the sample shows, the property values in stats
are formatted as strings, and some are numbers. It's not only defense
and speed
, but all of the stats are formatted inconsistently throughout the file, which makes the data less easy to use. It's possible the database or import script could coerce the values for us, but let's make changes to the source file itself so that the data will be consistent no matter how we use it.
Here is the incantation to sed
:
> sed -E '/[[:space:]]*("id"|"height")/!s/"([[:digit:]]+\.*[[:digit:]]*)"/\1/'
- The RegEx part before
!
tellssed
to ignore a line if it has either" id"
or" height"
[where there is a space character before the attribute] - This is because
id
s andheight
s have numbers in their values, but the nature of the data indicates that they should remain formatted as strings. - The substitution command
s/"( ...symbols... )"/\1/
does this: - Match on patterns that look like numbers inside quotation marks
- Group the part inside the quotation marks with parentheses
- Replace with the matched group
- Examples:
"7.25"
becomes7.25
,"10"
becomes10
- The flag
-E
is for modern/extended RegEx format
In practice, the full command would also indicate the original and output filenames, as seen in the simple replace example earlier in this post. The end result is that all number values are represented as Numbers, not Strings, except for those properties where string representation is appropriate for number-like values.
Summary
Bottom line: it's good to know about sed, what it does, and how it can be applied to everyday problems. It's a widely available utility, worth keeping in your Unix toolbox, even if it appears at first to have all the user-friendliness of an old flint blade.
Links
- WikiWikiWeb: Sed Language
- Sed Examples by Sasikala
- The Basics of Using the Sed Stream Editor to Manipulate Text in Linux
- Sed - An Introduction and Tutorial by Bruce Barnett
- the sed $HOME
-
sed
is older than I am, so that's prehistoric from my perspective. If you originated in the 80s, then it predates you, too:"sed" stands for Stream EDitor. Sed is a non-interactive editor, written by the late Lee E. McMahon in 1973 or 1974. A brief history of sed's origins may be found in an early history of the Unix tools, at http://www.columbia.edu/~rh120/ch106.x09.
Go back to the homepage