22/03/2006

shell idiom

this is a little bit of unix shell technique that i haven't seen mentioned much. there are some really good lists of perl one-liners floating around, but there's also a lot you can do in the shell alone. this particular command is used to solve the common problem of finding all files containing a particular regular expression, and displaying them, along with the matching lines.

it uses find to get a list of files that match some criteria and then looks for the regular expression using grep. the intuitive solution, piping the file contents, or passing the file as an argument, to grep regexp won't work, because grep just outputs the matching lines, and we won't know which file they came from.

one solution would be to use xargs which accepts paramaters on stdin and executes a command with each line of input as an argument. this will run into shell command length limitations, although xargs is a handy tool for many tasks. my preferred one-line command is this one, however:

find path -type f -exec grep "regexp" {} /dev/null \;

which uses the fact that although /dev/null will never contain your pattern, since grep is looking at multiple files it will print the names of files that contain a match, at the start of each line, for example, as shown below:

$ find ~/public_html/ -type f -exec grep "^<title" {} /dev/null \;
~/public_html/index.htm: <title>index page</title>
~/public_html/test.htm: <head><title>testing</title></head>
Binary file ~/public_html/scripts/statcgi matches

3 comments:

Anonymous said...

I love the find command too!

To force filenames being output, I'd personally add the -H switch to grep which is more obvious (rather than adding /dev/null as an argument).
Like so:


find path -type f -exec grep -H "regexp" /dev/null \;


The 'tr' command is quite cute sometimes.
Example:


cat someFile.txt | tr -d '\012' | tr -d '\015'


... to output a file without any carriage returns in it.

Or, to convert a windows format ascii file into *nix format:


cat someFile.txt | tr -d '\015'

Anonymous said...

Oops, sorry, that last comment was me. And my 'find' command snippet should have read:

find path -type f -exec grep -H "regexp" \;

(i.e. no /dev/null in it)

grkvlt said...

yes, but remember that the -H option is not always available. it isn't part of the XPG4 or POSIX grep standards, and for instance solaris 10 doesn't support it. when writing shell scripts, portability is usually a big requirement and you cannot be sure the latest gnu textutils will have been installed. that's why /bin/sh should be the interpreter for your scripts, in case bash isn't there, and so on...

btw, you can do that no carriage return trick like this:

cat file.txt | tr -d '\012\015'

since tr takes a range of characters to delete or translate, so to change everything to lowercase, do:

cat upper.txt | tr '[A-Z]' '[a-z]'

and there's also the rot13 one-liner, which i'm sure you can work out for yourself...