lunes, 16 de febrero de 2009

Bash Tricks I: (very) Repetitive tasks


I'll create a series (don't even know the number of items in the series) where I share with my loyal readers (in mathematical terms, that's an empty set) some handy tricks I've found when working with bash. Probably some of the tricks won't be the most efficient way to carry something out... but I can attest that, at the very least, they do work.

So, here we go.

Repetitive tasks
Sometimes, you might want to repeat a task a number of times. For example, right now I want to find out which one is faster on PHP: Using variables or defining constants

I have two script where I define a constant/variable (depending on the script) and I write its value to stdout. Let's say I want to run the variable.php script 1000 times. What I do is:

i=0; while [ $i -lt 1000 ]; do php variable.php > /dev/null; i=$(( $i + 1 )); done

But what does all that stuff mean? Let's decompose it:
i=0 We are creating a variable called i with the initial value of 0. Tip: When declaring the variable, don't use the preceding $ and don't use spaces between the variable name and the = sign.
while [ $i -lt 1000 ]; do This is fairly common talk to a programmer. We are telling bash to repeat the following commands (until it finds the closing done). while will test if the conditional betwen the []s is true to make another cycle. $i -lt 1000 There you are comparing the value in i with 1000. -lt means less than. You have more operators available (more than, equal, less or equal, more or equal and so on. Check the man page of test to know the kinds of things you can place as the conditional).
php variable.php > /dev/null We are executing the script I created and sending its output to /dev/null so that I don't get to see it (I couldn't care less about it, as I already know what will show up).
i=$(( $i + 1)) Here we increment the value of the variable i. $(( )) is a bash construct to do mathematic evaluation. As in the assignment of the variable i to 0, remember not to leave spaces between the variable and the = and also to skip the $.
done We are telling bash to close the while.

Now, let's time the execution of both scripts (variable vs constant):

echo Variable; time ( i=0; while [ $i -lt 1000 ]; do php variable.php > /dev/null; i=$(( $i + 1)); done ); echo Constant; time ( i=0; while [ $i -lt 1000 ]; do php constant.php > /dev/null; i=$(( $i + 1 )); done )

real 0m40.515s
user 0m22.217s
sys 0m13.109s

real 0m39.409s
user 0m22.557s
sys 0m13.277s

As you can see, it's almost the same (40.515 vs 39.409). I will do some more PHP tests that will lead to a spin off of this article... but that will arrive tomorrow and it's not related to bash, so let's go on with another trick.

Another kind of repetitive task you could find yourself doing (specially when programming) is replacing one string pattern for another... and the substitution could span various files.

Say that you need to change the string "mysql_" for "mydb_" (if you are thinking that I did it to change some mysql calls to agnostic calls on a php project, let me say that you might be right). Now, any IDE worth its salary would do it on a fly, but that doesn't mean that we can't do it with bash. I know that sed can change patterns on the fly, so how can we do that on various files? First, let's see how many times the pattern shows up in the files in this directory:
find ./ -type f -exec grep -Hni mysql_ {} ';' | wc -l

Now, let's run the substitution command:

find ./ -type f | while read filename; do sed 's/mysql_/mydb_/' $filename > tmp.php; mv tmp.php $filename; done

What did we tell bash to do there? Let's decompose it again:
find ./ -type f We are asking find to find normal files for us (so that we don't get the ./ directory in the listing of files to work on).
Then we have a pipe that connects the stdout of find with the rest of the command.
while read filename; do Instead of doing a test evaluation, we are asking while to go on iterating until it can't read anymore from its standard input. read will read a line from its standard input (in other words, the filename that comes from find one at a time) in the variable filename.
sed 's/mysql_/mydb_/' filename > tmp.php Here's a tricky thing. We can't ask sed to edit the file just like a normal editor and save the changed file. What we do instead is to use it as a filter reading from that file (using the variable as the file name) and write its output to a temporary file (with a fixed name).
mv tmp.php $filename Here we overwrite the original file with the modified file.
And that's the end of the trick. Let's see if we have left any string out:

find ./ -exec grep -Hni mysql_ {} ';' | wc -l

Oops! Seems like we made a mistake.
As a matter of fact, we didn't make a mistake. It's just that sed will only change the pattern once per line. And there was a line where the pattern was twice, so we could just go to that file and change it by hand, or we could just simply run once again the oneliner to make the change for us.

Well that's it for the first article in the series. I hope you find it useful. I won't tell you when the second article will be out as I currently don't have the slightest clue of what I will be writing about in it.... but I know there will be more... so stay in touch!

4 comentarios:

  1. Consider using the "seq" command as a shortcut - this eliminates the dependency on PHP.

    for a in `seq 10`; do echo hey $a; done

  2. replace

    do sed 's/mysql_/mydb_/' $filename > tmp.php


    do sed 's/mysql_/mydb_/g' $filename > tmp.php

    and replacements will be globally applied

  3. Oh, Ian, thanks so much for that detail.

    And Shantanu, thanks for your reply. I didn't know about "seq" (or did I? :-))... and it is certainly helpful. I'll keep it at hand for other situations. However, I used PHP as I was testing the performance of variables vs constants there. It's not because I didn't know I can do an echo of a bash variable. :-) Thanks anyway.

  4. Yet another way to do it without all the looping:
    find . -type f | xargs sed -i 's/mysql_/mydb_/g'
    And remember: "xargs" is your friend ;-)