jb… a weblog by Jonathan Buys

Parsing iostat Results

In the course of load testing a new system, we gathered the output from iostat from a group of servers. In addition to parsing through the device statistics, we thought it would be handy to graph the CPU stats as well. We set iostat to run every five seconds and captured the output in a text file, one per server. This gave me a sizable pool of data, but with everything I needed on separate lines.

Time: 18:00:01
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.79    0.03    0.48    0.06    0.00   98.64

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               2.22         0.20        38.52    1948442  373500872
sda1              0.00         0.00         0.00      16224       1848
sda2              2.22         0.20        38.52    1931938  373499024

I gathered all the text files into a single directory, and ran a loop in zsh to create a csv file, ready for easy manipulation in any number of programs.

for each in `ls`
egrep -A 1 '(Time)|(avg-cpu)' $each |sed s/Time\:\ //g | grep -v '\-\-\|avg-cpu' | paste -s -d '\ \n' - - | sed 's/\( \)\{1,\}/,/g' > $each.csv

I always start building long lines like this one section at a time. Each section is separated by a pipe, (|), so the first section in the loop calls egrep.

egrep -A 1 '(Time)|(avg-cpu)' $each

This command alone give us output that looks like this:

Time: 00:03:39
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.20    0.00    0.10    0.00    0.00   99.70
--
Time: 00:03:44
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.10    0.00    0.20    0.00    0.00   99.70

The “-A” flag on the egrep command tells egrep to get both the matching line in the text and the line directly below it. The section between the single tics, searches for either “Time” or “avg-cpu”. This gives me the time and the CPU statistics I’m interested in. Next, I pipe this output to the next section, a sed command that strips out the “Time” string, so our command now looks like:

egrep -A 1 '(Time)|(avg-cpu)' iostat.ba-rec1  |sed s/Time\:\ //g 

And gives us output like:

00:04:14
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.10    0.00    0.00   99.90
--
00:04:19
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.40    0.00    0.30    0.00    0.00   99.30

The next section uses grep with the “-v” flag, which tells grep to reverse it’s normal behavior and only return the strings that do not match the search text.

egrep -A 1 '(Time)|(avg-cpu)' iostat.ba-rec1  |sed s/Time\:\ //g | grep -v '\-\-\|avg-cpu'

The grep command is looking for either “avg-cpu” or two dashes and removing both lines, giving us:

00:04:14
           0.00    0.00    0.10    0.00    0.00   99.90
00:04:19
           0.40    0.00    0.30    0.00    0.00   99.30

This is looking much better, but the lines of text are offset from what I need them to be. A post I read recently by Dr. Drang reminded me of paste, which is perfect for this job:

egrep -A 1 '(Time)|(avg-cpu)' iostat.ba-rec1  |sed s/Time\:\ //g | grep -v '\-\-\|avg-cpu' | paste -s -d '\ \n' - - 

Which puts everything on the same line, nice and clean:

00:04:14            0.00    0.00    0.10    0.00    0.00   99.90
00:04:19            0.40    0.00    0.30    0.00    0.00   99.30

Only thing left now is to clean up the excess spaces a bit and separate each column by commas, another job for sed, which is only slightly modified from this post on StackOverflow, which leads us to the final command:

egrep -A 1 '(Time)|(avg-cpu)' iostat.ba-rec1  |sed s/Time\:\ //g | grep -v '\-\-\|avg-cpu' | paste -s -d '\ \n' - - | sed 's/\( \)\{1,\}/,/g'

Which I redirect the output of to a text file, one per server, just like the input from the loop. This, finally, gives us the csv output we were looking for:

00:04:14,0.00,0.00,0.10,0.00,0.00,99.90
00:04:19,0.40,0.00,0.30,0.00,0.00,99.30

Now it is ready.

linux sysadmin statistics