Perl script that can be used to calculate min, max, mean, mode, median and standard deviation for a set of log records
The best thing about this script is that it’s easy to customize, right now it’s optimized for comma delimited data.
1 use strict; 2 use warnings; 3 4 # Import stdev, average, mean and other statistical functions 5 # A copy of http://search.cpan.org/~brianl/Statistics-Lite-3.2/Lite.pm 6 do('stats.pl'); 7 8 my %page_runtimes; 9 my $delimitor = ';'; 10 my @columns = ("page", "samples", "min", "max", "mean", "mode", "median", "stddev\n"); 11 my $line; 12 my $first_timestamp, my $last_timestamp; 13 14 # ========================================== 15 # Parse log file 16 # ========================================== 17 foreach $line (<>) { 18 # remove the newline from $line, otherwise the report will be corrupted. 19 chomp($line); 20 21 my @columns = split(';', $line); 22 my $timestamp = $columns[0]; 23 my $page_name = $columns[1]; 24 my $page_runtime = $columns[2]; 25 26 if(!defined($first_timestamp)) 27 { 28 $first_timestamp = $timestamp; 29 } 30 31 # print what we find 32 if(!defined(@{$page_runtimes{$page_name}})) 33 { 34 print "Found page '$page_name'\n"; 35 } 36 37 # add page runtimes to one hash 38 push(@{$page_runtimes{$page_name}}, $page_runtime); 39 40 $last_timestamp = $timestamp; 41 } 42 43 # ========================================== 44 # Calculate and print page statistics 45 # ========================================== 46 open(PAGE_REPORT, ">report.csv") or die("Could not open report.csv."); 47 48 print PAGE_REPORT "First sample\n".$first_timestamp."\nLast sample\n".$last_timestamp."\n\n"; 49 print PAGE_REPORT join($delimitor, @columns); 50 51 for my $page_name (keys %page_runtimes ) 52 { 53 my @runtimes = @{$page_runtimes{$page_name}}; 54 55 my $samples = @runtimes; 56 my $min = min(@runtimes); 57 my $max = max(@runtimes); 58 my $mean = mean(@runtimes); 59 my $mode = mode(@runtimes); 60 my $median = median(@runtimes); 61 my $stddev = stddev(@runtimes); 62 63 my @data = ($page_name, $samples, $min, $max, $mean, $mode, $median, $stddev); 64 65 my $line = join($delimitor, @data); 66 67 # Use comma instead of decimal 68 $line =~ s/\./\,/g; 69 70 print PAGE_REPORT "$line\n"; 71 } 72 close(PAGE_REPORT);
To use it simply pipe some data into it like this:
1 grep "2008-31-12" silly-data.log | perl analyze.pl
How to generate a histogram with Perl
I couldn’t find a histogram library for Perl, so I had to write my own.
Save the following code in histogram.pl:
1 use POSIX qw(ceil floor); 2 3 # No bugs, please 4 use strict; 5 use warnings; 6 7 # Perl doesn't have round, so let's implement it 8 sub round 9 { 10 my($number) = shift; 11 return int($number + .5 * ($number <=> 0)); 12 } 13 14 sub histogram 15 { 16 my ($bin_width, @list) = @_; 17 18 # This calculates the frequencies for all available bins in the data set 19 my %histogram; 20 $histogram{ceil(($_ + 1) / $bin_width) -1}++ for @list; 21 22 my $max; 23 my $min; 24 25 # Calculate min and max 26 while ( my ($key, $value) = each(%histogram) ) 27 { 28 $max = $key if !defined($min) || $key > $max; 29 $min = $key if !defined($min) || $key < $min; 30 } 31 32 33 for (my $i = $min; $i <= $max; $i++) 34 { 35 my $bin = sprintf("% 10d", ($i) * $bin_width); 36 my $frequency = $histogram{$i} || 0; 37 38 $frequency = "#" x $frequency; 39 40 print $bin." ".$frequency."\n"; 41 } 42 43 print "===============================\n\n"; 44 print " Width: ".$bin_width."\n"; 45 print " Range: ".$min."-".$max."\n\n"; 46 }
To generate a histogram for a set of data include the histogram subroutine and pass the desired width of the bins to the routine and the dataset as an array:
1 do('histogram.pl'); 2 3 histogram(10, (1,2,3,4,5,10,11,12,20,21,30));
The output of the above example is:
1 0 ##### 2 10 ### 3 20 ## 4 30 # 5 6 =============================== 7 8 Width: 10 9 Range: 0-3
The generated histogram tells us that there are: 5 numbers between 0-9, 3 between 10-19, 2 between 20-29, 1 between 30-39
How to pipe input to a Perl script
Let’s say you want to pipe some input to a Perl script. First, you create this Perl script (pipe_me.pl):
1 while (<>) 2 { 3 print $_; 4 }
Then you call the script like this:
1 less access.log | perl pipe_me.pl
The script outputs the contents of access.log. To do some real work extend it with your own code—you might want to, for example, analyze an Apache access log.
You can also read the input line by line like this:
1 foreach $line (<>) 2 { 3 print $line; 4 }