Register now and start sharing your code snippets.

Perl script that can be used to calculate min, max, mean, mode, median and standard deviation for a set of log records

Perl posted 5 days ago by christian

The best thing about this script is that it’s easy to customize, right now it’s optimized for comma delimited data.

   1  use strict;
   2  use warnings;
   3  
   4  # Import stdev, average, mean and other statistical functions
   5  # A copy of http://search.cpan.org/~brianl/Statistics-Lite-3.2/Lite.pm
   6  do('stats.pl');
   7  
   8  my %page_runtimes;
   9  my $delimitor = ';';
  10  my @columns = ("page", "samples", "min", "max", "mean", "mode", "median", "stddev\n");
  11  my $line;
  12  my $first_timestamp, my $last_timestamp;
  13  
  14  # ==========================================
  15  # Parse log file
  16  # ==========================================
  17  foreach $line (<>) {
  18    # remove the newline from $line, otherwise the report will be corrupted.
  19    chomp($line);
  20  
  21    my @columns               = split(';', $line);
  22    my $timestamp             = $columns[0];
  23    my $page_name             = $columns[1];
  24    my $page_runtime          = $columns[2];
  25  
  26    if(!defined($first_timestamp))
  27    {
  28      $first_timestamp = $timestamp;
  29    }
  30  
  31    # print what we find
  32    if(!defined(@{$page_runtimes{$page_name}}))
  33    {
  34      print "Found page '$page_name'\n";
  35    }
  36   
  37    # add page runtimes to one hash
  38    push(@{$page_runtimes{$page_name}}, $page_runtime);
  39   
  40    $last_timestamp = $timestamp;
  41  }
  42  
  43  # ==========================================
  44  # Calculate and print page statistics
  45  # ==========================================
  46  open(PAGE_REPORT, ">report.csv") or die("Could not open report.csv.");
  47  
  48  print PAGE_REPORT "First sample\n".$first_timestamp."\nLast sample\n".$last_timestamp."\n\n";
  49  print PAGE_REPORT join($delimitor, @columns);
  50  
  51  for my $page_name (keys %page_runtimes )
  52  {
  53    my @runtimes = @{$page_runtimes{$page_name}};
  54   
  55    my $samples = @runtimes;
  56    my $min     = min(@runtimes);
  57    my $max     = max(@runtimes);
  58    my $mean    = mean(@runtimes);
  59    my $mode    = mode(@runtimes);
  60    my $median  = median(@runtimes);
  61    my $stddev  = stddev(@runtimes);
  62   
  63    my @data = ($page_name, $samples, $min, $max, $mean, $mode, $median, $stddev);
  64   
  65    my $line = join($delimitor, @data);
  66   
  67    # Use comma instead of decimal
  68    $line =~ s/\./\,/g;
  69   
  70    print PAGE_REPORT "$line\n";
  71  }
  72  close(PAGE_REPORT);

To use it simply pipe some data into it like this:

   1  grep "2008-31-12" silly-data.log | perl analyze.pl

Tagged csv, perl, min, max, mean, log, parser

How to generate a histogram with Perl

Perl posted about 1 year ago by christian

I couldn’t find a histogram library for Perl, so I had to write my own.

Save the following code in histogram.pl:

   1  use POSIX qw(ceil floor);
   2  
   3  # No bugs, please
   4  use strict;
   5  use warnings;
   6  
   7  # Perl doesn't have round, so let's implement it
   8  sub round
   9  {
  10      my($number) = shift;
  11      return int($number + .5 * ($number <=> 0));
  12  }
  13  
  14  sub histogram
  15  {
  16    my ($bin_width, @list) = @_;
  17  
  18    # This calculates the frequencies for all available bins in the data set
  19    my %histogram;
  20    $histogram{ceil(($_ + 1) / $bin_width) -1}++ for @list;
  21  
  22    my $max;
  23    my $min;
  24  
  25    # Calculate min and max
  26    while ( my ($key, $value) = each(%histogram) )
  27    {
  28      $max = $key if !defined($min) || $key > $max;
  29      $min = $key if !defined($min) || $key < $min;
  30    }
  31  
  32  
  33    for (my $i = $min; $i <= $max; $i++)
  34    {
  35      my $bin       = sprintf("% 10d", ($i) * $bin_width);
  36      my $frequency = $histogram{$i} || 0;
  37  
  38      $frequency = "#" x $frequency;
  39  
  40      print $bin." ".$frequency."\n";
  41    }
  42  
  43    print "===============================\n\n";
  44    print "    Width: ".$bin_width."\n";
  45    print "    Range: ".$min."-".$max."\n\n";
  46  }

To generate a histogram for a set of data include the histogram subroutine and pass the desired width of the bins to the routine and the dataset as an array:

   1  do('histogram.pl');
   2  
   3  histogram(10, (1,2,3,4,5,10,11,12,20,21,30));

The output of the above example is:

   1  0  #####
   2  10 ###
   3  20 ##
   4  30 #
   5  
   6  ===============================
   7  
   8  Width: 10
   9  Range: 0-3

The generated histogram tells us that there are: 5 numbers between 0-9, 3 between 10-19, 2 between 20-29, 1 between 30-39

Tagged histogram, perl

How to pipe input to a Perl script

Perl posted about 1 year ago by christian

Let’s say you want to pipe some input to a Perl script. First, you create this Perl script (pipe_me.pl):

   1  while (<>) 
   2  {
   3    print $_;
   4  }

Then you call the script like this:

   1  less access.log | perl pipe_me.pl

The script outputs the contents of access.log. To do some real work extend it with your own code—you might want to, for example, analyze an Apache access log.

You can also read the input line by line like this:

   1  foreach $line (<>) 
   2  {
   3    print $line;
   4  }

Tagged pipe, perl