perl

How to generate a histogram with Perl

Tagged histogram, perl  Languages perl

I couldn't find a histogram library for Perl, so I had to write my own.

Save the following code in histogram.pl:

use POSIX qw(ceil floor);

# No bugs, please
use strict;
use warnings;

# Perl doesn't have round, so let's implement it
sub round
{
    my($number) = shift;
    return int($number + .5 * ($number <=> 0));
}

sub histogram
{
  my ($bin_width, @list) = @_;

  # This calculates the frequencies for all available bins in the data set
  my %histogram;
  $histogram{ceil(($_ + 1) / $bin_width) -1}++ for @list;

  my $max;
  my $min;

  # Calculate min and max
  while ( my ($key, $value) = each(%histogram) )
  {
    $max = $key if !defined($min) || $key > $max;
    $min = $key if !defined($min) || $key < $min;
  }


  for (my $i = $min; $i <= $max; $i++)
  {
    my $bin       = sprintf("% 10d", ($i) * $bin_width);
    my $frequency = $histogram{$i} || 0;

    $frequency = "#" x $frequency;

    print $bin." ".$frequency."\n";
  }

  print "===============================\n\n";
  print "    Width: ".$bin_width."\n";
  print "    Range: ".$min."-".$max."\n\n";
}

To generate a histogram for a set of data include the histogram subroutine and pass the desired width of the bins to the routine and the dataset as an array:

do('histogram.pl');

histogram(10, (1,2,3,4,5,10,11,12,20,21,30));

The output of the above example is:

0  #####
10 ###
20 ##
30 #

===============================

Width: 10
Range: 0-3

The generated histogram tells us that there are: 5 numbers between 0-9, 3 between 10-19, 2 between 20-29, 1 between 30-39

Perl script that can be used to calculate min, max, mean, mode, median and standard deviation for a set of log records

Tagged csv, perl, min, max, mean, log, parser  Languages perl

The best thing about this script is that it's easy to customize, right now it's optimized for comma delimited data.

use strict;
use warnings;

# Import stdev, average, mean and other statistical functions
# A copy of http://search.cpan.org/~brianl/Statistics-Lite-3.2/Lite.pm
do('stats.pl');

my %page_runtimes;
my $delimitor = ';';
my @columns = ("page", "samples", "min", "max", "mean", "mode", "median", "stddev\n");
my $line;
my $first_timestamp, my $last_timestamp;

# ==========================================
# Parse log file
# ==========================================

#
# Don't use foreach as it reads the whole file into memory: foreach $line (<>) { 
#
while ($line=<>) {
  # remove the newline from $line, otherwise the report will be corrupted.
  chomp($line);

  my @columns               = split(';', $line);
  my $timestamp             = $columns[0];
  my $page_name             = $columns[1];
  my $page_runtime          = $columns[2];

  if(!defined($first_timestamp))
  {
    $first_timestamp = $timestamp;
  }

  # print what we find
  if(!defined(@{$page_runtimes{$page_name}}))
  {
    print "Found page '$page_name'\n";
  }
 
  # add page runtimes to one hash
  push(@{$page_runtimes{$page_name}}, $page_runtime);
 
  $last_timestamp = $timestamp;
}

# ==========================================
# Calculate and print page statistics
# ==========================================
open(PAGE_REPORT, ">report.csv") or die("Could not open report.csv.");

print PAGE_REPORT "First sample\n".$first_timestamp."\nLast sample\n".$last_timestamp."\n\n";
print PAGE_REPORT join($delimitor, @columns);

for my $page_name (keys %page_runtimes )
{
  my @runtimes = @{$page_runtimes{$page_name}};
 
  my $samples = @runtimes;
  my $min     = min(@runtimes);
  my $max     = max(@runtimes);
  my $mean    = mean(@runtimes);
  my $mode    = mode(@runtimes);
  my $median  = median(@runtimes);
  my $stddev  = stddev(@runtimes);
 
  my @data = ($page_name, $samples, $min, $max, $mean, $mode, $median, $stddev);
 
  my $line = join($delimitor, @data);
 
  # Use comma instead of decimal
  $line =~ s/\./\,/g;
 
  print PAGE_REPORT "$line\n";
}
close(PAGE_REPORT);

To use it simply pipe some data into it like this:

grep "2008-31-12" silly-data.log | perl analyze.pl

How to use the Perl DBI module

Tagged perl, dbi, mysql, example  Languages perl

Basic usage

use strict;
use DBI;
use DBD::mysql;

my $host = 'localhost';
my $database = 'xxx';
my $user = 'xxx';
my $password = '';

my $dsn = "dbi:mysql:$database:$host:3306";
my $db = DBI->connect($dsn, $user, $password);
 
my $sql = q(
  INSERT INTO 
    what (name, instructions) 
  VALUES (?, ?)
);

my $p = $db->prepare($sql);

my $result = $p->execute($name, $instructions);

print $result;

my $id = $dbh->{'mysql_insertid'};

One-liner for selecting one row

my $c = 'Horse';
my ($id, $instructions) = $db->selectrow_array("select id, instructions from categories where name = ?", undef, $c);

How to pipe input to a Perl script

Tagged pipe, perl  Languages perl

Let's say you want to pipe some input to a Perl script. First, you create this Perl script (pipe_me.pl):

while (<>) 
{
  print $_;
}

Then you call the script like this:

less access.log | perl pipe_me.pl

The script outputs the contents of access.log. To do some real work extend it with your own code--you might want to, for example, analyze an Apache access log.

You can also read the input line by line like this:

foreach $line (<>) 
{
  print $line;
}