csv snippets

Perl script that can be used to calculate min, max, mean, mode, median and standard deviation for a set of log records

Tagged csv, perl, min, max, mean, log, parser  Languages perl

The best thing about this script is that it's easy to customize, right now it's optimized for comma delimited data.

use strict;
use warnings;

# Import stdev, average, mean and other statistical functions
# A copy of http://search.cpan.org/~brianl/Statistics-Lite-3.2/Lite.pm
do('stats.pl');

my %page_runtimes;
my $delimitor = ';';
my @columns = ("page", "samples", "min", "max", "mean", "mode", "median", "stddev\n");
my $line;
my $first_timestamp, my $last_timestamp;

# ==========================================
# Parse log file
# ==========================================

#
# Don't use foreach as it reads the whole file into memory: foreach $line (<>) { 
#
while ($line=<>) {
  # remove the newline from $line, otherwise the report will be corrupted.
  chomp($line);

  my @columns               = split(';', $line);
  my $timestamp             = $columns[0];
  my $page_name             = $columns[1];
  my $page_runtime          = $columns[2];

  if(!defined($first_timestamp))
  {
    $first_timestamp = $timestamp;
  }

  # print what we find
  if(!defined(@{$page_runtimes{$page_name}}))
  {
    print "Found page '$page_name'\n";
  }
 
  # add page runtimes to one hash
  push(@{$page_runtimes{$page_name}}, $page_runtime);
 
  $last_timestamp = $timestamp;
}

# ==========================================
# Calculate and print page statistics
# ==========================================
open(PAGE_REPORT, ">report.csv") or die("Could not open report.csv.");

print PAGE_REPORT "First sample\n".$first_timestamp."\nLast sample\n".$last_timestamp."\n\n";
print PAGE_REPORT join($delimitor, @columns);

for my $page_name (keys %page_runtimes )
{
  my @runtimes = @{$page_runtimes{$page_name}};
 
  my $samples = @runtimes;
  my $min     = min(@runtimes);
  my $max     = max(@runtimes);
  my $mean    = mean(@runtimes);
  my $mode    = mode(@runtimes);
  my $median  = median(@runtimes);
  my $stddev  = stddev(@runtimes);
 
  my @data = ($page_name, $samples, $min, $max, $mean, $mode, $median, $stddev);
 
  my $line = join($delimitor, @data);
 
  # Use comma instead of decimal
  $line =~ s/\./\,/g;
 
  print PAGE_REPORT "$line\n";
}
close(PAGE_REPORT);

To use it simply pipe some data into it like this:

grep "2008-31-12" silly-data.log | perl analyze.pl

How to parse CSV data with Ruby

Tagged csv, parse, ruby, fastercsv, ccsv, csvscan, excelsior  Languages ruby

Ruby alternatives for parsing CSV files

  • Ruby String#split (slow)
  • Built-in CSV (ok, recommended)
  • ccsv (fast & recommended if you have control over CSV format)
  • CSVScan (fast & recommended if you have control over CSV format)
  • Excelsior (fast & recommended if you have control over CSV format)

CSV library benchmarks can be found here and here

Parsing with plain Ruby

filename = 'data.csv'
file = File.new(filename, 'r')

file.each_line("\n") do |row|
  columns = row.split(",")
  
  break if file.lineno > 10
end

This option has several problems...

Parsing with the built-in CSV library

require 'csv'

CSV.open('data.csv', 'r', ';') do |row|
  puts row
end
require 'csv'

CSV.foreach("changes.csv", quote_char: '"', col_sep: ';', row_sep: :auto, headers: true) do |row|
  puts row[0]
  puts row['xxx']
end

Parsing with the ccsv library

ccsv is hosted on GitHub.

require 'rubygems'
require 'ccsv'

Ccsv.foreach(file) do |values|
  puts values[0]
end

Parsing with the CSVScan library

CSVScan can be downloaded from here.

require "csvscan"

open("data.csv") do |io|
  CSVScan.scan(io)  do|row|
    puts row
  end
end

How to export a Google Doc spreadsheet to CSV

Tagged csv, json, google, spreadsheet  Languages 

It's no longer possible to export Google Spreadsheet's to CSV, so use JSON instead:

https://spreadsheets.google.com/feeds/list/<SPREADSHEET_KEY>/od6/public/values?alt=json

You can find the SPREADSHEET_KEY in the Google Doc's original URL.

You can also use the Google Drive API.