csv snippets

Perl script that can be used to calculate min, max, mean, mode, median and standard deviation for a set of log records

Tagged csv, perl, min, max, mean, log, parser  Languages perl

The best thing about this script is that it's easy to customize, right now it's optimized for comma delimited data.

use strict;
use warnings;

# Import stdev, average, mean and other statistical functions
# A copy of http://search.cpan.org/~brianl/Statistics-Lite-3.2/Lite.pm

my %page_runtimes;
my $delimitor = ';';
my @columns = ("page", "samples", "min", "max", "mean", "mode", "median", "stddev\n");
my $line;
my $first_timestamp, my $last_timestamp;

# ==========================================
# Parse log file
# ==========================================

# Don't use foreach as it reads the whole file into memory: foreach $line (<>) { 
while ($line=<>) {
  # remove the newline from $line, otherwise the report will be corrupted.

  my @columns               = split(';', $line);
  my $timestamp             = $columns[0];
  my $page_name             = $columns[1];
  my $page_runtime          = $columns[2];

    $first_timestamp = $timestamp;

  # print what we find
    print "Found page '$page_name'\n";
  # add page runtimes to one hash
  push(@{$page_runtimes{$page_name}}, $page_runtime);
  $last_timestamp = $timestamp;

# ==========================================
# Calculate and print page statistics
# ==========================================
open(PAGE_REPORT, ">report.csv") or die("Could not open report.csv.");

print PAGE_REPORT "First sample\n".$first_timestamp."\nLast sample\n".$last_timestamp."\n\n";
print PAGE_REPORT join($delimitor, @columns);

for my $page_name (keys %page_runtimes )
  my @runtimes = @{$page_runtimes{$page_name}};
  my $samples = @runtimes;
  my $min     = min(@runtimes);
  my $max     = max(@runtimes);
  my $mean    = mean(@runtimes);
  my $mode    = mode(@runtimes);
  my $median  = median(@runtimes);
  my $stddev  = stddev(@runtimes);
  my @data = ($page_name, $samples, $min, $max, $mean, $mode, $median, $stddev);
  my $line = join($delimitor, @data);
  # Use comma instead of decimal
  $line =~ s/\./\,/g;
  print PAGE_REPORT "$line\n";

To use it simply pipe some data into it like this:

grep "2008-31-12" silly-data.log | perl analyze.pl

How to parse CSV data with Ruby

Tagged csv, parse, ruby, fastercsv, ccsv, csvscan, excelsior  Languages ruby

Ruby alternatives for parsing CSV files

  • Ruby String#split (slow)
  • Built-in CSV (ok, recommended)
  • ccsv (fast & recommended if you have control over CSV format)
  • CSVScan (fast & recommended if you have control over CSV format)
  • Excelsior (fast & recommended if you have control over CSV format)

CSV library benchmarks can be found here and here

Parsing with plain Ruby

filename = 'data.csv'
file = File.new(filename, 'r')

file.each_line("\n") do |row|
  columns = row.split(",")
  break if file.lineno > 10

This option has several problems...

Parsing with the built-in CSV library

require 'csv'

CSV.open('data.csv', 'r', ';') do |row|
  puts row
require 'csv'

CSV.foreach("changes.csv", quote_char: '"', col_sep: ';', row_sep: :auto, headers: true) do |row|
  puts row[0]
  puts row['xxx']

Parsing with the ccsv library

ccsv is hosted on GitHub.

require 'rubygems'
require 'ccsv'

Ccsv.foreach(file) do |values|
  puts values[0]

Parsing with the CSVScan library

CSVScan can be downloaded from here.

require "csvscan"

open("data.csv") do |io|
  CSVScan.scan(io)  do|row|
    puts row

How to fix "Illegal quoting in line x" when parsing CSV with Ruby

Tagged ruby, csv  Languages ruby

To fix the illegal quoting issue, use a quote_char that is not in your data:

CSV.open('data.csv', quote_char: "\x00", col_sep: "\t").each do |s|

Use double quotes around \x00, otherwise you’ll get:

:quote_char has to be a single character String

You can also try the “liberal_quoting” option available in Ruby 2.4, and specifying the correct quote_char…

How to export a Google Doc spreadsheet to CSV

Tagged csv, json, google, spreadsheet  Languages 

It's no longer possible to export Google Spreadsheet's to CSV, so use JSON instead:


You can find the SPREADSHEET_KEY in the Google Doc's original URL.

You can also use the Google Drive API.

How to convert a HTML table to CSV or JSON

Tagged csv, html, table, json  Languages ruby
require 'nokogiri'
require 'net/http'
require 'csv'
require 'json'

url = 'http://site/html.table'
uri = URI(url)
response = Net::HTTP.get(uri)

doc = Nokogiri::HTML(response)
table = doc.xpath('//table//tr').map do |row|
  row.xpath('td').map do |cell|
puts JSON.pretty_generate(table)