eluka snippets

How to Classify Text with Bayesian and SVM Classifiers + Ruby

Tagged classifier, bayesian, hoatzin, ankusa, svm, support vector machines, eluka  Languages ruby

There are at least 4 Bayesian and Support Vector Machine classifiers for Ruby that you can use for e.g. sentiment analysis: Ankusa, Eluka, Classifier and the Hoatzin. My favorite is Ankusa..


require "ankusa"
require 'ankusa/file_system_storage'

file  = 'training.txt'
storage = Ankusa::FileSystemStorage.new(file)
classifier = Ankusa::NaiveBayesClassifier.new(storage)

training = []
training << OpenStruct.new(:sentiment => :happy, :text => "I'm so happy")
training << OpenStruct.new(:sentiment => :sad, :text => "I'm so sad")

training.each do |tweet|
  classifier.train(tweet.sentiment, tweet.text)

puts classifier.classify "I'm sad"
puts classifier.classifications("I'm sad").inspect


c = Hoatzin::Classifier.new
c.train(:positive, "Thats nice")
c.classify("Thats nice")


b = Classifier::Bayes.new 'Interesting', 'Uninteresting'
b.train_interesting "here are some good words. I hope you love them"
b.train_uninteresting "here are some bad words, I hate you"
b.classify "I hate bad words and you" # returns 'Uninteresting'


classifier = Eluka::Model.new
training.each do |tweet|
  classifier.add(tweet.features, tweet.sentiment)
sentiment = classifier.classify tweet.features

Hoatzin, Classifier and Eluka use libsvm. Reference Sentiment Analysis in Ruby by Mateusz Drożdżyński