Thursday, March 22, 2012

Injecting dependencies in Ruby

Although I have read various times that dependency injection is not needed in Ruby, it is a practice that I am very much used to thanks to the great Spring Framework and can't seem to make sense of any other way of doing things :).

In my current spare time application for example I needed to have a Recommender in my controller for recommending movies for the current user.

It works simply like this recommender.recommendations_for(user). I know that I could probably make a module and enrich my domain object with it, so I would say something like user.recommendations, and that would give me the recommendations for that user.

I will probably do that later on, but for now, I want to keep the concept of a separate recommendations engine or service.

The Recommender constructor looked something like this:


def initialize(extractor, similarity_data)



Where extractor and similarity_data are involved objects to build themselves.

I obviously didn't want to build it in the controller itself. Also I wanted a unique instance to be shared (singleton) everywhere I used the recommender, but I didn't want to use the class itself as the singleton. I wanted an actual instance of the class.

I could then create a factory method on the class to have the singleton like Recommender.getInstance, which is ok, or an independent factory, but I didn't want to do this for all the dependencies and control them individually. And it still doesn't feel as nice as the @Autowired feature in Spring.

So I decided to make a very simple way of injecting new properties in my objects. I would be depending on a central object container (Which I called ObjectContainer) that will have all the objects I intend to use for injection.

I created My ObjectContainer:

class ObjectContainer

  @objects = {}
  def self.init
    recommender = Recommender.new(FileRatingsExtractor.new, MongoBackedSimilarityData.new)

    @objects[:recommender]=recommender

    #..... more objects and stuff .......

  end


  def self.get(name)
    @objects[name]
  end

end



ObjectContainer.init




And then I created my inject support:



require_relative "object_container"

module Inject

  def method_missing(meth, *args)
    return super unless meth == :inject
    args.each do |to_inject|
       self.class_eval do
         define_method(to_inject) do
            ObjectContainer.get(to_inject.to_sym)
         end
       end
    end
  end

end



class Object
  extend Inject
end




I created a Spec like this:



require 'spec_helper'

describe "Inject Support" do

  it "should inject properties in my objects" do
    class AnyClass
      inject :recommender
    end
    a = AnyClass.new
    a.recommender.should_not be_nil
  end

end


This is in a Rails application I'm working on. The files init_object_container.rb and init_inject_support.rb are in the initializers folder. When I ran my test it was passing, showing that I have a value on the recommender property of an instane of my class AnyClass.

Now in my controller (haven't tested it yet :)) I should be able to do something like this:

class HomeController < ApplicationController
  inject :recommender

   .....

end

And doing that I would have the recommender accesible as a property in my controller.

I don't know if this is completely anti-Ruby, or anti-Rails. But it just feels very nice and unubtrusive to work like this. Not to say familiar to someone very used to DI and Spring.

Sunday, March 4, 2012

Adding recommendations in Ruby

I am developing a small movies web application, and I wanted to add a simple but functional recommendation engine to it. So I started to work on it.

Algorithms and formulas from this post are based on the great book Algorithms of the Intelligent Web


 


Recommendations in applications are based on similarities. Similarities between two or more users, or similarity between two or more items.

I wanted a simple Recommender that would look something like:


  1. class Recommender
  2.  def recommendations_for(user)
  3.  end
  4. end

the recommendations_for method would return the recommended movies for a user.

I wanted to start the recommendation first with the User similarity.
This means that we will compare each user with each other looking at the movies they like (and don’t like) based on the ratings they give to these movies. This way we establish similarities between users based on their common tastes.

Most Recommendation implementations are based on the concept of similarity. Mainly similarity between users and similarity between items.
So first let's see the concept of User Similarity.

User similarity simply means how much a user's rating patterns compare with another user's. So if for example two users rank the same 10 movies with the same rankings we can think that they movie tastes are similar so if we can recommend movies between this users.

For having the best values when looking at user similarity we compare every user with each other, and rank their mutual similarities in order so we have a full arrangements of users and their similar users.

Comparing each user with each other can be a long task in systems with many users and many reviews per user. So the best idea is probably to precalculate this values and not do it in real time. But for now let's just focus on the algorithm I need as a first step in creating recommendations for my site

So let's see how this is done with some code:

The first step is as said find a way to figure out if my users are similar between them.

My current structure is a MongoDB database with the user reviews embedded inside the Movies document, I would need to extract that, but I won't do it right now. Instead I assume that I already extracted it and ended up with the same structure as the file downloaded from movielens.

The structure of this file is:

user id | item id | rating | timestamp


So what exactly we need?

Our user similarity measure between two users will be based on the ranking they each give to a common movie, and the ratio of how many common movies they have watched. so for example

Mario | Titanic | 4
Luigi | Titanic | 4
Koopa | Titanic | 4
Mario | Braveheart | 5
Luigi | Braveheart | 5
Koopa | LOTR | 5

Mario and Luigi are more similar as they have more movies, from all their movies, in common with each other, even though in the common movie (Titanic) all three of the users have the same ranking.

We wan't to create a result of the form:

user_1_id | user_2_id | similarity

For now we will store this in a simple memory list

So we'll start with the most trivial but expensive algorithm and later see if we can improve it somehow:

For doing this for all the users and all the items we need the expensive n^2 * m^2 algorithm:

The following algortihm and formulas can be found in the great book Algorithms of the Intelligent Web


class Similarity

 attr_reader :user_similarity_list



 def initialize(extractor)

    @extractor = extractor

    @user_similarity_list = []

 end



 def generate_similarities

    users = @extractor.extract_users

    users.each do |user_1|

     user_1_rankings = @extractor.extract_movies_with_rankings(user_1)

     users.each do |user_2|

       common_items = 0

       similarity=0.0;

       user_2_rankings = @extractor.extract_movies_with_rankings(user_2)

       user_1_rankings.each do |ranking_1|

         user_2_rankings.each do |ranking_2|

           if ranking_1['movie'] == ranking_2['movie']

             common_items += 1

             similarity += (ranking_1['rating']-ranking_2['rating'])**2

           end

         end

       end

       if common_items>0

         similarity = Math.sqrt(similarity/common_items)

         similarity = 1.0 - Math.tan(similarity)

         max_common_items =  [user_1_rankings.size, user_2_rankings.size].min



         similarity = similarity * (common_items.to_f/max_common_items.to_f)

       end

       @user_similarity_list << UserSimilarityRelation.new(user_1, user_2, similarity)

     end

    end

 end

end



class UserSimilarityRelation

 def initialize(user_1, user_2, similarity)

    @user_1=user_1

    @user_2=user_2

    @similarity=similarity

 end

 def to_s

    "#{@user_1}, #{@user_2} : #{@similarity}"

 end

end


So let's test our algorithm with a mock extractor and see what results we get:


class MockExtractor

 def initialize

    @rankings = {'Koopa'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'LOTR', 'rating'=>5}], 'Mario'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'Braveheart', 'rating'=>5}], 'Luigi'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'Braveheart', 'rating'=>5}]}

 end



 def extract_users

    ['Mario', 'Luigi', 'Koopa']

 end



 def extract_movies_with_rankings(user)

    @rankings[user]

 end

end



similarity = Similarity.new(MockExtractor.new)

similarity.generate_similarities

puts similarity.user_similarity_list



we run that and we get the following result:

Mario, Mario : 1.0
Mario, Luigi : 1.0
Mario, Koopa : 0.5
Luigi, Mario : 1.0
Luigi, Luigi : 1.0
Luigi, Koopa : 0.5
Koopa, Mario : 0.5
Koopa, Luigi : 0.5
Koopa, Koopa : 1.0

We can see that Mario and Luigi are as similar as possible while Koopa is not as similar based on the different movie he reviewed.

So how to make recommendations based on this similarity information:

The recommendation algorithm:

We would want a Recommender of the form



First we create an implementation in our extractor to extract the movie names. In our MockExtractor we just do:


  1.   def extract_movies
  2.     ['Titanic', 'LOTR', 'Braveheart', 'Harry Potter']
  3.   end




We extend our data to have the following now:

Mario | Titanic | 4
Luigi | Titanic | 4
Koopa | Titanic | 4
Mario | Braveheart | 5
Luigi | Braveheart | 5
Koopa | LOTR | 5
Luigi | LOTR | 4
Luigi | Harry Potter | 4
Koopa | Rambo | 3
Mario | Rocky | 4

When we run the similarity we now get:

Mario, Mario : 1.0
Mario, Luigi : 0.6666666666666666
Mario, Koopa : 0.3333333333333333
Luigi, Mario : 0.6666666666666666
Luigi, Luigi : 1.0
Luigi, Koopa : 0.09699304532693201
Koopa, Mario : 0.3333333333333333
Koopa, Luigi : 0.09699304532693201
Koopa, Koopa : 1.0


We will see how the Users with more similarity to other users affect the recommendation more than users that are not that similar:
then our recommendations_for implementation is:



class Recommender



 def initialize(extractor, similarity_data)

    @extractor = extractor

    @data_set = extractor.rankings

    @similarity_data=similarity_data

 end





 def recommendations_for(user)

    movies = @extractor.extract_movies

    recommendations = []

    movies.each do |movie|

     recommendations << [movie, get_estimated_rating_for_movie(movie, user)]

    end

    recommendations.delete_if do |recommendation|

     recommendation[1].nil?

    end

    recommendations.sort! do |e1,e2|

     e1[1]<=>e2[1]

    end

    recommendations.reverse!

 end



 private



 def find_user_rating_for_movie(user, movie)

    the_movie_rating = @data_set[user].find { |movie_rating| movie_rating['movie']==movie }

    if the_movie_rating.nil?

     return nil

    end

    the_movie_rating['rating']

 end



 def get_estimated_rating_for_movie(movie, user)

    users = @extractor.extract_users

    similarity_sum = 0.0

    weighted_rating_sum = 0.0

    estimated_rating = 0.0

    if find_user_rating_for_movie(user, movie) == nil

     users.each do |other_user|

       next if other_user == user

       if find_user_rating_for_movie(other_user, movie) != nil

         similarity = @similarity_data.find { |similarity_relation| similarity_relation.user_1 == user and similarity_relation.user_2 == other_user }

         similarity = similarity.similarity

         other_user_rating = find_user_rating_for_movie(other_user, movie)

         weighted_rating = similarity * other_user_rating

         weighted_rating_sum += weighted_rating

         similarity_sum += similarity

       end

     end

     if (similarity_sum > 0.0)

       estimated_rating = weighted_rating_sum / similarity_sum

     end

    else

     estimated_rating = nil

    end

    estimated_rating

 end

end



Then we execute:

similarity = Similarity.new(MockExtractor.new)

similarity.generate_similarities

puts similarity.user_similarity_list



recommender = Recommender.new(MockExtractor.new,similarity.user_similarity_list)



recommendations = recommender.recommendations_for('Mario')



puts recommendations

And we get:

Harry Potter
5.0
LOTR
4.333333333333333



If we change now the rating Koopa gives to LOTR to 4 and the rating Luigi gives to LOTR to 5 we get:


Harry Potter
5.0
LOTR
4.666666666666666



We see that because Mario and Luigi are more similar than Mario and Koopa, Luigi's ratings have a bigger impact on the recommended movies to Mario.



This is it. We now have a recommendation engine based on user similarities and rating systems.



Again the performance of the algorithms presented are not good and not suitable for real time recommendation generation, and maybe not even for pre-calculated recommendations. However this is the system (with a couple of fast performance improvements like changing a couple of the iterations to Hash access) I will start to use in my own application and will evolve it if I see it is not a good fit.