So it turns out, reading data from another site is very easy with Nokogiri.
The source code is available here https://github.com/abreckner/MyRecipeSavour
There is a lot I am going to cover in the next few posts based on this code base (like Devise and Heroku), but for now we are focussed on this file https://github.com/abreckner/MyRecipeSavour/blob/master/app/models/site.rb
So we are going to look at the add_recipe method.
First we need to require a few packages
Unfortunately, I haven't yet figured out a heuristic for separating a recipe web page into a recipe's components (Title, Ingredients, Instructions, Amounts, etc...) but as a workaround, I maintain a catalogue of CSS selectors which define these elements per domain. When I read the page, I use NokoGiri to parse those elements for me using the CSS selectorsrequire 'open-uri'require 'rubygems'require 'nokogiri'
i.e.
html = Nokogiri::HTML(open(url).read) # open the page
title = html.css(site.title_selector).text.strip # read the title
I then populate a recipe object with these pieces
recipe = Recipe.newrecipe.name = title...recipe.save
My code around the ingredients and instructions is a little more complex as my Recipe model has many Ingredients and Instructions (eventually I am going to allow users to manipulate them individually). Each ingredient/instruction is parsed based on a line break, so I need to pull in the ingredient array from Nokogiri and then merge it into a string separated by line breaks.
ingredients = html.css(site.ingredient_selector).children.inject(''){|sum, n| sum + n.text + "\n"
...
Ingredient.multi_save(ingredients, recipe)
The reason I convert it to a string and then back into an array is so that the user can later edit the ingredients via a textarea. It's fair to say that I actually write the multi_save code from a textarea for input before I did the screen scrape and I wanted to reuse it.
The other interesting piece of this add_recipe method is that I store a new Site in case the user tries to add a recipe from an "uncatalogued" site. This automatically builds up a list of the sites people are interested in saving recipes from and allows me to catalogue it at a later date
site_domain = URI.parse(url).hostsite = Site.find_by_domain site_domainif site.nil?site = Site.newsite.domain = site_domainsite.url = urlsite.user = current_usersite.save!falseelse... #Nokogiri scraping code goes hereend