So it turns out, reading data from another site is very easy with Nokogiri.
The source code is available here https://github.com/abreckner/MyRecipeSavour
There is a lot I am going to cover in the next few posts based on this code base (like Devise and Heroku), but for now we are focussed on this file https://github.com/abreckner/MyRecipeSavour/blob/master/app/models/site.rb
So we are going to look at the add_recipe method.
First we need to require a few packages
Unfortunately, I haven't yet figured out a heuristic for separating a recipe web page into a recipe's components (Title, Ingredients, Instructions, Amounts, etc...) but as a workaround, I maintain a catalogue of CSS selectors which define these elements per domain. When I read the page, I use NokoGiri to parse those elements for me using the CSS selectors
I then populate a recipe object with these pieces
My code around the ingredients and instructions is a little more complex as my Recipe model has many Ingredients and Instructions (eventually I am going to allow users to manipulate them individually). Each ingredient/instruction is parsed based on a line break, so I need to pull in the ingredient array from Nokogiri and then merge it into a string separated by line breaks.
The reason I convert it to a string and then back into an array is so that the user can later edit the ingredients via a textarea. It's fair to say that I actually write the multi_save code from a textarea for input before I did the screen scrape and I wanted to reuse it.
The other interesting piece of this add_recipe method is that I store a new Site in case the user tries to add a recipe from an "uncatalogued" site. This automatically builds up a list of the sites people are interested in saving recipes from and allows me to catalogue it at a later date