Skuunk Works

Posts

Showing posts from 2012

Screen scraping with Nokogiri

So a few months ago I put together a side project at http://www.myrecipesavour.com/ . Basically, the site allows you to put in the URL of a cooking recipe page and will then parse the recipe for your collection. So it turns out, reading data from another site is very easy with Nokogiri. The source code is available here https://github.com/abreckner/MyRecipeSavour There is a lot I am going to cover in the next few posts based on this code base (like Devise and Heroku), but for now we are focussed on this file https://github.com/abreckner/MyRecipeSavour/blob/master/app/models/site.rb So we are going to look at the add_recipe method. First we need to require a few packages require 'open-uri' require 'rubygems' require 'nokogiri' Unfortunately, I haven't yet figured out a heuristic for separating a recipe web page into a recipe's components (Title, Ingredients, Instructions, Amounts, etc...) but as a workaround, I maintain a catalo...

How DRY is too DRY?

One of the earliest development principles we learn as programmers is Do not Repeat Yourself or DRY for short. Copy and Paste are supposedly your worst enemies. Rather than rely on copy and paste, you create functions and subroutines and call them from your code so you don't have to reimplement it continuously. It also has the added advantage that if you need to make a change to that subroutine, you only need to make that change once. (Note: I realize that functions and subroutines are different entities, but for the purpose of this article they are interchangeable). Sometimes you come across 2 pieces of functionality which are very similar, so instead of copying and pasting, you merely instantiate your function/subroutine with different variables to factor out/handle the differences, even where the core functionality is the same. For example, you might have a Customer object and a Vendor object. Both customers and vendors have addresses and you need to send mail to them both...

We know JavaScript is weird... enough already!

So it looks like JavaScript is close to being considered a "real" language nowadays. There are frameworks which allow you to do MVC (like Backbone.js), you can use it on the server (with Node.js) and you can even use it to interact with datastores (via MongoDB). So why is it that almost every job posting you see for a JavaScript gig and/or every interview you go to that has a JavaScript component, asks you to interpret/debug (without using a browser) some esoteric fault of JavaScript that you probably wouldn't run into in a 100 years because you actually write decent JS? Like var cities = ["NY", "SF"]; cities.length = 1; console.log(cities); // outputs ["NY"] or var a = 1 + 1 + "1"; // equals "21" var b = "1" + 1 + 1; // equals "111" It's as if they are trying to say to you "Look at that piece of crap language you are programming in! You must be an idiot!" while at th...

HTML Canvas Libraries

So I was remarking to a coworker today that the HTML Canvas API is very low level and hard to use, and his reaction to that was actually positive (and he had a point). By being very low level, it means that pretty much everything is exposed and going forward you won't have to wait for browser vendors to update their libraries in order to get the latest features. Basically, the browser vendors are removing themselves from the equation. However, application developers are left with a bit of a dilemma. Do we really want to reinvent the wheel every time we build an app? Why is it that you have to redraw everything every frame? Wouldn't it be easier to work with objects rather than pushing pixels? Well, while the browser vendors might have removed themselves from the library equation, fortunately a number of other people are stepping in. It looks like there are a myriad of 3rd party libraries out there now for manipulating the HTML 5 Canvas, some of which are more sophisticated ...

Working on a side project

There are many pluses when working on a side project. You get to work with the latest technology. You get to decide what features go in and what doesn't. You can show it off to potential employers. The list is endless. However there are some tips to remember as well. We will cover these here. Time management Unless you are unemployed, your time is now a precious resource, which means that you are now a resource to your own project. Try and organize blocks of time during the week when you can work (i.e. a few hours on Saturday or an hour on Thursday night) and stick to them. Try to break up your work into small chunks and aim to have a feature done in that block. This will help motivate you. Feature Management Related to time management, it's important to maintain a list of what you want to do and be able to check items off this list. Ask friends for feature ideas and add them to the list. Save anything that's a large feature for the weekend and try and do the sma...

For whom do you write code?

Ok, this was going to be called "Who do you write code for?" but I was told never to end a sentence with a preposition. Anyways, I was thinking the other day about who the audience is for the code I write and I came up with the following. Bear in mind I live by the principles of KISS (Keep It Simple, Stupid) and YAGNI (You Ain't Gonna Need It). I developed these attitudes after reading tons of other people's code (as well as my own code 6-12 months down the line. So in descending order... 1. The User Obviously you are writing code for someone to use (or a service to consume). I am not going to go into UX or HCI at this stage, just that as far as priority goes, this guy is the top. Make sure the user gets good and timely feedback for everything he does and that the steps he has to take make sense to him. 2. The Compiler/Interpreter The code you write has to be compiled or interpreted by a computer before the user can use it. Don't worry too ...

Speeding up RSpec

So today I have been looking into getting our enormous battery of tests to run faster. I have yet to find anything that works for Cucumber, but I did find an interesting way to speed up RSpec which is detailed here. https://makandracards.com/makandra/950-speed-up-rspec-by-deferring-garbage-collection Basically, it seems that by not collecting garbage too frequently, you can make your tests run much faster (at the expense of memory management of course). We observed a 30% reduction in the time it takes to run an RSpec test suite. I did try to implement this on Cucumber, however because we need to store much more in memory to set up and tear down our objects, it meant that I kept running out of memory when I wasn't using the default Garbage Collection and the tests took even longer (so, buyer beware). I suppose if you had a small set of features though you might see some benefit.

Cucumber - my perspective

I have been using Cucumber ( http://cukes.info/ ) at my current gig for about a year now. My initial reaction was that I absolutely hated it. It didn't seem to make sense for a programmer to write out tests (features) in plain English and then write out a bunch of regular expressions to turn that plain English into runnable code. What a palaver! The other problem, is that the Cucumber tests were extremely fragile. Even making text and/or HTML changes would break things in lots of random places. Anyways, as it turns out, I don't really hate Cucumber, I just hate the way it is implemented in my current gig. Here are some lessons I learnt on the way... 1) Features are not supposed to be written by programmers. You can write features as a programmer, but you are not the intended audience. The reason why features are written in plain text is that they are supposed to be written by business owners. As a programmer though, you can use features to organize your thoughts in plain Engli...

JSON caching with Rails

So the other day, I needed to cache an action which was basically a proxy action to return JSON. Basically, 3rd party company X has an XML API for it's deals. Ideally we would use JavaScript to pull the feed, but unfortunately, the feed is using XML which means we need to use the server to pull this "deal" and then reformat it in JSON for our site to use. To render the JSON we were using a render call render offer.to_json It can get expensive to pull this deal over the wire every time, so we looked into using caches action and set it to expire in 5 minutes (because that is how often the deal feed updates). caches_action :index, :expires_in => 5.minutes (Oh, also we need to turn caching on in dev to see this happening) config.action_controller.perform_caching = true While this worked fine for the first call, I noticed that in the subsequent calls, the application type was being set to 'text/html' instead of 'application/json'. This was causing the ...