In the Tail

Exploring the niche consumer

Google Ruby Mechanize

I’m still slowly working on the jargonfly site and I said I would post the source code as I went along so this is my first installment. Once someone registers a domain and builds a site, you need to get the word out about your new site, one way to do this is to use search engine optimization technique (SEO). SEO is a way of optimizing your site to make sure Google and other search engines equate your site to search terms related to your site. For example when someone is searching for Hiking trails in Georgia, Alabama, Tennessee, North Carolina or South Carolina I want Secret Falls listed, since it’s target audience is hikers in those states.

Using these SEO instructions and the ones from Google I found I was doing some things wrong. One of the biggest mistakes I was making was that my title for each page was the same, this is a huge problem but one that is easily corrected. I changed the title on each page to either take the name of the state or the name of the trail. It takes a couple of weeks for the change to show up in the Google search results, but slowly I started moving up from page 10 to page 3-4.

I wanted a way of tracking this progress so I put together the following script. It takes an Array of search terms and uses Ruby , Mechanize and Hpricot to find what position your site places for each search query. I plan to make this a feature on the jargonfly site where you can set this up, then run it on a regular basis and graph the results using ziya which I’m using on another site. I have one for Yahoo search too, but I need to merge them together and end up with one script that works on all the big search engines.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
require 'rubygems'
require 'mechanize'
require 'hpricot'
# Added delay so I wouldn't get blocked from Google, not sure this would really happen but successive queries could trigger a bot alert
delay=3
query=["Hiking Georgia","Hiking Alabama","Hiking Tennessee", "Hiking North Carolina", "Hiking South Carolina"]
site="www.secretfalls.com"
query.each do |q|
  position=0
  agent = WWW::Mechanize.new
  agent.user_agent_alias = 'Mac Safari'
  page = agent.get("http://www.google.com/")
  search_form = page.forms.with.name("f").first
  search_form.q = q
  search_results = agent.submit(search_form)
  #  puts search_results.body.class
  doc = Hpricot(search_results.body)
  while position<200  do
    (doc/"/html/body/div#res/div/div.g/h2.r/a.l").each do |link|
      position += 1
      url = link.attributes['href'].gsub(/'/,'')
      if url.include?(site)
        puts  "#{position} #{url} #{q}"
        position=1000
      end
    end
    next_page=(doc/"/html/body/div#res/div#navbar.n/table/td.b/a").last.attributes['href'].gsub(/'/,'')
    sleep rand*delay
    page=agent.get(next_page)
    doc = Hpricot(page.body)
  end
end
I added some random sleep before the script pulls the next page of results to avoid Google getting mad an locking out my IP. I really doubt this would happen, but it’s cheap insurance.