Hpricot can work directly with open-uri to load HTML from remote files
require 'rubygems' require 'hpricot' require 'open-uri' doc = Hpricot(open('http://www.rubyinside.com/test.html')) puts doc.search("h1").first.inner_html