Scraping Data from websites is pretty cool. But what if we could not only grab and parse the markup from external websites, but take full-blown screenshots and even modify that markup before we take that screenshot? We can do all that with an excellent package called PhantomJS.
A Browser without the Browser.
The hardest part of working with PhantomJS is getting it installed. Because of the requirements you are not going to be able to run this on a standard shared host, but an average vps will handle it fine. The following instructions are for getting things up and running with Ubuntu 11.10 on Linode.
Step1: Install PhantomJS Requirements
This will install everything we need to compile PhantomJS, along with the ability to run it virtually without a window system.
Step2: Install Browser Goodies
Installs flash plugin and windows fonts so sites appear more accurate.
Clone the repo and build from source. After everything finishes we should now have the program installed. Test it at the command-line by typing “phantomjs”
Muck’n with Markup
I’m sure by now your gears are already turning on fun uses of this technology. In Part II of this series I will go over how to integrate with Node.js to create a Phantom powered web app.