Dynamic Screenshots on the Server with PhantomJS

1

Scraping Data from websites is pretty cool. But what if we could not only grab and parse the markup from external websites, but take full-blown screenshots and even modify that markup before we take that screenshot? We can do all that with an excellent package called PhantomJS.

A Browser without the Browser.

PhantomJS describes itself as a “headless WebKit with JavaScript API”. For those that are not familiar, WebKit is the open-source web browsing engine that powers popular browsers including Chrome and Safari. “Headless” refers to fact that the program can be run from the command line without a window system, making it perfect for the server environment. The JavaScript API means that we can easily write scripts that interact with PhantomJS in the language of the web, allowing us to modify the browser output on the fly, just like you can do with Firebug or Developer Tools locally!

Installation

The hardest part of working with PhantomJS is getting it installed. Because of the requirements you are not going to be able to run this on a standard shared host, but an average vps will handle it fine. The following instructions are for getting things up and running with Ubuntu 11.10 on Linode.

Step1: Install PhantomJS Requirements

This will install everything we need to compile PhantomJS, along with the ability to run it virtually without a window system.

Step2: Install Browser Goodies

Installs flash plugin and windows fonts so sites appear more accurate.

Step3: Compile!

Clone the repo and build from source.
After everything finishes we should now have the program installed. Test it at the command-line by typing “phantomjs”

Taking Screenshots

Since PhantomJS scripts are simply javascript with some extra api calls baked in, the barrier to entry for web developers is zero! Create the following script and save it somewhere on your server as “shotty.js”. Check out the comments for an explanation of what is happening.
To run this script, navigate to where the script lives on your server and run the following command. We run “Xvfb -screen 0 1024x768x24&”  first to set the parameters of our Xvfb screen buffer. This is basically a virtual screen that allows us to emulate a window environment. Then we call the script with “DISPLAY=:0 phantomjs –load-plugins=yes shotty.js” to ensure Phantom runs in the buffer. PhantomJS will execute our script and save our screenshot in the same folder the script is in. If everything went well our screenshot of espn.com should look just like the real thing. How cool is that?

Muck’n with Markup

Because PhantomJS scripts are JavaScript, that means we can easily perform actions on the result of a page, and then take a screenshot of the results! Below we have created a script that grabs a page, embeds jQuery, and then performs some DOM manipulation before finally taking a screenshot. Priceless. The following code will allow us to edit the headline on espn.com to say whatever we want.

I’m sure by now your gears are already turning on fun uses of this technology. In Part II of this series I will go over how to integrate with Node.js to create a Phantom powered web app.

Skookum Digital Works has The Fire. We Will Share it with You.

2

While attending NodeSummit in San Francisco a couple weeks ago, I was approached by very sweet older lady named Gretchen. She was strolling through the exhibition area for unrelated reasons and stopped in her tracks when she saw the word “Skookum” on our banner. Gretchen is a retired school teacher from the Pacific Northwest, and she told me a story about the “Skookums” that her students loved to hear and that she had told for many years.

At the beginning of the world, people had no fire. The only fire anywhere was on top of a high mountain, guarded by the Skookums. You see, these Skookums were not like the fine and friendly Skookums we have today, these Skookums were total hoarders. They didn’t want the people to have fire, because if they did, then maybe they would become as powerful as the Skookums.

A coyote thought he’d be sly and go steal a brand of fire and bring it to the people. After consulting his three sisters for advice, (in an odd twist of events, the three sisters lived in the coyote’s stomach in the form of huckleberries…but I digress), the coyote lined up all the animals in strategic places along the mountainside in a line between the Skookum’s fire and the people.

The coyote stole some fire while the Skookum’s were chillaxing, then the animals basically relayed raced the fire until eventually an antelope gave the fire to a frog, who swallowed it. The frog then spat out the fire onto a piece of wood. The Skookum’s—kinda mad at this point—couldn’t figure out how undo the frog-fire-wood-spit, so they went back to the top of the mountain, presumably to resume chillaxing. The coyote then showed the people how to get the fire out of the wood by rubbing sticks together—and that’s why you can eat and be warm now.

So, why did I tell you this story? Because today’s Skookums aren’t hoarders. We don’t want to withhold our technology expertise, our “fire” if you will, from the people.

We don’t want coyotes up in our business either. So trust me, we’ll give you the fire. Just ask. Or come to a Friday Tech Talk.

*Illustrations by Rich!

Non-Programmer to Programmer: Introducing Case-Study Jason

3

Jason started working at Skookum about five months ago. Before that, he worked at a company that installed and seviced ShoreTel VoIP phone systems.

We asked Jason to introduce himself and talk about what he’s learned coming from a non-programming (though still technical) career into software development. Take it away, Jason.


At my last job, along with installing the phone systems, diagnostics and upgrades to the current network were almost always needed to get the needed voice quality. The company was just five people including the owner. The relationships formed within a small team like that are awesome. So why move to NC and take a new job then? The weather! I lived in north central Ohio…and like a lot of other NC transplants, I’d have a hard time going back to grey skies and bad winters.

But from a technical angle, I also was excited about the challenge of doing and learning something new. I’ve had a desire to improve my knowledge and toolset for some time. Coming to Skookum Digital Works, I really didn’t know what I would be doing, but it was clear they thought I had the capacity to grow if surrounded with the right teammates.

SDW does software development. I had some experience…mostly with ASP Classic (no laughing). No PHP. No node, No JavaScript, No CSS. No HTML5. Basically, no experience. Would I measure up? Would I really pick up those languages with the right guides?

Well here I am, five months in. I have not done any ASP Classic programming, but that’s (more than) OK. What I have done is learned a whole lot. I’ve picked up more programming skills in the last five months than I had in the rest of my previous jobs combined. I have used PHP, JavaScript and jQuery, node, knockout, CSS, and HTML5. I’ve also picked up some design and layout tricks, and I’ve even learned how to stretch and use WordPress as way-more-than a simple CMS.

And, do I like it here? Well, the work environment has a very similar feel to the small company I was at before. Everyone at SDW is awesome, smart, and willing to help when asked. Everyone here likes to operate out of their comfort zone and continue learning new things. And when someone makes a breakthrough, there’s usually a tech talk to coincide.

Looking toward the future, I hope to learn as much about new technologies and languages as I can. I want to learn more node, and wouldn’t mind learning Ruby at some point. I also would like to build an iPhone, iPad, or Mac app. There are things I need to be better at, and I am trying hard. My CSS, javascript, and code testing all need improving. I am learning as fast as I possibly can and look forward to learning even more. And as I move forward, a goal of mine is to help keep people informed of the new awesomeness that I find, because we all know, we will never know it all.

Scraping Poorly Formatted Data with cURL and phpQuery

0

cURL is a fantastic way to scrape data from websites. It’s pretty ubiquitous on LAMP servers nowadays, so you probably don’t even have to do anything to enable it and start using it. You can essentially get data that’s behind a login form by spoofing a browser logging into the site.

I’m not going to do a dissertation on the nuances of using cURL. Instead, I’d like to discuss how to process data after you’ve gotten the HTML string from a page using the curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); setting. The general process using cURL to get the data from a password protected site goes something like this:

  1. Make sure your “cookie jar” is set up and working so that your session can be saved while you’re scraping other pages after login.
  2. Get the HTML for the login page.
    1. Get the form post location.
    2. Get any special tokens or hidden variables.
  3. Post to the login script with your authorized username and password.
  4. Scrape any pages you need to get and process them.

Getting auth tokens and stuff

Sometimes login pages make it difficult to figure out what to post because of “authentication tokens” or similar variables that change every time a page is hit and then have to be posted back along with the username and password. This is most common among .NET applications since it’s built right in to the .NET form creation classes. The process of finding these variables is made a lot easier if you use phpQuery. It’s a project that attempts to replicate what jQuery does but with PHP. You pass phpQuery a string with all the HTML content in it and then you can perform selectors and traverse the DOM just like with jQuery. What you do with the HTML after that is up to you.

In the above example, I already had the HTML string of the login page and simply used phpQuery to get all the inputs on the page with a very jQuery-like selector syntax and then looped through the results to get all my input fields for the form so I could use them in my next call to post the data back to the login script.

Getting poorly formatted data and stuff

I consider “well formatted” data to be in formats like JSON, XML, CSV, etc. that are specifically meant for data transfer. But what heppens when you need to scrape data from HTML tables or DIVs?

phpQuery to the rescue! You can do the same thing as above but parse the newly acquired HTML string of the data you’re looking for. Here’s an example of looking at a two column table of data to get all sorts of neat stuff and format it into an array. Then I check for an element outside of the table loop and set another variable.

Using the data and stuff

After you’re done with that, you have all this useful data in an array and you can basically do whatever you want to it. Since this example I created originated from code used in a real life client project (with variable names changed to protect the innocent), I went on to take that array and save it out to a JSON file that I could easily read in and do what I wanted to with the data.

You may need to go through some tests to get the right kind of cleaning or data sanitization for your values, but phpQuery makes it really easy to scrape this kind of data if you’re used to jQuery’s selectors and traversing. Instead of scraping the site each time to get the data, I like saving out the html string to a file and then playing with selectors and traversing to get the data I want form a local file so I don’t put a lot of strain on the server.

And don’t forget stealth…

If you’re not sure how friendly the server (or serveradmin) is to you doing this, make sure you set the cURL user agent for each request. I like  pretending I’m GoogleBot, but any common user agent string would suffice so that the server doesn’t explicitly know you’re PHP trying to log in.

By default, cURL spits out a user agent like this (changes based on operating system and version of cURL):

curl/7.15.5 (i686-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5

If a serveradmin sees that in their logs, they might freak out… so  I like these user agent strings for sneaky data pulls:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7

Audience participation

How do you use cURL and phpQuery? Any useful scripts or resources to point people to other than their respective documentation sites?

Charlotte Programming Company Helps Business Clients Go Digital

0

Along with being ranked one of the Best Places to Work in Charlotte, North Carolina, James Hartsell—SDW’s co-founder and CEO ,was recently profiled by the Charlotte Business Journal. The Q&A is reprinted below.

Charlotte, North Carolina computer software mobile programming

Tell us a little about what Skookum Digital Works does and how it was founded?
Skookum Digital Works is a technical partner for anyone with a startup dream. We mainly work with entrepreneurs—building out their products or customizing software for businesses. 

My co-founder and I, Bryan Delaney, founded Skookum in 2005. We were roommates at UNC Charlotte, both graduated with computer science degrees, and both went to work for the Department of Defense for a few years before deciding we had a better mousetrap. 

Bryan and I are Charlotte natives, and we’re happy to have located our office Uptown. 

Do you specialize in any particular types of apps or client base?
I don’t think its crass of me to say we like to work with funded startups. Our typical client is the non-technical entrepreneur; someone with business skills and ideas but with no programming background. We allow a non-technical entrepreneur to get started on their digital product without having to find a technical co-founder or trying to hire engineers they are not qualified to vet. 

Our clients have often heard the word “No” elsewhere. We have expertise in the mobile web, complicated software integrations, and real-time web collaboration. 

What are some examples of apps Skookum has created?
A publishing company wanted a marketing tool to promote their books. SDW gave them a digital revenue stream.

Some D.C. folks knew independent voters were eager to take collective action. SDW built them a data mining and people matching system.

A group of investors-and-avid-golfers hated the 100+ scoring apps already available. We made players’ phones talk to each other. (!)

A neighborhood of New York businesses disliked Groupon keeping their margins. We created a localized model they collectively controlled

What do you think will be the next innovation in smartphone applications?
Mobile apps are going to be easier to download and live outside the walls of the Apple iTunes store and the Android Marketplace. Companies can now place their applications on iPhones and iPads without Apple taking 30% of the cut

Mobile applications are also slowing making their way into retail. Most stores know they can use smartphones and tablet devices to enhance their store experience, but the smaller chains (and certainly the local guys) are waiting to see what the big guys do before making the investment. 

Charlotte isn’t really seen as a tech town. Will that ever change?
It’s fundamentally a marketing problem and one that we and 70+ other Charlotte tech leaders have addressed face to face with Mayor Foxx. Aside from numerous local startups and technical partners like ourselves, all of the banks are essentially technology companies. 

For just one example, If you took all the programmers out of Bank of America, they would comprise the tenth largest tech company in the world. BoA has technology needs that make engineers at IBM cry. 

We like to think we’re doing our part recruiting talent to the area and flying the flag in front of national entrepreneurs. We have clients all over the place happy to come see us and come visit Charlotte. 

Charlotte is already a tech hub, but the city definitely needs to get better about spreading that message. 

Debugging with the Weinre

0

weinre. A rather odd accronym indeed. It could be pronounced why • ner • ree.
Or wee • ner (which my training of the English language biases me towards).

weinre. Web Inspector Remote. This is what happens when developers make and market a product.

weinre.

Weinre is a tool for remotely debugging your web pages. Let’s say you’re building a responsive web site or a mobile web app. iOS’s Debug Console certainly can’t be the most helpful option for peeking behind the curtain. This is where weinre comes in.

Recently being brought into the Phonegap fold (and therefore Apache and the Apache Incubator), weinre is undergoing active development. You can learn about weinre at
phonegap.github.com/weinre/Home.html or callback.github.com/callback-weiner (callback is the Apache Incubator github account).

Watch a demo
on Youtube

Here, you’ll learn that if you’re on OSX there is an app to get you up and running fast. You’ll learn that it’s a javascript-only implementation of the Webkit Inspector. You’ll learn that there are three parts (each with the word “Debug” in it’s name). But what you won’t learn is how to open up your localhost development server to outside devices. For that you came here.

To begin using weinre to remotely debug your projects, simply follow this six step program.

  1. Download the mac package from https://github.com/callback/callback-weinre/archives/master
  2. Create the directory and file
    ~/.weinre/server.properties

    with the following content:

    boundHost:  -all-
  3. Grab this bookmarklet Weinrize it!
  4. Launch weinre
  5. Run local server and weinre on the same machine while opening the website on your remote device.
  6. Bask in the glory of Webkit Inspector and iOS Development.

Disclaimer: Opening up your computer with all boundHosts has security implications.

Happy debugging.

Automated WordPress Plugin Deployment

0

Have you ever found a tool that made things so much easier you had to tell as many people about it as soon as possible? Yeah, that just happened to me.

A little back story:

I wrote a pretty neat WordPress plugin and had it hosted on GitHub. Basically, it takes an iTunes App Store application ID and fetches all the data about it and displays it on a WordPress site (while caching it for you in the WordPress database).

So I wanted to release it to the world using the WordPress plugin directory… the go-to location for finding plugins. If it didn’t live there, it basically had zero chance of getting any usage whatsoever. Nobody goes searching GitHub for WordPress plugins.

There was one problem…

WordPress uses SVN as the required source code management tool for publishing plugins. You clone check out the repository, make your changes, commit them and then tag the new version and the WordPress plugin directory detects the change and within a few minutes, your updated plugin is released to the world and every site that is using your plugin gets a little message at the top of the admin bar asking them to update their version of the plugin.

Except I haven’t used SVN in about 4 years. Basically the whole world has moved to GIT. There’s been a exodus from SVN, CVS, Perforce, Mercurial, etc. because of GIT’s ease of use and distributed model.

GIT to the rescue

Recent versions of GIT have really focused on integrating SVN as much as possible. The git-svn command is powerful enough to clone an entire SVN repository and allow you to work on it like it’s a GIT repository, then commit your history back to the SVN server, and the SVN server is none the wiser.

So I knew there had to be an easy and/or automated way to publish and then submit subsequent updates to my application.

tl;dr bro

I know, I know…

Which brings me to my point:

With a little Google-fu, I found Brent Shepherd’s blog post about his Automated WordPress Plugin Deployment script. Basically, it’s a bash script that you download and put into every plugin repos and change a few variables. When you’re don’e working on some features, just run ./deploy.sh and it automatically, checks out the SVN plugin, updates it with all your changes, making sure to ignore your GitHub “README.md” and the deploy script itself. Then it tags the new version and commits everything back to the WordPress server. It even aborts if it detects your readme.txt file stable version number doesn’t match your plugin’s declared version.

Thanks, Brent, for the fantastic deploy script. It worked flawlessly on the first try and my new plugin has been published.

Octocat to the rescue.
Well, technically, the Octocat only has 5 arms in the image, but I'm assuming that the rest are perfectly in line with the front legs.

Open an iOS App from an Email

4

Apple has allowed iOS apps to register their own URL schemes on your devices for a while now, but I’ve never used this functionality in-depth ’till just recently.

Having your app register a URL scheme on a device means that you can open an installed application on a user’s iPhone or iPad in ONE CLICK from their email (or the web, or another application). There are all sorts of apps already using this functionality.

Use Case

We just used this for a project where the client would send confirmation emails to the user after signup. The email would contain an link to a URL. The URL opens in mobile Safari, then launches the app and seamlessly logs the user into the app.

That’s one use case; there’s probably others (let us know yours). Another one could be a URL embeded in text—say someone is reading on their iPad and the content could prompt them to hop over to the app.

Try It Out

If a user has your app installed on their device, all you have to do is send them to my-great-app://whatever/ and Safari will resign to the background and your app will open. Why don’t you try it? I know you’ve got the Facebook app installed on your iPhone. Just clear your address bar and type (or click the link): fb://profile

Go ahead, I’ll wait.

It opened the Facebook app and took you to your profile page (or gave you a big fat cryptic error message if you don’t have the Facebook app installed). Facebook allows a LOT of options that you can pass to the application to take you into various parts of the app.

Process

So enough about the Facebook app. What does a developer do if they want to pass a bunch of data to the app—without actually defining what needs to be passed using a bunch of if/else or switch statements?

It’s fairly trivial to add url scheme to your application’s Info.plist:

  1. Register the url scheme in your Info.plist. Lets say: my-great-app://
  2. In the application delegate, implement this method and return YES:
    application:openURL:sourceApplication:annotation:
  3. That’s it!

I wrote a method you can add to your application that allows the passing of data through the url with key/pair values like so:

my-great-app://user_id/27/token/really9385long5892data/email/user@yourservice.com/

When the user clicks a link like that, or you auto-redirect them from your site to that URL, you can parse the URL into an NSDictionary with this code. Here’s the method and an example implementation along with some pretty badass comments so you know what the heck is going on:

Let me know in the comments if you find this useful at all or if you make changes to it.

Who Says Speed Kills?

2


Skookum Digital Works is wicked fast. Speed to market. Speed to prototype. Speed to minimum viable product.

Never is developmental speed more apparent than when I think back to my pre-SDW sales life.

I ran a software company.

Same industry.
Same clients.
Same challenges.
And two completely different approaches to solving those same business needs.

Perhaps my impressionable years of growing up in a Parochial school operated by nuns armed with wooden rulers lead to habits that played a role in my partiality for an inflexible, doctrinaire approach to application development.

I thought our 350+ page specifications documents were awesome. I thought our inflexible system produced predictable results. I thought our lumbering process equated thoroughness.

That all changed when I fell in love with Skookum. SDW was the company I always envisioned building.

SDW was nimble.
SDW was able to change course and turn on a dime.
SDW was fast, unfussy, AND thorough.

Importantly, SDW also had a full toolbox. At my .NET shop, we had one hammer, so every problem looked like a nail.

Skookum has its own creative department. Instead of waiting for innovation to come from Microsoft, SDW makes their own. Gone are the square buttons and pull downs of the .NET environment (and expensive licensees), and instead, I can now give clients, chic, state-of-the art solutions emphasizing intuitive design and natural functionality.

I admit it, I’ve become an Agile convert and enjoy nothing more than winning over clients both former and new. I like showing how they too can be better, faster, and more nimble. Ahead of schedule, and under budget. No fuss, just shipped code.

I used to think complex web-based application development meant one thing—mass.

Now I know I was wrong. Whoever said “Speed Kills…” has obviously never worked with Skookum Digital Works before.

Don’t Let String Sanitization Slow You Down

2

If you’ve ever taken a gander at a string sanitization class or library, you’ve probably noticed the amount of code necessary to keep the script kiddies at bay. We’re talking about slow string manipulations, i.e. string replacements and regular expressions.

Most MVC Frameworks today come with some form of sanitization or input filtering class built in. The problem with many of these libraries is that they fail to clean some of the more creative attack vectors. To combat this, some people use drop in libraries like OWASP AntiSamy or HTML Purifier to ensure their data is getting scrubbed clean. HTMLPurifier, for instance, uses a Smoke Test via the ha.ckers.org XSS attack list (the de-facto standard for finding attack vectors to test, might I add) to ensure they’re cleaning anything and everything. Continue reading

Page 1 of 812345...Last »