I’ve been playing around with some data scraping, and Yelp has been my first test subject. Their DOM makes light (read: uninteresting) work of extracting most data. I did stumble across something pretty useful this evening though.
On every venue page, Yelp adds a property called
json_biz to the
window object. You can access it through your favorite JS console, which is handy if you’re trying to manipulate data through your browser, the real win here is that the JSON representation of this object is inlined in a
<script> tag near the bottom of the page.
Obviously, YMMV with this technique, and please respect Yelp’s data, along with everyone else’s for that matter.