Cox Media Group
KryptoniteConference Schedule
Track one or two? Eeenie, meenie, miney, moe…
Y’all Wanna Scrape with Us? Content Ain’t a Thing : Web Scraping With Our Favorite Python Libraries
Time
Level
Description
Abstract
Outline
- lxml fu: etree vs html
 - lxml faves: iterlinks, prev/next, strip_tags, linepos
 - incorporating xpath
 - building your xml views/templates with lxml (this bullet is optional: may not have time but would love to hear if folks might find this useful)
 - learning how to build a good JSON API handler: what you can learn from some amazing api handlers when you have to build your own
 - feedparser, HTMLParser, re: the quick & dirty ways to parse when LXML isn't fast enough
 
 
Why learn from me? I’ve utilized these libraries to help build high-scale Django applications for the Washington Post and USA TODAY, covering everything from neighborhood blog aggregators, election coverage, Katrina mapping and financial reporting.
