Skip to main content

Bookmarks tagged with “scraping”

  1. GitHub - kennethreitz/requests-html: HTML Parsing for Humans™

    Python web requests and page scraping. Looks like it might be a bit easier than BeautifulSoup. (via @simonwillison)

  2. Parser API Docs — Readability

    “The web’s most powerful content parser.” Free for non-commercial use, up to an apparently unspecified request cap.

  3. fivefilters / php-readability — Bitbucket

    “A PHP port of Arc90’s original Javascript version of Readability.”

  4. Extract Data from Any Web Page - Diffbot

    Pay-for API that lets you “Get structured content from articles, products, discussions and other familiar page types.”

  5. Pattern, a Python module for mining web data

    Lovely looking module for grabbing data from a variety of web sources, analysing it, and displaying results in different ways. (via Waxy)

  6. Philgyford’s mailman-archive-scraper at master - GitHub

    My first Python code and my first attempt at using GitHub. Suggestions for things I’ve done wrong are welcome, but please be gentle.

  7. Introducing templatemaker | Holovaty.com

    Python thing. Point it at some HTML files and it will make a template with holes for the unique strings in the pages. (via Daring Fireball)

The most common tags

  1. webdevelopment (733)
  2. london (357)
  3. uk (318)
  4. music (255)
  5. javascript (172)
  6. mac (167)
  7. articles (152)
  8. maps (146)
  9. css (146)
  10. via:kottke (140)

More…