Skip to content

Latest commit

 

History

History
31 lines (23 loc) · 1.07 KB

README.md

File metadata and controls

31 lines (23 loc) · 1.07 KB

NOTE: This project is no longer maintained! more info

Scrapemark

Scrapemark is a super-convenient way to scrape webpages in Python.

It utilizes an HTML-like markup language to extract the data you need. You get your results as plain old Python lists and dictionaries. Scrapemark internally utilizes regular expressions and is super-fast.

As an example, here is a way you could scrape all the links on the Digg homepage in one fell swoop:

import scrapemark

print scrapemark.scrape("""
  {*
    <div class='news-summary'>
      <h3><a href='{{ [links].url }}'>{{ [links].title }}</a></h3>
      <p>{{ [links].description }}</p>
      <li class='digg-count'>
        <strong>{{ [links].diggs|int }}</strong>
      </li>
    </div>
  *}
  """,
  url='http://digg.com/')