Merge pull request #30 from datarefugephilly/seeder-updates

Minor formatting and copy updates to seeders
datarefuge · Feb 3, 2017 · c9659f4 · c9659f4
2 parents 58d62a9 + f0081fc
commit c9659f4
Showing 1 changed file with 7 additions and 7 deletions.
diff --git a/seednsort.md b/seednsort.md
@@ -1,28 +1,28 @@
-## Seeding and Sorting Overview
+# Seeding and Sorting Overview
 
 ## What do Seeders/Sorters do?
 Seeders and Sorters canvass the resources of a given government agency, identifying important URLs. They identify whether those URLs can be crawled by the Internet Archive's webcrawler. If the URLs are crawlable, the Seeders/Sorters nominate them to the End-of-Term (EOT) project, otherwise they add them to the Uncrawlable spreadsheet using the project's Chrome Extension.
 
 ## Choosing the website
-The Seeders/Sorters team will use the EDGI subprimer systems ([found here](https://envirodatagov.org/agency-forecasts/)), or a similar set of resources, to identify important/at risk data. Talk to the DataRescue organizers to learn more.
+The Seeders/Sorters team will use the [EDGI subprimers](https://envirodatagov.org/agency-forecasts/), or a similar set of resources, to identify important/at risk data. Talk to the DataRescue organizers to learn more.
 
 ## Canvassing the website and evaluating content
 - Start exploring the website assigned, identifying important URLs.
 - Decide whether the data on a page or website subsection can be [automatically captured by the Internet Archive webcrawler](./what-heritrix-does.md).
 - The best source of information about the seeding and sorting process is represented at [https://envirodatagov.org/](https://envirodatagov.org/), see:
-- [Understanding What the Internet Archive Webcrawler Does](https://docs.google.com/document/d/1PeWefW2toThs-Pbw0CMv2us7wxQI0gRrP1LGuwMp_UQ/edit)
-- [Seeding the Internet Archive’s Webcrawler](https://docs.google.com/document/d/1qpuNCmBmu4KcsS_hE2srewcCiP4f9P5cCyDfHmsSAVU/edit))
+  - [Understanding What the Internet Archive Webcrawler Does](https://docs.google.com/document/d/1PeWefW2toThs-Pbw0CMv2us7wxQI0gRrP1LGuwMp_UQ/edit)
+  - [Seeding the Internet Archive’s Webcrawler](https://docs.google.com/document/d/1qpuNCmBmu4KcsS_hE2srewcCiP4f9P5cCyDfHmsSAVU/edit)
 
-## Crawlable URLs
+### Crawlable URLs
 - URLs judged to be possibly crawlable are "nominated" (equivalently, "seeded") to the End-Of-Term project (EOT), using the [EDGI Nomination Chrome extension](https://chrome.google.com/webstore/detail/nominationtool/abjpihafglmijnkkoppbookfkkanklok?hl=en) or
   [bookmarklet](http://digital2.library.unt.edu/nomination/eth2016/about/).
 
 **Wherever possible, add in the Agency Office Code.** Talk to the DataRescue organizers to learn more.
 
-## Uncrawlable URLs
+### Uncrawlable URLs
  - If URL is judged not crawlable, add it to the "Uncrawlable" spreadsheet through the Chrome Extension.
   - In the spreadsheet is automatically associated with a universal unique identifyer (UUID) that was generated in advance.
   - You can check whether the page or some files are rendered using the Internet Archive's [Wayback Machine Chrome Extension](https://chrome.google.com/webstore/detail/wayback-machine/fpnmgdkabkmnadcjpehmlllkndpkmiak)
 
-## Not sure?
+### Not sure?
  - This sorting is only provisional: when in doubt seeders nominate the URL **and** mark it as possibly not crawlable.