Skip to content

Latest commit

 

History

History
85 lines (43 loc) · 2.77 KB

links-as-of-1-17.md

File metadata and controls

85 lines (43 loc) · 2.77 KB

General

Seeding

How the Internet Archive Crawler Works

Seeding the Internet Archive’s Webcrawler

Archiving using the Google Chrome Extension & Sub-primers

Seeding and Sorting Example

Data Refuge Seed Progress DataRescue Philly DOE

Data Refuge Seed Progress DataRescue Philly NOAA

Seeded URL's DataRescue Philly

EDGI URL Tracker with Agency Coding (Responses) 13:40:06 DataRescue Philly

Seeding and Sorting DataRescue Philly Debrief Notes

https://docs.google.com/document/d/10o-MQ2gDK5KfeXkSc2mglZwmWH0CahKrgAGzldptfbA/edit

Baggers

Initial Bagger Workflower and Documentation DataRescue Philly

EDGI Uncrawlable Content

Uncrawable Content Spreadsheet Template DataRescue Philly

Technical Metadata Bag DataRescue Philly

Scrape Methodology by Adam Labay

Scraping Code by Rob Emanuele

EDGI URL Priority List

Agency Office Code Lookup Table

NOAA Seeding and Sorting