|
| 1 | +# Open Source |
| 2 | +Each of the following section breaks down an area in this diagram, listing the purpose of each service in that area. |
| 3 | + |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +# Authentication |
| 8 | +Services that manage users. |
| 9 | + |
| 10 | +| Service | Description | Technologies | Status | Key Contributors | |
| 11 | +|---------|-------------|--------------|--------|------------------| |
| 12 | +| [**Identity**](https://github.com/qri-io/ident.archivers.space) | A central service for creating an account, login, logout, managing user info, etc. Other services can talk to the identity service to get information about a user. It's hosted [here](https://ident.archivers.space). Normal humans can create & manage accounts using the [archivers 2.0 webapp](https://alpha.archivers.space). | *alpha* | Go | @b5 | |
| 13 | +| [**IdentityDB**](https://github.com/qri-io/ident.archivers.space) | Database of all user identities. Only way to talk to it is through the identity service. | *alpha* | Postgres | @b5 | |
| 14 | + |
| 15 | + |
| 16 | +# Guidance |
| 17 | +Services that guide Data Rescue efforts. |
| 18 | + |
| 19 | +| Service | Description | Technologies | Status | Key Contributors | |
| 20 | +|---------|-------------|--------------|--------|------------------| |
| 21 | +| [**Agency Primers**](https://envirodatagov.org/agencyprimers/) | Spreadsheet of agencies & sub-agencies for archiving. | Google Sheets, Airtable | *in use* | @mayaad, @trinberg, Andrew Bergman | |
| 22 | +| [**Chrome Extension**](https://github.com/edgi-govdata-archiving/eot-nomination-tool) | Chrome extension to nominate government data that needs to be preserved. | **Javascript**, **Chrome Extension** | *in use* | @ates, @titaniumbones | |
| 23 | +| **Uncrawlables Spreadsheet** | The chrome extension dumps it's output to a google sheet of uncrawlable content. | Google Sheets | *in use* | | |
| 24 | + |
| 25 | +# Reporting |
| 26 | +Services for collecting & delivering platform-wide reports. |
| 27 | + |
| 28 | +| Service | Description | Technologies | Status | Key Contributors | |
| 29 | +|---------|-------------|--------------|--------|------------------| |
| 30 | +| **Stats** | Server that periodically collects key stats platform-wide & reports them for easy public consumption. This service will ideally just consume the JSON API's of all the other services & output a dashboard, there are lots of frameworks out in the wild that do this. We should research & pick one. | *planning to use an existing solution* | *not yet started* | | |
| 31 | +| **Health** | Server that periodically polls all services on the platform & outputs a page that shows when a service is down / malfunctioning. Status check frameworks for this already exist. We should research & pick one. | *planning to use an existing solution* | *not yet started* | | |
| 32 | + |
| 33 | + |
| 34 | +# Archiving 1.0 |
| 35 | +Current Services for downloading & storing content. |
| 36 | + |
| 37 | +| Service | Description | Technologies | Status | Key Contributors | |
| 38 | +|---------|-------------|--------------|--------|------------------| |
| 39 | +| [**Archivers 1.0**](https://github.com/edgi-govdata-archiving/archivers.space) | App for volunteers to research urls, add metadata, and upload archived .zip's. | Javascript, MeteorJS, ReactJS | *in use* | @kmcculloch, @danielballan, @b5 | |
| 40 | +| [**Archivers 1.0 DB**](https://github.com/edgi-govdata-archiving/archivers.space/imports/api) | Archivers Backing database | MongoDB | *in use* | @b5 | |
| 41 | +| [**S3-Upload-Server**](https://github.com/edgi-govdata-archiving/s3-upload-server) | Server for making uploads to S3 buckets via browser or AWS CLI tokens | *in use* | **Go** | @b5 | |
| 42 | +| [**Zip-Starter**](https://github.com/edgi-govdata-archiving/zip-starter) | Server for generating base metadata zip archivers | *in use* | Go | @b5 | |
| 43 | + |
| 44 | + |
| 45 | +# Archiving 2.0 |
| 46 | +New services for archiving & describing content. |
| 47 | + |
| 48 | +| Service | Description | Technologies | Status | Key Contributors | |
| 49 | +|---------|-------------|--------------|--------|------------------| |
| 50 | +| [**Archivers 2.0**](https://github.com/qri-io/context) | Webapp for volunteers to add metadata to urls & content. | Javascript, ReactJS, Redux | *alpha* | @b5 | |
| 51 | +| [**Patchbay**](https://github.com/qri-io/patchbay) | Backing service for archivers app, it coordinates realtime communication between users of the archivers 2.0 app, and does on-the-fly archiving of undiscovered urls. | Go | *alpha* | @b5 | |
| 52 | +| [**Miru**](https://github.com/zsck/miru/) | A website monitoring tool that periodically checks a site for changes & running custom scripts. Miru takes user-contributed scripts capable of extracting uncrawlable content & executes them, recording their results in a uniform format. | Go | *beta* | @zsck | |
| 53 | +| [**Miru DB**](https://github.com/zsck/miru/blob/master/models/queries.go) | Miru's backing database. | sqLite | *beta* | @zsck | |
| 54 | +| [**Recipes**](https://github.com/datarescue-boston/harvesting-tools) | A collection of strategies (recipes) for dealing with various types of uncrawlable content. | **Various Languages** | *under construction* | @jeffreyliu | |
| 55 | +| ArchiveDB | Database of archived content & metadata. Schema is outlined [here](https://github.com/qri-io/patchbay/blob/master/sql/schema.sql). | Postgres | *alpha* | | |
| 56 | +| ArchiveContent S3 Bucket | A big S3 bucket to save & read content to. | S3 | *alpha* | | |
| 57 | +| [**Sentry**](https://github.com/qri-io/sentry) | Sentry is a web that continually scans for pages that haven't been checked in a while (or ever), and generates snapshots of what it finds. | **Go** | *planned* | @b5 | |
| 58 | + |
| 59 | + |
| 60 | +# Site Monitoring |
| 61 | +Services for tracking changes to websites. |
| 62 | + |
| 63 | +| Service | Description | Technologies | Status | Key Contributors | |
| 64 | +|---------|-------------|--------------|--------|------------------| |
| 65 | +| [**Web Monitoring DB**](https://github.com/edgi-govdata-archiving/web-monitoring-db) | Website Monitoring project: a more automated version of page monitoring with Versionista (proof of concept for now) | Ruby | *under construction* | @Mr0grog | |
| 66 | +| [**Web Monitoring Differ**](https://github.com/edgi-govdata-archiving/web-monitoring-differ) | Diffing service for the website monitoring project | Javascript | *under construction* | @WestleyArgentum | |
| 67 | +| [**Web Monitoring Processing**](https://github.com/edgi-govdata-archiving/web-monitoring-processing) | Website Monitoring project: data processing, PageFreezer integration, and (eventually) diff filtering and processing | Jupyter, Python | *under construction* | @danielballan | |
| 68 | +| [**Web Monitoring UI**](https://github.com/edgi-govdata-archiving/web-monitoring-ui) | Website Monitoring project: enable analysts to quickly assess changes to monitored government websites | Javascript | *under construction* | @lightandluck | |
| 69 | +| [**PageFreezer**](http://pagefreezer.com) | Page freezing / archiving service | *external service* | *integrating* | @danielballan | |
| 70 | +| [**Versionista**](https://versionista.com) | Page freezing / archiving service | *external service* | *in use* | @danielballan | |
| 71 | + |
| 72 | +# Distribution |
| 73 | +Services for disseminating content & data to others. |
| 74 | + |
| 75 | +| Service | Description | Technologies | Status | Key Contributors | |
| 76 | +|---------|-------------|--------------|--------|------------------| |
| 77 | +| **API** | JSON API to wrap & publish as many platform services as possible. This would include platform users, archived content, archived metadata, and web-monitoring diffs. | JSON | *planned* | | |
| 78 | +| **Bag-Gen** | A server to generate bags for bag-oriented data hosting services (ckan, dataverse, etc). This service is planned as a python wrapper around [python bagIt lib](https://pypi.python.org/pypi/bagit/) that turns it into a server can generate bags from archived content. | Python | *planned* | | |
| 79 | +| **IPFS-Node** | A Bundle of existing frameworks to publish & syncronize archived content with the [Inter-Planitary File System](https://ipfs.io). | **Go** | *planned* | | |
| 80 | +| **Dat-Gen** | Service for generating & hosting dat-data project packages. This is a planned lightweight node.js wrapper around the dat library, capabale of translating archived content to dat projects. | Javascript, Node.js | *planned* | | |
| 81 | + |
| 82 | +# Coordination |
| 83 | +Services for integrating with other services. |
| 84 | + |
| 85 | +| Service | Description | Technologies | Status | Key Contributors | |
| 86 | +|---------|-------------|--------------|--------|------------------| |
| 87 | +| **Coordinator** | Service that talks to other archiving services about content & metadata they have, using prewritten integrations for translating to each service. This service functions in a very similar fashion to the current Miru implementation. This service could be implemented either as another Miru instance, or a forked version (preference for implementing as an instance). | | *planned* | | |
| 88 | +| **Coordinator DB** | Cache of data we've received from other services in a format that matches archiveDB. | | *planned* | | |
| 89 | +| **Integrations** | Series of recipe-repos that map external data sources & destinations. | | *planned* | | |
| 90 | + |
0 commit comments