Skip to content

Commit 442f59e

Browse files
committed
working on the stuff
1 parent 1dd477f commit 442f59e

6 files changed

+114
-6
lines changed

getting_started.md

-3
This file was deleted.

glossary.md

+7
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,9 @@
11
# Glossary
22

3+
* Primer
4+
* Source
5+
* Web Page
6+
* Content
7+
* Metadata
8+
* Uncrawlable
9+
* Collection

open_source.md

+90
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Open Source
2+
Each of the following section breaks down an area in this diagram, listing the purpose of each service in that area.
3+
4+
![overview](diagrams/service-overview.png)
5+
6+
7+
# Authentication
8+
Services that manage users.
9+
10+
| Service | Description | Technologies | Status | Key Contributors |
11+
|---------|-------------|--------------|--------|------------------|
12+
| [**Identity**](https://github.com/qri-io/ident.archivers.space) | A central service for creating an account, login, logout, managing user info, etc. Other services can talk to the identity service to get information about a user. It's hosted [here](https://ident.archivers.space). Normal humans can create & manage accounts using the [archivers 2.0 webapp](https://alpha.archivers.space). | *alpha* | Go | @b5 |
13+
| [**IdentityDB**](https://github.com/qri-io/ident.archivers.space) | Database of all user identities. Only way to talk to it is through the identity service. | *alpha* | Postgres | @b5 |
14+
15+
16+
# Guidance
17+
Services that guide Data Rescue efforts.
18+
19+
| Service | Description | Technologies | Status | Key Contributors |
20+
|---------|-------------|--------------|--------|------------------|
21+
| [**Agency Primers**](https://envirodatagov.org/agencyprimers/) | Spreadsheet of agencies & sub-agencies for archiving. | Google Sheets, Airtable | *in use* | @mayaad, @trinberg, Andrew Bergman |
22+
| [**Chrome Extension**](https://github.com/edgi-govdata-archiving/eot-nomination-tool) | Chrome extension to nominate government data that needs to be preserved. | **Javascript**, **Chrome Extension** | *in use* | @ates, @titaniumbones |
23+
| **Uncrawlables Spreadsheet** | The chrome extension dumps it's output to a google sheet of uncrawlable content. | Google Sheets | *in use* | |
24+
25+
# Reporting
26+
Services for collecting & delivering platform-wide reports.
27+
28+
| Service | Description | Technologies | Status | Key Contributors |
29+
|---------|-------------|--------------|--------|------------------|
30+
| **Stats** | Server that periodically collects key stats platform-wide & reports them for easy public consumption. This service will ideally just consume the JSON API's of all the other services & output a dashboard, there are lots of frameworks out in the wild that do this. We should research & pick one. | *planning to use an existing solution* | *not yet started* | |
31+
| **Health** | Server that periodically polls all services on the platform & outputs a page that shows when a service is down / malfunctioning. Status check frameworks for this already exist. We should research & pick one. | *planning to use an existing solution* | *not yet started* | |
32+
33+
34+
# Archiving 1.0
35+
Current Services for downloading & storing content.
36+
37+
| Service | Description | Technologies | Status | Key Contributors |
38+
|---------|-------------|--------------|--------|------------------|
39+
| [**Archivers 1.0**](https://github.com/edgi-govdata-archiving/archivers.space) | App for volunteers to research urls, add metadata, and upload archived .zip's. | Javascript, MeteorJS, ReactJS | *in use* | @kmcculloch, @danielballan, @b5 |
40+
| [**Archivers 1.0 DB**](https://github.com/edgi-govdata-archiving/archivers.space/imports/api) | Archivers Backing database | MongoDB | *in use* | @b5 |
41+
| [**S3-Upload-Server**](https://github.com/edgi-govdata-archiving/s3-upload-server) | Server for making uploads to S3 buckets via browser or AWS CLI tokens | *in use* | **Go** | @b5 |
42+
| [**Zip-Starter**](https://github.com/edgi-govdata-archiving/zip-starter) | Server for generating base metadata zip archivers | *in use* | Go | @b5 |
43+
44+
45+
# Archiving 2.0
46+
New services for archiving & describing content.
47+
48+
| Service | Description | Technologies | Status | Key Contributors |
49+
|---------|-------------|--------------|--------|------------------|
50+
| [**Archivers 2.0**](https://github.com/qri-io/context) | Webapp for volunteers to add metadata to urls & content. | Javascript, ReactJS, Redux | *alpha* | @b5 |
51+
| [**Patchbay**](https://github.com/qri-io/patchbay) | Backing service for archivers app, it coordinates realtime communication between users of the archivers 2.0 app, and does on-the-fly archiving of undiscovered urls. | Go | *alpha* | @b5 |
52+
| [**Miru**](https://github.com/zsck/miru/) | A website monitoring tool that periodically checks a site for changes & running custom scripts. Miru takes user-contributed scripts capable of extracting uncrawlable content & executes them, recording their results in a uniform format. | Go | *beta* | @zsck |
53+
| [**Miru DB**](https://github.com/zsck/miru/blob/master/models/queries.go) | Miru's backing database. | sqLite | *beta* | @zsck |
54+
| [**Recipes**](https://github.com/datarescue-boston/harvesting-tools) | A collection of strategies (recipes) for dealing with various types of uncrawlable content. | **Various Languages** | *under construction* | @jeffreyliu |
55+
| ArchiveDB | Database of archived content & metadata. Schema is outlined [here](https://github.com/qri-io/patchbay/blob/master/sql/schema.sql). | Postgres | *alpha* | |
56+
| ArchiveContent S3 Bucket | A big S3 bucket to save & read content to. | S3 | *alpha* | |
57+
| [**Sentry**](https://github.com/qri-io/sentry) | Sentry is a web that continually scans for pages that haven't been checked in a while (or ever), and generates snapshots of what it finds. | **Go** | *planned* | @b5 |
58+
59+
60+
# Site Monitoring
61+
Services for tracking changes to websites.
62+
63+
| Service | Description | Technologies | Status | Key Contributors |
64+
|---------|-------------|--------------|--------|------------------|
65+
| [**Web Monitoring DB**](https://github.com/edgi-govdata-archiving/web-monitoring-db) | Website Monitoring project: a more automated version of page monitoring with Versionista (proof of concept for now) | Ruby | *under construction* | @Mr0grog |
66+
| [**Web Monitoring Differ**](https://github.com/edgi-govdata-archiving/web-monitoring-differ) | Diffing service for the website monitoring project | Javascript | *under construction* | @WestleyArgentum |
67+
| [**Web Monitoring Processing**](https://github.com/edgi-govdata-archiving/web-monitoring-processing) | Website Monitoring project: data processing, PageFreezer integration, and (eventually) diff filtering and processing | Jupyter, Python | *under construction* | @danielballan |
68+
| [**Web Monitoring UI**](https://github.com/edgi-govdata-archiving/web-monitoring-ui) | Website Monitoring project: enable analysts to quickly assess changes to monitored government websites | Javascript | *under construction* | @lightandluck |
69+
| [**PageFreezer**](http://pagefreezer.com) | Page freezing / archiving service | *external service* | *integrating* | @danielballan |
70+
| [**Versionista**](https://versionista.com) | Page freezing / archiving service | *external service* | *in use* | @danielballan |
71+
72+
# Distribution
73+
Services for disseminating content & data to others.
74+
75+
| Service | Description | Technologies | Status | Key Contributors |
76+
|---------|-------------|--------------|--------|------------------|
77+
| **API** | JSON API to wrap & publish as many platform services as possible. This would include platform users, archived content, archived metadata, and web-monitoring diffs. | JSON | *planned* | |
78+
| **Bag-Gen** | A server to generate bags for bag-oriented data hosting services (ckan, dataverse, etc). This service is planned as a python wrapper around [python bagIt lib](https://pypi.python.org/pypi/bagit/) that turns it into a server can generate bags from archived content. | Python | *planned* | |
79+
| **IPFS-Node** | A Bundle of existing frameworks to publish & syncronize archived content with the [Inter-Planitary File System](https://ipfs.io). | **Go** | *planned* | |
80+
| **Dat-Gen** | Service for generating & hosting dat-data project packages. This is a planned lightweight node.js wrapper around the dat library, capabale of translating archived content to dat projects. | Javascript, Node.js | *planned* | |
81+
82+
# Coordination
83+
Services for integrating with other services.
84+
85+
| Service | Description | Technologies | Status | Key Contributors |
86+
|---------|-------------|--------------|--------|------------------|
87+
| **Coordinator** | Service that talks to other archiving services about content & metadata they have, using prewritten integrations for translating to each service. This service functions in a very similar fashion to the current Miru implementation. This service could be implemented either as another Miru instance, or a forked version (preference for implementing as an instance). | | *planned* | |
88+
| **Coordinator DB** | Cache of data we've received from other services in a format that matches archiveDB. | | *planned* | |
89+
| **Integrations** | Series of recipe-repos that map external data sources & destinations. | | *planned* | |
90+
File renamed without changes.

quickstart.md

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Getting Started
2+
3+
Let's get started.
4+
5+
Adding metadata workflow:
6+
1. Pick Content
7+
2. Research
8+
3. Submit metadata

readme.md

+9-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,12 @@
11
# Archivers.space Docs
22

3-
Welcome! If you're new here, defintely check out the [getting started guide](getting_started.md).
4-
5-
[api documentation](https://api.docs.archivers.space)
3+
Welcome! If you're new here, defintely check out the.
64

5+
* [Quickstart](quickstart.md)
6+
* [Code of Conduct](code_of_conduct.md)
7+
* [Metadata](metadata.md)
8+
* [Glossary](glossary.md)
9+
* [Organizing an archiving event](organizing.md)
10+
* [Uncrawlables](uncrawlables.md)
11+
* [Open Source](open_source.md)
12+
* [API documentation](https://api.docs.archivers.space)

0 commit comments

Comments
 (0)