Skip to content

Commit 5056dca

Browse files
authored
Merge pull request #4 from digitalevidencetoolkit/feat/move-docs
Feat/move docs
2 parents 0a83f96 + 434562d commit 5056dca

File tree

7 files changed

+294
-12
lines changed

7 files changed

+294
-12
lines changed

config.yaml

+5-7
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ params:
1919
single:
2020
include_footer: true
2121
font:
22-
name: 'Roboto'
22+
name: "Roboto"
2323
sizes: [400, 600, 700]
2424
hero:
2525
title: The Digital Evidence Preservation Toolkit
@@ -58,7 +58,7 @@ params:
5858

5959
section1:
6060
title: Enter, the Toolkit
61-
subtitle: 'At its core: **a ledger** with best-in-class cryptographic properties. **Immutable, replayable, durable.**'
61+
subtitle: "At its core: **a ledger** with best-in-class cryptographic properties. **Immutable, replayable, durable.**"
6262
tiles:
6363
- title: One click to preserve
6464
icon: mouse-globe
@@ -129,16 +129,14 @@ params:
129129
bulmalogo: false
130130
quicklinks:
131131
column2:
132-
title: 'Docs'
132+
title: "Docs"
133133
links:
134134
- text: Contact and About us
135135
link: /about
136136
- text: Get started
137-
link: https://digitalevidencetoolkit.notion.site/Getting-started-15521f4125534f4aa758a2575c27ad5c
137+
link: /getting-started
138138
- text: Technical documentation
139-
link: https://digitalevidencetoolkit.notion.site/Technical-Journal-01ad0720aebc4f9c9a8036da0fd7426b
140-
- text: Help and contribute
141-
link: https://digitalevidencetoolkit.notion.site/How-you-can-help-and-contribute-00ab347fa0fc49fd9ed42dc982e5f344
139+
link: /docs
142140
- text: Roadmap
143141
link: https://github.com/orgs/digitalevidencetoolkit/projects/3
144142
- text: Changelog

content/docs.md

+148
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
---
2+
title: "Technical Journal"
3+
include_footer: true
4+
---
5+
6+
Welcome to the documentation of the Digital Evidence Preservation Toolkit, a one-click tool to archive and annotate webpages while demonstrating chain of custody throughout. The Toolkit is a proof-of-concept software for researchers and small teams sifting through online material.
7+
8+
With only one click of the mouse, the material will be **archived in a framework demonstrating chain of custody** and **stored durably**. Once included in the growing database, users will be able to **go back to search through** and **annotate the material**, and to **export working copies** of said material for publication and dissemination.
9+
10+
A database built thusly can be handed to a prosecutor ten years down the line, and they will be able to say with mathematical certainty: **“the material in this archive is identical and contemporary to the one saved at the time, ten years ago.”**
11+
12+
---
13+
14+
# **The flow from 30,000ft:**
15+
16+
A **browser extension** is tasked with passing data from the user to the system.
17+
The system receives this data through HTTP requests and **records it into the ledger.
18+
A GUI of the library** is served by the system, and this can also add data to the ledger. **Annotations** can be added to the archive through the UI. **Working copies,** true to the originals, can be exported through the UI.
19+
20+
---
21+
22+
# 🤔 What is where?
23+
24+
**The browser extension** is currently written in **plain JS** (as well as some HTML/CSS). The JS assets are bundled and moved in place by Webpack, which also provides auto-reloading of the extension in-browser.
25+
26+
**The app and API** are currently written in (mostly) **Node & TypeScript**. It presently exposes REST endpoints (such as `/list-docs`, `/form`, etc.) and handles the interfacing with QLDB.
27+
28+
An **example UI** is included and built in **Svelte**, an amazing frontend framework. It demonstrates how some of the above API endpoints can be implemented and some of the capabilities of the tool.
29+
30+
All the above runs with `docker-compose`, as well as standalone `npm` scripts.
31+
32+
---
33+
34+
# 🥱 So, where are we ${today}?
35+
36+
## The API
37+
38+
Both the browser extension and the app/API are in a functioning state, though features need to be developed in sync to be considered complete.
39+
40+
Among other things, the browser extension is able to POST an object of the following shape to the API endpoint `/form` (ed: this name is terrible):
41+
42+
```tsx
43+
{ url: string,
44+
title: string,
45+
scr: Base64DataURI,
46+
one_file: HTMLCodeString }
47+
```
48+
49+
We're including:
50+
51+
- **a base64-encoded screenshot PNG** which, disappointingly, is only the visible part of the screen (see [`browser.tabs.captureVisibleTab`](https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/Tabs/captureVisibleTab)).
52+
Moving to a full-page screenshot will involve some fiddling with simulating a scroll while capturing with Screen Capture API, I'm told.
53+
These screenshots can be quite large (from a few hundred kbs to a couple of mbs) so on the app/API side we account for a chunked, streamed payload. All data from the browser is grouped in a `FormData()`.
54+
- **a long string of HTML code** which contains all HTML, inlined styles and JavaScript, as well as encoded images where possible.
55+
This is most definitely not quite `.mhtml`, which apparently is not supported on Firefox anymore since Quantum. Go figure!
56+
57+
---
58+
59+
A main `Record` type is defined as the central data structure flowing through the application.
60+
61+
```tsx
62+
type Record {
63+
64+
// list of files preserved and hashed
65+
// type Bundle
66+
bundle: [
67+
{ typ: 'screenshot' | 'one_file' | 'screenshot_thumbnail',
68+
hash: 'aaaaaaa'
69+
}, {...}
70+
],
71+
72+
// user-created data about the record
73+
// most probably after the original archive
74+
// type Annotations
75+
annotations: [
76+
description: 'description',
77+
other_key: 'other val'
78+
],
79+
80+
// data points about the page saved
81+
// type Data
82+
data: {
83+
title: 'page title',
84+
url: 'https://foo.bar.com'
85+
}
86+
87+
}
88+
```
89+
90+
**Examples of this data flowing:**
91+
92+
- Upon receiving `POST /form` , the API wrangles the payload data into this shape, which can then be passed to `Ledger.insertDoc` to be added to the ledger.
93+
(this includes side effects such as the writing to disk of screenshots and of the one-file archive)
94+
- The frontend consumes the result of `GET /list-docs`, which still fetches data from QLDB after passing it through two successive formatting functions:
95+
- `Record.fromLedger`, which takes QLDB-shaped data and builds a nice `Record` as defined above,
96+
- then `Record.toFrontend` , which takes a `Record` and builds a simplified shape for the frontend.
97+
98+
**Central to this type definition is the _Bundle:_**
99+
100+
```tsx
101+
type Bundle = File.File[];
102+
103+
type File = {
104+
kind: "screenshot" | "one_file" | "screenshot_thumbnail";
105+
hash: "xxx";
106+
};
107+
```
108+
109+
A Bundle is a list of files, which can only be of some kinds. At the back of our mind, these are the three kinds of files we're interested in for now:
110+
111+
- a page screenshot,
112+
- and its thumbnail for rendering in the UI,
113+
- as well as a one-file download of all the HTML/CSS/JS assets
114+
115+
The QLDB logic can be found under the `QLDB.*` namespace.
116+
117+
## The UI
118+
119+
The webapp uses [SvelteKit](https://kit.svelte.dev), a JS framework. It implements two notables routes – the two main use stories:
120+
121+
- The Library: `src/routes/library.svelte`
122+
- The Verification: `src/routes/verify.svelte`
123+
124+
**Library** renders a list of ledger entries, with their accompanying metadata. It supports the querying of a record's history, as well as the addition of metadata (i.e. a "description" field).
125+
126+
- details about how each of these features is replicated through the API
127+
128+
**Verification** implements the lookup process and surfacing of information made possible by the tool.
129+
130+
---
131+
132+
# Miscellaneous
133+
134+
### On uniqueness
135+
136+
Each record in our database contains a list of files that make it up (as of Aug 10th: a screenshot, its thumbnail, and a one-file HTML archive). Each is represented by its _kind_ and its hash (sha256).
137+
138+
The ID of the record is the hash of the concatenated hashes of its files:
139+
140+
`Record.id = hash(Record.files.sort().map(File.id).join(''))`
141+
142+
With self-identifiable data, it is possible to associate files to their ledger entries, since the ID can be computed from the files.
143+
144+
### On ledgers
145+
146+
"A non ledger database is table-centric. A ledger database is log-centric. **The log is the database.**" ([Ivan Moto](https://ivan.mw/2019-11-24/what-is-a-ledger-database))
147+
148+
"Standard databases track a sequence of transactions that add, delete, or change entries. Ledger databases add a layer of digital signatures for each transaction so anyone can audit the list and see that it was constructed correctly. More importantly, no one has gone back to adjust a previous transaction — to change history, so to speak." ([VentureBeat](https://venturebeat.com/2021/01/18/database-trends-why-you-need-a-ledger-database/))

content/getting-started.md

+138
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
---
2+
title: "Getting Started"
3+
include_footer: true
4+
---
5+
6+
## _“It's not you – it's me”_
7+
8+
If the instructions in this guide feel a bit much, it's likely because the Toolkit is still an alpha-version software which assumes a certain technical knowledge. There are technical solutions to simplifying this setup, but these were not prioritised.
9+
10+
<aside>
11+
🙏 If this is something you have expertise with and are happy to help, do reach out at **[email protected]**
12+
13+
</aside>
14+
15+
---
16+
17+
## Setting up the ledger
18+
19+
The Toolkit requires a working connection with Amazon Web Services, and thus that you have some kind of well-permissioned account or IAM role.
20+
21+
In short, you will need:
22+
23+
1. the AWS CLI and an authorised profile in `~/.aws/credentials` (see [“Installing the AWS CLI v2”](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) - _docs.aws.amazon.com)_
24+
2. an existing QLDB ledger, with a blank table in it (see [“Creating a QLDB ledger from the AWS Console”](https://qldbguide.com/docs/guide/getting-started/#using-aws-console) - _qldbguide.com_)
25+
26+
Not required but recommended is an S3 bucket in which to store Toolkit data.
27+
28+
**Remember the names** of the ledger and of its table. You'll need them shortly (see "Environment" below).
29+
30+
---
31+
32+
## Environment
33+
34+
After cloning the repository, create an `.env` file at the root or copy `.env.example`. The job of this file is to contain variables you really don't want to share publicly, so keep this out of version control software.
35+
36+
This file **must contain:**
37+
38+
- AWS access credentials and preferred region
39+
- Details about the ledger and S3 bucket
40+
41+
```bash
42+
AWS_ACCESS_KEY_ID="your aws access key"
43+
AWS_SECRET_ACCESS_KEY="your aws secret key"
44+
AWS_REGION="eu-central-1 (or another region)"
45+
BUCKET_NAME="anS3BucketName"
46+
LEDGER_NAME="yourLedgerName"
47+
DOC_TABLE_NAME="yourTableName"
48+
```
49+
50+
---
51+
52+
## Recommended way of running the Toolkit
53+
54+
<aside>
55+
💡 At the very least, **you will need Docker installed on your system.
56+
** https://docs.docker.com/get-docker/
57+
58+
</aside>
59+
60+
The Docker Compose orchestration is composed of several services:
61+
62+
1. An Express/TypeScript API
63+
2. A plain JS browser extension
64+
3. And a frontend
65+
66+
To start the whole app:
67+
68+
```bash
69+
$ docker-compose up
70+
```
71+
72+
---
73+
74+
## Running without Docker
75+
76+
Ensure you're running `node > 10.0` — the recommended version is the LTS, i.e. `node v16`. If you are using `nvm`:
77+
78+
```bash
79+
$ nvm use --lts
80+
> Now using node v16.13.0
81+
```
82+
83+
Manually install dependencies for each service:
84+
85+
```bash
86+
$ cd ui/ & npm install
87+
$ cd extension/ & npm install
88+
$ npm install
89+
```
90+
91+
Then use the npm script including all services:
92+
93+
```bash
94+
$ npm run all
95+
```
96+
97+
---
98+
99+
## Storage options
100+
101+
By including a bucket in the `.env` config, you’re choosing to replicate your archival on S3.
102+
103+
Namely, the Store (`src/store/index.ts`) will:
104+
105+
- upon receiving an archive request, store the Bundle files both locally and on S3,
106+
- and upon receiving a file request (e.g. the UI fetching thumbnails), serve it from S3.
107+
108+
---
109+
110+
## Is there anybody out there?
111+
112+
### API and frontend
113+
114+
The API should be available at [http://localhost:3000](https://github.com/digitalevidencetoolkit/deptoolkit/releases) — assert this by running:
115+
116+
```bash
117+
$ curl http://localhost:3000/list-docs
118+
> [ {...}, {...} ]
119+
```
120+
121+
The UI should be available at http://localhost:8000 in your web browser of choice. API requests are proxied through the UI. Thus, the following queries are equivalent:
122+
123+
```bash
124+
$ curl http://localhost:3000/list-docs // as before
125+
$ curl http://localhost:8000/api/list-docs
126+
```
127+
128+
### Browser extension
129+
130+
The extension should be being bundled on your filesystem. Pop open your browser's extension runtime by pasting this in the URL bar:
131+
132+
`about:debugging#/runtime/this-firefox`
133+
134+
Click _“Load temporary Add-on...”_ and navigate to `extension/addon` to select `manifest.json`.
135+
136+
The extension should have been loaded in your extension bar, as shown below:
137+
138+
![Untitled](/static/images/dept-untitled.png)

layouts/_default/baseof.html

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
<!DOCTYPE html>
22
<html lang="{{ .Site.LanguageCode }}">
33
<head>
4-
{{ partial "meta.html" . }}
54
<title>{{ block "title" . }}{{ .Site.Title }}{{ end }}</title>
65
{{ partial "css.html" . }}
76
{{ $options := (dict "targetPath" "custom.css" "outputStyle" "compressed" "enableSourceMap" true) }}
@@ -23,5 +22,5 @@
2322
{{ partial "javascript.html" . }}
2423
</body>
2524

26-
{{ partial "analytics.html" }}
25+
<!-- {{ partial "analytics.html" }} -->
2726
</html>

layouts/_default/single.html

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
<!DOCTYPE html>
22
<html lang="{{ .Site.LanguageCode }}">
33
<head>
4-
{{ partial "meta.html" . }}
54
<title>{{ block "title" . }}{{ .Site.Title }}{{ end }}</title>
65
{{ partial "css.html" . }} {{ $options := (dict "targetPath" "custom.css"
76
"outputStyle" "compressed" "enableSourceMap" true) }} {{ $style :=
@@ -20,5 +19,5 @@
2019
{{ partial "javascript.html" . }}
2120
</body>
2221

23-
{{ partial "analytics.html" }}
22+
<!-- {{ partial "analytics.html" }} -->
2423
</html>

layouts/partials/css.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
{{- $inServerMode := .Site.IsServer }}
1+
{{- $inServerMode := hugo.IsServer }}
22
{{- $sass := "style.sass" }}
33
{{- $cssTarget := "css/style.css" }}
44
{{- $cssOpts := cond ($inServerMode) (dict "targetPath" $cssTarget "enableSourceMap" true) (dict "targetPath" $cssTarget "outputStyle" "compressed") }}

static/images/dept-untitled.png

47.8 KB
Loading

0 commit comments

Comments
 (0)