|
| 1 | +--- |
| 2 | +title: "Technical Journal" |
| 3 | +include_footer: true |
| 4 | +--- |
| 5 | + |
| 6 | +Welcome to the documentation of the Digital Evidence Preservation Toolkit, a one-click tool to archive and annotate webpages while demonstrating chain of custody throughout. The Toolkit is a proof-of-concept software for researchers and small teams sifting through online material. |
| 7 | + |
| 8 | +With only one click of the mouse, the material will be **archived in a framework demonstrating chain of custody** and **stored durably**. Once included in the growing database, users will be able to **go back to search through** and **annotate the material**, and to **export working copies** of said material for publication and dissemination. |
| 9 | + |
| 10 | +A database built thusly can be handed to a prosecutor ten years down the line, and they will be able to say with mathematical certainty: **“the material in this archive is identical and contemporary to the one saved at the time, ten years ago.”** |
| 11 | + |
| 12 | +--- |
| 13 | + |
| 14 | +# **The flow from 30,000ft:** |
| 15 | + |
| 16 | +A **browser extension** is tasked with passing data from the user to the system. |
| 17 | +The system receives this data through HTTP requests and **records it into the ledger. |
| 18 | +A GUI of the library** is served by the system, and this can also add data to the ledger. **Annotations** can be added to the archive through the UI. **Working copies,** true to the originals, can be exported through the UI. |
| 19 | + |
| 20 | +--- |
| 21 | + |
| 22 | +# 🤔 What is where? |
| 23 | + |
| 24 | +**The browser extension** is currently written in **plain JS** (as well as some HTML/CSS). The JS assets are bundled and moved in place by Webpack, which also provides auto-reloading of the extension in-browser. |
| 25 | + |
| 26 | +**The app and API** are currently written in (mostly) **Node & TypeScript**. It presently exposes REST endpoints (such as `/list-docs`, `/form`, etc.) and handles the interfacing with QLDB. |
| 27 | + |
| 28 | +An **example UI** is included and built in **Svelte**, an amazing frontend framework. It demonstrates how some of the above API endpoints can be implemented and some of the capabilities of the tool. |
| 29 | + |
| 30 | +All the above runs with `docker-compose`, as well as standalone `npm` scripts. |
| 31 | + |
| 32 | +--- |
| 33 | + |
| 34 | +# 🥱 So, where are we ${today}? |
| 35 | + |
| 36 | +## The API |
| 37 | + |
| 38 | +Both the browser extension and the app/API are in a functioning state, though features need to be developed in sync to be considered complete. |
| 39 | + |
| 40 | +Among other things, the browser extension is able to POST an object of the following shape to the API endpoint `/form` (ed: this name is terrible): |
| 41 | + |
| 42 | +```tsx |
| 43 | +{ url: string, |
| 44 | + title: string, |
| 45 | + scr: Base64DataURI, |
| 46 | + one_file: HTMLCodeString } |
| 47 | +``` |
| 48 | + |
| 49 | +We're including: |
| 50 | + |
| 51 | +- **a base64-encoded screenshot PNG** which, disappointingly, is only the visible part of the screen (see [`browser.tabs.captureVisibleTab`](https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/Tabs/captureVisibleTab)). |
| 52 | + Moving to a full-page screenshot will involve some fiddling with simulating a scroll while capturing with Screen Capture API, I'm told. |
| 53 | + These screenshots can be quite large (from a few hundred kbs to a couple of mbs) so on the app/API side we account for a chunked, streamed payload. All data from the browser is grouped in a `FormData()`. |
| 54 | +- **a long string of HTML code** which contains all HTML, inlined styles and JavaScript, as well as encoded images where possible. |
| 55 | + This is most definitely not quite `.mhtml`, which apparently is not supported on Firefox anymore since Quantum. Go figure! |
| 56 | + |
| 57 | +--- |
| 58 | + |
| 59 | +A main `Record` type is defined as the central data structure flowing through the application. |
| 60 | + |
| 61 | +```tsx |
| 62 | +type Record { |
| 63 | + |
| 64 | + // list of files preserved and hashed |
| 65 | + // type Bundle |
| 66 | + bundle: [ |
| 67 | + { typ: 'screenshot' | 'one_file' | 'screenshot_thumbnail', |
| 68 | + hash: 'aaaaaaa' |
| 69 | + }, {...} |
| 70 | + ], |
| 71 | + |
| 72 | + // user-created data about the record |
| 73 | + // most probably after the original archive |
| 74 | + // type Annotations |
| 75 | + annotations: [ |
| 76 | + description: 'description', |
| 77 | + other_key: 'other val' |
| 78 | + ], |
| 79 | + |
| 80 | + // data points about the page saved |
| 81 | + // type Data |
| 82 | + data: { |
| 83 | + title: 'page title', |
| 84 | + url: 'https://foo.bar.com' |
| 85 | + } |
| 86 | + |
| 87 | +} |
| 88 | +``` |
| 89 | + |
| 90 | +**Examples of this data flowing:** |
| 91 | + |
| 92 | +- Upon receiving `POST /form` , the API wrangles the payload data into this shape, which can then be passed to `Ledger.insertDoc` to be added to the ledger. |
| 93 | + (this includes side effects such as the writing to disk of screenshots and of the one-file archive) |
| 94 | +- The frontend consumes the result of `GET /list-docs`, which still fetches data from QLDB after passing it through two successive formatting functions: |
| 95 | + - `Record.fromLedger`, which takes QLDB-shaped data and builds a nice `Record` as defined above, |
| 96 | + - then `Record.toFrontend` , which takes a `Record` and builds a simplified shape for the frontend. |
| 97 | + |
| 98 | +**Central to this type definition is the _Bundle:_** |
| 99 | + |
| 100 | +```tsx |
| 101 | +type Bundle = File.File[]; |
| 102 | + |
| 103 | +type File = { |
| 104 | + kind: "screenshot" | "one_file" | "screenshot_thumbnail"; |
| 105 | + hash: "xxx"; |
| 106 | +}; |
| 107 | +``` |
| 108 | + |
| 109 | +A Bundle is a list of files, which can only be of some kinds. At the back of our mind, these are the three kinds of files we're interested in for now: |
| 110 | + |
| 111 | +- a page screenshot, |
| 112 | +- and its thumbnail for rendering in the UI, |
| 113 | +- as well as a one-file download of all the HTML/CSS/JS assets |
| 114 | + |
| 115 | +The QLDB logic can be found under the `QLDB.*` namespace. |
| 116 | + |
| 117 | +## The UI |
| 118 | + |
| 119 | +The webapp uses [SvelteKit](https://kit.svelte.dev), a JS framework. It implements two notables routes – the two main use stories: |
| 120 | + |
| 121 | +- The Library: `src/routes/library.svelte` |
| 122 | +- The Verification: `src/routes/verify.svelte` |
| 123 | + |
| 124 | +**Library** renders a list of ledger entries, with their accompanying metadata. It supports the querying of a record's history, as well as the addition of metadata (i.e. a "description" field). |
| 125 | + |
| 126 | +- details about how each of these features is replicated through the API |
| 127 | + |
| 128 | +**Verification** implements the lookup process and surfacing of information made possible by the tool. |
| 129 | + |
| 130 | +--- |
| 131 | + |
| 132 | +# Miscellaneous |
| 133 | + |
| 134 | +### On uniqueness |
| 135 | + |
| 136 | +Each record in our database contains a list of files that make it up (as of Aug 10th: a screenshot, its thumbnail, and a one-file HTML archive). Each is represented by its _kind_ and its hash (sha256). |
| 137 | + |
| 138 | +The ID of the record is the hash of the concatenated hashes of its files: |
| 139 | + |
| 140 | +`Record.id = hash(Record.files.sort().map(File.id).join(''))` |
| 141 | + |
| 142 | +With self-identifiable data, it is possible to associate files to their ledger entries, since the ID can be computed from the files. |
| 143 | + |
| 144 | +### On ledgers |
| 145 | + |
| 146 | +"A non ledger database is table-centric. A ledger database is log-centric. **The log is the database.**" ([Ivan Moto](https://ivan.mw/2019-11-24/what-is-a-ledger-database)) |
| 147 | + |
| 148 | +"Standard databases track a sequence of transactions that add, delete, or change entries. Ledger databases add a layer of digital signatures for each transaction so anyone can audit the list and see that it was constructed correctly. More importantly, no one has gone back to adjust a previous transaction — to change history, so to speak." ([VentureBeat](https://venturebeat.com/2021/01/18/database-trends-why-you-need-a-ledger-database/)) |
0 commit comments