Data Rich Documents - a definition (draft, in progress) #1047
Unanswered
rufuspollock
asked this question in
General
Replies: 1 comment 2 replies
-
Fully agree with the motivation! Having easy ways of publishing datasets with a nice UX would be awesome. Sharing two more resources that could also help as inspiration: They both usually include transformations inside the documents but support loading data declaratively and plotting it. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Data Rich Documents is a specification for authoring and rendering content with data.
Data Rich Documents are markdown files with defined extensions for embedding or linking data and for presenting data in tables, graphs and maps.
Demo
TODO ... (create a website using flowershow and sample content - an update of https://github.com/datopian/data-literate)
Motivation
Data-oriented markdown file is the kind of thing in https://github.com/datasets/awesome-data or one of the issues https://github.com/datasets/awesome-data/issues
We want things like:
Data Rich Documents
We term these kind of markdown documents with additional data-oriented functionality "data rich".
Our "data rich" document is a markdown (or MDX) file with the following additional features:
Use cases (in full)
Researching or writing up a (data-heavy) topic and sharing it
I want to jot down, notes and links, preview data, display or graph data etc. I want to do this in markdown as that's what i work in. I want to be able to preview and then share this with others.
Curating data
I want to quickly turn some data I've found into a properly curated dataset. I might want to do this iteratively: starting with the minimum e.g. just a link and a few notes, then moving to caching the data, then previewing the data etc. The kind of thing you find in the issues of github.com/datasets/awesome-data/issues. I want to view this dataset as i've developing it and publish this dataset when its down (or as i go along)
Content + Data are naturally co-occurring in many settings
In reality, data and content usually go together. For example, consider two main and common use cases:
Desired features of the tooling:
How is this different from ...
So one big thing is we've no interest to tackle interactivity like Jupyter or ObservableHQ. The aim is for the documents and the data themselves to be static. Any processing you do of them is done elsewhere and any loading and procesing of data is purely declarative (ie. here's the url of the data).
Inspiration
History
This work goes way back personally to efforts in the 2000s and early work on open economics. Then work on ReclineJS in early versions of CKAN (c. 2010-2013).
In its recent form started in 2020/2021 with demo code in https://github.com/datopian/data-literate
Beta Was this translation helpful? Give feedback.
All reactions