forked from awslabs/open-data-registry
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsoftware-heritage.yaml
41 lines (39 loc) · 1.73 KB
/
software-heritage.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Name: Software Heritage Graph Dataset
Description: |
[Software Heritage](https://www.softwareheritage.org/) is the largest
existing public archive of software source code and accompanying
development history. The Software Heritage Graph Dataset is a fully
deduplicated Merkle DAG representation of the Software Heritage archive.
The dataset links together file content identifiers, source code
directories, Version Control System (VCS) commits tracking evolution over
time, up to the full states of VCS repositories as observed by Software
Heritage during periodic crawls. The dataset’s contents come from major
development forges (including GitHub and GitLab), FOSS distributions (e.g.,
Debian), and language-specific package managers (e.g., PyPI). Crawling
information is also included, providing timestamps about when and where all
archived source code artifacts have been observed in the wild.
Documentation: https://wiki.softwareheritage.org/wiki/Graph_Dataset_on_Amazon_Athena
Contact: [email protected]
UpdateFrequency: Data is updated yearly
Tags:
- aws-pds
- source code
- open source software
- free software
- digital preservation
License: |
Creative Commons Attribution 4.0 International.
By accessing the dataset, you agree with the Software Heritage [Ethical
Charter for using the archive
data](https://www.softwareheritage.org/legal/users-ethical-charter/) and
the [terms of use for bulk
access](https://www.softwareheritage.org/legal/bulk-access-terms-of-use/).
Resources:
- Description: Software Heritage Graph Dataset
ARN: arn:aws:s3:::softwareheritage
Region: us-east-1
Type: S3 Bucket
DataAtWork:
Tutorials:
Tools & Applications:
Publications: