diff --git a/api/datasets/index.html b/api/datasets/index.html index 8b226cb..87b8367 100644 --- a/api/datasets/index.html +++ b/api/datasets/index.html @@ -416,6 +416,7 @@
squashfuse dataset.squ dataset/
pytact-check dataset/
pytact-visualize dataset/
+umount dataset/ # Unmount the dataset, optional
Finally, you can also unpack the image without mounting it (this is the only option for MacOS). On systems with limited memory or slow hard-disks this will diff --git a/index.json b/index.json index e510bdb..c203db9 100644 --- a/index.json +++ b/index.json @@ -75,7 +75,7 @@ "url": "https://coq-tactician.github.io/api/datasets/", "title": "Datasets", "description": "", - "content": "Online Data Explorer You can explore and visualize the latest dataset that has been extracted from Coq online. The visualization should give you some intuition about what kind of data is available and how it is encoded. If this data is interesting to you, you can find download and usage instructions below.\nYou can find the main entry-point for the visualization here. From there, you can explore the entire dataset. Examples of objects include:\nModule hierarchies showing dependencies between compilation units Global contexts of a compilation units Individual definitions within a global context Tactical proofs of theorems Individual Proof transformations of tactical proofs Downloads Datasets are published on Zenodo:\nData for API v15 on Coq v8.11 Accessing the data The data is encoded using the Cap\u0026rsquo;n Proto serialization protocol, allowing for fast random access to the graph and metadata from many programming languages. For Python, a library called PyTactician is provided that allows for easy and efficient access to the data. It includes software to visualize the dataset, a sanity checker for the dataset, an example prediction server that interfaces with Coq, an Oracle prediction server and an example proof exploration client. This library is a good starting point to explore the dataset. If you are looking to use another language to access the data, your starting point would be the graph_api.capnp schema file.\nThe data in the archive is represented as a SquashFS image dataset.squ. In order to access the data, you can either mount it or decompress it. Mounting is preferred on machines with limited memory. To mount, you need to install squashfuse through your favorite package manager.\nNote for MacOS users: Squashfuse is not available on MacOS. You can instead install squashfs from Brew and decompress the dataset using the unsquashfs instructions below.\nOnce squashfuse is installed, if you are using PyTactician, you can load the dataset by pointing directly to this image. This will automatically mount the image in directory dataset/. For example:\npytact-check dataset.squ pytact-visualize dataset.squ If you prefer to mount manually, or if you are not using the PyTactician library, you can mount the image using\nsquashfuse dataset.squ dataset/ pytact-check dataset/ pytact-visualize dataset/ Finally, you can also unpack the image without mounting it (this is the only option for MacOS). On systems with limited memory or slow hard-disks this will lead to some performance degradations while accessing the dataset. If you wish to unpack, you can run:\nunsquashfs -dest dataset/ dataset.squ Introspecting the dataset with PyTactician You can use PyTactician as a library to write scripts that extract information from the dataset. To get started, you can take a look at some simple example scripts. Documentation of the API is also available.\nRaw data inspection For each Coq compilation unit X, the dataset includes the original X.v source file. Alongside that file is a X.bin file with Cap\u0026rsquo;n Proto structure containing the data extracted during the compilation of X.\nIf you wish to inspect the raw data in the dataset manually, you can use capnp convert to decode an individual file in the dataset to JSON as follows:\ncat dataset/coq-tactician-stdlib.8.11.dev/theories/Init/Logic.bin | \\ capnp convert binary:json meta/graph_api.capnp Dataset You can process the resulting JSON further using, for example, the jq command.\n" + "content": "Online Data Explorer You can explore and visualize the latest dataset that has been extracted from Coq online. The visualization should give you some intuition about what kind of data is available and how it is encoded. If this data is interesting to you, you can find download and usage instructions below.\nYou can find the main entry-point for the visualization here. From there, you can explore the entire dataset. Examples of objects include:\nModule hierarchies showing dependencies between compilation units Global contexts of a compilation units Individual definitions within a global context Tactical proofs of theorems Individual Proof transformations of tactical proofs Downloads Datasets are published on Zenodo:\nData for API v15 on Coq v8.11 Accessing the data The data is encoded using the Cap\u0026rsquo;n Proto serialization protocol, allowing for fast random access to the graph and metadata from many programming languages. For Python, a library called PyTactician is provided that allows for easy and efficient access to the data. It includes software to visualize the dataset, a sanity checker for the dataset, an example prediction server that interfaces with Coq, an Oracle prediction server and an example proof exploration client. This library is a good starting point to explore the dataset. If you are looking to use another language to access the data, your starting point would be the graph_api.capnp schema file.\nThe data in the archive is represented as a SquashFS image dataset.squ. In order to access the data, you can either mount it or decompress it. Mounting is preferred on machines with limited memory. To mount, you need to install squashfuse through your favorite package manager.\nNote for MacOS users: Squashfuse is not available on MacOS. You can instead install squashfs from Brew and decompress the dataset using the unsquashfs instructions below.\nOnce squashfuse is installed, if you are using PyTactician, you can load the dataset by pointing directly to this image. This will automatically mount the image in directory dataset/. For example:\npytact-check dataset.squ pytact-visualize dataset.squ If you prefer to mount manually, or if you are not using the PyTactician library, you can mount the image using\nsquashfuse dataset.squ dataset/ pytact-check dataset/ pytact-visualize dataset/ umount dataset/ # Unmount the dataset, optional Finally, you can also unpack the image without mounting it (this is the only option for MacOS). On systems with limited memory or slow hard-disks this will lead to some performance degradations while accessing the dataset. If you wish to unpack, you can run:\nunsquashfs -dest dataset/ dataset.squ Introspecting the dataset with PyTactician You can use PyTactician as a library to write scripts that extract information from the dataset. To get started, you can take a look at some simple example scripts. Documentation of the API is also available.\nRaw data inspection For each Coq compilation unit X, the dataset includes the original X.v source file. Alongside that file is a X.bin file with Cap\u0026rsquo;n Proto structure containing the data extracted during the compilation of X.\nIf you wish to inspect the raw data in the dataset manually, you can use capnp convert to decode an individual file in the dataset to JSON as follows:\ncat dataset/coq-tactician-stdlib.8.11.dev/theories/Init/Logic.bin | \\ capnp convert binary:json meta/graph_api.capnp Dataset You can process the resulting JSON further using, for example, the jq command.\n" }, { "url": "https://coq-tactician.github.io/faq/",