make etl-env
: Create template .env file. Used to provide ETL image with Neo4j write creditials.
make etl-sql
: Run ETL pipeline to populate knoweledge graph from datasources listed below.make etl-clear-cache
: Clear cached queries from above command.make etl-projection
: Export full knowledge graph into pytorch-friendly dataset on disk.
make etl
: build docker image without running any commands.make etl-connect
: connect to running docker image
- Palmprints are subsequences within a Sequence Read Archive (SRA) run
- Each unique palmprint is assigned a unique identifier, i.e. its
palm_id
- (uclust) identifies
palm_id
centroids as representatives of a species-like operational taxonomical unit (sOTU). The sOTU is described with its existingpalm_id
usearch -calc_distmx otu_centroids.fa -tabbedout palmdb.40id_edge.txt \
-maxdist 0.6 -termdist 0.7
Input file is palmDB (https://github.com/rcedgar/palmdb) OTU centroids from the 03-14-21 snapshot
- Taxonomy data reference README