This python script loads data from ClinicalTrials.gov API into the neo4j based covidgraph. The script gets data from the StudyFields, which are described on this API homepage.
Maintainer: Kirsten
Version: 0.1.1
Neo4j version: < 3.5.17
APOC version: < 3.5.0.11
Docker image location: covidgraph/data-clinical_trials_gov
docker run -it --rm --name data-cord19 -e GC_NEO4J_URL="bolt://${HOSTNAME}:7687" covidgraph/data-clinical_trials_gov
docker build -t data-clinical_trials_gov .
docker run -it --rm --name data-cord19 -e GC_NEO4J_URL='bolt://myneo4jhostname:7687' -e GC_NEO4J_USER=neo4j -e GC_NEO4J_PASSWORD=mysecret data-clinical_trials_gov
The most important Env variables are:
GC_NEO4J_URL
: The full bolt url example 'bolt://myneo4jhostname:7687'
GC_NEO4J_USER
: The neo4j user
GC_NEO4J_PASSWORD
: The neo4j password
Due to a limit of 1000 studies to be returned from a query[https://clinicaltrials.gov/api/gui/demo/simple_study_fields], the queries has been split into 3 parts (syntax for the query): Studies contatining the word COVID for
- Obervational studies (COVID AND AREA[StudyType]Observational)
- Interventional studies (COVID AND AREA[StudyType]Interventional)
- NOT (Observations AND Interventional) studies - e..g expanded access(COVID AND NOT AREA[StudyType]Interventional AND NOT AREA[StudyType]Observational)
The following study fields are seleted:
NCTId,
StudyType,
Phase,
Condition,
BriefTitle,
LeadSponsorName,
LocationFacility,
LocationCity,
LocationState,
LocationCountry,
InterventionName,
CollaboratorName,
OverallStatus,
PrimaryOutcomeMeasure,
EligibilityCriteria,
StartDate,
StudyFirstSubmitDate,
PrimaryCompletionDate.
Decription of the fields can be found here: https://clinicaltrials.gov/api/gui/ref/crosswalks.
At this point no results information can be found for COVID studies. This will be added once results are avilable.