June 26, 2018, Tuesday

Bugs
- Jbrowse_out
  - Maybe just keep it at CSHL? Otherwise will need to copy it and uncompress it
  - If so, need to remove both annotation and assembly workflows from de.sciapps.org
- FastQC_out
  - Folder can not be visualized
  - For the visualization window, we might need to browse down to files for both 'link' and 'visualization'
- TACC samtools index does not always work, not sure why we get following error
  - http://datacommons.cyverse.org/browse/iplant/home/lwang/sci_data/results/STAR_align_Stampede2-2.5.3_7329c7f7-c480-49c9-a7c1-d33cd05236ca/job-for-star_align_stampede2-2-5-3-6785646041787526680-242ac113-0001-007.err
  - samtools: error while loading shared libraries: /usr/local/bin/../lib/./libssl.so.1.0.0: unsupported version 0 of Verneed record
  - https://github.com/openssl/openssl/issues/4170
To do
- Push to maizecode branch, and deploy on de2.sciapps.org
- Adjust history job name length so it won't wrap to two lines
- Modify link and visualize buttons
  - Can we let link button to copy link to history (not clickable but downloadable)
  - Visualize button will open the file (disable it for genome browser files, bam, etc)
Plan for ENCODE DCC
- iRODS webfront end (might be tricky, we can rely on icommands, DE, CyberDuck)
- Workflow API
  - Command line version of SciApps
    - Give RNA-Seq as an example, we have the workflow json for one replicate, now we want to apply the workflow to another replicate (paired reads)
    - curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -F "[email protected]" https://public.tenants.agaveapi.co/jobs/v2?pretty=true
      - Workflow JSON
      - New set of reads (replace inside JSON)
      - Workflow name (replace inside JSON)
- Flow
  - Domain we will use: data.maizecode.org (brie)
    - We have a workflow JSON with inputs from CyVerse Data Store and outputs archived to there too
    - We have metadata attached to inputs (reads)
      - /iplant/home/shared/maizecode/B73v4/RNA-seq/B73LongRampage/BioProject_PRJNA438108/BioSample1/BioSample1Library1
      - Most metadata @BioSample1, some @BioProject_PRJNA438108, and @BioSample1Library1 (distinguish RNA-Seq from RAMPAGE)
      - Can we generate a 'experiment JSON'?
        
        Workflow JSON id
        
        Replicate JSON id for each input data: combines metadata for each input file
        
        We need to use DS uuid for each file instead of path? Maybe we will use path for now
  - Search
    - elasticsearch vs solr vs mysql (preferred)
    - Assume we have run two sets of RNA-Seq data (data_a (root) and data_b (shoot))
    - Search for 'root' should return following page
      - Left: filter by organism (B73, W22, NC350, Til11) and Tissue
      - Right: List of workflow names (some metadata, workflow_id, summary or workflow description)
      - Example:
        
        /iplant/home/shared/maizecode/B73v4/RNA-seq/B73LongRampage/BioProject_PRJNA438108
        
        https://www.encodeproject.org/search/?searchTerm=h3k4me3
    - Click on each workflow will bring up the workflow page
      - Example: https://www.encodeproject.org/experiments/ENCSR285FZP/
      - Question: Do we use one workflow for one replicate or one workflow for all replicates?
        
        We will use one workflow for all replicates
        
        However, each replicate has one BioSample ID @ NCBI SRA
    - Page rendering
      - Example: https://www.encodeproject.org/experiments/ENCSR285FZP/?format=json
  - Download
    - For inputs, we can make them public and direct user to Data Common landing page
    - For outputs, they are already public through the workflow, and direct user to DC landing page
  - Visualize
    - Direct user to SciApps to visualize results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

June 26, 2018, Tuesday

Clone this wiki locally