-
This tool will validate your sample metadata against one or more schemas. Drag and drop all parts of your PEP here. This includes metadata only, which is the config YAML file, any sample or subsample table CSV files, etc. Then, click "Validate".
-
-
-
-
- No results to display.
-
-
-
-
Want API access? This tool is a static, client-hosted form that accesses an API validator service based on peppy. You can also access this service programatically if you want to validate sample metadata as part of a pipeline or other tool.
-
-
-
-
-
-
diff --git a/docs/writing-a-filter.md b/docs/writing-a-filter.md
deleted file mode 100644
index 87f91180..00000000
--- a/docs/writing-a-filter.md
+++ /dev/null
@@ -1,55 +0,0 @@
-**Filters are an experimental feature and may change in feature versions of `eido`**
-
-# How to write a custom eido filter
-
-One of `eido`'s tasks is to provide a CLI to convert a PEP into alternative formats. These include some built-in formats, like `csv` (which spits out a processed `csv` file, with project/sample modified), `yaml`, and a few others. It also provides a plugin system so that you can write your own Python functions to provide custom output formats.
-
-## Custom filters
-
-To write a custom filter, start by writing a Python package. You will need to include a function that takes a `peppy.Project` object as input, and prints out the custom file format. The filter functions also can require additional keyword arguments.
-
-### 1. Write functions to call
-
-The package contain one or more functions. The filter function **must take a peppy.Project object and `**kwargs` as parameters**. Example:
-
-```python
-import peppy
-
-def my_custom_filter(p, **kwargs):
- import re
- import sys
- import yaml
-
- for s in p.samples:
- sys.stdout.write("- ")
- out = re.sub('\n', '\n ', yaml.safe_dump(s.to_dict(), default_flow_style=False))
- sys.stdout.write(out + "\n")
-```
-For reference you can check the signatures of the functions in [Built-in `eido` Plugins Documentation](plugin_api_docs.md). Importantly, if the function *requires* any arguments (always provided via `**kwargs`), the creator of the function should take care of handling missing/faulty input.
-
-Next, we need to link that function in to the `eido` filter plugin system.
-
-### 2. Add entry_points to setup.py
-
-The `setup.py` file uses `entry_points` to specify a mapping of eido hooks to functions to call.
-
-```python
-entry_points={
- "pep.filters": [
- "basic=eido.conversion_plugins:basic_pep_filter",
- "yaml=eido.conversion_plugins:yaml_pep_filter",
- "csv=eido.conversion_plugins:csv_pep_filter",
- "yaml-samples=eido.conversion_plugins:yaml_samples_pep_filter",
- ],
-},
-```
-
-The format is: `'pep.filters': 'FILTER_NAME=PLUGIN_PACKAGE_NAME:FUNCTION_NAME'`.
-
-- "FILTER_NAME" can be any unique identifier for your plugin
-- "PLUGIN_PACKAGE_NAME" must be the name of python package the holds your plugin.
-- "FUNCTION_NAME" must match the name of the function in your package
-
-### 3. Install package
-
-If you install this package, any filters provided by it will be available for use with eido, which you can see using `eido filters`.
diff --git a/docs/writing-a-schema.md b/docs/writing-a-schema.md
deleted file mode 100644
index 1f713629..00000000
--- a/docs/writing-a-schema.md
+++ /dev/null
@@ -1,124 +0,0 @@
-# How to write a PEP schema
-
-If you are a tool developer, we recommend you write a PEP schema that describes what sample and project attributes are required for your tool to work. PEP schemas use the [JSON Schema](https://json-schema.org/) vocabulary, plus some additional features. This guide will walk you through everything you need to know to write your own schema. It assumes you already have a basic familiarity with JSON Schema.
-
-
-## Importing the base PEP schema
-
-One of the features added by `eido` is the `imports` attribute. This allows you to extend existing schemas. We recommend your new PEP schema start by importing the [base PEP schema](http://schema.databio.org/pep/2.0.0.yaml). This will ensure that the putative PEP at least follows the basic PEP specification, which you will then build on with your tool-specific requirements. Here's how we'll start with importing the generic base PEP schema:
-
-```yaml
-description: A example schema for a pipeline.
-imports:
- - http://schema.databio.org/pep/2.0.0.yaml
-```
-
-You can also use the `imports` to build other schemas that subclass your own schemas.
-
-## Project and sample sections
-
-Like the PEP itself, the schema is divided into two sections, one for the project config, and one for the samples. So, base PEP schema defines an object with two components: a `config` object, and a `samples` array:
-
-
-```yaml
-description: A example schema for a pipeline.
-imports:
- - http://schema.databio.org/pep/2.0.0.yaml
-properties:
- config:
- type: object
- samples:
- type: array
-required:
- - samples
- - config
-```
-
-
-## Required sample attributes
-
-Let's say you're writing a PEP-compatible tool that requires 3 arguments: `read1`, `read2`, and `genome`, and also offers optional argument `read_length`. Validating the generic PEP specification will not confirm all required attributes, so you want to write an extended schema. Starting from the base above, we're not changing the `config` section so we can drop that, and we add new parameters for the required sample attributes like this:
-
-```yaml
-description: A example schema for a pipeline.
-imports:
- - http://schema.databio.org/pep/2.0.0.yaml
-properties:
- samples:
- type: array
- items:
- type: object
- properties:
- read1:
- type: string
- description: "Fastq file for read 1"
- read2:
- type: string
- description: "Fastq file for read 2"
- genome:
- type: string
- description: "Refgenie genome registry identifier"
- read_length:
- type: integer
- description: "Length of the Unique Molecular Identifier, if any"
- required:
- - read1
- - read2
- - genome
-required:
- - samples
-```
-
-This document defines the required an optional sample attributes for this pipeline. That's all you need to do, and your users can validate an existing PEP to see if it meets the requirements of your tool.
-
-## Required input files
-
-In the above example, we listed `read1` and `read2` attributes as *required*. This will enforce that these attributes must be defined on the samples, but for this example, this is not enough -- these also must *point to files that exist*. Checking for files is outside the scope of JSON Schema, which only validates JSON documents, so eido extends JSON Schema with the ability to specify which attributes should point to files.
-
-Eido provides two ways to do it: `files` and `required_files`. The basic `files` is simply used to specify which attributes point to files, which are not required to exist. This is useful for tools that want to calculate the total size of any provided inputs, for example. The `required_files` list specifies that the attributes point to files that *must exist*, otherwise the PEP doesn't validate. Here's an example of specifying an optional and required input attribute:
-
-```yaml
-description: A PEP for ATAC-seq samples for the PEPATAC pipeline.
-imports:
- - http://schema.databio.org/pep/2.0.0.yaml
-properties:
- samples:
- type: array
- items:
- type: object
- properties:
- sample_name:
- type: string
- description: "Name of the sample"
- organism:
- type: string
- description: "Organism"
- protocol:
- type: string
- description: "Must be an ATAC-seq or DNAse-seq sample"
- genome:
- type: string
- description: "Refgenie genome registry identifier"
- read_type:
- type: string
- description: "Is this single or paired-end data?"
- enum: ["SINGLE", "PAIRED"]
- read1:
- type: string
- description: "Fastq file for read 1"
- read2:
- type: string
- description: "Fastq file for read 2 (for paired-end experiments)"
- required_files:
- - read1
- files:
- - read1
- - read2
-```
-
-This could a valid example for a pipeline that accepts either single-end or paired-end data, so `read1` must point to a file, whereas `read2` isn't required, but if it does point to a file, then this file is also to be considered an input file.
-
-
-## Example schemas
-
-If you need more information, it would be a good idea to look at [example schemas](example-schemas.md) for ideas.
diff --git a/docs_jupyter/build/.gitignore b/docs_jupyter/build/.gitignore
deleted file mode 100644
index d6b7ef32..00000000
--- a/docs_jupyter/build/.gitignore
+++ /dev/null
@@ -1,2 +0,0 @@
-*
-!.gitignore
diff --git a/docs_jupyter/cli.ipynb b/docs_jupyter/cli.ipynb
deleted file mode 100644
index 37377cbe..00000000
--- a/docs_jupyter/cli.ipynb
+++ /dev/null
@@ -1,407 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# `eido` command line usage"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "To use the command line application one just needs a path to a project configuration file. It is a positional argument in the `eido` command.\n",
- "\n",
- "For this tutorial, let's grab a PEP from a public example repository that describes a few PRO-seq test samples:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Cloning into 'ppqc'...\n",
- "remote: Enumerating objects: 154, done.\u001b[K\n",
- "remote: Counting objects: 100% (20/20), done.\u001b[K\n",
- "remote: Compressing objects: 100% (15/15), done.\u001b[K\n",
- "remote: Total 154 (delta 7), reused 17 (delta 5), pack-reused 134\u001b[K\n",
- "Receiving objects: 100% (154/154), 81.69 KiB | 3.27 MiB/s, done.\n",
- "Resolving deltas: 100% (82/82), done.\n"
- ]
- }
- ],
- "source": [
- "rm -rf ppqc\n",
- "git clone https://github.com/databio/ppqc.git --branch cfg2"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [],
- "source": [
- "cd ppqc\n",
- "export DATA=$HOME\n",
- "export SRAFQ=$HOME"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## PEP inspection\n",
- "\n",
- "First, let's use `eido inspect` to inspect a PEP. \n",
- "\n",
- " - To inspect the entire `Project` object just provide the path to the project configuration file."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Project 'PEPPRO' (peppro_paper.yaml)\n",
- "47 samples (showing first 20): K562_PRO-seq_02, K562_PRO-seq_04, K562_PRO-seq_06, K562_PRO-seq_08, K562_PRO-seq_10, K562_PRO-seq_20, K562_PRO-seq_30, K562_PRO-seq_40, K562_PRO-seq_50, K562_PRO-seq_60, K562_PRO-seq_70, K562_PRO-seq_80, K562_PRO-seq_90, K562_PRO-seq_100, K562_RNA-seq_0, K562_RNA-seq_10, K562_RNA-seq_20, K562_RNA-seq_30, K562_RNA-seq_40, K562_RNA-seq_50\n",
- "Sections: name, pep_version, sample_table, looper, sample_modifiers\n"
- ]
- }
- ],
- "source": [
- "eido inspect peppro_paper.yaml"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- " - To inspect a specific sample, one needs to provide the sample name (via `-n`/`--sample-name` oprional argument)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Sample 'K562_RNA-seq_10' in Project (peppro_paper.yaml)\n",
- "\n",
- "sample_name: K562_RNA-seq_10\n",
- "sample_desc: 90% K562 PRO-seq + 10% K562 RNA-seq\n",
- "treatment: 70M total reads\n",
- "protocol: PRO\n",
- "organism: human\n",
- "read_type: SINGLE\n",
- "umi_len: 0\n",
- "read1: /Users/mstolarczyk/K562_10pctRNA.fastq.gz\n",
- "srr: K562_10pctRNA\n",
- "pipeline_interfaces: $CODE/peppro/sample_pipeline_interface.yaml\n",
- "genome: hg38\n",
- "\n",
- "... (showing first 10)\n",
- "\n",
- "\n"
- ]
- }
- ],
- "source": [
- "eido inspect peppro_paper.yaml -n K562_PRO-seq K562_RNA-seq_10"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## PEP validation"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Next, let's use `eido` to validate this project against the generic PEP schema. You just need to provide a path to the project config file and schema as an input."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Validation successful\n"
- ]
- }
- ],
- "source": [
- "eido validate peppro_paper.yaml -s http://schema.databio.org/pep/2.0.0.yaml -e"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Any PEP should validate against that schema, which describes generic PEP format. We can go one step further and validate it against the PEPPRO schema, which describes Proseq projects specfically for this pipeline:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Validation successful\n"
- ]
- }
- ],
- "source": [
- "eido validate peppro_paper.yaml -s http://schema.databio.org/pipelines/ProseqPEP.yaml"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "This project would *not* validate against a different pipeline's schema.\n",
- "\n",
- "Following `jsonschema`, `eido` produces comprehensive error messages that include the objects that did not pass validation. When validating PEPs that include lots of samples one can use option `-e`/`--exclude-case` to limit the error output just to the human readable message. This is the option used in the example below:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Traceback (most recent call last):\n",
- " File \"/usr/local/bin/eido\", line 8, in