Skip to content

Commit

Permalink
sync
Browse files Browse the repository at this point in the history
  • Loading branch information
meren committed Aug 1, 2023
1 parent e27a017 commit 12e2a3a
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 27 deletions.
2 changes: 1 addition & 1 deletion help/main/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ If you need an introduction to the terminology used in 'omics research or in anv
<a href="/network/" target="_blank"><img src="/images/anvio-network.png" width="100%" /></a>

{:.notice}
The help contents were last updated on **27 Jul 23 20:31:04** for anvi'o version **7.1-dev (hope)**.
The help contents were last updated on **31 Jul 23 21:12:03** for anvi'o version **7.1-dev (hope)**.


{% include _project-anvio-version.html %}
Expand Down
64 changes: 38 additions & 26 deletions help/main/programs/anvi-report-inversions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,40 +47,40 @@ Reports inversions.
## Usage


This program allows you to find genomic inversions using short-read mapping information.
This program allows you to find genomic inversions using metagenomic read recruitment results, and their activity patterns across samples.

An inversion is typically carried out by an invertase. This enzyme recognizes a pair of inverted repeat (IR), which are a special case of palindromic sequence where the repeats are facing inward on different DNA strand. The IRs are distant from each other and the invertase will invert the DNA fragment between the IRs.

In brief, anvi'o leverages paired-read orientation to locate regions of interest in a set of contigs. It screens for IRs whithin these regions, and uses short-reads to confirm which IRs corrrespond to real inversions. Eventually, anvi'o can compute the inversion activity: the relative proportion of an inversion's orientation in each sample.
In brief, anvi'o leverages paired-read orientation (through the `--fetch-filter` mechanism in <span class="artifact-p">[anvi-profile](/help/main/programs/anvi-profile)</span> explained below) to locate regions of interest in a set of contigs. It screens for IRs whithin regions that are enriched in read pairs that are enriched in forward/forward or reverse/reverse orientations, and uses short-reads to confirm which IRs corrrespond to real inversions. Anvi'o can also compute the 'inversion activity', i.e., the relative proportion of each orientation of an inversion in each sample.

### Anvi'o's philosophy to find inversions
### Anvi'o philosophy to find inversions

Much like a T-Rex, anvi'o's vision rely on movement and it cannot see an inversion if it does not move, or in this case, invert. So let's start with what you cannot do with this command: you cannot find inversions in a set of contigs alone.
Much like a T-Rex, the vision of anvi'o rely on movement and it cannot see an inversion if it does not move, or in this case, invert. So let's start with what you cannot do with this command: you cannot find inversions in a set of contigs alone.

To find an inversion, you need to have short-reads from at least one sample. If there is even a small fraction of a microbial population that have an inverted sequence compare to your contigs of reference, then anvi-report-inversions is for you!
To find an inversion, you need to have short-reads from at least one sample. If there is even a small fraction of the members of a microbial population have an inverted sequence, then <span class="artifact-p">[anvi-report-inversions](/help/main/programs/anvi-report-inversions)</span> will very likely find it for you!

### Prerequistes to run this program
### Before you run this program

Anvi'o is able to locate inversion using the paired-end read orientation. Regular paired-end reads are facing inward with a FWD/REV orientation, but when an inversion happens, some reads will be mapping in the opposite orientation regarding the reference. As a consequence, some paired-end reads will have the same orientation: FWD/FWD or REV/REV.
Anvi'o is able to locate inversion using the paired-end read orientation. Regular paired-end reads are facing inward with a FWD/REV orientation, but when an inversion happens, some reads will be mapping in the opposite orientation regarding the reference. As a consequence, some paired-end reads will have the same orientation: FWD/FWD or REV/REV.

To leverage that information, anvi'o can profile bam files for FWD/FWD and REV/REV reads only with <span class="artifact-p">[anvi-profile](/help/main/programs/anvi-profile)</span> to make special <span class="artifact-n">[single-profile-db](/help/main/artifacts/single-profile-db)</span>.

<div class="codeblock" markdown="1">
<div class="codeblock" markdown="1">
anvi&#45;profile &#45;i <span class="artifact&#45;n">[bam&#45;file](/help/main/artifacts/bam&#45;file)</span> \
&#45;c <span class="artifact&#45;n">[contigs&#45;db](/help/main/artifacts/contigs&#45;db)</span> \
&#45;&#45;fetch&#45;filter inversion
</div>

### Inputs
### Other essential inputs to run this program

The main input for this command is a <span class="artifact-n">[bams-and-profiles-txt](/help/main/artifacts/bams-and-profiles-txt)</span>, which is a TAB-delimited file composed of at least four columns:
The main input for <span class="artifact-p">[anvi-report-inversions](/help/main/programs/anvi-report-inversions)</span> is a <span class="artifact-n">[bams-and-profiles-txt](/help/main/artifacts/bams-and-profiles-txt)</span>, which is a TAB-delimited file composed of at least four columns:

* Sample name,
* <span class="artifact-n">[contigs-db](/help/main/artifacts/contigs-db)</span>,
* <span class="artifact-n">[single-profile-db](/help/main/artifacts/single-profile-db)</span> generated with the inversion fetch filter,
* <span class="artifact-n">[bam-file](/help/main/artifacts/bam-file)</span>.

You can also add two column for the R1 and R2 fastq files so that anvi'o can compute the inversion's activity.
If you are interested in also characterizing inversion activity statistics across samples, you will also need to add two more columns into the <span class="artifact-n">[bams-and-profiles-txt](/help/main/artifacts/bams-and-profiles-txt)</span> file to point out the paths for the R1 and R2 FASTQ files.

Here is a standard run with default parameters:

Expand All @@ -91,7 +91,7 @@ anvi&#45;report&#45;inversion &#45;P <span class="artifact&#45;n">[bams&#45;and&

### Identifying regions of interest

While anvi'o could directly search of inverted repeats in all the contigs, it would be a waste of time as many IRs are actually not related to inversions. Instead, anvi'o uses the FWD/FWD and REV/REV reads to identify region of interest and constrain the seach for IRs only in these regions.
While anvi'o could directly search of inverted repeats in all the contigs, it would be a waste of time as many IRs are actually not related to inversions. Instead, anvi'o uses the FWD/FWD and REV/REV reads to identify region of interest and constrain the seach for IRs only in these regions.

For this step, you can set the minimum coverage of FWD/FWD and REV/REV reads to define 'stretches' with `--min-coverage-to-define-stretches`. Lower threshold yield more stretches, but also more noise.

Expand All @@ -116,7 +116,7 @@ anvi&#45;report&#45;inversions &#45;P <span class="artifact&#45;n">[bams&#45;and

There are a few paramters to constrain the search for palindromic sequences of the IRs, like a minimum length that can be set with `--min-palindrome-length`, and a maximum number of mismatches with `--max-num-mismatches`.

You can set the minimum distance between two palindromic sequence with `--min-distance`. A distance of 0 would correspond to a in-place palindrome, though they don't relate to genomic inversions.
You can set the minimum distance between two palindromic sequence with `--min-distance`. A distance of 0 would correspond to a in-place palindrome, though they don't relate to genomic inversions.

When searching for palindromes with mismatches, the algorithm will extend the palindrom length as much as possible, often including mismatches which are outside of the true palindrome sequences. The flag `--min-mismatch-distance-to-first-base` allows you to trim the palindrome when one or more mismatches are n nucleotides away from a palindrome's start or stop. The default value is 1, meaning that a palindrome `MMMMMM(X)M`, where M denotes matching nucleotides and X a mismatch, will be trimmed to the first 6 matches `MMMMMM`.

Expand All @@ -134,21 +134,21 @@ anvi&#45;report&#45;inversions &#45;P <span class="artifact&#45;n">[bams&#45;and

### Confirming inversions

Multiple palindromes are usualy reported for each stretch and to confirm which one actually relates to an inversions, anvi'o searches short-reads in the bam file for unique constructs that can only occur when a genomic region inverted.
Multiple palindromes are usualy reported for each stretch and to confirm which one actually relates to an inversions, anvi'o searches short-reads in the bam file for unique constructs that can only occur when a genomic region inverted.

By default, anvi'o reports the first confirmed palindrome and move to the next stretch. This process is very efficient as a strech usually have only one inversion. But in rare cases, you can have multiple inversions happening in a single stretch. Then, you can use the flag `--check-all-palindromes` and anvi'o will look for inversion evidences in the short-reads for every palindrome in a stretch.

Anvi'o looks for inversion evidence in the FWD/FWD and REV/REV reads first. If no evidence are found, then it searches the rest of the reads mapping to the region of interest. If you want to only search the FWD/FWD and REV/REV reads you can use the flag `--process-only-inverted-reads`

### Computing inversion activity

If you provide the short-reads R1 and R2 in the <span class="artifact-n">[bams-and-profiles-txt](/help/main/artifacts/bams-and-profiles-txt)</span>, anvi'o can compute the proportion of the inversion's orientation in each sample.
If you provide the short-reads R1 and R2 in the <span class="artifact-n">[bams-and-profiles-txt](/help/main/artifacts/bams-and-profiles-txt)</span>, anvi'o can compute the proportion of the inversion's orientation in each sample.

This is a very time consuming step, and if you have multiple sample, you can use the parameter `--num-threads` to set the maximum of threads for multithreading when possible.
This is a very time consuming step, and if you have multiple sample, you can use the parameter `--num-threads` to set the maximum of threads for multithreading when possible.

To compute the inversion's ratios, anvi'o design in silco primers based on the palidrome sequence and the upstream/downstream genomic context to search short-reads in the raw fastq files. The variable `--oligo-primer-base-length` is used to control how much of the palindrome should be used to design the primers. The longer, the more specific but if it is too long, less reads will match to the primer.

This step is very computationally intense, but you can test it with the parameter `--end-primer-search-after-x-hits`. Once the total number of reads reach this parameter, anvi'o will stop searching further and will continue with the next sample. This flag is only good for testing.
This step is very computationally intense, but you can test it with the parameter `--end-primer-search-after-x-hits`. Once the total number of reads reach this parameter, anvi'o will stop searching further and will continue with the next sample. This flag is only good for testing.

If you want to skip this step, you can use the flag `--skip-compute-inversion-activity`.

Expand All @@ -161,23 +161,35 @@ anvi&#45;report&#45;inversions &#45;P <span class="artifact&#45;n">[bams&#45;and
&#45;&#45;oligo&#45;primer&#45;base&#45;length 12
</div>

### Computing inversion activity using previously computed inversions

It is possible to instruct anvi'o to use previously reported inversions to characterize their activity across a larger set of samples. This is possible by passing the program <span class="artifact-p">[anvi-report-inversions](/help/main/programs/anvi-report-inversions)</span> the output file for consensus inversions (i.e., 'CONSENSUS-INVERSIONS.txt') or the output file for sample-specific inversions (i.e., 'INVERSIONS-IN-[SAMPLE-NAME].txt') from a previous run using the flag `--pre-computed-inversions`:

<div class="codeblock" markdown="1">
anvi&#45;report&#45;inversions &#45;P <span class="artifact&#45;n">[bams&#45;and&#45;profiles&#45;txt](/help/main/artifacts/bams&#45;and&#45;profiles&#45;txt)</span> \
&#45;&#45;pre&#45;computed&#45;inversions inversions_output/INVERSIONS&#45;CONSENSUS.txt
&#45;o activity_calculations
</div>

In this mode, <span class="artifact-p">[anvi-report-inversions](/help/main/programs/anvi-report-inversions)</span> will not reclaculate inversions, and only report the activity of inversions found in the input file across samples listed in the <span class="artifact-n">[bams-and-profiles-txt](/help/main/artifacts/bams-and-profiles-txt)</span> file.

### Reporting genomic context around inversions

For every inversion, anvi'o can report the surrounding genes and their function as additional files.
For every inversion, anvi'o can report the surrounding genes and their function as additional files.

You can use the flag `--num-genes-to-consider-in-context` to choose how many genes to consider upstream/downstream of the inversion. By default, anvi'o report three genes downstream, and three genes upstream.

To select a specific gene caller, you can use `--gene-caller`. The default is prodigal.
To select a specific gene caller, you can use `--gene-caller`. The default is prodigal.

If you want to skip this step, you can use the flag `--skip-recovering-genomic-context`.

### Targeted search
### Targeted search

If you are interested in a given contig region you can use the following flags to limit the search:

* `--target-contig`: contig of interest,
* `--target-region-start`: the start position of the region of interest,
* `--target-region-end`: the end position of the region of interest.
* `--target-region-end`: the end position of the region of interest.

### Output
<span class="artifact-p">[anvi-report-inversions](/help/main/programs/anvi-report-inversions)</span> searches for inversions in every single sample at a time and thus genereates a TAB-delimited table for every sample: `INVERSIONS-IN-SAMPLE_01.txt`, `INVERSIONS-IN-SAMPLE_02`, ...
Expand All @@ -198,15 +210,15 @@ These tables contains the following information:
* the in silico primers used to compute the inversion's activity, for the first and second palindrome,
* the oligo corresponding to the reference sequence.

Anvi'o eventually create a consensus table with all the unique inversions found accross all your samples in a file called `INVERSIONS-CONSENSUS.txt`. This table has the same format as the individual sample outputs, with the 'entry ID' replaced by a unique inversion ID.
Anvi'o eventually create a consensus table with all the unique inversions found accross all your samples in a file called `INVERSIONS-CONSENSUS.txt`. This table has the same format as the individual sample outputs, with the 'entry ID' replaced by a unique inversion ID.

Another default output table is named `ALL-STRETCHES-CONSIDERED.txt` and it reports every stretch that passed the ['Identifying regions of interest'](#identifying-regions-of-interest) parameters. It reports the maximum coverage of FWD/FWD and REV/REV in that stretch, per sample. It also reports the number of palindromes found and if a true inversion was confirmed.
Another default output table is named `ALL-STRETCHES-CONSIDERED.txt` and it reports every stretch that passed the ['Identifying regions of interest'](#identifying-regions-of-interest) parameters. It reports the maximum coverage of FWD/FWD and REV/REV in that stretch, per sample. It also reports the number of palindromes found and if a true inversion was confirmed.

If the user enable the reporting of the genomic context, two addition TAB-delimited tables are generated: `INVERSIONS-CONSENSUS-SURROUNDING-GENES.txt` and `INVERSIONS-CONSENSUS-SURROUNDING-FUNCTIONS.txt`.
If the user enable the reporting of the genomic context, two addition TAB-delimited tables are generated: `INVERSIONS-CONSENSUS-SURROUNDING-GENES.txt` and `INVERSIONS-CONSENSUS-SURROUNDING-FUNCTIONS.txt`.
The first table report the gene calls surrounging every inversion when possible (inversions_id, gene_caller_id, start and stop position, orientation, gene_caller and contig).
The second table report the function associated to every gene call reported in the first file (inversions_id, gene_caller_id, source, accession, function).
The second table report the function associated to every gene call reported in the first file (inversions_id, gene_caller_id, source, accession, function).

Finally, if the user provide R1 and R2 fastq files and enable the reporting of inversion's activity, <span class="artifact-p">[anvi-report-inversions](/help/main/programs/anvi-report-inversions)</span> will generate a long-format file named `INVERSION-ACTIVITY.txt`. This file reports, for every inversion and sample, the relative proportion and read abundance of unique oligos, which either correspond to the reference contig (no inversion), or to an inversion sequence. The inversion's activity is computed and reported for both side of each inversion.
Finally, if the user provide R1 and R2 fastq files and enable the reporting of inversion's activity, <span class="artifact-p">[anvi-report-inversions](/help/main/programs/anvi-report-inversions)</span> will generate a long-format file named `INVERSION-ACTIVITY.txt`. This file reports, for every inversion and sample, the relative proportion and read abundance of unique oligos, which either correspond to the reference contig (no inversion), or to an inversion sequence. The inversion's activity is computed and reported for both side of each inversion.



Expand Down

0 comments on commit 12e2a3a

Please sign in to comment.