Skip to content

zou-yawen/bioinformatics_pipeline

 
 

Repository files navigation

Bioinformatics pipeline

Introduction

An effective, reproducible and reliable data analysis workflow is based on the state-of-the-art pipeline, using the most up-to-date methods. In order to facilitate my future work, building the bioinformatics pipeline is necessary. However, how to construct the workflow is still hard for me. In my view, I wanna use the nextflow or snakemake program language to do it. Before doing it, what I need is to describe the workflows in the mindmap which could make me be clear.

Interests

  • amplicon sequencing analysis
  • metagenomics sequencing analysis
  • bulk-RNA sequencing analysis
  • DNA methylation by Illumina Array

Workflow

The workflow comprises of two parts, one is from raw data to profile, and the other is data analysis (statistical analysis)

  • the first parts
  • demultiplex sequences;
  • scan the quality of reads;
  • filter the low quality reads and remove host DNA sequence
  • align the high quality reads into the reference database
  • obtain the profile whose structure is $M x N$ matrix (M: features' name; N: sampleid)
  • the second part
    • statistical analysis such as wilcoxon rank sum test, LDA, PCoA, linear regression analysis and multivariables association analysis
    • machine learning

Notice

First and foremost, I utilize the perl program language to do some preliminary work and finally convert all the workflows into snakemake or nextflow.

About

Bioinformatics Pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 31.3%
  • Python 28.7%
  • Perl 24.3%
  • CSS 13.5%
  • Shell 2.2%