Skip to content

TrinhLab/GeCCo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gene Coexpression Connectivity (GeCCo)

Note: GeCCo is previously named as Gene Expression Classifier (GEC)

What is GeCCo

GeCCo serves to analyze omics data between two conditions or time points. Such comparison is referred to as case vs control. GeCCo will determine genes or proteins that fall under different categories:

  • control_overexpressed (highly-expressed)
  • case_overexpressed (lowly-expressed)
  • control_upregulated (up-regulated)
  • case_upregulated (down-regulated)
  • changed_regulation
  • no_change Additionally, GeCCo will perform network co-expression analysis and produce several files ready for analysis in Cytoscape.

Preliminary notes

GeCCo is a python program with a command line interface so that you can get things done efficiently. The commands below are intended for a unix-like OS (MacOS, *BSD, GNU/Linux), if you are on Windows you can use a variety of emulation options (e.g., Windows Subsystem for Linux, Cygwin, Virtualbox) to reproduce a linux command line environment or simply use the Windows command line, which may require some adjustments (If you choose to use Windows directly check out how to run python programs, one option is the Anaconda Prompt)

Installation

  1. Clone the repository or download the zip file.
  2. Install the python package: pip install -e GeCCo

Usage

  1. Create a problem directory with the following files in an input subdirectory:

    • tpm.csv (mandatory): Must contain the following headers: Gene|WT_t1_rep1|WT_t1_rep2|WT_t1_rep3|WT_t2_rep1|WT_t2_rep2|MT_t1_rep1|MT_t1_rep2|MT_t2_rep1|MT_t2_rep2 Any number of replicates is acceptable. Note that WT corresponds to case and MT to control.
    • gene_features.csv(optional): Headers are Gene|Feature. Any arbitrary feature name may be used, for example mutated_gene or transcription_factor.
    • coexpression_tpm.csv (optional): Used to construct co-expression network, First column must be labeled 'Gene', every subsequent column will be treated as a sample.

    For example a problem directory named "p1" should have the following structure:

    \---p1
        |   README.md
        \---input
                coexpression_tpm.csv
                gene_features.csv
                tpm.csv
    

    ⚠️ The transcript per million (tpm) data provided in any of the input files must not be log transformed. ⚠️

    The file header_map.csv keeps tracks of the original data headers but it is currently not used by GeCCo.

  2. Open the command line(for Windows the Anacondas Prompt is recommended) and execute:

    gecco <problem_directory>
    

    where <problem_directory> is the path to the problem you would like to run. For example, to run problem p1 execute:

    gecco gecco/problems/p1
    

    Then your output figures and cytoscape input will be saved in your problem directory.

    To explore additional options execute

    gecco --help
    

About

Gene expression classifier

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages