This project is a bioinformatics tool designed to automate the analysis of Succinate Dehydrogenase in nematodes. The tool leverages data from WormBase and the InterPro database to perform HMMer searches on the University of Leicester's HPC (ALICE). It streamlines the process of downloading relevant data, running computational searches, and generating informative visual outputs.
- Downloads and unzips FASTA files for three chosen nematode species from WormBase.
- Extracts PFAM identifiers related to Succinate Dehydrogenase from a TSV file.
- Downloads and unzips the corresponding HMM profiles.
- Generates a shell script to run HMMer searches on the ALICE HPC system.
- Parses HMMer outputs to generate:
- A detailed summary table of hits.
- A heatmap of hit scores.
- A bar chart of the top 10 hits.
- pandas
- subprocess
- urllib
- BioPython
- requests
- BeautifulSoup
- matplotlib
- seaborn
- Required for downloading data.
- Required to run the generated HMMer search script.
- SearchResults-succinatedehydrogenase.tsv: This file should be in the working directory (optional file name can be changed).
- The program will list all available species from WormBase.
- You will be prompted to select three species by their index numbers, separated by spaces.
- The selected FASTA files will be downloaded and unzipped automatically.
- Ensure that the SearchResults-succinatedehydrogenase.tsv file is in your working directory.
- You will be prompted to type 'y' to confirm the file's presence, or type 'change' to specify a different file name or path.
- The program will extract PFAM identifiers (starting with 'PF') from the TSV file and download the corresponding HMM profiles.
- You will be prompted to enter your email address.
- The program will generate a shell submission script (HMMsearch.sh) for the SLURM scheduler on ALICE. This script contains the HMMer commands to run the searches.
- Run the generated script on ALICE to obtain the HMMer output files.
- After generating the output files, you will be prompted to type 'y' to confirm their presence in the current directory.
- The program will parse the results and produce:
- A detailed table of all hits (hmmer_output_summary.csv).
- A heatmap of the scores (hmmer_output_heatmap.png).
- A bar chart of the top 10 hits (hmmer_top_hits_bar_chart.png).
- hmmer_output_summary.csv: A detailed table of HMMer hits.
- hmmer_output_heatmap.png: A heatmap visualizing the scores.
- hmmer_top_hits_bar_chart.png: A bar chart showing the top 10 hits.
- The program is not case sensitive.
- Ensure that the program and the HPC script are run in the same directory.
- All generated files will be saved in the current directory.
- Ensure all required libraries are installed in your Python environment.
- The program requires a stable Internet connection.
Contributions are welcome! Please fork the repository and submit a pull request.
For any questions or issues, please contact Bismah Ghafoor at [email protected].