Pathway Figure OCR

The goal of this project is to extract identifyable genes, proteins and metabolites from publised pathway figures. In addition to all the code for assembling and running the Pathway Figure OCR pipeline, this repo contains scripts specific to the QC, analysis and figure generation involved in our publications of the work. Here we document a few of the key files and folders relevant to each paper:

25 Years of Pathway Figures (BioRxiv 2020)
- Interactive search tool for 65k pathway figures and their gene content: shiny app and code
- NIH Figshare of identified pathway figures and OCR results as RDS datasets: collection
- UpSet plot of top text and figure genes: script
- Pie chart data for top disease terms for text and figure genes: script
- Overlap matrix for Hippo Signaling pathway figure genes: script
- Machine learning progression plots: script
- Local database name: pfocr20200131
Identifying Genes in Published Pathway Figure Images (BioRxiv 2018)
- Performance assessment figures: folder
- Local database name: pfocr2018121717

This work is supported by NIGMS, R01GM100039

Developers

The codebook is a good place to start to see how we assemble and run the PFOCR pipeline. Be forewarned, however, this project is still in development and is not ready for production or even dev releases. So, don't expect things to work :) Contact us via Issues if you're interested in contributing to the development. All our code are open source.

Name		Name	Last commit message	Last commit date
Latest commit History 375 Commits
archive		archive
custom_pkgs		custom_pkgs
database		database
gene_mentions		gene_mentions
image_preprocessors		image_preprocessors
lexicon		lexicon
ocr_engines		ocr_engines
performance		performance
pfocr-pubtator-pubmed		pfocr-pubtator-pubmed
shiny-25years		shiny-25years
shiny-covidpathways		shiny-covidpathways
shiny-curator		shiny-curator
shiny-display		shiny-display
shiny-screen		shiny-screen
transforms		transforms
.gitignore		.gitignore
CURRENT_DB		CURRENT_DB
LICENSE		LICENSE
README.md		README.md
codebook.md		codebook.md
copy_all_except_figures.sh		copy_all_except_figures.sh
copy_tables.sh		copy_tables.sh
europepmc_metadata.R		europepmc_metadata.R
get_pg_conn.py		get_pg_conn.py
match.py		match.py
matrix-visualization.R		matrix-visualization.R
ocr_pmc.py		ocr_pmc.py
pfocr-gmt-enrich.R		pfocr-gmt-enrich.R
pfocr.py		pfocr.py
pfocr_curate.R		pfocr_curate.R
pfocr_fetch.R		pfocr_fetch.R
pfocr_plot.R		pfocr_plot.R
pfocr_qc.R		pfocr_qc.R
pmc_image_caption_parse.php		pmc_image_caption_parse.php
pmc_image_parse.php		pmc_image_parse.php
resolutions.py		resolutions.py
run.sh		run.sh
run2018121717subset.sh		run2018121717subset.sh
run20191102.sh		run20191102.sh
run20200224.sh		run20200224.sh
sample-size.R		sample-size.R
shell.nix		shell.nix
simple_html_dom.php		simple_html_dom.php
summarize.py		summarize.py
svg2png.py		svg2png.py
svg2png.sh		svg2png.sh
svg2png_limited.sh		svg2png_limited.sh
wp-gmt-overlaps.R		wp-gmt-overlaps.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pathway Figure OCR

Developers

About

Releases

Packages

Contributors 4

Languages

License

hiplot/pathway-figure-ocr

Folders and files

Latest commit

History

Repository files navigation

Pathway Figure OCR

Developers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages