Welcome to my code repository! Here, you'll find a collection of projects and learning materials I've developed
- HMM profiler. There are two parts. The profiler, which deduces the state-transition and emission matrices. And the decoder, which guess the most probable sequence.
- Nussinov algorithm. Based on a RNA sequence, it predicts the folding and structure of the molecule based on merely dynamic programming.
- FASTA extractor. A very simple code to extract, organize and posteriously search information from a FASTA file.
- Clone trigger. Given a DNA reference sequence and a DNA intended to be clone it suggests top 20 list of forward and reverse primers that can be used as well as the restriction enzymes.
- Prot-Profiler. Based on a set of protein sequences it produces a HMM-profile
- Resume. My professional portfolio detailing my experience, availability and different contributions
- BERT Text classification for suicide-risk detection. Using BERT's architecture, TensorFlow and a dataset from Kaggle, the model can identify suicide-related messages
- DCGAN digit-image generator. Implementation of a Deep Convolutional Generative Adversarial Network (DCGAN) in PyTorch to generate realistic handwritten digit images using the MNIST dataset
- Text generator (Pharmaceutical names). Neural Network implemented with TensorFlow. Process a variable length text-input and completes it with characters to create a pharmaceutical product name
- ML algorithms from scratch. Just for learning I implement different machine learning algorithms (LR, NN, PCA...) without using specific packages for them
-
postGWAS comparison Cleaning and standarization of multiple summary statistics as well as a genetic correlation and pathway analysis contrast (In Progress)
-
GWAS-Catalog download. Download, cleaning and standarization of multiple GWAS-Catalog summary statistics based on a search pattern and a LD reference panel.
-
EM-Algorithm for allele frequency estimation. Estimation of the allele frequencies in a haploid organism's genome using an EM algorithm based on genotype likelihoods.
-
Ancestry Estimation using Genotype Likelihoods. This project focuses on estimating the ancestral populations in African American individuals using a likelihood model based on genotype likelihoods and estimated ancestral allele frequencies from NGSadmix analysis.
-
Mendelian Randomization Automatized causal analyses of pairwise phenotype comparisons based on summary statistics performing IVW, Egger and Weighted Median. Includes quality plots.
- DEA (Differential Expression Analysis). Automatized search, cleaning, quality control and analysis of BioStudies reports of a topic (In Progress)
- LD-proxy algorithm. Based on a PLINK reference panel and a list of reference alleles, it identifies potential substitutes of these for each summary statistics integrated.
- Protein Variant Analysis. Analysis and study figures that can be performed over a set of translated DNA coding sequences.
- Identify seasonality in anxyolitic sales using a hierarchical model.This project illustrates the differences of sales in USA for this pharmaceuticals comparing the entry of winter and June.
- Predict doctor visits with Poisson. Posteriors for Poisson regressions can be used to predict the number of visits of a patient considering their age and health state. The model was created with rJAGS.
- Fit AR, NDLM and mixAR for Google searches. Analyze the pattern for the Google searching hits of the term "cough" and identify the best model solution to predict the next year's values.
- PokeGuess A Pokemon game where you have to guess the Pokemon based on the silhouette and the data
- Texas Cheater A Texas Poker simulator capable to predict your probabilities to win a game based on your game circumstances.
- Pairwise Alignment. Assess the alignemnt of two sequences thorugh dinamyc programming (Watermann algorithm).
- Pairwise genetic correlations. Using AWK language performs the genetic correlation via LDSC program and builds a matrix object