Skip to content

Latest commit

 

History

History
783 lines (409 loc) · 82.9 KB

README.md

File metadata and controls

783 lines (409 loc) · 82.9 KB

Machine- and Deep Learning resources

License: MIT PR's Welcome

Machine and deep learning and data analysis resources. Please, contribute and get in touch! See MDmisc notes for other programming and genomics-related notes.

Table of content

Cheatsheets

Awesome Deep Learning

Keras, Tensorflow

PyTorch

JAX

JAX is a combination of Automatic Differentiation and XLA (Accelerated Linear ALgebra). XLA is a compiler developed by Google to work on TPU units. Jax has Numpy as its higher layer of abstraction, and works the same way on CPU, GPU, and TPU (much faster).

  • awesome-jax - JAX - A curated list of resources

  • JAX - Jupyter (Colab) notebooks introducing JAX basic (jit, vmap, pmap, grad, and other) and advanced concepts, by @yvrjsharma

Graph Neural Networks

Transformers

DL Books

DL Courses & Tutorials

DL Videos

DL Papers

DL Papers Genomics

  • genomicsnotebook - Genomics Data Analysis with Jupyter Notebooks on Azure.

  • Machine Learning for Genomics - ML4GLand is a community for that develops and maintains tools (primarily in Python) for genomics sequence based machine learning.

  • SEQUOIA - a linearized transformer model for gene expression prediction from pathology slides. Uses UNI (foundational model for slides), compared with ResNet50. Compared with tRNAsformer, HE2RNA. Trained on 7584 tumor samples across 16 cancer types (TCGA), validated on independent cohorts (CPTAC, Tempus). BRCA shows best performance, the model predicts the risk of breast cancer recurrence. About 15K out of 20K genes can be predicted, well-known signature genes are predicted best. Detected 272 genes significantly associated with recurrence. Predicts spatially-specific gene expression. Python, GitHub.

    Paper Pizurica, Marija. “Digital Profiling of Gene Expression from Histology Images with Linearized Attention.” Nature Communications, 2024.

DL Tools

  • Interactive_Tools - Interactive Tools for Machine Learning, Deep Learning and Math. Play with deep neural network in browser

  • ivy - The Unified Machine Learning Framework supporting JAX, TensorFlow, PyTorch, MXNet, and Numpy. Python module. Documentation

  • keras - Deep Learning for humans http://keras.io/

  • MXNet-Gluon-Style-Transfer - neural artistic style transfer using MXNet. PyTorch and Torch implementations available

  • openai.com - GPT-3 Access Without the Wait (API access to GPT-3)

  • OpenCV - Open Source Computer Vision library. GitHub, opencv-python - CPU-only OpenCV packages for Python. Documentation. Video - 3h OpenCV crash course

  • pathology_learning - Using traditional machine learning and deep learning methods to predict stuff from TCGA pathology slides

  • ruta - Unsupervised Deep Architechtures in R, autoencoders. Requires Keras and TensorFlow. Book

  • tensor2tensor - Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research

  • Janggu - deep learning interface to genomic data (FASTA, BAM, BigWig, BED, GFF). Numpy-like Bioseq and Cover objects accessable by Keras. Includes model evaluation and interpretation features. Pypi, Docs, Janggu - Deep learning for genomics

  • maui - Multi-omics Autoencoder Integration. Latent factors from different data types (stacked variational autoencoders), and their clustering, testing for association with survival. Tested vs. latent factors extracted using Multifactor Analysis (MFA) and iCluster+, on TCGA colorectal cancer RNA-seq, SNPs, CNVs. Evaluation of Colorectal Cancer Subtypes and Cell Lines Using Deep Learning

  • Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code. GitHub

  • Mask_RCNN - Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

  • PennAI - AI-Driven Data Science, entry-level machine learning interface for non-experts. A System for Accessible Artificial Intelligence

Auto ML

DL models

DL projects

Language models

ChatGPT, Gemini, NotebookLM, Claude, OpenRouter, Groq, Storm

  • awesome-chatgpt - Curated list of awesome tools, demos, docs for ChatGPT and GPT-3

  • awesome-llm-courses - A curated list of awesome online courses about Large Langage Models (LLMs)

  • chatbox - User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)

  • chatgpt-clone - Build Yo'own ChatGPT with OpenAI API & Gradio. A Python app for web browser intercage to ChatGPT.

  • h2ogpt - open-source GPT with document and image Q&A, 100% private chat, no data leaks, Apache 2.0 https://arxiv.org/pdf/2306.08161.pdf Live Demo: https://gpt.h2o.ai/

  • Hands-On-Large-Language-Models - Official code repo for the O'Reilly Book - "Hands-On Large Language Models" by Jay Alammar and Maarten Grootendorst. PyTorch, Jupyter notebooks, can be opened in Google colab.

  • llm-course - Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

  • LLMs-from-scratch - Implementing a ChatGPT-like LLM from scratch, step by step, the Python code for coding, pretraining, and finetuning a GPT-like LLM. By Sebastian Raschka. For the book Build a Large Language Model (From Scratch).

  • LLMsPracticalGuide - A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)

  • lobe-chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Azure / DeepSeek), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Vision/TTS/Plugins/Artifacts). One-click FREE deployment of your private ChatGPT/ Claude application.

  • mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. Documentation

  • nanoGPT - The simplest, fastest repository for training/finetuning medium-sized GPTs.

  • ollama - Get up and running with Llama 2 and other large language models locally.

  • openai-cookbook - Examples and guides for using the OpenAI API. Rendered version

  • privateGPT - Interact privately with your documents using the power of GPT, 100% privately, no data leaks.

  • Storm - An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations. GitHub

Music, voice, audio

  • buzz - Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

  • ebook2audiobook - Convert ebooks to audiobooks with chapters and metadata using dynamic AI models and voice cloning. Supports 1,107+ languages!

  • Jukebox - music generation neural network. Hierarchical Vector Quantised-Variational AutoEncoder (VQ-VAE) architecture, three separate temporal resolutions. Able to generate singing from lyrics, extend music examples. Dhariwal et al., “Jukebox: A Generative Model for Music.”, Blog post with examples of generated music

  • june - Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit

  • Magenta - Music and Art Generation with Machine Intelligence

  • OpenVoice - voice cloning tool, transfer voice tones to pronounce different words, even in different language.

  • Real-Time-Voice-Cloning - Clone a voice in 5 seconds to generate arbitrary speech in real-time. Learn voice characteristics from a short audio clip and perform text-to-speech conversion using this voice.

  • Project DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture. Transcribe audio data, English model available. Documentation

  • SpeechBrain - A PyTorch-based Speech Toolkit for speech/speaker recognition, speech enhancement, processing, and more. GitHub repo

  • vampnet - music generation with masked transformers. arXiv paper, supplementary page

Image, vision

DL Misc

  • geospy.ai - location identification from photos

  • app.wombo.art - deep generative model dreaming awesome images from text, Android and iOS apps available. Tweet describing the VQGAN+CLIP technology behind it

  • CSrankings - A web app for ranking computer science departments according to their research output in selective venues, and for finding active faculty across a wide range of areas. Website

  • ColossalAI - A Unified Deep Learning System for Big Model Era. Scaling deep learning models using data, pipeline, tensor, and sequence parallelism. 1D, 2D, 2.5D, 3D distributed operators. Examples of each. Written in PyTorch, needs a configuration file defining parallelism. Benchmarked against DeepSpeed, Megatron-LM.

    Paper Li, Shenggui, Jiarui Fang, Zhengda Bian, Hongxin Liu, Yuliang Liu, Haichen Huang, Boxiang Wang, and Yang You. “Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training,” n.d.

Awesome Machine learning

ML Books

ML Courses & Tutorials

ML Videos

ML Papers

  • Whalen, Sean, Jacob Schreiber, William S. Noble, and Katherine S. Pollard. “Navigating the Pitfalls of Applying Machine Learning in Genomics.” Nature Reviews Genetics 23, no. 3 (March 2022): 169–81. https://doi.org/10.1038/s41576-021-00434-9. - Five machine learning problems in genomics, distributional differences, dependency structure, confounding variables, information leakage, unbalanced data. Description, examples, solutions.

  • Domingos, Pedro. “A Few Useful Things to Know about Machine Learning.” Communications of the ACM 55, no. 10 (October 1, 2012): 78. https://doi.org/10.1145/2347736.2347755. Twelve lessons for machine learning. Overview of machine learning problems and algorithms, problem of overfitting, causes and solutions, curse of dimensionality, issues with high-dimensional data, feature engineering, bagging, boosting, stacking, model sparsity. Video lectures

ML Tools

  • mlr3 - Machine learning in R R package, the unified interface to classification, regression, survival analysis, and other machine learning tasks. GitHub repo, mlr3gallery - Examples of problems and code solutions, mlr3 Manual - mlr3 bookdown. More on the mlr3 package site, including videos

ML Misc

Material in Chinese

  • Autopilot-Notes - Autonomous driving notes summarizing the basics, hardware, perception, position, planning, control, product, tools, and manufacturing plan topics.

Material in Russian

  • Scientific_graphics_in_python - matplotlib for scientific graphics. 3 parts, 13 chapters. By Pavel Shabanov

  • ml-course-hse - machine learning course at the Computer Sciences Department, High Schoool of Economy. Multiple years, videos

  • mlcourse_open - OpenDataScience Machine Learning course (Both in English and Russian). Python-based ML course, with video lectures. Video

  • DL_CSHSE_spring2018 - Deep learning, Anton Osokin, Higher School of Economics, Computer Sciences Department (Russian), course material, and video lectures

  • Ordinary Differential Equations - Обыкновенные дифференциальные уравнения, Интерактивный учебник, Илья Щуров (НИУ ВШЭ)

  • Calculus - Математический анализ, Записки лекций, Илья Щуров (НИУ ВШЭ). Tweet

  • mathprofi.ru - Высшая математика – просто и доступно. Mirror