Skip to content
Change the repository type filter

All

    Repositories list

    • datatrove

      Public
      Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
      Python
      Apache License 2.0
      144000Updated Nov 9, 2024Nov 9, 2024
    • Container-based software environments used in the TrustLLM EU project.
      Shell
      Apache License 2.0
      2101Updated Nov 4, 2024Nov 4, 2024
    • streaming

      Public
      A Data Streaming Library for Efficient Neural Network Training
      Python
      Apache License 2.0
      142000Updated Nov 4, 2024Nov 4, 2024
    • Python
      Apache License 2.0
      174000Updated Oct 12, 2024Oct 12, 2024
    • composer

      Public
      Supercharge Your Model Training
      Python
      Apache License 2.0
      419000Updated Oct 12, 2024Oct 12, 2024
    • LLM training code for Databricks foundation models
      Python
      Apache License 2.0
      528000Updated Oct 12, 2024Oct 12, 2024
    • Ongoing research training transformer models at scale
      Python
      Other
      2.4k000Updated Aug 22, 2024Aug 22, 2024
    • NeMo

      Public
      A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
      Python
      Apache License 2.0
      2.5k000Updated Aug 21, 2024Aug 21, 2024