Skip to content
View zhenye234's full-sized avatar
🍉
🍉

Block or report zhenye234

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …

Python 4,084 397 Updated Feb 16, 2025

Robust recipes to align language models with human and AI preferences

Python 4,995 429 Updated Nov 21, 2024

Unified automatic quality assessment for speech, music, and sound.

Python 316 15 Updated Feb 13, 2025

Fully open reproduction of DeepSeek-R1

Python 20,278 1,745 Updated Feb 17, 2025
Python 345 53 Updated Sep 3, 2024

LLaSE: Maximizing Acoustic Preservation for LLaMA based Speech Enhancement

Python 8 1 Updated Jan 28, 2025

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 327 25 Updated Feb 14, 2025

Realtime Video and Audio Streaming with WebRTC and Gradio

Python 205 27 Updated Feb 15, 2025

UTokyo-SaruLab MOS Prediction System

Python 145 14 Updated Dec 9, 2024

Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Python 48 4 Updated Jan 16, 2025

Recipes to scale inference-time compute of open models

Python 1,000 97 Updated Jan 16, 2025

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 165 18 Updated Feb 13, 2025

This repository contains demos I made with the Transformers library by HuggingFace.

Jupyter Notebook 9,978 1,517 Updated Jan 13, 2025

Reference-aware automatic speech evaluation toolkit

Python 142 11 Updated Dec 5, 2024

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…

Jupyter Notebook 16,218 2,332 Updated Feb 12, 2025

The official Meta Llama 3 GitHub site

Python 28,316 3,279 Updated Jan 26, 2025

Official repo for Images that sound: a special spectrogram that can be seen as images and played as sound generated by diffusions

Python 232 13 Updated Feb 4, 2025

[ACM MM24] Official implementation of paper "From Speaker to Dubber: Movie Dubbing with Prosody and Duration Consistency Learning"

Python 25 1 Updated Jan 17, 2025

PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.

Python 226 15 Updated Oct 2, 2024
Python 119 8 Updated Jan 20, 2025

NeMo text processing for ASR and TTS

Python 305 99 Updated Jan 27, 2025

Text Normalization & Inverse Text Normalization

Python 530 75 Updated Nov 11, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 2,658 216 Updated Dec 5, 2024

A quick guide (especially) for trending instruction finetuning datasets

2,858 183 Updated Nov 28, 2023

first base model for full-duplex conversational audio

Python 1,707 112 Updated Jan 5, 2025

A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR

Python 939 161 Updated Jul 5, 2023
Next
Showing results