Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ストリーミングモードのdecodeを実装(precompute_renderとrender) #854

Merged
merged 23 commits into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions crates/voicevox_core/src/blocking.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

pub use crate::{
engine::open_jtalk::blocking::OpenJtalk, infer::runtimes::onnxruntime::blocking::Onnxruntime,
synthesizer::blocking::Synthesizer, user_dict::dict::blocking::UserDict,
voice_model::blocking::VoiceModelFile,
synthesizer::blocking::Audio, synthesizer::blocking::Synthesizer,
user_dict::dict::blocking::UserDict, voice_model::blocking::VoiceModelFile,
};

pub mod onnxruntime {
Expand Down
34 changes: 34 additions & 0 deletions crates/voicevox_core/src/engine/audio_file.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
use std::io::{Cursor, Write as _};

/// 16bit PCMにヘッダを付加しWAVフォーマットのバイナリを生成する。
pub fn wav_from_s16le(pcm: &[u8], output_sampling_rate: u32, output_stereo: bool) -> Vec<u8> {
Yosshi999 marked this conversation as resolved.
Show resolved Hide resolved
// TODO: 44.1kHzなどの対応
Yosshi999 marked this conversation as resolved.
Show resolved Hide resolved

let num_channels: u16 = if output_stereo { 2 } else { 1 };
let bit_depth: u16 = 16;
let block_size: u16 = bit_depth * num_channels / 8;

let bytes_size = pcm.len() as u32;
let wave_size = bytes_size + 44;

let buf: Vec<u8> = Vec::with_capacity(wave_size as usize);
let mut cur = Cursor::new(buf);

cur.write_all("RIFF".as_bytes()).unwrap();
cur.write_all(&(wave_size - 8).to_le_bytes()).unwrap();
cur.write_all("WAVEfmt ".as_bytes()).unwrap();
cur.write_all(&16_u32.to_le_bytes()).unwrap(); // fmt header length
cur.write_all(&1_u16.to_le_bytes()).unwrap(); //linear PCM
Hiroshiba marked this conversation as resolved.
Show resolved Hide resolved
cur.write_all(&num_channels.to_le_bytes()).unwrap();
cur.write_all(&output_sampling_rate.to_le_bytes()).unwrap();

let block_rate = output_sampling_rate * block_size as u32;

cur.write_all(&block_rate.to_le_bytes()).unwrap();
cur.write_all(&block_size.to_le_bytes()).unwrap();
cur.write_all(&bit_depth.to_le_bytes()).unwrap();
cur.write_all("data".as_bytes()).unwrap();
cur.write_all(&bytes_size.to_le_bytes()).unwrap();
cur.write_all(pcm).unwrap();
cur.into_inner()
}
2 changes: 2 additions & 0 deletions crates/voicevox_core/src/engine/mod.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
mod acoustic_feature_extractor;
mod audio_file;
mod full_context_label;
mod kana_parser;
mod model;
mod mora_list;
pub(crate) mod open_jtalk;

pub(crate) use self::acoustic_feature_extractor::OjtPhoneme;
pub use self::audio_file::wav_from_s16le;
pub(crate) use self::full_context_label::{
extract_full_context_label, mora_to_text, FullContextLabelError,
};
Expand Down
2 changes: 1 addition & 1 deletion crates/voicevox_core/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ use rstest_reuse;

pub use self::{
devices::SupportedDevices,
engine::{AccentPhrase, AudioQuery, FullcontextExtractor, Mora},
engine::{wav_from_s16le, AccentPhrase, AudioQuery, FullcontextExtractor, Mora},
error::{Error, ErrorKind},
metas::{
RawStyleId, RawStyleVersion, SpeakerMeta, StyleId, StyleMeta, StyleType, StyleVersion,
Expand Down
Loading
Loading