voice_preset doesn't work #611

etheryee · 2024-11-06T15:33:51Z

from bark import SAMPLE_RATE, generate_audio, preload_models
import sounddevice
from transformers import BarkModel, BarkProcessor
import torch
import numpy as np
from optimum.bettertransformer import BetterTransformer
from scipy.io.wavfile import write as write_wav
import re

device = "cuda:0" if torch.cuda.is_available() else "cpu"
SPEAKER = "v2/en_speaker_6"

def barkspeed(text_prompt):
processor = BarkProcessor.from_pretrained("suno/bark-small")
model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device)
model = BetterTransformer.transform(model, keep_original_model=False)
model.enable_cpu_offload()
sentences = re.split(r'[.?!]', text_prompt)
pieces = []
for sentence in sentences:
inp = processor(sentence.strip(), voice_preset=SPEAKER).to(device)
audio = model.generate(**inp, do_sample=True, fine_temperature=0.4, coarse_temperature=0.5)
audio = ((audio/torch.max(torch.abs(audio))).numpy(force=True).squeeze()*pow(2, 15)).astype(np.int16)
pieces.append(audio)
write_wav("bark_generation.wav", SAMPLE_RATE, np.concatenate(pieces))
sounddevice.play(np.concatenate(pieces), samplerate=24000)
sounddevice.wait()

error message:
F:\Program Files\anaconda3\envs\ollamaRAG\Lib\site-packages\transformers\models\encodec\modeling_encodec.py:124: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
self.register_buffer("padding_total", torch.tensor(kernel_size - stride, dtype=torch.int64), persistent=False)
The class optimum.bettertransformers.transformation.BetterTransformer is deprecated and will be removed in a future release.
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:None for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:None for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:None for open-end generation.

I used this code to generate some audio. But I found the voice_preset didn't work. The audio consists of difference voices. In the call of Barkprocessor, it was fine. But in the generate history_prompt was all 10000,

The text was updated successfully, but these errors were encountered:

etheryee · 2024-11-07T02:45:45Z

Environment Info:

transformers version: 4.46.1
Platform: Windows-11-10.0.22631-SP0
Python version: 3.12.7
Huggingface_hub version: 0.26.2
Safetensors version: 0.4.5
Accelerate version: 1.1.0
Accelerate config: not found
PyTorch version (GPU?): 2.5.1 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: No
Using GPU in script?: Yes
GPU type: NVIDIA GeForce RTX 4080 SUPER

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

voice_preset doesn't work #611

voice_preset doesn't work #611

etheryee commented Nov 6, 2024

etheryee commented Nov 7, 2024

voice_preset doesn't work #611

voice_preset doesn't work #611

Comments

etheryee commented Nov 6, 2024

etheryee commented Nov 7, 2024