Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

voice_preset doesn't work #611

Open
etheryee opened this issue Nov 6, 2024 · 1 comment
Open

voice_preset doesn't work #611

etheryee opened this issue Nov 6, 2024 · 1 comment

Comments

@etheryee
Copy link

etheryee commented Nov 6, 2024

from bark import SAMPLE_RATE, generate_audio, preload_models
import sounddevice
from transformers import BarkModel, BarkProcessor
import torch
import numpy as np
from optimum.bettertransformer import BetterTransformer
from scipy.io.wavfile import write as write_wav
import re

device = "cuda:0" if torch.cuda.is_available() else "cpu"
SPEAKER = "v2/en_speaker_6"

def barkspeed(text_prompt):
processor = BarkProcessor.from_pretrained("suno/bark-small")
model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device)
model = BetterTransformer.transform(model, keep_original_model=False)
model.enable_cpu_offload()
sentences = re.split(r'[.?!]', text_prompt)
pieces = []
for sentence in sentences:
inp = processor(sentence.strip(), voice_preset=SPEAKER).to(device)
audio = model.generate(**inp, do_sample=True, fine_temperature=0.4, coarse_temperature=0.5)
audio = ((audio/torch.max(torch.abs(audio))).numpy(force=True).squeeze()*pow(2, 15)).astype(np.int16)
pieces.append(audio)
write_wav("bark_generation.wav", SAMPLE_RATE, np.concatenate(pieces))
sounddevice.play(np.concatenate(pieces), samplerate=24000)
sounddevice.wait()

error message:
F:\Program Files\anaconda3\envs\ollamaRAG\Lib\site-packages\transformers\models\encodec\modeling_encodec.py:124: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
self.register_buffer("padding_total", torch.tensor(kernel_size - stride, dtype=torch.int64), persistent=False)
The class optimum.bettertransformers.transformation.BetterTransformer is deprecated and will be removed in a future release.
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:None for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:None for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:None for open-end generation.

I used this code to generate some audio. But I found the voice_preset didn't work. The audio consists of difference voices. In the call of Barkprocessor, it was fine. But in the generate history_prompt was all 10000,
屏幕截图 2024-11-06 231231

@etheryee
Copy link
Author

etheryee commented Nov 7, 2024

Environment Info:

  • transformers version: 4.46.1
  • Platform: Windows-11-10.0.22631-SP0
  • Python version: 3.12.7
  • Huggingface_hub version: 0.26.2
  • Safetensors version: 0.4.5
  • Accelerate version: 1.1.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.5.1 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: Yes
  • GPU type: NVIDIA GeForce RTX 4080 SUPER

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant