You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When fine-tuning the XTTS model with num_workers > 0 for Japanese dataset, a TypeError occurs related to fugashi.Tagger.
Specifically, the error self.c_tagger cannot be converted to a Python object for pickling is triggered because fugashi.Tagger, used in the cutlet library for Japanese text processing, cannot be serialized for multiprocessing.
To Reproduce
Steps to Reproduce:
Load training and evaluation samples using load_tts_samples().
Initialize the Trainer object.
Create a training DataLoader using trainer.get_train_dataloader().
Set num_workers=2 in the DataLoader to enable multiprocessing.
Attempt to iterate through the DataLoader and observe the error.
train_samples, eval_samples=load_tts_samples(
# Your loading code here...
)
# Initialize the trainertrainer=Trainer(
# Trainer initialization code here...
)
train_loader=trainer.get_train_dataloader(
{},
train_samples,
True
)
dataset=train_loader.dataset# Create DataLoader with num_workers > 0, which uses multiprocessing and may trigger the pickling issueloader=DataLoader(
dataset,
batch_size=1,
shuffle=False,
collate_fn=dataset.collate_fn,
drop_last=False,
sampler=None,
num_workers=2, # Setting this to 2 will use multiple workers (multiprocessing)pin_memory=False,
)
# Create an iterator from the dataloaderdata_iter=iter(loader)
# Try to fetch the first batch, this should trigger the pickling errortry:
first_batch=next(data_iter)
pd.DataFrame(list(first_batch.items()), columns=['Key', 'Value'])
exceptExceptionase:
print(f"Error: {e}")
Expected behavior
The data should be processed without any errors, even with num_workers > 0.
This issue only occurs when processing Japanese text, due to the use of fugashi.Tagger in the tokenization process, which is not compatible with multiprocessing.
The text was updated successfully, but these errors were encountered:
Describe the bug
When fine-tuning the XTTS model with
num_workers > 0
for Japanese dataset, a TypeError occurs related tofugashi.Tagger
.Specifically, the error
self.c_tagger cannot be converted to a Python object for pickling
is triggered because fugashi.Tagger, used in the cutlet library for Japanese text processing, cannot be serialized for multiprocessing.To Reproduce
Steps to Reproduce:
Expected behavior
The data should be processed without any errors, even with num_workers > 0.
Logs
No response
Environment
Additional context
This issue only occurs when processing Japanese text, due to the use of fugashi.Tagger in the tokenization process, which is not compatible with multiprocessing.
The text was updated successfully, but these errors were encountered: