Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libriheavy train_bert_encoder.py incompatible with Lhotse 1.27.0 #1716

Closed
juliendespres opened this issue Aug 13, 2024 · 4 comments · Fixed by #1719
Closed

Libriheavy train_bert_encoder.py incompatible with Lhotse 1.27.0 #1716

juliendespres opened this issue Aug 13, 2024 · 4 comments · Fixed by #1719

Comments

@juliendespres
Copy link

Hi,
I'm trying to reproduce the PromtASR recipe but the script ./zipformer_prompt_asr/train_bert_encoder.py seems incompatible with Lhotse 1.27.0.

I got the following error :
Traceback (most recent call last):
File "/home/despres/source/git/k2/icefall/egs/libriheavy/ASR/./zipformer_prompt_asr/train_bert_encoder.py", line 1799, in
main()
...
File "/home/despres/miniconda3/envs/k2_2408/lib/python3.12/site-packages/lhotse/cut/set.py", line 2581, in iter
yield from self.cuts
File "/home/despres/source/git/k2/icefall/egs/libriheavy/ASR/./zipformer_prompt_asr/train_bert_encoder.py", line 1597, in remove_short_and_long_utt
tokens = sp.encode(c.supervisions[0].texts[0], out_type=str)
File "/home/despres/miniconda3/envs/k2_2408/lib/python3.12/site-packages/lhotse/custom.py", line 64, in getattr
raise AttributeError(f"No such attribute: {name}")
AttributeError: No such attribute: texts. Did you mean: 'text'?

I've just done a full install of K2(1.24.4)/Lhotse(1.27.0)/Icefall with the latest version and it doesn't change the problem.

Do I need to install a special version of Lhotse to use this recipe?

@marcoyang1998
Copy link
Collaborator

@juliendespres Hi, I think this is not a lhotse version problem based on the error log. It seems that your manifest does not have a field texts in the supervision.

How did you generate your manifest? I tried this manifest(https://huggingface.co/datasets/pkufool/libriheavy/blob/main/libriheavy_cuts_small.jsonl.gz) and it seems to work fine. Could you please check if the manifest you are using has the field texts in the supervision? You can verify this by doing vim xx.jsonl.gz. I attached a screenshot below.

from lhotse import load_manifest_lazy
cuts = load_manifest_lazy("path/your_manifest.jsonl.gz")
print(cuts[0].supervisions[0].texts)

image

@juliendespres
Copy link
Author

OK, thank you very much, I understand the problem better now.
I generated the manifests with the script provided in the libriheavy recipe : icefall/egs/libriheavy/ASR/prepare.sh, and this script generates manifests that does not contain the filed "texts".
I see in this script that the manifests containing the "texts" fields are downloaded in download/libriheavy and not automatically generated by the script.
However, in the command given in the RESULTS.md file for this recipe, the manifests used appear to be those automatically generated by the prepare.sh script in the data/fbank directory :
python ./zipformer_prompt_asr/train_bert_encoder.py
--world-size 4
--start-epoch 1
--num-epochs 60
--exp-dir ./zipformer_prompt_asr/exp
--use-fp16 True
--memory-dropout-rate $memory_dropout_rate
--causal $causal
--subset $subset
--manifest-dir data/fbank
...

It seems to me that the prepare.sh script is missing a step.
If I change the --manifest-dir option to download/libriheavy, I get the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'data/download/libriheavy/musan_cuts.jsonl.gz'
Should I replace the automatically generated manifests in data/fbank with those downloaded in download/libriheavy?

Subsidiary question: is the script used to generate the downloaded manifests supplied anywhere?

I hope that's clear enough and thank you again for your help.

@marcoyang1998
Copy link
Collaborator

Thanks for reporting this bug. The custom fields are deleted in local/prepare_manifest.py to reduce the size of the manifest for users not intending for PromptASR. I just made the deletion optional in this PR #1719. Please try running prepare.sh again.

@juliendespres
Copy link
Author

I regenerated the features and was able to start learning the BERT model. Everything seems to be working fine.
Thanks again for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants