Question about training code

Thanks for your contributions to the opensource community. There is some confusion about the training code. In `anygpt/src/stage1_pretrain.py`, I only find that the image/speech/music data is loaded, but not tokenized by the corresponding tokenizers (like SEED or SpeechTokenizer). Where do you use them to tokenize these data in pretraining?