Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transcribe method broken? #11516

Open
hammondm opened this issue Dec 9, 2024 · 0 comments
Open

transcribe method broken? #11516

hammondm opened this issue Dec 9, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@hammondm
Copy link

hammondm commented Dec 9, 2024

Hi.

For some reason, the transcribe method causes a core dump with quartznet or conformer. I'm running this in docker. My code is tweaked from one of the tutorials.

Mike H

Here's my code:

import os,glob,os,subprocess,tarfile
import wget,nemo,librosa,json
from ruamel.yaml import YAML
import nemo.collections.asr as nemo_asr
import pytorch_lightning as pl
from omegaconf import DictConfig

data_dir = '/data/an4'
config_path = 'quartzconf.yaml'
epochs = 3

if not os.path.exists(data_dir):
	os.makedirs(data_dir)

#download data
if not os.path.exists(
		data_dir + '/an4_sphere.tar.gz'
	):
	an4_url = 'https://dldata-public.s3.us' + \
		'-east-2.amazonaws.com/an4_sphere.tar.gz'
	an4_path = wget.download(an4_url,data_dir)
else:
	an4_path = data_dir + '/an4_sphere.tar.gz'

#convert to wav files
if not os.path.exists(data_dir + '/an4/'):
	tar = tarfile.open(an4_path)
	tar.extractall(path=data_dir)
	sph_list = glob.glob(
		data_dir + '/an4/**/*.sph',
		recursive=True
	)
	for sph_path in sph_list:
		wav_path = sph_path[:-4] + '.wav'
		cmd = ["sox",sph_path,wav_path]
		subprocess.run(cmd)

#function to create manifest file
def build_manifest(
		transcripts_path,
		manifest_path,wav_path
	):
	with open(transcripts_path,'r') as fin:
		with open(manifest_path,'w') as fout:
			for line in fin:
				transcript = line[: \
					line.find('(')-1].lower()
				transcript = transcript.replace(
					'<s>',''
				).replace('</s>','')
				transcript = transcript.strip()
				file_id = line[line.find('(')+1 : -2]
				audio_path = os.path.join(
					data_dir,wav_path,
					file_id[file_id.find('-')+1 : \
						file_id.rfind('-')],
					file_id + '.wav')
				duration = librosa.core.get_duration(
					filename=audio_path
				)
				metadata = {
					"audio_filepath": audio_path,
					"duration": duration,
					"text": transcript
				}
				json.dump(metadata,fout)
				fout.write('\n')
				
#make manifest files
train_transcripts = data_dir + \
	'/an4/etc/an4_train.transcription'
train_manifest = data_dir + \
	'/an4/train_manifest.json'
if not os.path.isfile(train_manifest):
	build_manifest(
		train_transcripts,
		train_manifest,
		'an4/wav/an4_clstk'
	)
test_transcripts = data_dir + \
	'/an4/etc/an4_test.transcription'
test_manifest = data_dir + \
	'/an4/test_manifest.json'
if not os.path.isfile(test_manifest):
	build_manifest(
		test_transcripts,
		test_manifest,
		'an4/wav/an4test_clstk'
	)

#read config from yaml file
yaml = YAML(typ='safe')
with open(config_path) as f:
	params = yaml.load(f)

print(params)

#build trainer
trainer = pl.Trainer(
	devices=1,
	accelerator='gpu',
	max_epochs=epochs
)

#specify training and validation data
params['model']['train_ds']\
	['manifest_filepath'] = train_manifest
params['model']['validation_ds']\
	['manifest_filepath'] = test_manifest

#build model
first_asr_model = \
	nemo_asr.models.EncDecCTCModel(
		cfg=DictConfig(params['model']),
		trainer=trainer
)

#train
trainer.fit(first_asr_model)

#do some inference
paths2audio_files = [
		os.path.join(
			data_dir,
			'an4/wav/an4_clstk/mgah/cen2-mgah-b.wav'
		),
		os.path.join(
			data_dir,
			'an4/wav/an4_clstk/fmjd/cen7-fmjd-b.wav'
		),
		os.path.join(
			data_dir,
			'an4/wav/an4_clstk/fmjd/cen8-fmjd-b.wav'
		),
		os.path.join(
			data_dir,
			'an4/wav/an4_clstk/fkai/cen8-fkai-b.wav'
		)
	]
#print(first_asr_model.transcribe(
#	paths2audio_files=paths2audio_files,
#	batch_size=4
#))

Here's the output:

{'name': 'QuartzNet15x5', 'sample_rate': 16000, 'repeat': 1, 'dropout': 0.0, 'separable': True, 'labels': [' ', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', "'"], 'model': {'train_ds': {'manifest_filepath': '???', 'sample_rate': 16000, 'labels': [' ', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', "'"], 'batch_size': 32, 'trim_silence': True, 'max_duration': 16.7, 'shuffle': True, 'num_workers': 4, 'pin_memory': True, 'is_tarred': False, 'tarred_audio_filepaths': None, 'shuffle_n': 2048, 'bucketing_strategy': 'synced_randomized', 'bucketing_batch_size': None}, 'validation_ds': {'manifest_filepath': '???', 'sample_rate': 16000, 'labels': [' ', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', "'"], 'batch_size': 32, 'shuffle': False, 'num_workers': 4, 'pin_memory': True}, 'preprocessor': {'_target_': 'nemo.collections.asr.modules.audio_preprocessing.AudioToMelSpectrogramPreprocessor', 'normalize': 'per_feature', 'window_size': 0.02, 'sample_rate': 16000, 'window_stride': 0.01, 'window': 'hann', 'features': 64, 'n_fft': 512, 'frame_splicing': 1, 'dither': 1e-05, 'stft_conv': False}, 'spec_augment': {'_target_': 'nemo.collections.asr.modules.SpectrogramAugmentation', 'rect_freq': 50, 'rect_masks': 5, 'rect_time': 120}, 'encoder': {'_target_': 'nemo.collections.asr.modules.ConvASREncoder', 'feat_in': 64, 'activation': 'relu', 'conv_mask': True, 'jasper': [{'filters': 128, 'repeat': 1, 'kernel': [11], 'stride': [1], 'dilation': [1], 'dropout': 0.0, 'residual': True, 'separable': True, 'se': True, 'se_context_size': -1}, {'filters': 256, 'repeat': 1, 'kernel': [13], 'stride': [1], 'dilation': [1], 'dropout': 0.0, 'residual': True, 'separable': True, 'se': True, 'se_context_size': -1}, {'filters': 256, 'repeat': 1, 'kernel': [15], 'stride': [1], 'dilation': [1], 'dropout': 0.0, 'residual': True, 'separable': True, 'se': True, 'se_context_size': -1}, {'filters': 256, 'repeat': 1, 'kernel': [17], 'stride': [1], 'dilation': [1], 'dropout': 0.0, 'residual': True, 'separable': True, 'se': True, 'se_context_size': -1}, {'filters': 256, 'repeat': 1, 'kernel': [19], 'stride': [1], 'dilation': [1], 'dropout': 0.0, 'residual': True, 'separable': True, 'se': True, 'se_context_size': -1}, {'filters': 256, 'repeat': 1, 'kernel': [21], 'stride': [1], 'dilation': [1], 'dropout': 0.0, 'residual': False, 'separable': True, 'se': True, 'se_context_size': -1}, {'filters': 1024, 'repeat': 1, 'kernel': [1], 'stride': [1], 'dilation': [1], 'dropout': 0.0, 'residual': False, 'separable': True, 'se': True, 'se_context_size': -1}]}, 'decoder': {'_target_': 'nemo.collections.asr.modules.ConvASRDecoder', 'feat_in': 1024, 'num_classes': 28, 'vocabulary': [' ', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', "'"]}, 'optim': {'name': 'novograd', 'lr': 0.01, 'betas': [0.8, 0.5], 'weight_decay': 0.001, 'sched': {'name': 'CosineAnnealing', 'monitor': 'val_loss', 'reduce_on_plateau': False, 'warmup_steps': None, 'warmup_ratio': None, 'min_lr': 0.0, 'last_epoch': -1}}}, 'trainer': {'devices': 1, 'max_epochs': 5, 'max_steps': -1, 'num_nodes': 1, 'accelerator': 'gpu', 'strategy': 'ddp', 'accumulate_grad_batches': 1, 'enable_checkpointing': False, 'logger': False, 'log_every_n_steps': 1, 'val_check_interval': 1.0, 'benchmark': False}, 'exp_manager': {'exp_dir': None, 'name': 'QuartzNet15x5', 'create_tensorboard_logger': True, 'create_checkpoint_callback': True}}
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[NeMo I 2024-12-09 16:13:59 audio_to_text_dataset:49] Model level config does not contain `sample_rate`, please explicitly provide `sample_rate` to the dataloaders.
[NeMo I 2024-12-09 16:13:59 audio_to_text_dataset:49] Model level config does not contain `labels`, please explicitly provide `labels` to the dataloaders.
[NeMo I 2024-12-09 16:13:59 collections:196] Dataset loaded with 948 files totalling 0.71 hours
[NeMo I 2024-12-09 16:13:59 collections:197] 0 files were filtered totalling 0.00 hours
[NeMo I 2024-12-09 16:13:59 audio_to_text_dataset:49] Model level config does not contain `sample_rate`, please explicitly provide `sample_rate` to the dataloaders.
[NeMo I 2024-12-09 16:13:59 audio_to_text_dataset:49] Model level config does not contain `labels`, please explicitly provide `labels` to the dataloaders.
[NeMo I 2024-12-09 16:13:59 collections:196] Dataset loaded with 130 files totalling 0.10 hours
[NeMo I 2024-12-09 16:13:59 collections:197] 0 files were filtered totalling 0.00 hours
[NeMo I 2024-12-09 16:13:59 features:289] PADDING: 16
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
[NeMo I 2024-12-09 16:14:00 modelPT:728] Optimizer config = Novograd (
    Parameter Group 0
        amsgrad: False
        betas: [0.8, 0.5]
        eps: 1e-08
        grad_averaging: False
        lr: 0.01
        weight_decay: 0.001
    )
[NeMo I 2024-12-09 16:14:00 lr_scheduler:910] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7f8ff9db6da0>" 
    will be used during training (effective maximum steps = 90) - 
    Parameters : 
    (warmup_steps: null
    warmup_ratio: null
    min_lr: 0.0
    last_epoch: -1
    max_steps: 90
    )

  | Name              | Type                              | Params
------------------------------------------------------------------------
0 | preprocessor      | AudioToMelSpectrogramPreprocessor | 0     
1 | encoder           | ConvASREncoder                    | 1.2 M 
2 | decoder           | ConvASRDecoder                    | 29.7 K
3 | loss              | CTCLoss                           | 0     
4 | spec_augmentation | SpectrogramAugmentation           | 0     
5 | _wer              | WER                               | 0     
------------------------------------------------------------------------
1.2 M     Trainable params
0         Non-trainable params
1.2 M     Total params
4.836     Total estimated model params size (MB)
[NeMo W 2024-12-09 16:14:01 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/fit_loop.py:281: PossibleUserWarning: The number of training batches (30) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
      rank_zero_warn(
    
Epoch 2: 100%|████████████████████████| 30/30 [00:01<00:00, 15.07it/s, v_num=15]`Trainer.fit` stopped: `max_epochs=3` reached.                                  
Epoch 2: 100%|████████████████████████| 30/30 [00:02<00:00, 14.87it/s, v_num=15]
Transcribing:   0%|                                       | 0/1 [00:00<?, ?it/s]Segmentation fault (core dumped)

Here's how the container was built:

docker run -it \
  --gpus all \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  -p 8888:8888 \
  --name ne \
  -v /data/:/mhdata \
  nvcr.io/nvidia/nemo:23.10

Describe the bug

A clear and concise description of what the bug is.

Steps/Code to reproduce bug

Please list minimal steps or code snippet for us to be able to reproduce the bug.

A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.

Expected behavior

A clear and concise description of what you expected to happen.

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
  • Method of NeMo install: [pip install or from source]. Please specify exact commands you used to install.
  • If method of install is [Docker], provide docker pull & docker run commands used

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

  • OS version
  • PyTorch version
  • Python version

Additional context

Add any other context about the problem here.
Example: GPU model

@hammondm hammondm added the bug Something isn't working label Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant