Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors for processing waymo infos: #29

Open
Z-Lee-corder opened this issue Sep 7, 2023 · 13 comments
Open

Errors for processing waymo infos: #29

Z-Lee-corder opened this issue Sep 7, 2023 · 13 comments

Comments

@Z-Lee-corder
Copy link

Z-Lee-corder commented Sep 7, 2023

Hello, I would like to run the code on waymo dataset. However, when I run the following two commands

  1. "python -m al3d_det.datasets.waymo.waymo_preprocess --cfg_file tools/cfgs/det_dataset_cfgs/waymo_xxx_sweeps_mm.yaml --func create_waymo_infos",
  2. python -m al3d_det.datasets.waymo.waymo_preprocess --cfg_file tools/cfgs/det_dataset_cfgs/waymo_xxxx_sweeps_mm.yaml --func create_waymo_database

The following error occurred:
2023-09-07 21:03:29.162028: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Traceback (most recent call last):
File "/home/lizheng/anaconda3/envs/open-mmlab/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/lizheng/anaconda3/envs/open-mmlab/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/media/lizheng/Samsung/codes/LoGoNet/detection/al3d_det/datasets/waymo/waymo_preprocess.py", line 355, in
create_waymo_database(
File "/media/lizheng/Samsung/codes/LoGoNet/detection/al3d_det/datasets/waymo/waymo_preprocess.py", line 304, in create_waymo_database
dataset = WaymoTrainingDataset(
File "/media/lizheng/Samsung/codes/LoGoNet/detection/al3d_det/datasets/waymo/waymo_dataset.py", line 51, in init
from petrel_client.client import Client
ModuleNotFoundError: No module named 'petrel_client'

When I remove "OSS_PATH: 'cluster2:s3://dataset/waymo" in "waymo_one_sweep_mm.yaml", the new error occurred:
Traceback (most recent call last):
File "/media/lizheng/Samsung/codes/LoGoNet/detection/al3d_det/datasets/waymo/waymo_preprocess.py", line 38, in get_infos_worker
sequence_infos = list(tqdm(executor.map(process_single_sequence, sample_sequence_file_list),
File "/home/lizheng/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/tqdm/std.py", line 1182, in iter
for obj in iterable:
File "/home/lizheng/anaconda3/envs/open-mmlab/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/home/lizheng/anaconda3/envs/open-mmlab/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/home/lizheng/anaconda3/envs/open-mmlab/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/home/lizheng/anaconda3/envs/open-mmlab/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/media/lizheng/Samsung/codes/LoGoNet/detection/al3d_det/datasets/waymo/waymo_utils.py", line 218, in process_single_sequence_and_save
if pkl_file.exists():
AttributeError: 'str' object has no attribute 'exists'

May I ask what I should do?

@CSautier
Copy link

Replace if pkl_file.exists(): by if os.path.exists(pkl_file)
Later on you might also have to comment the line info_path = self.check_sequence_name_with_all_version(info_path) and replace
sequence_file_tfrecord = sequence_file[:-9] + '_with_camera_labels.tfrecord
by
sequence_file_tfrecord = sequence_file[:-9] + '.tfrecord'.
Those might only be true if you are not using ceph.

@Z-Lee-corder
Copy link
Author

Replace if pkl_file.exists(): by if os.path.exists(pkl_file) Later on you might also have to comment the line info_path = self.check_sequence_name_with_all_version(info_path) and replace sequence_file_tfrecord = sequence_file[:-9] + '_with_camera_labels.tfrecord by sequence_file_tfrecord = sequence_file[:-9] + '.tfrecord'. Those might only be true if you are not using ceph.

Thank you for your reply. After following your modifications, the error has been eliminated. But when I run the command "python -m al3d_det.datasets.waymo.waymo_preprocess --cfg_file tools/cfgs/det_dataset_cfgs/waymo_one_sweep_mm.yaml --func create_waymo_infos", the CPU memory (64GB) is not enough.

May I know how to operate the codes properly?

@Z-Lee-corder
Copy link
Author

Replace if pkl_file.exists(): by if os.path.exists(pkl_file) Later on you might also have to comment the line info_path = self.check_sequence_name_with_all_version(info_path) and replace sequence_file_tfrecord = sequence_file[:-9] + '_with_camera_labels.tfrecord by sequence_file_tfrecord = sequence_file[:-9] + '.tfrecord'. Those might only be true if you are not using ceph.

Previously, I processed waymo data in the official project of “OpenPCDet”. However, the processed waymo data does not contain images infomation. Is it not possible for me to use the these generated data files in this project (LoGoNet) now?

@CSautier
Copy link

Yes, I've found too that the waymo preprocessing cost a lot of memory. A partial solution would be to remove the multiprocessing by replacing

with futures.ThreadPoolExecutor(num_workers) as executor:
    sequence_infos = list(tqdm(executor.map(process_single_sequence, sample_sequence_file_list),
                               total=len(sample_sequence_file_list)))

by
sequence_infos = list([process_single_sequence(sample_sequence_file) for sample_sequence_file in tqdm(sample_sequence_file_list)])
however you have to know that this makes the process even slower.

Also if at some point you get it running, make absolutely sure it does save png files, as for me it didn't at first. You can for instance replace in waymo_utils the line
cv2.imwrite(image_path, all_images[cam_i]) by

if not cv2.imwrite(image_path, all_images[cam_i]):
    os.makedirs(os.path.join(cur_save_dir, 'image_{}'.format(cam_i)), exist_ok=True)
    cv2.imwrite(image_path, all_images[cam_i])

@CSautier
Copy link

As for using the OpenPCDet preprocessing I have no idea. I'm not affiliated with the authors of the code, I'm just trying to get the code running as well.

@Z-Lee-corder
Copy link
Author

Yes, I've found too that the waymo preprocessing cost a lot of memory. A partial solution would be to remove the multiprocessing by replacing

with futures.ThreadPoolExecutor(num_workers) as executor:
    sequence_infos = list(tqdm(executor.map(process_single_sequence, sample_sequence_file_list),
                               total=len(sample_sequence_file_list)))

by sequence_infos = list([process_single_sequence(sample_sequence_file) for sample_sequence_file in tqdm(sample_sequence_file_list)]) however you have to know that this makes the process even slower.

Also if at some point you get it running, make absolutely sure it does save png files, as for me it didn't at first. You can for instance replace in waymo_utils the line cv2.imwrite(image_path, all_images[cam_i]) by

if not cv2.imwrite(image_path, all_images[cam_i]):
    os.makedirs(os.path.join(cur_save_dir, 'image_{}'.format(cam_i)), exist_ok=True)
    cv2.imwrite(image_path, all_images[cam_i])

Thank you very much for your patient answer. With your help, I can now process the data normally. But I found that the processed data is very large. Is it necessary to have at least 5T of storage space, as I have found that each frame scene has an additional 6 camera images added. At present, my storage capacity is only 3T, which is probably not enough.

@CSautier
Copy link

I can't tell for sure, but the waymo_one_sweep_mm.yaml seems to use a bit less than 3T. Maybe start with KITTI as it seems to be much lighter and probably easier to setup.

@SISTMrL
Copy link

SISTMrL commented Sep 15, 2023

I can't tell for sure, but the waymo_one_sweep_mm.yaml seems to use a bit less than 3T. Maybe start with KITTI as it seems to be much lighter and probably easier to setup.

hello, could you please tell me how long of the porcessing on waymo infos,? The program output log stayed in this interface for a long time

image

and the gpu memory i used is shown as below:

image

@CSautier
Copy link

The pre-processing last about 150 hours on my hardware, with no multiprocessing. I'm not sure why it uses any GPU memory, as far as I can tell the pre-processing is CPU-only. It seems to me that it opens all sequences, parses it, converts the range-view into point clouds, and saves individually the point cloud, images and annotations.

@reynerliu
Copy link

@CSautier thanks bro,i got trouble for this for a whole week. appreciate for your attribution!

@SiHengHeHSH
Copy link

SiHengHeHSH commented Mar 15, 2024

assert img_file.exists()
AttributeError: 'NoneType' object has no attribute 'exists'

run ‘python -m al3d_det.datasets.waymo.waymo_preprocess --cfg_file tools/cfgs/det_dataset_cfgs/waymo_xxx_sweeps_mm.yaml --func create_waymo_infos' and 'python -m al3d_det.datasets.waymo.waymo_preprocess --cfg_file tools/cfgs/det_dataset_cfgs/waymo_xxxx_sweeps_mm.yaml --func create_waymo_database'. There are
not the path of '../data/waymo/waymo_processed_data_v4/segment-9509506420470671704_4049_100_4069_100_with_camera_labels/image_0/0034.png' and '../data/waymo/waymo_processed_data_v4/segment-9509506420470671704_4049_100_4069_100_with_camera_labels/image*' What should I do? Thake you for your answer. @CSautier There lack of image file of waymo data set.

@kikiki-cloud
Copy link

Yes, I've found too that the waymo preprocessing cost a lot of memory. A partial solution would be to remove the multiprocessing by replacing

with futures.ThreadPoolExecutor(num_workers) as executor:
    sequence_infos = list(tqdm(executor.map(process_single_sequence, sample_sequence_file_list),
                               total=len(sample_sequence_file_list)))

by sequence_infos = list([process_single_sequence(sample_sequence_file) for sample_sequence_file in tqdm(sample_sequence_file_list)]) however you have to know that this makes the process even slower.

Also if at some point you get it running, make absolutely sure it does save png files, as for me it didn't at first. You can for instance replace in waymo_utils the line cv2.imwrite(image_path, all_images[cam_i]) by

if not cv2.imwrite(image_path, all_images[cam_i]):
    os.makedirs(os.path.join(cur_save_dir, 'image_{}'.format(cam_i)), exist_ok=True)
    cv2.imwrite(image_path, all_images[cam_i])

Hello, I want to ask, why did my kitti data set report the following errors during training.
Traceback (most recent call last):
File "detection/tools/train.py", line 204, in
main()
File "detection/tools/train.py", line 153, in main
last_epoch=last_epoch, optim_cfg=cfg.OPTIMIZATION
File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/init.py", line 52, in build_scheduler
optimizer, total_steps, last_step, optim_cfg.LR, list(optim_cfg.MOMS), optim_cfg.DIV_FACTOR, optim_cfg.PCT_START
File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 85, in init
super().init(fai_optimizer, total_step, last_step, lr_phases, mom_phases)
File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 45, in init
self.step()
File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 58, in step
self.update_lr()
File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 51, in update_lr
self.optimizer.lr = func((step - start) / (end - start))
ZeroDivisionError: division by zero
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3213650) of binary: /home/linux/anaconda3/envs/logonet/bin/python
Traceback (most recent call last):
File "/home/linux/anaconda3/envs/logonet/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/linux/anaconda3/envs/logonet/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

@fangweicheng6
Copy link

Yes, I've found too that the waymo preprocessing cost a lot of memory. A partial solution would be to remove the multiprocessing by replacing

with futures.ThreadPoolExecutor(num_workers) as executor:
    sequence_infos = list(tqdm(executor.map(process_single_sequence, sample_sequence_file_list),
                               total=len(sample_sequence_file_list)))

by sequence_infos = list([process_single_sequence(sample_sequence_file) for sample_sequence_file in tqdm(sample_sequence_file_list)]) however you have to know that this makes the process even slower.
Also if at some point you get it running, make absolutely sure it does save png files, as for me it didn't at first. You can for instance replace in waymo_utils the line cv2.imwrite(image_path, all_images[cam_i]) by

if not cv2.imwrite(image_path, all_images[cam_i]):
    os.makedirs(os.path.join(cur_save_dir, 'image_{}'.format(cam_i)), exist_ok=True)
    cv2.imwrite(image_path, all_images[cam_i])

Hello, I want to ask, why did my kitti data set report the following errors during training. Traceback (most recent call last): File "detection/tools/train.py", line 204, in main() File "detection/tools/train.py", line 153, in main last_epoch=last_epoch, optim_cfg=cfg.OPTIMIZATION File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/init.py", line 52, in build_scheduler optimizer, total_steps, last_step, optim_cfg.LR, list(optim_cfg.MOMS), optim_cfg.DIV_FACTOR, optim_cfg.PCT_START File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 85, in init super().init(fai_optimizer, total_step, last_step, lr_phases, mom_phases) File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 45, in init self.step() File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 58, in step self.update_lr() File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 51, in update_lr self.optimizer.lr = func((step - start) / (end - start)) ZeroDivisionError: division by zero ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3213650) of binary: /home/linux/anaconda3/envs/logonet/bin/python Traceback (most recent call last): File "/home/linux/anaconda3/envs/logonet/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in main() File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run )(*cmd_args) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

hello Have you solve your question?I met the same question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants