You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for sharing code.
I am leaving an issue since I have trouble on running your code.
I run a code without ddp python ./tools/train.py ./configs/nusc/nuscenes_centerformer_separate_detection_head.py, sh setup.sh works nicely. but here is follwing error when running train.py.
Traceback (most recent call last):
File "./tools/train.py", line 137, in <module>
main()
File "./tools/train.py", line 132, in main
logger=logger,
File "/workspace/det3d/torchie/apis/train.py", line 335, in train_detector
trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, local_rank=cfg.local_rank)
File "/workspace/det3d/torchie/trainer/trainer.py", line 546, in run
epoch_runner(data_loaders[i], self.epoch, **kwargs)
File "/workspace/det3d/torchie/trainer/trainer.py", line 413, in train
self.model, data_batch, train_mode=True, **kwargs
File "/workspace/det3d/torchie/trainer/trainer.py", line 371, in batch_processor_inline
losses = model(example, return_loss=True)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/det3d/models/detectors/voxelnet_dynamic.py", line 52, in forward
x, _ = self.extract_feat(example)
File "/workspace/det3d/models/detectors/voxelnet_dynamic.py", line 38, in extract_feat
data['voxels'], data["coors"], data["batch_size"], data["input_shape"]
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/det3d/models/backbones/scn.py", line 156, in forward
x = self.conv_input(ret)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/spconv/modules.py", line 134, in forward
input = module(input)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/spconv/conv.py", line 181, in forward
use_hash=self.use_hash)
File "/opt/conda/lib/python3.7/site-packages/spconv/ops.py", line 95, in get_indice_pairs
int(use_hash))
ValueError: /workplace/spconv/src/spconv/spconv_ops.cc 87
unknown device type
I have tried hard to run your code on nuscenes dataset. We also have 8gpus of A100 settting as you do.
One difference would be that I use docker image.
Here is dockerfile.
FROM pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel
MAINTAINER Junho Cho <[email protected]>
RUN rm /etc/apt/sources.list.d/cuda.list
RUN rm /etc/apt/sources.list.d/nvidia-ml.list
RUN apt-get update
RUN apt-get install git -y
RUN git clone https://github.com/TuSimple/centerformer.git
RUN cd centerformer && pip install -r requirements.txt
RUN apt-get install wget libboost-all-dev libgl1 -y
# Install cmake v3.13.2
RUN apt-get purge -y cmake && \
mkdir /root/temp && \
cd /root/temp && \
wget https://github.com/Kitware/CMake/releases/download/v3.13.2/cmake-3.13.2.tar.gz && \
tar -xzvf cmake-3.13.2.tar.gz && \
cd cmake-3.13.2 && \
bash ./bootstrap && \
make && \
make install && \
cmake --version && \
rm -rf /root/temp
RUN git clone --branch v1.2.1 https://github.com/traveller59/spconv.git --recursive
RUN cd spconv && python setup.py bdist_wheel && cd ./dist && pip install *whl
WORKDIR /workspace
ENV PYTHONPATH="${PYTHONPATH}:/workspace"
Through this dockerfile, we build spconv v1.2.1 on cuda 11.1 and pytorch 1.9.1 environment.
This makes exact pytorch, cuda version as your setting. Only difference is python, but I think is not a big difference. (also tried python 3.9.12, but no luck).
sh setup.sh always works nicely.
seems following error
ValueError: /root/spconv/src/spconv/spconv_ops.cc 87
unknown device type
might be solved with using other spconv (according to traveller59/spconv#58) , but I have not tried because you specified only spconv 1.2.1 works.
Would there be any idea to sort this issue?
Probably, spconv 1.2.1 does not work in docker accordint to this, but I confirmed spconv 2.2 worked in docker.
If this so, is there any chance this repo be able to support spconv 2.2? (I already tried spconv 2.2 for centerformer and failed a lot)
The text was updated successfully, but these errors were encountered:
acturally conda is ok to do that, surely you use docker maybe based on some reasons like landing projects or your server cuda version which is also used by your friends that can't be changed.
this project is based on centerpoint, you can set env based on that, for spconv v1, I suggest that you:
cd spconv && python setup.py bdist_wheel (maybe you will meet No CUDA compler find and no Cmake.txt in pyblind11, you can do export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
export PATH=$PATH:$CUDA_HOME/bin, and git clone --recurse-submodules https://github.com/pybind/pybind11.git in spconv/thirdparty/)
cd ./dist
pip install xxx.whl
then maybe you should setup multi_scale_deformable and iou_nms_cuda
you should go to det3d/ops/ and setup them
Hi, thanks for sharing code.
I am leaving an issue since I have trouble on running your code.
I run a code without ddp
python ./tools/train.py ./configs/nusc/nuscenes_centerformer_separate_detection_head.py
,sh setup.sh
works nicely. but here is follwing error when runningtrain.py
.I have tried hard to run your code on nuscenes dataset. We also have 8gpus of A100 settting as you do.
One difference would be that I use docker image.
Here is dockerfile.
Through this dockerfile, we build
spconv v1.2.1
oncuda 11.1
andpytorch 1.9.1
environment.This makes exact pytorch, cuda version as your setting. Only difference is python, but I think is not a big difference. (also tried python 3.9.12, but no luck).
sh setup.sh
always works nicely.seems following error
might be solved with using other spconv (according to traveller59/spconv#58) , but I have not tried because you specified only
spconv 1.2.1
works.Would there be any idea to sort this issue?
Probably, spconv 1.2.1 does not work in docker accordint to this, but I confirmed spconv 2.2 worked in docker.
If this so, is there any chance this repo be able to support spconv 2.2? (I already tried spconv 2.2 for centerformer and failed a lot)
The text was updated successfully, but these errors were encountered: