Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: /workplace/spconv/src/spconv/spconv_ops.cc 87 unknown device type error #18

Open
junhocho opened this issue Dec 19, 2022 · 1 comment

Comments

@junhocho
Copy link

Hi, thanks for sharing code.
I am leaving an issue since I have trouble on running your code.
I run a code without ddp python ./tools/train.py ./configs/nusc/nuscenes_centerformer_separate_detection_head.py,
sh setup.sh works nicely. but here is follwing error when running train.py.

Traceback (most recent call last):
  File "./tools/train.py", line 137, in <module>
    main()
  File "./tools/train.py", line 132, in main
    logger=logger,
  File "/workspace/det3d/torchie/apis/train.py", line 335, in train_detector
    trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, local_rank=cfg.local_rank)
  File "/workspace/det3d/torchie/trainer/trainer.py", line 546, in run
    epoch_runner(data_loaders[i], self.epoch, **kwargs)
  File "/workspace/det3d/torchie/trainer/trainer.py", line 413, in train
    self.model, data_batch, train_mode=True, **kwargs
  File "/workspace/det3d/torchie/trainer/trainer.py", line 371, in batch_processor_inline
    losses = model(example, return_loss=True)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/det3d/models/detectors/voxelnet_dynamic.py", line 52, in forward
    x, _ = self.extract_feat(example)
  File "/workspace/det3d/models/detectors/voxelnet_dynamic.py", line 38, in extract_feat
    data['voxels'], data["coors"], data["batch_size"], data["input_shape"]
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/det3d/models/backbones/scn.py", line 156, in forward
    x = self.conv_input(ret)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/spconv/modules.py", line 134, in forward
    input = module(input)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/spconv/conv.py", line 181, in forward
    use_hash=self.use_hash)
  File "/opt/conda/lib/python3.7/site-packages/spconv/ops.py", line 95, in get_indice_pairs
    int(use_hash))
ValueError: /workplace/spconv/src/spconv/spconv_ops.cc 87
unknown device type

I have tried hard to run your code on nuscenes dataset. We also have 8gpus of A100 settting as you do.
One difference would be that I use docker image.
Here is dockerfile.

FROM pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel
MAINTAINER Junho Cho <[email protected]>

RUN rm /etc/apt/sources.list.d/cuda.list
RUN rm /etc/apt/sources.list.d/nvidia-ml.list
RUN apt-get update

RUN apt-get install git -y
RUN git clone https://github.com/TuSimple/centerformer.git

RUN cd centerformer && pip install -r requirements.txt

RUN apt-get install wget libboost-all-dev libgl1 -y

# Install cmake v3.13.2
RUN apt-get purge -y cmake && \
    mkdir /root/temp && \
    cd /root/temp && \
    wget https://github.com/Kitware/CMake/releases/download/v3.13.2/cmake-3.13.2.tar.gz && \
    tar -xzvf cmake-3.13.2.tar.gz && \
    cd cmake-3.13.2 && \
    bash ./bootstrap && \
    make && \
    make install && \
    cmake --version && \
    rm -rf /root/temp

RUN git clone --branch v1.2.1  https://github.com/traveller59/spconv.git --recursive
RUN cd spconv && python setup.py bdist_wheel && cd ./dist && pip install *whl

WORKDIR /workspace
ENV PYTHONPATH="${PYTHONPATH}:/workspace"

Through this dockerfile, we build spconv v1.2.1 on cuda 11.1 and pytorch 1.9.1 environment.
This makes exact pytorch, cuda version as your setting. Only difference is python, but I think is not a big difference. (also tried python 3.9.12, but no luck).

sh setup.sh always works nicely.

seems following error

ValueError: /root/spconv/src/spconv/spconv_ops.cc 87
unknown device type

might be solved with using other spconv (according to traveller59/spconv#58) , but I have not tried because you specified only spconv 1.2.1 works.

Would there be any idea to sort this issue?

Probably, spconv 1.2.1 does not work in docker accordint to this, but I confirmed spconv 2.2 worked in docker.

If this so, is there any chance this repo be able to support spconv 2.2? (I already tried spconv 2.2 for centerformer and failed a lot)

@Machine-NO-Learning
Copy link

acturally conda is ok to do that, surely you use docker maybe based on some reasons like landing projects or your server cuda version which is also used by your friends that can't be changed.

this project is based on centerpoint, you can set env based on that, for spconv v1, I suggest that you:

  1. git clone -b v1.2.1 https://github.com/traveller59/spconv.git --recursive
  2. cd spconv && python setup.py bdist_wheel (maybe you will meet No CUDA compler find and no Cmake.txt in pyblind11, you can do export CUDA_HOME=/usr/local/cuda
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
    export PATH=$PATH:$CUDA_HOME/bin, and git clone --recurse-submodules https://github.com/pybind/pybind11.git in spconv/thirdparty/)
  3. cd ./dist
  4. pip install xxx.whl

then maybe you should setup multi_scale_deformable and iou_nms_cuda
you should go to det3d/ops/ and setup them

hope this can help you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants