Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSVT-P trainning on Kitti Dataset #64

Closed
dinvincible98 opened this issue Oct 23, 2023 · 20 comments
Closed

DSVT-P trainning on Kitti Dataset #64

dinvincible98 opened this issue Oct 23, 2023 · 20 comments
Labels
good first issue Good for newcomers

Comments

@dinvincible98
Copy link

dinvincible98 commented Oct 23, 2023

Hi,

I tried to train a dsvt-pillar model using the kitti dataset, below is my config:

CLASS_NAMES: ['Car', 'Pedestrian', 'Cyclist']

DATA_CONFIG: 
    _BASE_CONFIG_: cfgs/dataset_configs/kitti_dataset.yaml
    POINT_CLOUD_RANGE: [0, -40, -3, 70.4, 40, 1]

    DATA_AUGMENTOR:
        DISABLE_AUG_LIST: ['placeholder']
        AUG_CONFIG_LIST:
            - NAME: gt_sampling
              USE_ROAD_PLANE: True
              DB_INFO_PATH:
                  - kitti_dbinfos_train.pkl
              PREPARE: {
                 filter_by_min_points: ['Car:5', 'Pedestrian:5', 'Cyclist:5'],
                 filter_by_difficulty: [-1],
              }

              SAMPLE_GROUPS: ['Car:15','Pedestrian:15', 'Cyclist:15']
              NUM_POINT_FEATURES: 4
              REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0]
              LIMIT_WHOLE_SCENE: True

            - NAME: random_world_flip
              ALONG_AXIS_LIST: ['x','y']

            - NAME: random_world_rotation
              WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]

            - NAME: random_world_scaling
              WORLD_SCALE_RANGE: [0.95, 1.05]
            - NAME: random_world_translation
              NOISE_TRANSLATE_STD: [0.5, 0.5, 0.5]
    DATA_PROCESSOR:
      -   NAME: mask_points_and_boxes_outside_range
          REMOVE_OUTSIDE_BOXES: True
      -   NAME: shuffle_points
          SHUFFLE_ENABLED: {
            'train': True,
            'test': False
          }
      -   NAME: transform_points_to_voxels_placeholder
          VOXEL_SIZE: [ 0.1505, 0.1709, 4 ]

MODEL:
  NAME: CenterPoint

  VFE:
    NAME: DynPillarVFE3D
    WITH_DISTANCE: False
    USE_ABSLOTE_XYZ: True
    USE_NORM: True
    NUM_FILTERS: [192, 192]

  BACKBONE_3D:
    NAME: DSVT
    INPUT_LAYER:
      sparse_shape: [468, 468, 1]
      downsample_stride: []
      d_model: [192]
      set_info: [[36, 4]]
      window_shape: [[12, 12, 1]]
      hybrid_factor: [2, 2, 1] # x, y, z
      shifts_list: [[[0, 0, 0], [6, 6, 0]]]
      normalize_pos: False
    
    block_name: ['DSVTBlock']
    set_info: [[36, 4]]
    d_model: [192]
    nhead: [8]
    dim_feedforward: [384]
    
    
    dropout: 0.0 
    activation: gelu
    reduction_type: 'attention'
    output_shape: [468, 468]
    conv_out_channel: 192
    # ues_checkpoint: True

  MAP_TO_BEV:
    NAME: PointPillarScatter3d
    INPUT_SHAPE: [468, 468, 1]
    NUM_BEV_FEATURES: 192

  BACKBONE_2D:
    NAME: BaseBEVResBackbone
    LAYER_NUMS: [ 1, 2, 2 ]
    LAYER_STRIDES: [ 1, 2, 2 ]
    NUM_FILTERS: [ 128, 128, 256 ]
    UPSAMPLE_STRIDES: [ 1, 2, 4 ]
    NUM_UPSAMPLE_FILTERS: [ 128, 128, 128 ]

  DENSE_HEAD:
    NAME: CenterHead
    CLASS_AGNOSTIC: False

    CLASS_NAMES_EACH_HEAD: [
      ['Car', 'Pedestrian', 'Cyclist']
    ]

    SHARED_CONV_CHANNEL: 64
    USE_BIAS_BEFORE_NORM: False
    NUM_HM_CONV: 2

    BN_EPS: 0.001
    BN_MOM: 0.01
    SEPARATE_HEAD_CFG:
      HEAD_ORDER: ['center', 'center_z', 'dim', 'rot']
      HEAD_DICT: {
        'center': {'out_channels': 2, 'num_conv': 2},
        'center_z': {'out_channels': 1, 'num_conv': 2},
        'dim': {'out_channels': 3, 'num_conv': 2},
        'rot': {'out_channels': 2, 'num_conv': 2},
        'iou': {'out_channels': 1, 'num_conv': 2},
      }

    TARGET_ASSIGNER_CONFIG:
      FEATURE_MAP_STRIDE: 1
      NUM_MAX_OBJS: 500
      GAUSSIAN_OVERLAP: 0.1
      MIN_RADIUS: 2

    IOU_REG_LOSS: True

    LOSS_CONFIG:
      LOSS_WEIGHTS: {
        'cls_weight': 1.0,
        'loc_weight': 2.0,
        'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
      }

    POST_PROCESSING:
      SCORE_THRESH: 0.5
      POST_CENTER_LIMIT_RANGE: [-80, -80, -10.0, 80, 80, 10.0]
      MAX_OBJ_PER_SAMPLE: 500

      USE_IOU_TO_RECTIFY_SCORE: True
      IOU_RECTIFIER: [0.68, 0.71, 0.65]


      NMS_CONFIG:
        # NMS_TYPE: multi_class_nms  # only for centerhead, use mmdet3d version nms
        # NMS_THRESH: [0.7, 0.6, 0.55]
        # NMS_PRE_MAXSIZE: [4096, 4096, 4096]
        # NMS_POST_MAXSIZE: [500, 500, 500]
        
        NMS_TYPE: nms_gpu 
        NMS_THRESH: 0.1
        NMS_PRE_MAXSIZE: 4096
        NMS_POST_MAXSIZE: 500

  POST_PROCESSING:
    RECALL_THRESH_LIST: [0.3, 0.5, 0.7]

    EVAL_METRIC: kitti

OPTIMIZATION:
    BATCH_SIZE_PER_GPU: 1
    NUM_EPOCHS: 20

    OPTIMIZER: adam_onecycle
    LR: 0.001
    WEIGHT_DECAY: 0.01
    MOMENTUM: 0.9

    MOMS: [0.95, 0.85]
    PCT_START: 0.4
    DIV_FACTOR: 10
    DECAY_STEP_LIST: [35, 45]
    LR_DECAY: 0.1
    LR_CLIP: 0.0000001

    LR_WARMUP: False
    WARMUP_EPOCH: 1
    
    GRAD_NORM_CLIP: 10
    LOSS_SCALE_FP16: 32.0

HOOK:
  DisableAugmentationHook:
    DISABLE_AUG_LIST: ['gt_sampling','random_world_flip','random_world_rotation','random_world_scaling', 'random_world_translation']
    NUM_LAST_EPOCHS: 1

I only modified the point cloud range to match with the kitti settings and the voxel size to match with the default sparce shape [468, 468, 1], but I am constantly getting an error:

RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

I traced down the error happened in DynamicPillarVFE3D module where the batch_dict['points'] often return some empty tensor point. However, when I tried to use the default point cloud range from waymo settings: [-74.88, -74.88, -2, 74.88, 74.88, 4.0], this error disappered. Can u give me some guidance?

Thank you!

@xifen523
Copy link

Did you train successfully on the kitti? How did it turn out?

@dinvincible98
Copy link
Author

Did you train successfully on the kitti? How did it turn out?

No, I always get Nan or Inf error during trainning. I guess there are some hyperparameter issues.

@xifen523
Copy link

xifen523 commented Nov 1, 2023

Did you train successfully on the kitti? How did it turn out?

No, I always get Nan or Inf error during trainning. I guess there are some hyperparameter issues.

I will try to run this code on the KITTI in the future when I have some free time.

@Haiyang-W
Copy link
Owner

Very sorry for the late reply, I'm rushing some ddls. We haven't tried kitti dataset. You can see if issue59 will be helpful.

@dinvincible98
Copy link
Author

dinvincible98 commented Nov 1, 2023

Did you train successfully on the kitti? How did it turn out?

No, I always get Nan or Inf error during trainning. I guess there are some hyperparameter issues.

I will try to run this code on the KITTI in the future when I have some free time.

There are some data augumentor issues, here's the modified config, you can try this to see if get Nan or Inf error:

    CLASS_NAMES: ['Car', 'Pedestrian', 'Cyclist']
    
    DATA_CONFIG: 
        _BASE_CONFIG_: cfgs/dataset_configs/kitti_dataset.yaml
        POINT_CLOUD_RANGE: [0, -39.68, -3, 69.12, 39.68, 1]
        DATA_PROCESSOR:
            - NAME: mask_points_and_boxes_outside_range
              REMOVE_OUTSIDE_BOXES: True
    
            - NAME: shuffle_points
              SHUFFLE_ENABLED: {
                'train': True,
                'test': False
              }
    
            - NAME: transform_points_to_voxels
              VOXEL_SIZE: [0.1477, 0.1696, 4]
              MAX_POINTS_PER_VOXEL: 32
              MAX_NUMBER_OF_VOXELS: {
                'train': 16000,
                'test': 40000
              }
        DATA_AUGMENTOR:
            DISABLE_AUG_LIST: ['placeholder']
            AUG_CONFIG_LIST:
                - NAME: gt_sampling
                  USE_ROAD_PLANE: True
                  DB_INFO_PATH:
                      - kitti_dbinfos_train.pkl
                  PREPARE: {
                     filter_by_min_points: ['Car:5', 'Pedestrian:5', 'Cyclist:5'],
                     filter_by_difficulty: [-1],
                  }
    
                  SAMPLE_GROUPS: ['Car:15','Pedestrian:15', 'Cyclist:15']
                  NUM_POINT_FEATURES: 4
                  DATABASE_WITH_FAKELIDAR: False
                  REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0]
                  LIMIT_WHOLE_SCENE: False
    
                - NAME: random_world_flip
                  ALONG_AXIS_LIST: ['x']
    
                - NAME: random_world_rotation
                  WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]
    
                - NAME: random_world_scaling
                  WORLD_SCALE_RANGE: [0.95, 1.05]
    MODEL:
      NAME: CenterPoint
    
      VFE:
        NAME: DynPillarVFE3D
        WITH_DISTANCE: False
        USE_ABSLOTE_XYZ: True
        USE_NORM: True
        NUM_FILTERS: [ 192, 192 ]
    
      BACKBONE_3D:
        NAME: DSVT
        INPUT_LAYER:
          sparse_shape: [468, 468, 1]
          downsample_stride: []
          d_model: [192]
          set_info: [[36, 4]]
          window_shape: [[12, 12, 1]]
          hybrid_factor: [2, 2, 1] # x, y, z
          shifts_list: [[[0, 0, 0], [6, 6, 0]]]
          normalize_pos: False
    
        block_name: ['DSVTBlock']
        set_info: [[36, 4]]
        d_model: [192]
        nhead: [8]
        dim_feedforward: [384]
        dropout: 0.0
        activation: gelu
        output_shape: [468, 468]
        conv_out_channel: 192
        # ues_checkpoint: True
    
      MAP_TO_BEV:
        NAME: PointPillarScatter3d
        INPUT_SHAPE: [468, 468, 1]
        NUM_BEV_FEATURES: 192
    
      BACKBONE_2D:
        NAME: BaseBEVResBackbone
        LAYER_NUMS: [ 1, 2, 2 ]
        LAYER_STRIDES: [ 1, 2, 2 ]
        NUM_FILTERS: [ 128, 128, 256 ]
        UPSAMPLE_STRIDES: [ 1, 2, 4 ]
        NUM_UPSAMPLE_FILTERS: [ 128, 128, 128 ]
    
      DENSE_HEAD:
        NAME: CenterHead
        CLASS_AGNOSTIC: False
    
        CLASS_NAMES_EACH_HEAD: [
          ['Car', 'Pedestrian', 'Cyclist']
        ]
    
        SHARED_CONV_CHANNEL: 64
        USE_BIAS_BEFORE_NORM: True
        NUM_HM_CONV: 2
    
        BN_EPS: 0.001
        BN_MOM: 0.01
        SEPARATE_HEAD_CFG:
          HEAD_ORDER: ['center', 'center_z', 'dim', 'rot']
          HEAD_DICT: {
            'center': {'out_channels': 2, 'num_conv': 2},
            'center_z': {'out_channels': 1, 'num_conv': 2},
            'dim': {'out_channels': 3, 'num_conv': 2},
            'rot': {'out_channels': 2, 'num_conv': 2},
            'iou': {'out_channels': 1, 'num_conv': 2},
          }
    
        TARGET_ASSIGNER_CONFIG:
          FEATURE_MAP_STRIDE: 1
          NUM_MAX_OBJS: 500
          GAUSSIAN_OVERLAP: 0.1
          MIN_RADIUS: 2
    
        IOU_REG_LOSS: True
    
        LOSS_CONFIG:
          LOSS_WEIGHTS: {
            'cls_weight': 1.0,
            'loc_weight': 2.0,
            'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
          }
    
    
    
        POST_PROCESSING:
            RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
            SCORE_THRESH: 0.1
            OUTPUT_RAW_SCORE: False
            POST_CENTER_LIMIT_RANGE: [0, -40, -3, 75, 40, 1]
            MAX_OBJ_PER_SAMPLE: 500
    
            EVAL_METRIC: kitti
    
            NMS_CONFIG:
                MULTI_CLASSES_NMS: False
                NMS_TYPE: nms_gpu
                NMS_THRESH: 0.01
                NMS_PRE_MAXSIZE: 4096
                NMS_POST_MAXSIZE: 500
    
    
    OPTIMIZATION:
        BATCH_SIZE_PER_GPU: 1
        NUM_EPOCHS: 40
    
        OPTIMIZER: adam_onecycle
        LR: 0.001
        WEIGHT_DECAY: 0.01
        MOMENTUM: 0.9
    
        MOMS: [0.95, 0.85]
        PCT_START: 0.4
        DIV_FACTOR: 10
        DECAY_STEP_LIST: [35, 45]
        LR_DECAY: 0.1
        LR_CLIP: 0.0000001
    
        LR_WARMUP: False
        WARMUP_EPOCH: 1
    
        GRAD_NORM_CLIP: 10

@dinvincible98
Copy link
Author

dinvincible98 commented Nov 1, 2023

Very sorry for the late reply, I'm rushing some ddls. We haven't tried kitti dataset. You can see if issue59 will be helpful.

Yes, I checked this issue so I recalculate the voxel size. The sparse shape matched with pillar settings but the training will throw Nan or Inf error after multiple epochs

@evil-master
Copy link

很抱歉回复晚了,我正在赶一些ddls。我们还没有尝试过kitti数据集。您可以查看 issue59 是否有帮助。

是的,我检查了这个问题,所以我重新计算了体素大小。稀疏形状与柱子设置匹配,但训练会在多个 epoch 后抛出 Nan 或 Inf 错误
May I ask if you can adapt to the Kitti dataset by simply modifying the config file without modifying the network? Besides, why is your backbone_ 3D Don't need downsampling

@Haiyang-W
Copy link
Owner

Haiyang-W commented Nov 2, 2023

If anyone has succeeded on KITTI by modifying the config, please share the corresponding config and experimental results in this issue. We will be very grateful for your contribution to the community. :)

After I finish the CVPR deadline, I'll take a look when I have time. I guess this shouldn't be a very difficult problem.

@Haiyang-W
Copy link
Owner

很抱歉回复晚了,我正在赶一些ddls。我们还没有尝试过kitti数据集。您可以查看 issue59 是否有帮助。

是的,我检查了这个问题,所以我重新计算了体素大小。稀疏形状与柱子设置匹配,但训练会在多个 epoch 后抛出 Nan 或 Inf 错误
May I ask if you can adapt to the Kitti dataset by simply modifying the config file without modifying the network? Besides, why is your backbone_ 3D Don't need downsampling

I guess he use DSVT-pillar version.

@Haiyang-W Haiyang-W added the help wanted Extra attention is needed label Nov 2, 2023
@123susu
Copy link

123susu commented Nov 7, 2023

Very sorry for the late reply, I'm rushing some ddls. We haven't tried kitti dataset. You can see if issue59 will be helpful.

Yes, I checked this issue so I recalculate the voxel size. The sparse shape matched with pillar settings but the training will throw Nan or Inf error after multiple epochs

did you complete kitti config? l really need this,thanks!

@Haiyang-W
Copy link
Owner

Any update?

@dinvincible98
Copy link
Author

dinvincible98 commented Dec 8, 2023

I have a functional config for training kitti dataset:

CLASS_NAMES: ['Car', 'Pedestrian', 'Cyclist']
DATA_CONFIG: 
_BASE_CONFIG_: cfgs/dataset_configs/kitti_dataset.yaml
POINT_CLOUD_RANGE: [0, -39.68, -3, 69.12, 39.68, 1]

DATA_PROCESSOR:
    - NAME: mask_points_and_boxes_outside_range
      REMOVE_OUTSIDE_BOXES: True

    - NAME: shuffle_points
      SHUFFLE_ENABLED: {
        'train': True,
        'test': False
      }

    - NAME: transform_points_to_voxels_placeholder
      VOXEL_SIZE: [0.1477, 0.1696, 4]
      MAX_POINTS_PER_VOXEL: 32
      MAX_NUMBER_OF_VOXELS: {
       'train': 16000,
       'test': 40000
      }

DATA_AUGMENTOR:
    DISABLE_AUG_LIST: ['placeholder']
    AUG_CONFIG_LIST:
        - NAME: gt_sampling
          USE_ROAD_PLANE: True
          DB_INFO_PATH:
              - kitti_dbinfos_train.pkl
          PREPARE: {
             filter_by_min_points: ['Car:5', 'Pedestrian:5', 'Cyclist:5'],
             filter_by_difficulty: [-1],
          }

          SAMPLE_GROUPS: ['Car:15','Pedestrian:15', 'Cyclist:15']
          NUM_POINT_FEATURES: 4
          DATABASE_WITH_FAKELIDAR: False
          REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0]
          LIMIT_WHOLE_SCENE: False

        - NAME: random_world_flip
          ALONG_AXIS_LIST: ['x']

        - NAME: random_world_rotation
          WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]

        - NAME: random_world_scaling
          WORLD_SCALE_RANGE: [0.95, 1.05]
        
        - NAME: random_local_pyramid_aug
          DROP_PROB: 0.25
          SPARSIFY_PROB: 0.05
          SPARSIFY_MAX_NUM: 50
          SWAP_PROB: 0.1
          SWAP_MAX_NUM: 50
MODEL:
NAME: CenterPoint

VFE:
NAME: DynPillarVFE3D
WITH_DISTANCE: False
USE_ABSLOTE_XYZ: True
USE_NORM: True
NUM_FILTERS: [ 192, 192 ]

BACKBONE_3D:
NAME: DSVT
INPUT_LAYER:
  sparse_shape: [468, 468, 1]
  downsample_stride: []
  d_model: [192]
  set_info: [[36, 4]]
  window_shape: [[12, 12, 1]]
  hybrid_factor: [2, 2, 1] # x, y, z
  shifts_list: [[[0, 0, 0], [6, 6, 0]]]
  normalize_pos: False

block_name: ['DSVTBlock']
set_info: [[36, 4]]
d_model: [192]
nhead: [8]
dim_feedforward: [384]
dropout: 0.0
activation: gelu
output_shape: [468, 468]
conv_out_channel: 192
ues_checkpoint: True

MAP_TO_BEV:
NAME: PointPillarScatter3d
INPUT_SHAPE: [468, 468, 1]
NUM_BEV_FEATURES: 192

BACKBONE_2D:
NAME: BaseBEVResBackbone
LAYER_NUMS: [ 1, 2, 2 ]
LAYER_STRIDES: [ 1, 2, 2 ]
NUM_FILTERS: [ 128, 128, 256 ]
UPSAMPLE_STRIDES: [ 1, 2, 4 ]
NUM_UPSAMPLE_FILTERS: [ 128, 128, 128 ]

DENSE_HEAD:
NAME: CenterHead
CLASS_AGNOSTIC: False

CLASS_NAMES_EACH_HEAD: [
  ['Car', 'Pedestrian', 'Cyclist']
]

SHARED_CONV_CHANNEL: 64
USE_BIAS_BEFORE_NORM: False
NUM_HM_CONV: 2

BN_EPS: 0.001
BN_MOM: 0.01
SEPARATE_HEAD_CFG:
  HEAD_ORDER: ['center', 'center_z', 'dim', 'rot']
  HEAD_DICT: {
    'center': {'out_channels': 2, 'num_conv': 2},
    'center_z': {'out_channels': 1, 'num_conv': 2},
    'dim': {'out_channels': 3, 'num_conv': 2},
    'rot': {'out_channels': 2, 'num_conv': 2},
    'iou': {'out_channels': 1, 'num_conv': 2},
  }

TARGET_ASSIGNER_CONFIG:
  FEATURE_MAP_STRIDE: 1
  NUM_MAX_OBJS: 500
  GAUSSIAN_OVERLAP: 0.1
  MIN_RADIUS: 2
  # BOX_CODER: ResidualCoder

IOU_REG_LOSS: True

LOSS_CONFIG:
  LOSS_WEIGHTS: {
    'cls_weight': 1.0,
    'loc_weight': 2.0,
    'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
  }



POST_PROCESSING:
  # RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
  SCORE_THRESH: 0.1
  OUTPUT_RAW_SCORE: False
  POST_CENTER_LIMIT_RANGE: [0, -40, -3, 80, 40, 1]
  MAX_OBJ_PER_SAMPLE: 500

  # USE_IOU_TO_RECTIFY_SCORE: True
  # IOU_RECTIFIER: [0.5, 0.71, 0.65]

  NMS_CONFIG:
    MULTI_CLASSES_NMS: False
    NMS_TYPE: nms_gpu
    NMS_THRESH: 0.01
    NMS_PRE_MAXSIZE: 4096
    NMS_POST_MAXSIZE: 500

POST_PROCESSING:
RECALL_THRESH_LIST: [0.3, 0.5, 0.7]

EVAL_METRIC: kitti


OPTIMIZATION:
BATCH_SIZE_PER_GPU: 2
NUM_EPOCHS: 80

OPTIMIZER: adam_onecycle
LR: 0.001
WEIGHT_DECAY: 0.01
MOMENTUM: 0.9

MOMS: [0.95, 0.85]
PCT_START: 0.4
DIV_FACTOR: 10
DECAY_STEP_LIST: [35, 45]
LR_DECAY: 0.1
LR_CLIP: 0.0000001

LR_WARMUP: False
WARMUP_EPOCH: 1

GRAD_NORM_CLIP: 10
LOSS_SCALE_FP16: 32.0

And I got results below:

Generate label finished(sec_per_example: 0.0629 second).
recall_roi_0.3: 0.000000
recall_rcnn_0.3: 0.939800
recall_roi_0.5: 0.000000
recall_rcnn_0.5: 0.888598
recall_roi_0.7: 0.000000
recall_rcnn_0.7: 0.669268
Average predicted number of objects(3769 samples): 12.283

Car [email protected], 0.70, 0.70:
bbox AP:95.2198, 89.4526, 88.8744
bev  AP:89.3524, 87.4650, 86.4434
3d   AP:87.1649, 77.6970, 76.9884
aos  AP:95.20, 89.33, 88.69
Car [email protected], 0.70, 0.70:
bbox AP:97.0173, 94.1119, 91.8572
bev  AP:92.0481, 88.3356, 87.8141
3d   AP:87.8954, 80.9305, 78.6656
aos  AP:97.00, 93.96, 91.65
Car [email protected], 0.50, 0.50:
bbox AP:95.2198, 89.4526, 88.8744
bev  AP:95.2554, 89.6240, 89.1787
3d   AP:95.1996, 89.5775, 89.0910
aos  AP:95.20, 89.33, 88.69
Car [email protected], 0.50, 0.50:
bbox AP:97.0173, 94.1119, 91.8572
bev  AP:97.2790, 94.6162, 94.2047
3d   AP:97.2439, 94.5121, 94.0162
aos  AP:97.00, 93.96, 91.65
Pedestrian [email protected], 0.50, 0.50:
bbox AP:68.9796, 66.8907, 64.9734
bev  AP:58.0059, 55.0643, 52.5829
3d   AP:52.8691, 51.4204, 47.9159
aos  AP:64.75, 62.16, 59.99
Pedestrian [email protected], 0.50, 0.50:
bbox AP:69.8867, 67.2526, 64.8893
bev  AP:56.3069, 53.6334, 50.8015
3d   AP:51.9532, 49.1637, 45.9747
aos  AP:65.05, 62.00, 59.43
Pedestrian [email protected], 0.25, 0.25:
bbox AP:68.9796, 66.8907, 64.9734
bev  AP:75.4927, 73.8959, 71.9835
3d   AP:74.6756, 72.9769, 71.1328
aos  AP:64.75, 62.16, 59.99
Pedestrian [email protected], 0.25, 0.25:
bbox AP:69.8867, 67.2526, 64.8893
bev  AP:76.3137, 74.6665, 72.3919
3d   AP:75.3678, 73.5785, 71.5206
aos  AP:65.05, 62.00, 59.43
Cyclist [email protected], 0.50, 0.50:
bbox AP:88.9667, 77.5716, 74.3384
bev  AP:86.9765, 71.5262, 67.4459
3d   AP:85.9338, 69.3215, 66.2503
aos  AP:88.85, 77.08, 73.75
Cyclist [email protected], 0.50, 0.50:
bbox AP:93.4071, 78.7487, 75.0949
bev  AP:91.3305, 71.9404, 67.9715
3d   AP:88.3232, 69.5222, 66.1419
aos  AP:93.27, 78.19, 74.49
Cyclist [email protected], 0.25, 0.25:
bbox AP:88.9667, 77.5716, 74.3384
bev  AP:87.2510, 74.5043, 70.9506
3d   AP:87.2510, 74.5037, 70.9506
aos  AP:88.85, 77.08, 73.75
Cyclist [email protected], 0.25, 0.25:
bbox AP:93.4071, 78.7487, 75.0949
bev  AP:91.5098, 75.4892, 71.6936
3d   AP:91.5098, 75.4891, 71.6934
aos  AP:93.27, 78.19, 74.49

@Haiyang-W
Copy link
Owner

Haiyang-W commented Dec 8, 2023

Thanks for your contribution! Very Nice!

But I am not familiar with KiTTi, may I ask if this performance is acceptable?
Thanks! Looking forward your reply.

@Haiyang-W
Copy link
Owner

If this result turns out to be good, I will tag this issue to make it more accessible for those interested in running DSVT on KITTI.
Many thanks!

@dinvincible98
Copy link
Author

I adopted the pointpillar settings and it has a slightly better performance compared to the pointpillar. I trained the model with a sinlgle GPU so the performance might be furtherly improved with multi-gpu training I guess.

@Haiyang-W
Copy link
Owner

I adopted the pointpillar settings and it has a slightly better performance compared to the pointpillar. I trained the model with a sinlgle GPU so the performance might be furtherly improved with multi-gpu training I guess.

Perhaps some further adjustments can be made; DSVT performs much better on Waymo and NuScenes compared to PointPillar. At least, its performance on KITTI should be close to that of MsSVT.

@Haiyang-W Haiyang-W added good first issue Good for newcomers and removed help wanted Extra attention is needed labels Dec 9, 2023
@Haiyang-W
Copy link
Owner

Thank @dinvincible98 , it seems that this issue has been resolved to some extent. The issue will be closed.

Thank you all for your contributions and discussions. :)

@Haiyang-W Haiyang-W pinned this issue Jan 8, 2024
@evil-master
Copy link

作者你好,我成功配置了环境以及训练了kitti的数据,但是在转onnx模型时遇到了点问题,请问这个需要填写的是我生成数据集的pkl文件吗?在deploy.py中的path,我生成的文件是pkl
####### read input #######
batch_dict = torch.load("path to batch_dict.pth", map_location="cuda")
inputs = batch_dict

@evil-master
Copy link

作者你好,我成功配置了环境以及训练了kitti的数据,但是在转onnx模型时遇到了点问题,请问这个需要填写的是我生成数据集的pkl文件吗?在deploy.py中的path,我生成的文件是pkl ####### read input ####### batch_dict = torch.load("path to batch_dict.pth", map_location="cuda") inputs = batch_dict

我在readme里面找到了inputdict.pth的下载地址,载入我基于kitii训练的权重,但是显示的报错是
File "deploy.py", line 134, in
inputs = model.vfe(inputs)
File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/cjg/DSVT/pcdet/models/backbones_3d/vfe/dynamic_pillar_vfe.py", line 219, in forward
features = pfn(features, unq_inv)
File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/cjg/DSVT/pcdet/models/backbones_3d/vfe/dynamic_pillar_vfe.py", line 37, in forward
x = self.linear(inputs)
File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (61800x11 and 10x96)

@evil-master
Copy link

作者你好,我成功配置了环境以及训练了kitti的数据,但是在转onnx模型时遇到了点问题,请问这个需要填写的是我生成数据集的pkl文件吗?在deploy.py中的path,我生成的文件是pkl ####### read input ####### batch_dict = torch.load("path to batch_dict.pth", map_location="cuda") inputs = batch_dict

我在readme里面找到了inputdict.pth的下载地址,载入我基于kitii训练的权重,但是显示的报错是 File "deploy.py", line 134, in inputs = model.vfe(inputs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/user/cjg/DSVT/pcdet/models/backbones_3d/vfe/dynamic_pillar_vfe.py", line 219, in forward features = pfn(features, unq_inv) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/user/cjg/DSVT/pcdet/models/backbones_3d/vfe/dynamic_pillar_vfe.py", line 37, in forward x = self.linear(inputs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (61800x11 and 10x96)

作者你好,我成功配置了环境以及训练了kitti的数据,但是在转onnx模型时遇到了点问题,请问这个需要填写的是我生成数据集的pkl文件吗?在deploy.py中的path,我生成的文件是pkl ####### read input ####### batch_dict = torch.load("path to batch_dict.pth", map_location="cuda") inputs = batch_dict

我在readme里面找到了inputdict.pth的下载地址,载入我基于kitii训练的权重,但是显示的报错是 File "deploy.py", line 134, in inputs = model.vfe(inputs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/user/cjg/DSVT/pcdet/models/backbones_3d/vfe/dynamic_pillar_vfe.py", line 219, in forward features = pfn(features, unq_inv) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/user/cjg/DSVT/pcdet/models/backbones_3d/vfe/dynamic_pillar_vfe.py", line 37, in forward x = self.linear(inputs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (61800x11 and 10x96)

我这边查到问题了,提供的点云是有6个参数,而kitti数据只有5个,所以需要去掉最后一个维度就会可以了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants