MetaPose

This repo is the official implementation for MetaPose: Multimodal Enhancement and Transformation Alignment for Accurate 3D Human Pose Estimation.

Demo Video

Introduction

Abstract. We propose MetaPose, a novel framework that leverages Multimodal Enhancement and Transformation Alignment for accurate single-frame 3D human pose estimation. MetaPose integrates spatial, visual, and semantic information by leveraging two innovative modules: Spatial-Driven Modality Enhancement and Semantic-Driven Modality Enhancement. The Spatial-Driven module utilizes 2D keypoints to guide deformable attention sampling on image features, effectively capturing positional and depth cues. The Semantic-Driven module encodes joint description using a Text Encoder, applying iterative deformable attention to capture structural relationships between body joints through language guidance. To unify these enhanced features, we propose a Transformation Alignment module that aligns different modalities into a unified distribution via learned normal distributions and cross-attention mechanisms. This unified pose representation is then lifted to 3D pose. Extensive experiments demonstrate that MetaPose achieves state-of-the-art performance on multiple benchmarks, validating the effectiveness of our approach.

Previous lifting models typically included only 2D (a) joint coordinates, with some approaches integrating (b) image features. Our method combines (a) joint coordinates, (b) image features, and joint (c) text features to strengthen multimodal integration and improve pose estimation performance.

Environment

The code is developed and tested under the following environment.

Python 3.8.19
PyTorch 1.12.1
CUDA 11.3

cd metapose
conda create --name metapose --file conda_requirements.txt
conda activate metapose
pip install -r requirements.txt

Human3.6M

Preparation

Please refer to H36M-Toolbox and CA-PF to prepare the dataset images, as we follow the same processing procedure. All images should be placed in the images/ directory, while h36m_train.pkl and h36m_validation.pkl should be stored in the data/ directory.

Note: In our implementation, we directly saved the cropped video frames. If you wish to use the full video frames, certain modifications to the code will be required.
Download (COCO) pretrained weights for HRNet-32/48 and place them under data/pretrained/coco/
After the above steps, your folder should look like this:

root/
├── data/
│   ├── h36m_train.pkl
│   ├── h36m_validation.pkl
│   └── pretrained/
│       └── coco/
│           ├── pose_hrnet_w32_256x192.pth
│           ├── pose_hrnet_w48_256x192.pth
│           └── README.MD
└── images/
        ├── s_01_act_02_subact_01_ca_01/
        │   ├── s_01_act_02_subact_01_ca_01_000001.jpg
        │   ├── ...
        │   └── s_01_act_02_subact_01_ca_01_001384.jpg
        ├── s_01_act_02_subact_01_ca_02/
        ├── ...
        └── s_11_act_16_subact_02_ca_04/
├── experiments/
├── datasets/
├── models/
│   ├── ...
│   ├── metapose.py
├── utils/
├── logs/
├── figure/
├── ...
└── train.py

Train

You can train MetaPose with the following commands:

# Single GPU
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port=2345 train.py --config experiments/human36m/human36m.yaml --image_encoder hrnet_32 --text_encoder ViT-B/32 --logdir ./logs

# Multiple GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=2345 train.py --config experiments/human36m/human36m.yaml --image_encoder hrnet_32 --text_encoder ViT-B/32  --logdir ./logs

Test

Place the pre-trained model weights under checkpoint/, and run:

# Single GPU
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port=2345 train.py --config experiments/human36m/human36m.yaml --image_encoder hrnet_32 --text_encoder ViT-B/32 --logdir ./logs --eval

# Multiple GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=2345 train.py --config experiments/human36m/human36m.yaml --image_encoder hrnet_32 --text_encoder ViT-B/32 --logdir ./logs --eval

MPI-INF-3DHP

Coming Soon...

Acknowledgment

Our code refers to the following repositories.

We thank the authors for releasing their codes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MetaPose

Demo Video

Introduction

Environment

Human3.6M

Preparation

Train

Test

MPI-INF-3DHP

Acknowledgment

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
common		common
data		data
datasets		datasets
experiments/human36m		experiments/human36m
figure		figure
image		image
lib		lib
models		models
utils		utils
README.md		README.md
conda_requirements.txt		conda_requirements.txt
prompt.txt		prompt.txt
requirements.txt		requirements.txt
run.sh		run.sh
train.py		train.py

Visual-Pose-Lab/MetaPose

Folders and files

Latest commit

History

Repository files navigation

MetaPose

Demo Video

Introduction

Environment

Human3.6M

Preparation

Train

Test

MPI-INF-3DHP

Acknowledgment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages