My research area is computer vision, especially human-centric motion/video generation. My previous research interest was video understanding and action recognition. I am open to discussions and collaborations about research, please feel free to send me emails.
• 09/2024 HumanVid and InterControl are accepted by NeurIPS 2024 D&B Track and Main Track, respectively.
• 07/2024 We release HumanVid dataset's paper and homepage for camera-controllable human image animation. Code and data are in this repo.
• 12/2023 Me and my friend are reproducing Animate Anyone in this github repo.
• 11/2023 One paper about controllable human motion generation is on arxiv. Code is also available.
• 07/2023 One paper is accepted by ICCV 2023 (5th-author).
• 06/2023 I got CVPR 2023 Outstanding Reviewer.
• 08/2022 Coming to MMLab, CUHK.
• 11/2021 One paper is accepted by AAAI 2022 (1st author).
• 08/2021 One workshop paper is accepted by ACM MM 2021 (1st author).
• 07/2021 One paper is accepted by ICCV 2021 (4th-author).
• 06/2021 We get the 1st place in the HC-STVG track of PIC Workshop at CVPR 2021.
• 04/2021 I am a co-organizer of DeeperAction Workshop at ICCV 2021.
• 06/2020 One paper is accepted by ECCV 2020 (1st-author).
We propose HumanVid, a dataset for the camera-controllable human image animation task, and a baseline method for generating cinematic-quality video clips. As a by-product, our approach enables reproducing existing methods like Animate Anyone, alleviating the difficulty of static-camera video collection.
A large scale synthetic dataset from Unreal Engine 5 for city-scale NeRF rendering.
Negative sample matters: A renaissance of metric learning for temporal grounding
Zhenzhi Wang, Limin Wang, Tao Wu, Tianhao Li, Gangshan Wu.
AAAI, 2022
arXiv /
code
Boost the performance of temporal grounding with contrastive learning by leveraging more negative samples.
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
Yixuan Li, Lei Chen, Runyu He, Zhenzhi Wang, Gangshan Wu, Limin Wang.
ICCV, 2021
arXiv /
code
A fine-grained and large-scale spatial-temporal action detection dataset with 4 different sports, 66 action categories, 3200 video clips, and annotating 37701 action instances with 902k bounding boxes.
Boundary-aware cascade networks for temporal action segmentation
Zhenzhi Wang, Ziteng Gao, Limin Wang, Gangshan Wu.
ECCV, 2020
paper /
code
We leverage two complementary modules to boost action segmentation performance: (1) stage cascade for boosting segmentation accuracy for hard frames (e.g., near action boundaries); and (2) local barrier pooling utilizing boundary information for smoother predictions and less over-segmentation errors.
Professional Services
• Conference reviewer for CVPR (2024, 2023 (outstanding reviewer), 2022), NeurIPS (2024), ECCV (2024, 2022), ICCV (2023), WACV (2023).
• Journal reviewer for Pattern Recognition, IEEE TNNLS, TCSVT, Neurocomputing.
• Co-organizer of DeeperAction Workshop at ICCV 2021, and ACM-MM 2021 Grand Challenge Multi-modal ads video understanding.