Zhenzhi Wang (王臻郅)

I am currently a third-year Ph.D. candidate at MMLab in Department of Information Engineering, CUHK, advised by Prof. Dahua Lin. Before that, I received my Master's degree from Nanjing University in 2022, supervised by Prof. Limin Wang, and my Bachelor's degree also from Nanjing University in 2019.

My research area is computer vision, especially human-centric motion/video generation. My previous research interest was video understanding and action recognition. I am open to discussions and collaborations about research, please feel free to send me emails.

Email  /  Google Scholar  /  Twitter  /  Github

profile photo

News

• 09/2024 HumanVid and InterControl are accepted by NeurIPS 2024 D&B Track and Main Track, respectively.
• 07/2024 We release HumanVid dataset's paper and homepage for camera-controllable human image animation. Code and data are in this repo.
• 12/2023 Me and my friend are reproducing Animate Anyone in this github repo.
• 11/2023 One paper about controllable human motion generation is on arxiv. Code is also available.
• 07/2023 One paper is accepted by ICCV 2023 (5th-author).
• 06/2023 I got CVPR 2023 Outstanding Reviewer.
• 08/2022 Coming to MMLab, CUHK.
• 11/2021 One paper is accepted by AAAI 2022 (1st author).
• 08/2021 One workshop paper is accepted by ACM MM 2021 (1st author).
• 07/2021 One paper is accepted by ICCV 2021 (4th-author).
• 06/2021 We get the 1st place in the HC-STVG track of PIC Workshop at CVPR 2021.
• 04/2021 I am a co-organizer of DeeperAction Workshop at ICCV 2021.
• 06/2020 One paper is accepted by ECCV 2020 (1st-author).

Research

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Youqing Fang, Yuwei Guo, Wenran Liu, Jing Tan, Kai Chen, Tianfan Xue, Bo Dai, Dahua Lin.
NeurIPS D&B Track, 2024
arXiv / project page / code

We propose HumanVid, a dataset for the camera-controllable human image animation task, and a baseline method for generating cinematic-quality video clips. As a by-product, our approach enables reproducing existing methods like Animate Anyone, alleviating the difficulty of static-camera video collection.

InterControl: Generate Human Motion Interactions by Controlling Every Joint
Zhenzhi Wang, Jingbo Wang, Yixuan Li, Dahua Lin, Bo Dai.
NeurIPS, 2024
arXiv / code

We could generate human motion interactions with spatially controllable MDM that is only trained on single-person data.

MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond
Yixuan Li, Lihan Jiang, Linning Xu,Yuanbo Xiangli, Zhenzhi Wang, Dahua Lin, Bo Dai.
ICCV, 2023
arXiv / project page

A large scale synthetic dataset from Unreal Engine 5 for city-scale NeRF rendering.

Negative sample matters: A renaissance of metric learning for temporal grounding
Zhenzhi Wang, Limin Wang, Tao Wu, Tianhao Li, Gangshan Wu.
AAAI, 2022
arXiv / code

Boost the performance of temporal grounding with contrastive learning by leveraging more negative samples.

MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
Yixuan Li, Lei Chen, Runyu He, Zhenzhi Wang, Gangshan Wu, Limin Wang.
ICCV, 2021
arXiv / code

A fine-grained and large-scale spatial-temporal action detection dataset with 4 different sports, 66 action categories, 3200 video clips, and annotating 37701 action instances with 902k bounding boxes.

Boundary-aware cascade networks for temporal action segmentation
Zhenzhi Wang, Ziteng Gao, Limin Wang, Gangshan Wu.
ECCV, 2020
paper / code

We leverage two complementary modules to boost action segmentation performance: (1) stage cascade for boosting segmentation accuracy for hard frames (e.g., near action boundaries); and (2) local barrier pooling utilizing boundary information for smoother predictions and less over-segmentation errors.

Professional Services

• Conference reviewer for CVPR (2024, 2023 (outstanding reviewer), 2022), NeurIPS (2024), ECCV (2024, 2022), ICCV (2023), WACV (2023).
• Journal reviewer for Pattern Recognition, IEEE TNNLS, TCSVT, Neurocomputing.
• Co-organizer of DeeperAction Workshop at ICCV 2021, and ACM-MM 2021 Grand Challenge Multi-modal ads video understanding.




Thanks Jon Barron for sharing the source code of this website template.