Jiaben Chen

I am a first-year Ph.D. student in Computer Science at UMass Amherst, advised by Prof. Chuang Gan. My primary research interest lies in multi-modality learning and video synthesis.

I received my master's degree in Computer Science at CSE of University of California, San Diego, where I was mentored by Prof. Xiaolong Wang. Starting from the summer of 2021, I had the privilege of working with Prof. Jianbo Shi at University of Pennsylvania as a research intern.

Before graduate study, I received my bachelor's degree in Computer Science and Technology at SIST of ShanghaiTech University. Throughout my academic journey, I have been fortunate to collaborate with Prof. Huaizu Jiang, Prof. Laurent Kneip, Prof. Shenghua Gao, and Prof. Xia Hu.

Email  /  Google Scholar  /  Github  /  Twitter  /  LinkedIn

profile photo

Publications

* indicates equal contributions.

clean-usnob RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Jiaben Chen, Xin Yan, Yihang Chen, Siyuan Cen, Qinwei Ma, Haoyu Zhen, Kaizhi Qian, Lie Lu, and Chuang Gan
arXiv Preprint, 2024
project page / paper / code

In this paper, we introduce a challenging task for simultaneously generating 3D holistic body motions and singing vocals directly from textual lyrics inputs. To facilitate this, we first collect the RapVerse dataset, a large dataset containing synchronous rapping vocals, lyrics, and high-quality 3D holistic body meshes.

clean-usnob SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation
Jiaben Chen, and Huaizu Jiang
Computer Vision and Pattern Recognition Conference (CVPR), 2024
project page / paper / code

In this paper, we introduce SportsSloMo, a benchmark consisting of more than 130K video clips and 1M video frames of high-resolution (≥720p) slow-motion sports videos, for human-centric video frame interpolation.

clean-usnob Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction
Xinhang Liu, Jiaben Chen, Shiu-Hong Kao, Yu-Wing Tai, and Chi-Keung Tang
European Conference on Computer Vision (ECCV), 2024
project page / paper /

In this work, we enhance sparse-view reconstruction by leveraging a diffusion model pre-trained from multiview datasets to synthesize pseudo-observations.

clean-usnob RoboDreamer: Learning Compositional World Models for Robot Imagination
Siyuan Zhou, Yilun Du, Jiaben Chen, Yandong Li, Dit-Yan Yeung, and Chuang Gan
International Conference on Machine Learning (ICML), 2024
project page / paper / code

In this paper, we introduce RoboDreamer, an innovative approach for learning a compositional world model by factorizing the video generation.

clean-usnob UniMuMo: Unified Text, Music and Motion Generation
Han Yang, Kun Su, Yutong Zhang, Jiaben Chen, Kaizhi Qian, Gaowen Liu, and Chuang Gan
arXiv Preprint, 2024
project page / paper /

In this paper, we introduce UniMuMo, a unified multimodal model capable of taking arbitrary text, music, and motion data as input conditions to generate outputs across all three modalities.

clean-usnob Revisiting Event-based Video Frame Interpolation
Jiaben Chen, Yichen Zhu, Dongze Lian, Jiaqi Yang, Yifu Wang, Renrui Zhang, Xinhang Liu, Shenhan Qian, Laurent Kneip, and Shenghua Gao
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023
project page / paper / video

In this paper, we revist event-based video frame interpolation with a proxy-guided synthesis strategy and a event-guided optical flow refinement strategy.

clean-usnob iQuery: Instruments as Queries for Audio-Visual Sound Separation
Jiaben Chen, Renrui Zhang, Dongze Lian, Jiaqi Yang, Ziyao Zeng, and Jianbo Shi
Computer Vision and Pattern Recognition Conference (CVPR), 2023
project page / paper / arXiv / video / code

In this paper, we re-formulate visual-sound separation task and propose Instrument as Query (iQuery) with a flexible query expansion mechanism.

clean-usnob Unsupervised Multi-View Object Segmentation Using Radiance Field Propagation
Xinhang Liu, Jiaben Chen, Huai Yu, Yu-Wing Tai, and Chi-Keung Tang
Neural Information Processing Systems (NeurIPS), 2022
project page / paper / code / data

In this paper, we propose radiance field propagation (RFP), a novel approach to segment objects in 3D during reconstruction given only unlabeled multi-view images of a scene.

clean-usnob DEVO: Visual Odometry in Challenging Conditions using a Stereo Event Depth Camera
Yi-Fan Zuo*, Jiaqi Yang*, Jiaben Chen, Xia Wang, Yifu Wang, and Laurent Kneip
International Conference on Robotics and Automation (ICRA), 2022
paper

In this paper, we proposed a novel real-time visual odometry framework for a stereo setup of a high-resolution event and depth camera to deal with challenging conditions.

clean-usnob AutoVideo: An Automated Video Action Recognition System
Daochen Zha*, Zaid Pervaiz Bhat*, Yi-Wei Chen*, Yicheng Wang*, Sirui Ding*, Jiaben Chen*, Kwei-Herng Lai*, Mohammad Qazim Bhat*, Anmoll Kumar Jain, Alfredo Costilla Reyes, Na Zou, and Xia Hu
International Joint Conference on Artificial Intelligence (IJCAI), 2022
paper / video / code

In this paper, we presented AutoVideo, a Python system for video action recognition based on Automated Machine Learning.

clean-usnob VECtor: A Versatile Event-Centric Benchmark for Multi-Sensor SLAM
Ling Gao*, Yuxuan Liang*, Jiaqi Yang*, Shaoxun Wu, Chenyu Wang, Jiaben Chen, and Laurent Kneip
Robotics and Automation Letters (RA-L), 2022
International Conference on Intelligent Robots and Systems (IROS), 2022
paper / benchmark

In this paper, we proposed the first complete multi-sensor benchmark dataset containing an event-based stereo camera, a regular stereo camera, multiple depth sensors, and an inertial measurement unit.

Miscellanea

Conference Reviewer: ECCV 2022, IROS 2022/2023, AAAI 2024/2025, NeurIPS 2024, ICLR 2025, CVPR 2025.
Personal Interests:
  • I am a huge fan of Stephen Curry.
  • In my spare time, I enjoy playing basketball, FIFA and Valorant.

Last update: Oct, 2024


Design and source code from Jon Barron's website.