Publications
* indicates equal contributions.
|
|
RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Jiaben Chen,
Xin Yan,
Yihang Chen,
Siyuan Cen,
Qinwei Ma,
Haoyu Zhen,
Kaizhi Qian,
Lie Lu,
and Chuang Gan
arXiv Preprint, 2024
project page /
paper /
code
In this paper, we introduce a challenging task for simultaneously generating 3D holistic body motions and singing vocals directly from textual lyrics inputs. To facilitate this, we first collect the RapVerse dataset, a large dataset containing synchronous rapping vocals, lyrics, and high-quality 3D holistic body meshes.
|
|
SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation
Jiaben Chen, and Huaizu Jiang
Computer Vision and Pattern Recognition Conference (CVPR), 2024
project page /
paper /
code
In this paper, we introduce SportsSloMo, a benchmark consisting of more than 130K video clips and 1M video frames of high-resolution (≥720p) slow-motion sports videos, for human-centric video frame interpolation.
|
|
Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction
Xinhang Liu,
Jiaben Chen,
Shiu-Hong Kao,
Yu-Wing Tai,
and Chi-Keung Tang
European Conference on Computer Vision (ECCV), 2024
project page /
paper /
In this work, we enhance sparse-view reconstruction by leveraging a diffusion model pre-trained from multiview datasets to synthesize pseudo-observations.
|
|
RoboDreamer: Learning Compositional World Models for Robot Imagination
Siyuan Zhou,
Yilun Du,
Jiaben Chen,
Yandong Li,
Dit-Yan Yeung,
and Chuang Gan
International Conference on Machine Learning (ICML), 2024
project page /
paper /
code
In this paper, we introduce RoboDreamer, an innovative approach for learning a compositional world model by factorizing the video generation.
|
|
UniMuMo: Unified Text, Music and Motion Generation
Han Yang,
Kun Su,
Yutong Zhang,
Jiaben Chen,
Kaizhi Qian,
Gaowen Liu,
and Chuang Gan
Annual AAAI Conference on Artificial Intelligence (AAAI), 2025
project page /
paper /
In this paper, we introduce UniMuMo, a unified multimodal model capable of taking arbitrary text, music, and motion data as input conditions to generate outputs across all three modalities.
|
|
Revisiting Event-based Video Frame Interpolation
Jiaben Chen,
Yichen Zhu,
Dongze Lian,
Jiaqi Yang,
Yifu Wang,
Renrui Zhang,
Xinhang Liu,
Shenhan Qian,
Laurent Kneip,
and Shenghua Gao
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023
project page /
paper /
video
In this paper, we revist event-based video frame interpolation with a proxy-guided synthesis strategy and a event-guided optical flow refinement strategy.
|
|
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Jiaben Chen,
Renrui Zhang,
Dongze Lian,
Jiaqi Yang,
Ziyao Zeng,
and Jianbo Shi
Computer Vision and Pattern Recognition Conference (CVPR), 2023
project page /
paper /
arXiv /
video /
code
In this paper, we re-formulate visual-sound separation task and propose Instrument as Query (iQuery) with a flexible query expansion mechanism.
|
|
Unsupervised Multi-View Object Segmentation Using Radiance Field Propagation
Xinhang Liu,
Jiaben Chen,
Huai Yu,
Yu-Wing Tai,
and Chi-Keung Tang
Neural Information Processing Systems (NeurIPS), 2022
project page /
paper /
code /
data
In this paper, we propose radiance field propagation (RFP), a novel approach to segment objects in 3D during reconstruction given only unlabeled multi-view images of a scene.
|
|
DEVO: Visual Odometry in Challenging Conditions using a Stereo Event Depth Camera
Yi-Fan Zuo*,
Jiaqi Yang*,
Jiaben Chen,
Xia Wang,
Yifu Wang,
and Laurent Kneip
International Conference on Robotics and Automation (ICRA), 2022
paper
In this paper, we proposed a novel real-time visual odometry framework for a stereo setup of a high-resolution event and depth camera to deal with challenging conditions.
|
|
AutoVideo: An Automated Video Action Recognition System
Daochen Zha*,
Zaid Pervaiz Bhat*,
Yi-Wei Chen*,
Yicheng Wang*,
Sirui Ding*,
Jiaben Chen*,
Kwei-Herng Lai*,
Mohammad Qazim Bhat*,
Anmoll Kumar Jain,
Alfredo Costilla Reyes,
Na Zou,
and Xia Hu
International Joint Conference on Artificial Intelligence (IJCAI), 2022
paper /
video /
code
In this paper, we presented AutoVideo, a Python system for video action recognition based on Automated Machine Learning.
|
|
VECtor: A Versatile Event-Centric Benchmark for Multi-Sensor SLAM
Ling Gao*,
Yuxuan Liang*,
Jiaqi Yang*,
Shaoxun Wu,
Chenyu Wang,
Jiaben Chen,
and Laurent Kneip
Robotics and Automation Letters (RA-L), 2022
International Conference on Intelligent Robots and Systems (IROS), 2022
paper /
benchmark
In this paper, we proposed the first complete multi-sensor benchmark dataset containing an event-based stereo camera, a regular stereo camera, multiple depth sensors, and an inertial measurement unit.
|
Miscellanea
|
Conference Reviewer: ECCV 2022, IROS 2022/2023, AAAI 2024/2025, NeurIPS 2024, ICLR 2025, CVPR 2025.
|
Personal Interests:
- I am a huge fan of Stephen Curry.
- In my spare time, I enjoy playing basketball, FIFA and Valorant.
|
|