Hi, thanks for stopping by! I am now a second-year Ph.D. Student at The University of North Carolina at Chapel Hill, advised by Prof. Mohit Bansal. Previously, I did my undergraduate study at Shanghai Jiao Tong University.
While at UNC, I spent my summer time at Adobe Research (2024), Amazon Alexa (2023). Prior to UNC, I did research projects at SenseTime Research (2021), and with MIT-IBM Watson AI Lab (2021).
I am interested in wide topics in computer vision, especially in video, including video+X (language, audio, robotics) understanding & generation, trustworthy video reasoning, and robust video representation learning.
Find me here: shoubin -atsign- cs . unc . edu
ðĨ News
- 2024.07: ðđ One paper accepted to ACM MM 2024. Check IVA-0 for controllable image animation.
- 2024.06: ðŽ Gave an invited talk at Google.
- 2024.05: ðŽ Start summer intern at Adobe as Research Scientist.
- 2023.09: âïļ One paper accepted to NeurIPS 2023. Check SeViLA for Video Loc+QA.
- 2023.07: ðĶī One paper accepted to IEEE TCSVT. Check MoPRL for skeletal anomaly detection.
- 2023.05: ð Start summer intern at Amazon as Research Scientist.
- 2022.09: âŠïļ Join UNC-CH MURGe-Lab .
- 2022.06: ð Graduate from Shanghai Jiao Tong University (excellent graduates).
- 2021.10: ð One paper accepted to NeurIPS 2021. Check STAR for real-world situated reasoning.
ð Pre-print (*: equal contribution/co-first author)
![sym](images/videotree.jpg)
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Ziyang Wang*, Shoubin Yu*, Elias Stengel-Eskin*, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal
- We present VideoTree, an adaptive tree-based video presentation/prompting with simple visual clusturing for long video reasoning with LLM.
![sym](images/raccoon.jpg)
RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives
Jaehong Yoon*, Shoubin Yu*, Mohit Bansal
- We present RACCooN, a versatile and user-friendly video-to-paragraph-to-video framework, enables users to remove, add, or change video content via updating auto-generated narratives.
![sym](images/crema.jpg)
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu*, Jaehong Yoon*, Mohit Bansal
- We present CREMA, an efficient & modular modality-fusion framework for injecting any new modality into video reasoning.
![sym](images/llovi.jpg)
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius
- We present LLoVi, a simple yet effective framework with LLM for long-range video question-answering.
ð Publications
![sym](images/iva0.jpg)
Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition
Shoubin Yu, Jacob Zhiyuan Fang, Skyler Zheng, Gunnar Sigurdsson, Vicente Ordonez, Robinson Piramuthu, Mohit Bansal
- We present IVA-0, a Image-to-Video animationor, enables precise control from users through in-place and out-of-place motion decomposition.
![sym](images/sevila.jpg)
Self-Chained Image-Language Model for Video Localization and Question Answering
Shoubin Yu, Jaemin Cho, Prateek Yadav, Mohit Bansal
- We propose SeViLA, which self-chained BLIP-2 for 2-stage video question-answering (localization + QA) & refine localization with QA feedback.
![sym](images/moprl.jpg)
Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection
Shoubin Yu, Zhongyin Zhao, Haoshu Fang, Andong Deng,Haisheng Su, Dongliang Wang, Weihao Gan, Cewu Lu, Wei Wu
- We propose MoPRL, a transformer-based model incorporated with skeletal motion prior for efficient video anomaly detection.
![sym](images/star.jpg)
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan
- We propose STAR, a benchmark for neural-symbolic video reasoning in real-world scenes.
ð Honors and Awards
- CN Patent CN114724062A, 2022
- The Hui-Chun Chin and Tsung Dao Lee Scholar, 2020
- CN Patent CN110969107A, 2019
- Meritorious Award in Mathematical Contest in Modeling, 2019
- Second Prize in Shanghai, China Undergraduate Mathematical Contest in Modeling, 2019
ð§ Service
- Conference reviewer: CVPR, ECCV, NeurIPS, ACL, EACL, CoNLL, AAAI
- Journal reviewer: IEEE Transactions on Circuits and Systems for Video Technology
ð Educations
![sym](images/unc_logo.png)
- 2022.09 - Present
- The University of North Carolina at Chapel Hill
- Computer Science, Ph.D.
![sym](images/sjtu_logo.png)
- 2017.09 - 2022.06
- Shanghai Jiao Tong University
- Information Security, B.Eng.
ðŧ Internships
![sym](images/amazon.png)
- 2023.05 - 2023.11, Research Scientist Intern
- work with Jocob Zhiyuan Fang, Robinson Piramuthu
![sym](images/sensetime.png)
- 2021.01 - 2022.04, Research Intern
- work with Haisheng Su, Wei Wu