Hi, thanks for stopping by! I am now a second-year Ph.D. Student at The University of North Carolina at Chapel Hill, advised by Prof. Mohit Bansal. Previously, I did my undergraduate study at Shanghai Jiao Tong University.
While at UNC, I spent my summer time at Amazon Alexa (2023). Prior to UNC, I did research at SenseTime (2021), MIT-IBM Watson AI Lab (2021).
I am interested in wide topics in computer vision, especially in video, including video+X (language, audio, robotics), video understanding, generation, reasoning, representation learning.
ðĨ News
- 2024.01: ðŽ I will intern at Adobe as Research Intern for Summer 2024.
- 2023.09: âïļ We have one paper accepted to NeurIPS 2023. Check SeViLA for Video Loc+QA.
- 2023.07: ðĶī We have one paper accepted to IEEE TCSVT. Check MoPRL for skeletal anomaly detection.
- 2023.05: ð I will intern at Amazon as Research Scientist Intern for Summer 2023.
- 2022.06: ð Graduate from Shanghai Jiao Tong University! (excellent graduates).
- 2022.04: âŠïļ I will join UNC-CH MURGe-Lab in Fall 2022.
- 2021.10: ð We have one paper accepted to NeurIPS 2021. Check STAR for real-world situated reasoning.
ð Pre-print
CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion
Shoubin Yu*, Jaehong Yoon*, Mohit Bansal
- We present CREMA, an efficient & modular modality-fusion framework for injecting any new modality into video reasoning.
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius
- We present LLoVi, a simple yet effective framework with LLM for long-range video question-answering.
ð Publications
Self-Chained Image-Language Model for Video Localization and Question Answering
Shoubin Yu, Jaemin Cho, Prateek Yadav, Mohit Bansal
- We propose SeViLA, which self-chained BLIP-2 for 2-stage video question-answering (localization + QA) & refine localization with QA feedback.
Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection
Shoubin Yu, Zhongyin Zhao, Haoshu Fang, Andong Deng,Haisheng Su, Dongliang Wang, Weihao Gan, Cewu Lu, Wei Wu
- We propose MoPRL, a transformer-based model incorporated with skeletal motion prior for efficient video anomaly detection.
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan
- We propose STAR, a benchmark for neural-symbolic video reasoning in real-world scenes.
ð Honors and Awards
- CN Patent CN114724062A, 2022
- The Hui-Chun Chin and Tsung Dao Lee Scholar, 2020
- CN Patent CN110969107A, 2019
- Meritorious Award in Mathematical Contest in Modeling, 2019
- Second Prize in Shanghai, China Undergraduate Mathematical Contest in Modeling, 2019
ð§ Service
- Conference reviewer: CVPR 2024, ACL 2023, EACL 2023, CoNLL 2023, CVPR 2023 Workshop, AAAI 2023 Workshop
- Journal reviewer: IEEE Transactions on Circuits and Systems for Video Technology
ð Educations
- 2022.09 - Present
- The University of North Carolina at Chapel Hill
- Computer Science, Ph.D.
- 2017.09 - 2022.06
- Shanghai Jiao Tong University
- Information Security, B.Eng.
ðŧ Internships
- 2023.05 - 2023.11, Research Scientist Intern
- work with Jocob Zhiyuan Fang, Robinson Piramuthu
- 2021.01 - 2022.04, Research Intern
- work with Haisheng Su, Wei Wu