🎓 About Me
I am Shidong Cao, a Ph.D. student at Hong Kong Baptist University in HKBUNLP, advised by Prof. Jing Ma. My research interests include multimodal learning, large language models, computer vision, and generative AI.
I have also worked closely with Hongzhan Lin and Wenhao Chai, and have learned a lot from our research discussions.
Education
- 2025.09 - Present, Ph.D. student, HKBUNLP, Hong Kong Baptist University. Advisor: Prof. Jing Ma.
- 2022.09 - 2025.03, M.S., CVNext Lab, Zhejiang University. Advisor: Prof. Gaoang Wang.
- 2018.09 - 2022.06, B.Eng., School of Computer Science, Beijing University of Posts and Telecommunications.
🔥 News
- 2026.02: DiffCoT was accepted to Findings of ACL 2026.
- 2025.10: The sports deep learning survey appeared in TVCG 2025.
- 2025.04: MTransLLAMA was published in TMM 2025.
- 2024.09: STEVE appeared at ECCV 2024.
- 2024.02: UniAP appeared at AAAI 2024.
- 2024.02: DiffFashion was published in TMM 2024.
📝 Publications

DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs
Shidong Cao, Hongzhan Lin, Yuxuan Gu, Ziyang Luo, Jing Ma
Published in Findings of the Association for Computational Linguistics: ACL, 2026
[Paper] [Code]
- A diffusion-styled framework for improving chain-of-thought reasoning in large language models.

A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision
Zhonghan Zhao, Wenhao Chai, Shengyu Hao, Wenhao Hu, Guanhong Wang, Shidong Cao, Mingli Song, Jenq-Neng Hwang, Gaoang Wang
Published in IEEE Transactions on Visualization and Computer Graphics, 2025
[Paper]
- A survey of deep learning applications in sports across perception, comprehension, and decision-making.

Efficient Transfer From Image-Based Large Multimodal Models to Video Tasks
Shidong Cao, Zhonghan Zhao, Shengyu Hao, Wenhao Chai, Jenq-Neng Hwang, Hongwei Wang, Gaoang Wang
Published in IEEE Transactions on Multimedia, 2025
[Paper]
- An efficient transfer approach for adapting image-based large multimodal models to fine-grained video tasks.

See and Think: Embodied Agent in Virtual Environment
Zhonghan Zhao, Wenhao Chai, Xuan Wang, Boyi Li, Shengyu Hao, Shidong Cao, Tian Ye, Gaoang Wang
Published in European Conference on Computer Vision, 2024
[Paper] [Code] [Project]
- An embodied agent framework that combines visual perception, language instruction, and executable code actions in Minecraft.

DiffFashion: Reference-Based Fashion Design With Structure-Aware Transfer by Diffusion Models
Shidong Cao, Wenhao Chai, Shengyu Hao, Yanting Zhang, Hangyue Chen, Gaoang Wang
Published in IEEE Transactions on Multimedia, 2024
[Paper] [Code]
- A diffusion model-based method for reference-based fashion design with structure-aware transfer.

UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning
Meiqi Sun, Zhonghan Zhao, Wenhao Chai, Hanjun Luo, Shidong Cao, Yanting Zhang, Jenq-Neng Hwang, Gaoang Wang
Published in AAAI Conference on Artificial Intelligence, 2024
[Paper] [Code] [Project]
- A universal animal perception model for few-shot pose estimation, segmentation, and classification.

AI Assisted Fashion Design: A Review
Ziyue Guo, Zongyang Zhu, Yizhi Li, Shidong Cao, Hangyue Chen, Gaoang Wang
Published in IEEE Access, 2023
[Paper]
- A review of artificial intelligence techniques for fashion detection, synthesis, and recommendation.

Image Reference-guided Fashion Design with Structure-aware Transfer by Diffusion Models
Shidong Cao, Wenhao Chai, Shengyu Hao, Gaoang Wang
Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023
[Paper] [Code]
- An unsupervised structure-aware diffusion framework for reference-guided fashion design.
Reviewers
I have served as a reviewer for CVIU, ICASSP, NeurIPS, TNNLS, and TMM.