🎓 About Me

I am Shidong Cao, a Ph.D. student at Hong Kong Baptist University in HKBUNLP, advised by Prof. Jing Ma. My research interests include multimodal learning, large language models, computer vision, and generative AI.

I have also worked closely with Hongzhan Lin and Wenhao Chai, and have learned a lot from our research discussions.

Education

  • 2025.09 - Present, Ph.D. student, HKBUNLP, Hong Kong Baptist University. Advisor: Prof. Jing Ma.
  • 2022.09 - 2025.03, M.S., CVNext Lab, Zhejiang University. Advisor: Prof. Gaoang Wang.
  • 2018.09 - 2022.06, B.Eng., School of Computer Science, Beijing University of Posts and Telecommunications.

🔥 News

  • 2026.02: DiffCoT was accepted to Findings of ACL 2026.
  • 2025.10: The sports deep learning survey appeared in TVCG 2025.
  • 2025.04: MTransLLAMA was published in TMM 2025.
  • 2024.09: STEVE appeared at ECCV 2024.
  • 2024.02: UniAP appeared at AAAI 2024.
  • 2024.02: DiffFashion was published in TMM 2024.

📝 Publications

ACL Findings 2026
DiffCoT

DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs

Shidong Cao, Hongzhan Lin, Yuxuan Gu, Ziyang Luo, Jing Ma

Published in Findings of the Association for Computational Linguistics: ACL, 2026
[Paper] [Code]

  • A diffusion-styled framework for improving chain-of-thought reasoning in large language models.
TVCG 2025
Sports survey

A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision

Zhonghan Zhao, Wenhao Chai, Shengyu Hao, Wenhao Hu, Guanhong Wang, Shidong Cao, Mingli Song, Jenq-Neng Hwang, Gaoang Wang

Published in IEEE Transactions on Visualization and Computer Graphics, 2025
[Paper]

  • A survey of deep learning applications in sports across perception, comprehension, and decision-making.
TMM 2025
MTransLLAMA

Efficient Transfer From Image-Based Large Multimodal Models to Video Tasks

Shidong Cao, Zhonghan Zhao, Shengyu Hao, Wenhao Chai, Jenq-Neng Hwang, Hongwei Wang, Gaoang Wang

Published in IEEE Transactions on Multimedia, 2025
[Paper]

  • An efficient transfer approach for adapting image-based large multimodal models to fine-grained video tasks.
ECCV 2024
STEVE

See and Think: Embodied Agent in Virtual Environment

Zhonghan Zhao, Wenhao Chai, Xuan Wang, Boyi Li, Shengyu Hao, Shidong Cao, Tian Ye, Gaoang Wang

Published in European Conference on Computer Vision, 2024
[Paper] [Code] [Project]

  • An embodied agent framework that combines visual perception, language instruction, and executable code actions in Minecraft.
TMM 2024
DiffFashion TMM

DiffFashion: Reference-Based Fashion Design With Structure-Aware Transfer by Diffusion Models

Shidong Cao, Wenhao Chai, Shengyu Hao, Yanting Zhang, Hangyue Chen, Gaoang Wang

Published in IEEE Transactions on Multimedia, 2024
[Paper] [Code]

  • A diffusion model-based method for reference-based fashion design with structure-aware transfer.
AAAI 2024
UniAP

UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning

Meiqi Sun, Zhonghan Zhao, Wenhao Chai, Hanjun Luo, Shidong Cao, Yanting Zhang, Jenq-Neng Hwang, Gaoang Wang

Published in AAAI Conference on Artificial Intelligence, 2024
[Paper] [Code] [Project]

  • A universal animal perception model for few-shot pose estimation, segmentation, and classification.
IEEE Access 2023
AI Assisted Fashion Design

AI Assisted Fashion Design: A Review

Ziyue Guo, Zongyang Zhu, Yizhi Li, Shidong Cao, Hangyue Chen, Gaoang Wang

Published in IEEE Access, 2023
[Paper]

  • A review of artificial intelligence techniques for fashion detection, synthesis, and recommendation.
CVPRW 2023
DiffFashion CVPRW

Image Reference-guided Fashion Design with Structure-aware Transfer by Diffusion Models

Shidong Cao, Wenhao Chai, Shengyu Hao, Gaoang Wang

Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023
[Paper] [Code]

  • An unsupervised structure-aware diffusion framework for reference-guided fashion design.

Reviewers

I have served as a reviewer for CVIU, ICASSP, NeurIPS, TNNLS, and TMM.