Haorui He — Homepage

About Me 👀
I am a second-year Ph.D. student in Computer Science at The University of Hong Kong (HKU). I am a Senior Research Assistant at Hong Kong Baptist University (HKBU), supervised by Prof. Francis C.M. Lau, Prof. Reynold C.K. Cheng, and Prof. Ivan Y.P. Li.
Previously, I worked as a Research Intern under Prof. Zhizheng Wu at the Chinese University of Hong Kong (CUHK), Shenzhen and the Shanghai AI Laboratory, and as a MITACS Research Intern under Prof. Zhen Ming (Jack) Jiang at York University (YorkU). I hold a B.Eng. in Software Engineering from Nanjing University of Posts and Telecommunications (NJUPT), where I received the Outstanding Bachelor’s Thesis Award.
I am the creator of Emilia, a leading dataset for expressive and spontaneous text-to-speech (TTS) synthesis, and its preprocessing pipeline, Emilia-Pipe. Emilia has surpassed 1 million downloads from over 1k institutions and companies worldwide, including Stanford, CMU, OpenAI, Google, and NVIDIA. It is the “most liked dataset” in the audio category on HuggingFace and serves as a foundational training dataset for state-of-the-art TTS models like F5-TTS, MaskGCT, IndexTTS-2, ZipVoice, as well as Speech LLMs such as Kimi-Audio, VITA-Audio, and Ming-Omni.
🔬 Research Focus
My current research interests revolve around Social Computing and Large Language Models (LLMs), where I aim to leverage LLM agents to address critical societal challenges such as misinformation and fake news.
Milestones 🎉
- 2026/01: Our paper DebateCV was accepted by the Web4good special track of WWW 2026 (95/437, Top 21.7%)! 🎉
- 2026/01: A Chinese blog by QingKe AI for Fact2Fiction reached 100k views on Weibo! 🔥
- 2025/12: Our paper DebateCV won the Best Poster Award at ICSC 2025! 🏆
- 2025/11: Our paper Fact2Fiction was accepted for an oral presentation at AAAI 2026 (1,105/23,680, Top 4.7%)! 🎉
- 2025/10: Our paper Noro was selected as a Best Paper Finalist at APSIPA (8/581, Top 1.4%)! 🏆
- 2025/09: Our paper Emilia was accepted by TASLP after a year! 🎉
- 2025/08: Our paper Fact2Fiction about attacking fact-checking systems is now available online! 🤗
- 2025/07: Our paper DebateCV about debate-driven claim verification is now available online! 🤗
- 2025/03: Emilia has become the “most liked dataset” in the audio category on HuggingFace. 🏆
- 2024/12: Our paper SpMis was selected as a Best Paper Finalist at SLT 2024 (9/373, Top 2.5%)! 🏆
- 2024/10: Thrilled to begin my CS Ph.D. journey at HKU! A real dream come true! 🎉
- 2024/09: Within two months, Emilia received more than 50k downloads from over 700 prestigious research institutions, including Stanford, CMU, OpenAI, Google, and NVIDIA. 🔥
- 2024/08: Our papers Amphion, Emilia, and SpMis were accepted by SLT 2024! 🎉
- 2024/07: We released Emilia, the first multilingual in-the-wild dataset for speech generation with over 101k hours of speech data! We also made the Emilia-Pipe preprocessing pipeline open-source! 🔥
- 2024/02: Our paper MCFEND about robust fake news detection was accepted for an oral presentation at WWW 2024! 🎉
- 2023/12: My first attempt at being a core member in a large-scale open-source project, Amphion. 🤗
- 2023/09: Our paper CTSDT about stance detection in conversational threads was accepted by ICDM 2023 as a regular long paper! 🎉
- 2023/09: Our paper Branch-BERT about conversational stance detection was accepted by TCSS! 🎉
Selected Publications 📖
Social Computing 🗞️
[2]
Debating Truth: Debate-driven Claim Verification with Multiple Large Language Model Agents
Haorui He, Yupeng Li, Dacheng Wen, Chen Yang, Reynold Cheng, Donglong Chen, and Francis C. M. Lau.
WWW 2026 (Web4good Special Track, Top 21.7%, ICSC 2025 Best Poster Award)
•
[3]
MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection
Yupeng Li, Haorui He†, Jin Bai, and Dacheng Wen.
WWW 2024 (Oral Presentation, Top 9.4%)
•
[4]
Contextual Target-Specific Stance Detection on Twitter: New Dataset and Method
Yupeng Li, Dacheng Wen, Haorui He, Jianxiong Guo, Xuan Ning, and Francis C. M. Lau.
ICDM 2023 (Regular Long Paper, Top 9.4%)
•
[5]
Improved Target-specific Stance Detection on Social Media Platform by Delving into Conversation Threads
Yupeng Li, Haorui He†, Shaonan Wang, Francis C.M. Lau, and Yunya Song.
IEEE Transactions on Computational Social Systems (TCSS) 2023
•
Speech Processing 🎤
[6]
Emilia: A Large-Scale Extensive, Multilingual, and Diverse Speech Dataset for Speech Generation
Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, and Zhizheng Wu.
IEEE Transactions on Audio, Speech and Language Processing (TASLP) 2025
•
[7]
Over-the-Air Adversarial Attacks and Detection for Automatic Speaker Verification
Li Wang, Xiaoyan Lei, Haorui He, Lei Wang, Jie Shi, and Zhizheng Wu.
IEEE Transactions on Audio, Speech and Language Processing (TASLP) 2025
•
[8]
Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning
Haorui He, Yuchen Song, Yuancheng Wang, Haoyang Li, Xueyao Zhang, Li Wang, Gongping Huang, Eng Siong Chng, and Zhizheng Wu.
APSIPA 2025 (Best Paper Finalist, Top 1.4%)
•
[9]
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang, Liumeng Xue, Yicheng Gu, Yuancheng Wang, Haorui He, Chaoren Wang, Xi Chen, Zihao Fang, Haopeng Chen, Junan Zhang, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, and Zhizheng Wu.
[10]
SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Peizhuo Liu, Li Wang, Renqiang He, Haorui He, Lei Wang, Huadi Zheng, Jie Shi, Tong Xiao, and Zhizheng Wu.
SLT 2024 (Best Paper Finalist, Top 2.5%)
•
Research Experience 💼
Aug. 2023 -- Aug. 2024: Research Intern, supervised by
Prof. Zhizheng Wu at CUHK-Shenzhen & Shanghai AI Laboratory.
Teaching 📚
- Teaching Assistant: FITE7409, Blockchain and Cryptocurrency, Postgraduate Course, School of Computing and Data Science, HKU. (Course Instructor: Prof. Tsz Hon Yuen)
Awards 🏆
- AAAI Student Scholarship, 2026 ($1,000 USD)
- AAAI Oral Presentation, 2026 (1,105/23,680, Top 4.7%)
- ICSC Best Poster Award, 2025 (Top 2)
- APSIPA Best Paper Finalist, 2025 (8/581, Top 1.4%)
- SLT Best Paper Finalist, 2024 (9/373, Top 2.5%)
- WWW Oral Presentation, 2024 (188/2008, Top 9.4%)
- Outstanding Bachelor’s Thesis Award, 2023 (2/162, Top 1.2%)
- MITACS Globalink Research Internship Scholarship, 2022 (Top 200 in China)
Services 🌻
- Membership: IEEE Graduate Student Member, ACM Member, AAAI Student Member
- Invited Reviewer: WWW 2026, AAAI 2026, ARR (ACL/EMNLP/NAACL) 2025, IJCAI 2025-2026, ICASSP 2025-2026, ICME 2025, MM 2023-2024, IEEE Journal of Social Computing, etc.
- Student Volunteer: SLT 2024, ICDE 2025, AAAI 2026
🌍 Visitor Map
"Facts do not cease to exist because they are ignored." -- Aldous Huxley
Last updated: Jan. 2026