Haorui He — Homepage

About Me 👀

I am a Ph.D. candidate in Computer Science at The University of Hong Kong (HKU). I am a Senior Research Assistant at Hong Kong Baptist University (HKBU), supervised by Prof. Francis C.M. Lau, Prof. Reynold C.K. Cheng, and Prof. Ivan Y.P. Li. Previously, I worked as a Research Intern under Prof. Zhizheng Wu at the Chinese University of Hong Kong (CUHK), Shenzhen and the Shanghai AI Laboratory, and as a MITACS Research Intern under Prof. Zhen Ming (Jack) Jiang at York University (YorkU). I hold a B.Eng. in Software Engineering from Nanjing University of Posts and Telecommunications (NJUPT), where I received the Outstanding Bachelor’s Thesis Award.

I am the creator of Emilia, a leading dataset for expressive and spontaneous text-to-speech (TTS) synthesis, and its preprocessing pipeline, Emilia-Pipe. Emilia has surpassed 1 million downloads from over 1k institutions and companies worldwide, including Stanford, CMU, OpenAI, Google, and NVIDIA. It is the “most liked dataset” in the audio category on HuggingFace and serves as a foundational training dataset for state-of-the-art TTS models like F5-TTS, MaskGCT, IndexTTS-2, ZipVoice, as well as Speech LLMs such as Kimi-Audio, VITA-Audio, and Ming-Omni.

🔬 Research Focus

My current research interests revolve around Social Computing and Large Language Models (LLMs), where I aim to leverage LLM agents to address critical societal challenges such as misinformation and fake news.

Milestones 🎉

2026/03: Our project, an AI-powered fact-checking-as-a-service, where I am the Tech Lead, won the International Press Prize at the Salon International des Inventions de Genève (Special Award, Top 1)! See the official results, and the HKSAR Government press release. 🏆
2026/01: Our paper DebateCV was accepted by the Web4good special track of WWW 2026 (95/437, Top 21.7%)! 🎉
2026/01: A Chinese blog by QingKe AI for Fact2Fiction reached 100k views on Weibo! 🔥
2025/12: Our paper DebateCV won the Best Poster Award at ICSC 2025! 🏆
2025/11: Our paper Fact2Fiction was accepted for an oral presentation at AAAI 2026 (1,105/23,680, Top 4.7%)! 🎉
2025/10: Our paper Noro was selected as a Best Paper Finalist at APSIPA (8/581, Top 1.4%)! 🏆
2025/09: Our paper Emilia was accepted by TASLP after a year! 🎉
2025/08: Our paper Fact2Fiction about attacking fact-checking systems is now available online! 🤗
2025/07: Our paper DebateCV about debate-driven claim verification is now available online! 🤗
2025/03: Emilia has become the “most liked dataset” in the audio category on HuggingFace. 🏆
2024/12: Our paper SpMis was selected as a Best Paper Finalist at SLT 2024 (9/373, Top 2.5%)! 🏆
2024/10: Thrilled to begin my CS Ph.D. journey at HKU! A real dream come true! 🎉
2024/09: Within two months, Emilia received more than 50k downloads from over 700 prestigious research institutions, including Stanford, CMU, OpenAI, Google, and NVIDIA. 🔥
2024/08: Our papers Amphion, Emilia, and SpMis were accepted by SLT 2024! 🎉
2024/07: We released Emilia, the first multilingual in-the-wild dataset for speech generation with over 101k hours of speech data! We also made the Emilia-Pipe preprocessing pipeline open-source! 🔥
2024/02: Our paper MCFEND about robust fake news detection was accepted for an oral presentation at WWW 2024! 🎉
2023/12: My first attempt at being a core member in a large-scale open-source project, Amphion. 🤗
2023/09: Our paper CTSDT about stance detection in conversational threads was accepted by ICDM 2023 as a regular long paper! 🎉
2023/09: Our paper Branch-BERT about conversational stance detection was accepted by TCSS! 🎉

Selected Publications 📖

† denotes corresponding author.

📚 Full Publication List

[1] Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking Systems

Haorui He, Yupeng Li, Bin Benjamin Zhu, Dacheng Wen, Reynold Cheng, and Francis C. M. Lau.

AAAI 2026 (Oral Presentation, Top 4.7%) •

[2] Debating Truth: Debate-driven Claim Verification with Multiple Large Language Model Agents

Haorui He, Yupeng Li, Dacheng Wen, Chen Yang, Reynold Cheng, Donglong Chen, and Francis C. M. Lau.

WWW 2026 (Web4good Special Track, Top 21.7%, ICSC 2025 Best Poster Award) •

[3] MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection

Yupeng Li, Haorui He^†, Jin Bai, and Dacheng Wen.

WWW 2024 (Oral Presentation, Top 9.4%) •

[4] Contextual Target-Specific Stance Detection on Twitter: New Dataset and Method

Yupeng Li, Dacheng Wen, Haorui He, Jianxiong Guo, Xuan Ning, and Francis C. M. Lau.

ICDM 2023 (Regular Long Paper, Top 9.4%) •

[5] Improved Target-specific Stance Detection on Social Media Platform by Delving into Conversation Threads

Yupeng Li, Haorui He^†, Shaonan Wang, Francis C.M. Lau, and Yunya Song.

IEEE Transactions on Computational Social Systems (TCSS) 2023 •

Speech Processing 🎤

[6] Emilia: A Large-Scale Extensive, Multilingual, and Diverse Speech Dataset for Speech Generation

Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, and Zhizheng Wu.

IEEE Transactions on Audio, Speech and Language Processing (TASLP) 2025 •

[7] Over-the-Air Adversarial Attacks and Detection for Automatic Speaker Verification

Li Wang, Xiaoyan Lei, Haorui He, Lei Wang, Jie Shi, and Zhizheng Wu.

IEEE Transactions on Audio, Speech and Language Processing (TASLP) 2025 •

[8] Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning

Haorui He, Yuchen Song, Yuancheng Wang, Haoyang Li, Xueyao Zhang, Li Wang, Gongping Huang, Eng Siong Chng, and Zhizheng Wu.

APSIPA 2025 (Best Paper Finalist, Top 1.4%) •

[9] Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Xueyao Zhang, Liumeng Xue, Yicheng Gu, Yuancheng Wang, Haorui He, Chaoren Wang, Xi Chen, Zihao Fang, Haopeng Chen, Junan Zhang, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, and Zhizheng Wu.

SLT 2024 •

[10] SpMis: An Investigation of Synthetic Spoken Misinformation Detection

Peizhuo Liu, Li Wang, Renqiang He, Haorui He, Lei Wang, Huadi Zheng, Jie Shi, Tong Xiao, and Zhizheng Wu.

SLT 2024 (Best Paper Finalist, Top 2.5%) •

Research Experience 💼

Aug. 2023 -- Aug. 2024: Research Intern, supervised by Prof. Zhizheng Wu at CUHK-Shenzhen & Shanghai AI Laboratory.

Sep. 2022 -- Dec. 2022: Research Intern, supervised by Prof. Zhen Ming (Jack) Jiang at YorkU.

June 2021 -- Aug. 2023: Research Intern, supervised by Prof. Francis C.M. Lau at HKU & Prof. Ivan Y.P. Li at HKBU.

Teaching 📚

Teaching Assistant: FITE7409, Blockchain and Cryptocurrency, Postgraduate Course, School of Computing and Data Science, HKU. (Course Instructor: Prof. Tsz Hon Yuen)

Awards 🏆

Salon International des Inventions de Genève - International Press Prize, 2026 (Special Award, Top 1)
AAAI - Student Scholarship, 2026 ($1,000 USD)
AAAI - Oral Presentation, 2026 (1,105/23,680, Top 4.7%)
ICSC - Best Poster Award, 2025 (Top 2)
APSIPA - Best Paper Finalist, 2025 (8/581, Top 1.4%)
SLT - Best Paper Finalist, 2024 (9/373, Top 2.5%)
WWW - Oral Presentation, 2024 (188/2008, Top 9.4%)
NJUPT - Outstanding Bachelor’s Thesis Award, 2023 (2/162, Top 1.2%)
MITACS - Globalink Research Internship Scholarship, 2022 (Top 200 in China)

Services 🌻

Membership: IEEE Graduate Student Member, ACM Member, AAAI Student Member
Invited Reviewer: WWW 2026, AAAI 2026, ARR (ACL/EMNLP/NAACL) 2025, IJCAI 2025-2026, ICASSP 2025-2026, ICME 2025, MM 2023-2024, IEEE Journal of Social Computing, etc.
Student Volunteer: SLT 2024, ICDE 2025, AAAI 2026

🌍 Visitor Map

"Facts do not cease to exist because they are ignored." -- Aldous Huxley

Last updated: Mar. 2026