Haorui He
(何昊睿, Harry)

Logo

CS Ph.D. student at HKU

View My GitHub Profile

Haorui He’s Homepage

Email Github X Google Scholar


About Me 👀

I am a second-year Ph.D. student in Computer Science at The University of Hong Kong (HKU) and a Senior Research Assistant at Hong Kong Baptist University (HKBU), jointly supervised by Prof. Francis C.M. Lau, Prof. Reynold C.K. Cheng, and Prof. Ivan Y.P. Li. Previously, I worked as a Research Assistant under Prof. Zhizheng Wu at the Chinese University of Hong Kong (CUHK), Shenzhen and the Shanghai AI Laboratory, and as a Mitacs Research Intern under Prof. Zhen Ming (Jack) Jiang at York University (YorkU). I hold a B.Eng. in Software Engineering from Nanjing University of Posts and Telecommunications (NJUPT), where I received the Outstanding Bachelor’s Thesis Award.

I am the creator of Emilia, a leading dataset for expressive and spontaneous text-to-speech (TTS) synthesis, and its preprocessing pipeline, Emilia-Pipe. Emilia has surpassed 1 Million downloads from over 1,000 institutions and companies worldwide, including Stanford, CMU, OpenAI, Google, and NVIDIA. It is the “most liked dataset” in the audio category on HuggingFace and serves as a foundational training dataset for state-of-the-art TTS models like F5-TTS, MaskGCT, IndexTTS-2, ZipVoice, as well as Speech LLMs such as Kimi-Audio, VITA-Audio, and Ming-Omni.

🔬 Research Focus

My current research interests revolve around Social Computing and Large Language Models (LLMs), where I aim to leverage LLM agents to address critical societal challenges such as misinformation and fake news.

The University of Hong Kong Hong Kong Baptist University

Milestones 🎉


Selected Publications 📖

* denotes equal contribution, † denotes corresponding.
📚 Full Publication List

Social Computing 🗞️

[1] Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking Systems
Haorui He, Yupeng Li, Bin Benjamin Zhu, Dacheng Wen, Reynold Cheng, and Francis C. M. Lau.
AAAI 2026 (Oral Presentation, Top 4.67%) GitHub arXiv Website RedNote
[2] Debating Truth: Debate-driven Claim Verification with Multiple Large Language Model Agents
Haorui He, Yupeng Li, Dacheng Wen, Chen Yang, Reynold Cheng, Donglong Chen, and Francis C. M. Lau.
In Submission arXiv
[3] LightUL: An Efficient Recommendation Unlearning Framework
Wentao Ning, Haorui He, Reynold Cheng, Nur Al Hasan Haldar, Ben Kao, Nan Huo, Bo Tang, and Yupeng Li.
Guide-AI @ VLDB 2025 GitHub Paper
[4] MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection
Yupeng Li, Haorui He, Jin Bai, and Dacheng Wen.
WWW 2024 (Oral Presentation, Top 9.40%) GitHub Paper Website
[5] Contextual Target-Specific Stance Detection on Twitter: New Dataset and Method
Yupeng Li, Dacheng Wen, Haorui He, and Francis C. M. Lau.
ICDM 2023 (Regular Long Paper, Top 9.37%) GitHub Paper
[6] Improved Target-specific Stance Detection on Social Media Platform by Delving into Conversation Threads
Yupeng Li, Haorui He, Shaonan Wang, Francis C.M. Lau, and Yunya Song.
IEEE Transactions on Computational Social Systems (TCSS) 2023 GitHub Paper

Speech Processing 🎤

[7] Emilia: A Large-Scale Extensive, Multilingual, and Diverse Speech Dataset for Speech Generation
Haorui He*, Zengqiang Shang*, Chaoren Wang*, Xuyuan Li*, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, and Zhizheng Wu.
IEEE Transactions on Audio, Speech and Language Processing (TASLP) 2025 GitHub Paper HuggingFace Demo Tweet
[8] Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang*, Liumeng Xue*, Yicheng Gu*, Yuancheng Wang*, Haorui He, Chaoren Wang, Xi Chen, Zihao Fang, Haopeng Chen, Junan Zhang, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, and Zhizheng Wu.
SLT 2024 GitHub Paper HuggingFace Tweet
[9] SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Peizhuo Liu, Li Wang, Renqiang He, Haorui He, Lei Wang, Huadi Zheng, Jie Shi, Tong Xiao, and Zhizheng Wu.
SLT 2024 (Best Paper Finalist, Top 2.5%) Paper
[10] Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning
Haorui He, Yuchen Song, Yuancheng Wang, Haoyang Li, Xueyao Zhang, Li Wang, Gongping Huang, Eng Siong Chng, and Zhizheng Wu.
APSIPA ASC 2025 (Best Paper Finalist, Top 1.4%) arXiv

Previous Experience 💼

CUHK-Shenzhen Shanghai AI Laboratory

Aug. 2023 -- Aug. 2024: Research Intern, supervised by Prof. Zhizheng Wu at CUHK-Shenzhen & Shanghai AI Laboratory.

York University

Sep. 2022 -- Dec. 2022: MITACS-CSC Joint Globalink Research Intern, supervised by Prof. Zhen Ming (Jack) Jiang at YorkU. (Thanks to CSC and Mitacs for kindly sponsoring my internship).

The University of Hong Kong Hong Kong Baptist University

June 2021 -- Aug. 2023: Research Intern, supervised by Prof. Francis C.M. Lau at HKU & Prof. Ivan Y.P. Li at HKBU.

Nanjing University of Posts and Telecommunications

Sep. 2019 -- June 2023: Bachelor of Engineering in Software Engineering (GPA: 90.07/100), NJUPT.


Teaching 📚


Awards 🏆


Services 🌻


🌍 Visitor Map

"Facts do not cease to exist because they are ignored." -- Aldous Huxley

Last updated: Nov. 2025