Haorui He
(何昊睿, Harry)

Logo

CS Ph.D. student at HKU

View My GitHub Profile

Haorui He’s Homepage

Email Github X Google Scholar


About Me 👀

I am a second year Ph.D. student in Computer Science at The University of Hong Kong (HKU) and a Research Assistant at Hong Kong Baptist University (HKBU), jointly supervised by Prof. Francis C.M. Lau, Prof. Reynold C.K. Cheng, and Prof. Yupeng Li. Previously, I worked as a Research Assistant under Prof. Zhizheng Wu at the Chinese University of Hong Kong (CUHK), Shenzhen and the Shanghai AI Laboratory, and as a Mitacs Research Intern under Prof. Zhen Ming (Jack) Jiang at York University (YorkU). I hold a B.Eng. in Software Engineering from Nanjing University of Posts and Telecommunications (NJUPT), where I received the Outstanding Bachelor’s Thesis Award.

I am the creator of Emilia, a leading dataset for expressive and spontaneous text-to-speech (TTS) synthesis, and its preprocessing pipeline, Emilia-Pipe. As of May 2025, Emilia has surpassed 500,000 downloads by over 1,000 institutions and companies, including Stanford, CMU, OpenAI, Google, and NVIDIA. It is the “most liked dataset” in the audio category on HuggingFace and serves as a foundational training dataset for state-of-the-art TTS models like F5-TTS, MaskGCT, IndexTTS-2, ZipVoice, as well as audio LM such as Kimi-Audio, VITA-Audio, and Ming-Omni.

🔬 Research Focus

My current research interests revolve around Social Computing and Large Language Models (LLMs), where I aim to leverage LLM agents to address critical societal challenges such as misinformation and fake news.

The University of Hong Kong Hong Kong Baptist University

Milestones 🎉


Selected Publication 📖

* denotes equal contribution, † denotes corresponding.
📚 Full Publication List

Social Computing 🗞️

[1] Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System
Haorui He, Yupeng Li, Bin Benjamin Zhu, Dacheng Wen, Reynold Cheng, and Francis C. M. Lau.
In Submission arXiv
[2] Debating Truth: Debate-driven Claim Verification with Multiple Large Language Model Agents
Haorui He, Yupeng Li, Dacheng Wen, Reynold Cheng, and Francis C. M. Lau.
In Submission arXiv
[3] LightUL: An Efficient Recommendation Unlearning Framework
Wentao Ning, Haorui He, Reynold Cheng, Nur Al Hasan Haldar, Ben Kao, Nan Huo, Bo Tang, and Yupeng Li.
Guide-AI Workshop @ VLDB 2025 GitHub Paper
[4] MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection
Yupeng Li, Haorui He, Jin Bai, and Dacheng Wen.
ACM WWW 2024 (Oral Presentation, Top 9.4%) GitHub Paper Website
[5] Contextual Target-Specific Stance Detection on Twitter: New Dataset and Method
Yupeng Li, Dacheng Wen, Haorui He, and Francis C. M. Lau.
IEEE ICDM 2023 (Regular Long Paper, Top 9.37%) GitHub Paper
[6] Improved Target-specific Stance Detection on Social Media Platform by Delving into Conversation Threads
Yupeng Li, Haorui He, Shaonan Wang, Francis C.M. Lau, and Yunya Song.
IEEE TCSS 2023 GitHub Paper

Speech Processing 🎤

[7] Emilia: An Large-Scale Extensive, Multilingual, and Diverse Speech Dataset for Speech Generation
Haorui He*, Zengqiang Shang*, Chaoren Wang*, Xuyuan Li*, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, and Zhizheng Wu.
IEEE SLT 2024 GitHub Paper HuggingFace Demo Tweet
Extended Version (In Submission) arXiv Tweet
[8] Overview of the Amphion Toolkit (v0.2)
Jiaqi Li*, Xueyao Zhang*, Yuancheng Wang*, Haorui He*, Chaoren Wang*, Li Wang*, Huan Liao*, Junyi Ao*, Zeyu Xie*, Yiqiao Huang*, Junan Zhang*, Zhizheng Wu
Technical Report arXiv Website
[9] Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang*, Liumeng Xue*, Yicheng Gu*, Yuancheng Wang*, Haorui He, Chaoren Wang, Xi Chen, Zihao Fang, Haopeng Chen, Junan Zhang, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, and Zhizheng Wu.
IEEE SLT 2024 GitHub Paper HuggingFace Tweet
[10] SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Peizhuo Liu, Li Wang, Renqiang He, Haorui He, Lei Wang, Huadi Zheng, Jie Shi, Tong Xiao, Zhizheng Wu.
IEEE SLT 2024 (Best Paper Finalist, Top 2.5%) Paper

Previous Experience 💼

CUHK-Shenzhen Shanghai AI Laboratory

Aug. 2023 -- Aug. 2024: Research Intern, supervised by Prof. Zhizheng Wu at CUHK-Shenzhen & Shanghai AI Laboratory.

York University

Sep. 2022 -- Dec. 2022: MITACS-CSC Joint Globalink Research Intern, supervised by Prof. Zhen Ming (Jack) Jiang at YorkU. (Thanks to CSC and Mitacs for kindly sponsoring my internship).

The University of Hong Kong Hong Kong Baptist University

June 2021 -- Aug. 2023: Research Intern, supervised by Prof. Francis C.M. Lau at HKU & Prof. Yupeng Li at HKBU.

Nanjing University of Posts and Telecommunications

Sep. 2019 -- June 2023: Bachelor of Engineering in Software Engineering (GPA: 90.07/100), NJUPT.


Teaching 📚


Awards 🏆


Services 🌻


🌍 Visitor Map

"Facts do not cease to exist because they are ignored." -- Aldous Huxley

Last updated: August 2025