Haorui He’s Homepage

About Me 👀
I am a second year Ph.D. student in Computer Science at The University of Hong Kong (HKU) and a Research Assistant at Hong Kong Baptist University (HKBU), jointly supervised by Prof. Francis C.M. Lau, Prof. Reynold C.K. Cheng, and Prof. Yupeng Li. Previously, I worked as a Research Assistant under Prof. Zhizheng Wu at the Chinese University of Hong Kong (CUHK), Shenzhen and the Shanghai AI Laboratory, and as a Mitacs Research Intern under Prof. Zhen Ming (Jack) Jiang at York University (YorkU). I hold a B.Eng. in Software Engineering from Nanjing University of Posts and Telecommunications (NJUPT), where I received the Outstanding Bachelor’s Thesis Award.
I am the creator of Emilia, a leading dataset for expressive and spontaneous text-to-speech (TTS) synthesis, and its preprocessing pipeline, Emilia-Pipe. As of May 2025, Emilia has surpassed 500,000 downloads by over 1,000 institutions and companies, including Stanford, CMU, OpenAI, Google, and NVIDIA. It is the “most liked dataset” in the audio category on HuggingFace and serves as a foundational training dataset for state-of-the-art TTS models like F5-TTS, MaskGCT, IndexTTS-2, ZipVoice, as well as audio LM such as Kimi-Audio, VITA-Audio, and Ming-Omni.
🔬 Research Focus
My current research interests revolve around Social Computing and Large Language Models (LLMs), where I aim to leverage LLM agents to address critical societal challenges such as misinformation and fake news.
Milestones 🎉
- 2025/08: Our paper Fact2Fiction about attacking fact-checking systems is now available online! 🎯
- 2025/07: Our paper DebateCV about debate-driven claim verification is now available online! 🔍
- 2025/07: Our paper LightUL about efficient recommendation unlearning got accepted by Guide-AI Workshop @ VLDB 2025! 🚀
- 2025/03: Emilia just became the “most liked dataset” in the audio category on HuggingFace! 🏆
- 2025/01: Our work Extended version of Emilia and Amphion v0.2 Technical Report is now online! 🌻
- 2024/12: Our paper SpMis was selected for the Best Paper Finalists at IEEE SLT 2024 (9/373, Top 2.5%)! Congratulations to Peizhuo Liu and Li Wang! 🤗
- 2024/10: Thrilled to begin my CS Ph.D. journey at HKU! A real dream come true! 🎉
- 2024/09: Within two month, Emilia received more than 50k downloads from over 700 prestigious research institutions, including Stanford, CMU, OpenAI, Google, and NVIDIA. 🔥
- 2024/08: Our papers Amphion, Emilia, and SpMis got accepted by IEEE SLT 2024! 🤗
- 2024/07: We release Emilia, the first multilingual in-the-wild dataset for speech generation with over 101k hours of speech data! We also made the Emilia-Pipe preprocessing pipeline open-source! 🔥
- 2023/12: My first attempt at being a core member in a large-scale open-source project, Amphion. 🎤
- 2024/02: Our paper MCFEND about robust fake news detection got accepted for an oral presentation at ACM WWW 2024! 🤗
- 2023/09: Our paper CTSDT about stance detection in conversational threads got accepted by IEEE ICDM 2023 as a regular long paper! 🤗
- 2023/09: Our paper Branch-BERT about conversational stance detection got accepted by IEEE TCSS! 🤗
Selected Publication 📖
Social Computing 🗞️
[1]
Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System
Haorui He, Yupeng Li, Bin Benjamin Zhu, Dacheng Wen, Reynold Cheng, and Francis C. M. Lau.
In Submission
•
[2]
Debating Truth: Debate-driven Claim Verification with Multiple Large Language Model Agents
Haorui He, Yupeng Li, Dacheng Wen, Reynold Cheng, and Francis C. M. Lau.
In Submission
•
[3]
LightUL: An Efficient Recommendation Unlearning Framework
Wentao Ning, Haorui He, Reynold Cheng, Nur Al Hasan Haldar, Ben Kao, Nan Huo, Bo Tang, and Yupeng Li.
Guide-AI Workshop @ VLDB 2025
•
[4]
MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection
Yupeng Li, Haorui He†, Jin Bai, and Dacheng Wen.
ACM WWW 2024 (Oral Presentation, Top 9.4%)
•
[5]
Contextual Target-Specific Stance Detection on Twitter: New Dataset and Method
Yupeng Li, Dacheng Wen, Haorui He, and Francis C. M. Lau.
IEEE ICDM 2023 (Regular Long Paper, Top 9.37%)
•
[6]
Improved Target-specific Stance Detection on Social Media Platform by Delving into Conversation Threads
Yupeng Li, Haorui He†, Shaonan Wang, Francis C.M. Lau, and Yunya Song.
IEEE TCSS 2023
•
Speech Processing 🎤
[7]
Emilia: An Large-Scale Extensive, Multilingual, and Diverse Speech Dataset for Speech Generation
Haorui He*, Zengqiang Shang*, Chaoren Wang*, Xuyuan Li*, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, and Zhizheng Wu.
Extended Version (In Submission)
•
[8]
Overview of the Amphion Toolkit (v0.2)
Jiaqi Li*, Xueyao Zhang*, Yuancheng Wang*, Haorui He*, Chaoren Wang*, Li Wang*, Huan Liao*, Junyi Ao*, Zeyu Xie*, Yiqiao Huang*, Junan Zhang*, Zhizheng Wu
Technical Report
•
[9]
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang*, Liumeng Xue*, Yicheng Gu*, Yuancheng Wang*, Haorui He, Chaoren Wang, Xi Chen, Zihao Fang, Haopeng Chen, Junan Zhang, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, and Zhizheng Wu.
[10]
SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Peizhuo Liu, Li Wang, Renqiang He, Haorui He, Lei Wang, Huadi Zheng, Jie Shi, Tong Xiao, Zhizheng Wu.
IEEE SLT 2024 (Best Paper Finalist, Top 2.5%)
•
Previous Experience 💼
Aug. 2023 -- Aug. 2024: Research Intern, supervised by
Prof. Zhizheng Wu at CUHK-Shenzhen & Shanghai AI Laboratory.
Sep. 2019 -- June 2023: Bachelor of Engineering in Software Engineering (GPA: 90.07/100),
NJUPT.
Teaching 📚
- Teaching Assistant: 2025 Spring, FITE7409, Blockchain and Cryptocurrency, Postgraduate Elective Course, Dept. of Computer Science, HKU (Course Instructor: Prof. Tsz Hon Yuen).
Awards 🏆
- IEEE SLT Best Paper Finalists, 2024. (Top 2.5%)
- Outstanding Bachelor’s Thesis Award, 2023. (Top 1%)
- Mitacs and China Scholarship Council (CSC) Joint Globalink Research Internship Scholarship, 2022. (Top 200 in China)
Services 🌻
- Membership: IEEE Graduate Student Member
- Invited Reviewer: ACM MM 2023-2024, IEEE ICASSP 2025, IEEE ICME 2025, IJCAI 2025, EMNLP 2025, AAAI 2026, IEEE Journal of Social Computing 2025.
- Student Volunteer: IEEE SLT 2024, IEEE ICDE 2025
🌍 Visitor Map
"Facts do not cease to exist because they are ignored." -- Aldous Huxley
Last updated: August 2025