About Me

I am a Ph.D. candidate in Computer Science at The University of Hong Kong (HKU), and a Senior Research Assistant at Hong Kong Baptist University (HKBU), supervised by Prof. Francis C.M. Lau, Prof. Reynold C.K. Cheng, and Prof. Ivan Y.P. Li. Previously, I worked under Prof. Zhizheng Wu as a Research Assistant at the Chinese University of Hong Kong (CUHK), Shenzhen and as a Research Intern at the Shanghai AI Laboratory, and as a Research Intern under Prof. Zhen Ming (Jack) Jiang at York University (YorkU). I hold a B.Eng. in Software Engineering from Nanjing University of Posts and Telecommunications (NJUPT), where I received the Outstanding Bachelor’s Thesis Award.

I am the creator of Emilia, a leading dataset for expressive and spontaneous text-to-speech (TTS) synthesis, and its preprocessing pipeline, Emilia-Pipe. Emilia has surpassed 1 million downloads from over 1k institutions and companies worldwide, including Stanford, CMU, OpenAI, Google, and NVIDIA. It is the “most liked dataset” in the audio category on HuggingFace and serves as a foundational training dataset for state-of-the-art TTS models like F5-TTS, MaskGCT, IndexTTS-2, ZipVoice, as well as Speech LLMs such as Kimi-Audio, VITA-Audio, and OmniVoice.

Research Focus

My current research interests revolve around Social Computing, where I aim to leverage LLM agents to address critical societal challenges such as misinformation & fake news.

The University of Hong Kong Hong Kong Baptist University

Milestones

  • 2026/03: Our project at HKBU, an AI-powered fact-checking-as-a-service, won the International Press Prize at the International Exhibition of Inventions Geneva (Special Award, Top 1)! See the official results, and the Hong Kong government press release. 🏆
  • 2026/01: Our paper DebateCV was accepted to WWW 2026 (Web4good special track) (95/437, Top 21.7%)! 🎉
  • 2026/01: A Chinese blog by QingKe AI for Fact2Fiction reached 100K views on Weibo! 🔥
  • 2025/12: Our paper DebateCV won the Best Poster Award at ICSC 2025! 🏆
  • 2025/11: Our paper Fact2Fiction was accepted for an oral presentation at AAAI 2026 (1,105/23,680, Top 4.7%)! 🎉
  • 2025/10: Our paper Noro was selected as a Best Paper Finalist at APSIPA (8/581, Top 1.4%)! 🏆
  • 2025/09: Our paper Emilia was accepted by TASLP after a year! 🎉
  • 2025/08: Our paper Fact2Fiction about attacking fact-checking systems is now available online! 🤗
  • 2025/07: Our paper DebateCV about debate-driven claim verification is now available online! 🤗
  • 2025/03: Emilia has become the “most liked dataset” in the audio category on HuggingFace. 🏆
  • 2024/12: Our paper SpMis was selected as a Best Paper Finalist at SLT 2024 (9/373, Top 2.5%)! 🏆
  • 2024/10: Thrilled to begin my CS Ph.D. journey at HKU! A real dream come true! 🎉
  • 2024/09: Within two months, Emilia received more than 50k downloads from over 700 prestigious research institutions, including Stanford, CMU, OpenAI, Google, and NVIDIA. 🔥
  • 2024/08: Our papers Amphion, Emilia, and SpMis were accepted by SLT 2024! 🎉
  • 2024/07: We released Emilia, the first multilingual in-the-wild dataset for speech generation with over 101k hours of speech data! We also made the preprocessing pipeline open-source! 🔥
  • 2024/02: Our paper MCFEND about robust fake news detection was accepted for an oral presentation at WWW 2024! 🎉
  • 2023/12: My first attempt at being a core member in a large-scale open-source project, Amphion. 🤗
  • 2023/09: Our paper CTSDT about stance detection in conversational threads was accepted by ICDM 2023 as an oral presentation! 🎉
  • 2023/09: Our paper Branch-BERT about conversational stance detection was accepted by TCSS! 🎉

Selected Publications

† denotes corresponding author.
Full Publication List
  1. Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-Checking System
    Haorui He, Yupeng Li, Bin Benjamin Zhu, Dacheng Wen, Reynold Cheng, and Francis C. M. Lau.
    AAAI 2026 (Oral; Acceptance rate: 4.7%; 100K views on Weibo)
  2. Debating Truth: Debate-Driven Claim Verification with Multiple Large Language Model Agents
    Haorui He, Yupeng Li, Dacheng Wen, Chen Yang, Reynold Cheng, Donglong Chen, and Francis C. M. Lau.
    WWW 2026 (Web4good Special Track; Acceptance rate: 21.7%)
  3. Emilia: A Large-Scale Extensive, Multilingual, and Diverse Speech Dataset for Speech Generation
    Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, and Zhizheng Wu.
    IEEE Transactions on Audio, Speech and Language Processing (TASLP) 2025 (Ranked #1 most-liked/trending audio dataset on HuggingFace with over 1 million downloads)
  4. Noro: Noise-Robust One-Shot Voice Conversion with Hidden Speaker Representation Learning
    Haorui He, Yuchen Song, Yuancheng Wang, Haoyang Li, Xueyao Zhang, Li Wang, Gongping Huang, Eng Siong Chng, and Zhizheng Wu.
    APSIPA 2025 (Best Paper Finalist, Acceptance rate: 1.4%)
  5. Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
    Xueyao Zhang, Liumeng Xue, Yicheng Gu, Yuancheng Wang, Haorui He, Chaoren Wang, Xi Chen, Zihao Fang, Haopeng Chen, Junan Zhang, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, and Zhizheng Wu.
    SLT 2024 (Ranked #1 trending on GitHub)
  6. MCFEND: A Multi-Source Benchmark Dataset for Chinese Fake News Detection
    Yupeng Li, Haorui He, Jin Bai, and Dacheng Wen.
    WWW 2024 (Oral Presentation, Acceptance rate: 9.4%)
  7. Improved Target-Specific Stance Detection on Social Media Platform by Delving into Conversation Threads
    Yupeng Li, Haorui He, Shaonan Wang, Francis C.M. Lau, and Yunya Song.
    IEEE Transactions on Computational Social Systems (TCSS) 2023
  8. Contextual Target-Specific Stance Detection on Twitter: New Dataset and Method
    Yupeng Li, Dacheng Wen, Haorui He, Jianxiong Guo, Xuan Ning, and Francis C. M. Lau.
    ICDM 2023 (Oral Presentation, Acceptance rate: 9.4%)

Education

The University of Hong Kong (HKU)

Ph.D. Candidate, School of Computing and Data Science

Supervisors: Prof. Francis C.M. Lau, Prof. Reynold C.K. Cheng, Prof. Ivan Y. P. Li (HKBU)

Sep 2024 -- Present

Nanjing University of Posts and Telecommunications (NJUPT)

B.Eng. in Software Engineering (GPA: 90.07/100)

Sep 2019 -- Jun 2023

Experience

Hong Kong Baptist University

Senior Research Assistant @ Hong Kong

LLM Agents for Fact-checking, Supervisor: Prof. Ivan Y. P. Li

Nov 2024 -- Present

Shanghai AI Laboratory

Research Intern @ Shanghai

Speech LLM, Text-to-Speech, Supervisor: Prof. Zhizheng Wu

Feb 2024 -- Oct 2024

Chinese University of Hong Kong, Shenzhen

Research Assistant @ Shenzhen

Speech Deepfake Detection, Voice Conversion, Supervisor: Prof. Zhizheng Wu

Sep 2023 -- Feb 2024

York University

Research Intern @ Toronto

Active Machine Learning, Code Intelligence, Supervisor: Prof. Zhen Ming (Jack) Jiang

Sep 2022 -- Dec 2022

The University of Hong Kong

Research Intern @ Hong Kong

Stance Detection, Fake News Detection, Supervisors: Prof. Francis C.M. Lau, Prof. Ivan Y. P. Li

Jun 2021 -- Aug 2023

Awards

Services

Misc

"Facts do not cease to exist because they are ignored." -- Aldous Huxley

Last updated: Apr 2026