I received my Ph.D. from the National University of Singapore (NUS), where I was supervised by Professor Li Haizhou. I am currently a researcher at A*STAR, working on speech generation technologies for Southeast Asian languages.

My current research focuses on language-model-based speech generation, where large language models predict discrete acoustic tokens to synthesize speech. I work across the full system pipeline, including large-scale pretraining post-training optimization using reinforcement learning methods such as GRPO to improve speech naturalness and robustness.

During my PhD, my research focused on speaker verification, particularly on improving robustness in noisy environments and developing explainable AI techniques to interpret speaker embedding models. My work has been published in major speech processing conferences such as ICASSP and Interspeech.

I obtained my B.Eng. from Sichuan University in 2017 and M.Eng. from Shanghai Jiao Tong University in 2020. During my master’s studies, I worked on bio-acoustic signal processing, developing a deep learning system for detecting abnormal biological sounds and deploying the model in an Android application with my supervisor Professor Li Yongfu.

My research interests broadly lie in speech processing, generative speech models, and speech foundation models, with the goal of bridging cutting-edge research and practical real-world speech applications.

📝 Publications

ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification, Yi Ma, Shuai Wang, Tianchi Liu and Haizhou Li, IEEE Signal Processing Letters (SPL), 2025. code

Gradient weighting for speaker verification in extreme low Signal-to-Noise Ratio, Yi Ma, Kong Aik Lee, Ville Hautamaki, Meng Ge, Haizhou Li, International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024. code

How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?, Tianchi Liu, Lin Zhang, Rohan Kumar Das, Yi Ma, Ruijie Tao, Haizhou Li, Interspeech, 2024.

PL-EESR: Perceptual loss based end-to-end robust speaker representation extraction, Yi Ma, Kong Aik Lee, Ville Hautamaki, Haizhou Li, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021. code

LungRN+ NL: An improved adventitious lung sound classification using non-local block resnet neural network with mixup data augmentation, Yi Ma, Xinzi Xu, Yongfu Li, Interspeech, 2020.

LungBRN: a Smart Digital Stethoscope for Detecting Respiratory Disease Using bi-ResNet Deep Learning Algorithm, Yi Ma, Xinzi Xu, Qing Yu, Yuhang Zhang, Yongfu Li, Jian Zhao, Guoxing Wang, IEEE Biomedical Circuits and Systems Conference (BioCAS), 2019. code

Live Demo: LungSys - Automatic Digital Stethoscope System For Adventitious Respiratory Sound Detection, Yi Ma, Xinzi Xu, Qing Yu, Yuhang Zhang, Yongfu Li, Jian Zhao, Guoxing Wang, IEEE Biomedical Circuits and Systems Conference (BioCAS), 2019. code

Enhancing speech recognition for Parkinson’s disease patient using transfer learning technique, Qing Yu, Yi Ma, Yongfu Li, Journal of Shanghai Jiaotong University (Science), 2022.

LungAttn: advanced lung sound classification using attention mechanism with dual TQWT and triple STFT spectrogram, Jizuo Li, Jiajun Yuan, Hansong Wang, Shijian Liu, Qianyu Guo, Yi Ma, Yongfu Li, Liebin Zhao and Guoxing Wang, Physiological Measurement, 2021.

📖 Educations

  • 2020.08 - now, Ph.D. in National University of Singapore (NUS), Singapore.
  • 2017.09 - 2020.03, M.Sc. in Shanghai Jiao Tong University, Shanghai, China.
  • 2013.09 - 2017.06, B.Eng. in Electronic Information Engineer, Sichuan Universiy, Sichuan, China.

👔 Internships

  • 2024.11 - 2025.05,Huawei, Singapore.
  • 2020.05 - 2020.08,Pingan Technology, Shangha, China.

💻 Open Source Code