About me
Shu-wen Yang is a Ph.D. candidate (final year) in computer science at National Taiwan University (NTU), advised by Prof. Hung-yi Lee and Prof. Lin-shan Lee. He is looking for the full-time research scientist position starting at July 2026.
Shu-wen Yang’s research interests lie in speech representation (understanding/generation), speech-to-speech LLMs (training/post-training), and speech/audio generative models. He has published over 10 papers in top-tier speech/audio conferences and journals, including Interspeech, ICASSP, TASLP, ICML, and ICLR. His research has accumulated over 2,500 citations and an h-index of 14 on Google Scholar. He co-organized the SUPERB benchmark and challenge, which have been adopted by over 40 institutions. He also co-created the S3PRL speech toolkit, which has earned over 2,500 stars on GitHub and is used by more than 170 open-source projects. He gave tutorials on speech representations at NAACL 2022, ICASSP 2022, and Interspeech 2022. He co-organized the SUPERB Challenge @ IEEE SLT 2022 and the SPARKS Workshop @ IEEE ASRU 2023. He received the Google Ph.D. Fellowship in 2024.
(My Curriculum Vitae)
Selected Projects
I coordinated (as the research and engineering lead) the initial version of SUPERB (Speech processing Universal PERformance Benchmark), where the proposed speech foundation model (SFM) paradigm has influenced numerous works, as seen in additional benchmarks like SUPERB-SG, SUPERB-prosody, ML-SUPERB, and Dynamic-SUPERB. This influence extends to the development of SFMs, such as Unispeech-SAT, WavLM, and the compression of SFMs, including DistilHuBERT, LightHuBERT, and ARMHuBERT.
I also co-founded the S3PRL Toolkit with Andy T. Liu (NTU) in 2019, with support and advice from Hung-yi Lee (NTU). Throughout several years, I have collaborated with over 40 contributors, to whom I extend my sincere thanks. The major contributors are highlighted in the Change Log. The toolkit supports the pre-training of several classical SSL methods, benchmarking of numerous downstream tasks, and offers the most comprehensive collection of pre-trained SSL models to track research history. It is widely used by the community, including toolkits like ESPnet, S3PRL-VC and numerous open-source projects.
Selected Publications
SUPERB: Speech processing Universal PERformance Benchmark
Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee
in Interspeech, 2021
arxiv / video / website / codeA Large-Scale Evaluation of Speech Foundation Models
Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee
in IEEE/ACM Transactions on Audio Speech and Language Processing, 2024
arxiv (preferred) / ieee / codeGenerative Audio Language Modeling with Continuous-valued Tokens and Masked Next-Token Prediction
Shu-wen Yang, Byeonggeun Kim, Kuan-Po Huang, Qingming Tang, Huy Phan, Bo-Ru Lu, Harsha Sundar, Shalini Ghosh, Hung-yi Lee, Chieh-Chi Kao, Chao Wang
in ICML, 2025
arxiv (comming soon)
Visitors
