About me

My publication list

Shu-wen Yang is a Ph.D. candidate (final year) in computer science at National Taiwan University (NTU), advised by Prof. Hung-yi Lee and Prof. Lin-shan Lee. He is looking for the full-time research scientist position starting at July 2026.

Shu-wen Yang’s research interests lie in speech representation (understanding/generation), speech-to-speech LLMs (training/post-training), and speech/audio generative models. He has published over 10 papers in top-tier speech/audio conferences and journals, including Interspeech, ICASSP, TASLP, ICML, and ICLR. His research has accumulated over 2,500 citations and an h-index of 14 on Google Scholar. He co-organized the SUPERB benchmark and challenge, which have been adopted by over 40 institutions. He also co-created the S3PRL speech toolkit, which has earned over 2,500 stars on GitHub and is used by more than 170 open-source projects. He gave tutorials on speech representations at NAACL 2022, ICASSP 2022, and Interspeech 2022. He co-organized the SUPERB Challenge @ IEEE SLT 2022 and the SPARKS Workshop @ IEEE ASRU 2023. He received the Google Ph.D. Fellowship in 2024.

(My Curriculum Vitae)

Selected Projects

I coordinated (as the research and engineering lead) the initial version of SUPERB (Speech processing Universal PERformance Benchmark), where the proposed speech foundation model (SFM) paradigm has influenced numerous works, as seen in additional benchmarks like SUPERB-SG, SUPERB-prosody, ML-SUPERB, and Dynamic-SUPERB. This influence extends to the development of SFMs, such as Unispeech-SAT, WavLM, and the compression of SFMs, including DistilHuBERT, LightHuBERT, and ARMHuBERT.

I also co-founded the S3PRL Toolkit with Andy T. Liu (NTU) in 2019, with support and advice from Hung-yi Lee (NTU). Throughout several years, I have collaborated with over 40 contributors, to whom I extend my sincere thanks. The major contributors are highlighted in the Change Log. The toolkit supports the pre-training of several classical SSL methods, benchmarking of numerous downstream tasks, and offers the most comprehensive collection of pre-trained SSL models to track research history. It is widely used by the community, including toolkits like ESPnet, S3PRL-VC and numerous open-source projects.

Selected Publications

Visitors