About me
Shu-wen Yang is a Ph.D. candidate in computer science at National Taiwan University (NTU), advised by Prof. Hung-yi Lee and Prof. Lin-shan Lee. His research interest lies in representation learning for general speech encoders. He has published over 10 papers in speech-related top conferences and journals. His research has accumulated over 2,100 citations and an h-index of 14 on Google Scholar. He co-organized the SUPERB benchmark and challenge, now adopted by over 40 institutions. He also co-created the S3PRL speech toolkit, which has earned over 2,300 stars on GitHub and is used by more than 150 open-source projects. He gave tutorials on speech representations at NAACL 2022, ICASSP 2022, and Interspeech 2022. He co-organized the SUPERB Challenge @ IEEE SLT 2022 and SPARKS Workshop @ IEEE ASRU 2023. He received the Google Ph.D. Fellowship in 2024. Finally, he enjoy playing the piano in the free time, under the guidance of Yiin Bin Yang. (See my hobbies)
(My Curriculum Vitae)
My primary direction is on speech representation learning, a field that has garnered various names recently. My recent efforts concentrate on self-supervised learning, representation generalizability, and efficient pre-training.
Self-Supervised Learning (SSL): Learning speech representations from unlabeled data. We discover that speech SSL techniques lead to representations with strong task generalizability beyond Automatic Speech Recognition (ASR). Additionally, we explore their use across a broad spectrum of real-life speech applications, which marks the beginning of the era of speech foundation models (SFM).
Representation Generalizability: Benchmarking the task and domain generalizability of SFMs. I think deeply about the purpose and methods of creating a correct and solidly grounded benchmark, especially regarding its important role in guiding future model development.
Efficient Pre-training: All existing SFMs require industrial-level computing, which makes further research monopolized by large corporations. I am currently working on how to pre-train SFMs efficiently within academic resources.
Selected Projects
I coordinated (as the research and engineering lead) the initial version of SUPERB (Speech processing Universal PERformance Benchmark), where the proposed speech foundation model (SFM) paradigm has influenced numerous works, as seen in additional benchmarks like SUPERB-SG, SUPERB-prosody, ML-SUPERB, and Dynamic-SUPERB. This influence extends to the development of SFMs, such as Unispeech-SAT, WavLM, and the compression of SFMs, including DistilHuBERT, LightHuBERT, and ARMHuBERT.
I also co-founded the S3PRL Toolkit with Andy T. Liu (NTU) in 2019, with support and advice from Hung-yi Lee (NTU). Throughout several years, I have collaborated with over 40 contributors, to whom I extend my sincere thanks. The major contributors are highlighted in the Change Log. The toolkit supports the pre-training of several classical SSL methods, benchmarking of numerous downstream tasks, and offers the most comprehensive collection of pre-trained SSL models to track research history. It is widely used by the community, including toolkits like ESPnet, S3PRL-VC and numerous open-source projects.
I am always open to collaborations involving dense and deep discussions, where I can learn from new explorations and intense debates regardless of co-authorship. If you are interested in collaborating, please reach me at my email: leo19941227@gmail.com.
Selected Publications
SUPERB: Speech processing Universal PERformance Benchmark
Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee
in Interspeech, 2021
arxiv / video / website / codeA Large-Scale Evaluation of Speech Foundation Models
Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee
in IEEE/ACM Transactions on Audio Speech and Language Processing, 2024
arxiv (preferred) / ieee / code
Visitors