Publications
* indicates equal contribution.
[ACL 2024] RepCodec: A Speech Representation Codec for Speech Tokenization. Zhichao Huang*, Chutong Meng*, Tom Ko. [code]
[TASLP 2024] Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research. Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang. [dataset]
[Interspeech 2023] CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning. Chutong Meng*, Junyi Ao*, Tom Ko, Mingxuan Wang, Haizhou Li. [code]
[Interspeech 2023] Gigast: A 10,000-hour pseudo speech translation corpus. Rong Ye, Chengqi Zhao, Tom Ko, Chutong Meng, Tao Wang, Mingxuan Wang, Jun Cao.