Anup Singh

I am currently an Applied Scientist at Amazon, where I work on Automatic Speech Recognition (ASR), leveraging large-scale models and speech LLMs to improve accuracy, robustness, and multilingual capabilities in real-world media applications. My broader interests include speech representation learning, audio understanding, and efficient large-scale systems.

I completed my Ph.D. in Computer Science at Ghent University, where I was part of the Speech and Audio Processing Group at IDLab. I was advised by Prof. Kris Demuynck and Prof. Vipul Arora. My doctoral research focused on self-supervised learning for speech and audio, with an emphasis on scalable audio indexing and retrieval. Building on this foundation, I explored speech tokenization techniques aimed at advancing textless NLP and speech-based language models.

I hold a BS–MS dual degree in Mathematics from the Indian Institute of Science Education and Research (IISER-Kolkata).

Outside of work, I enjoy learning about geopolitics, reading, and playing sports (mostly lawn-tennis these days!)

news

Jan 22, 2026	Our paper titled “Harmonic Summation-Based Robust Pitch Estimation in Noisy and Reverberant Environments” has been accepted at NCC 2026. Check out the paper.
Jan 17, 2026	Our paper titled “BEST-STD2.0: Balanced and Efficient Speech Tokenizer for Spoken Term Detection” has been accepted at ICASSP 2026. Check out the paper.
Jan 01, 2026	I have joined Amazon as an Applied Scientist II, working on Speech LLMs.

latest posts

Apr 12, 2026	RLHF: Reinforcement Learning from Human Feedback
Feb 21, 2026	Flow Matching
Dec 26, 2025	What are Diffusion Models?

selected publications

Interspeech

Language-Agnostic Speech Tokenizer for Spoken Term Detection with Efficient Retrieval

Anup Singh, Kris Demuynck, and Vipul Arora

In Proc. Interspeech 2025, 2025

Bib PDF Code Poster

@inproceedings{singh2025language,
  title = {Language-Agnostic Speech Tokenizer for Spoken Term Detection with Efficient Retrieval},
  author = {Singh, Anup and Demuynck, Kris and Arora, Vipul},
  booktitle = {Proc. Interspeech 2025},
  pages = {2630--2634},
  year = {2025},
}

ICASSP

BEST-STD2.0: Balanced and Efficient Speech Tokenizer for Spoken Term Detection

Anup Singh, Vipul Arora, and Kris Demuynck

In ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026

Bib PDF Poster

@inproceedings{singh2026best2,
  title = {BEST-STD2.0: Balanced and Efficient Speech Tokenizer for Spoken Term Detection},
  author = {Singh, Anup and Arora, Vipul and Demuynck, Kris},
  booktitle = {ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages = {17852-17856},
  year = {2026},
  organization = {IEEE},
}