EDGE AI TALKS: AUDIO APPLICATIONS IN THE TINYML ERA

The rapid evolution of TinyML has opened up new frontiers in the deployment of AI models on microcontroller units (MCUs), enabling sophisticated functionality in devices with extremely limited computational resources. This development is particularly transformative for audio-based applications, where balancing performance and efficiency is paramount.

In this presentation, we introduce two key audio use cases that leverage TinyML. First, we explore single-channel environmental noise cancellation (ENC) – a critical technique for improving human-to-human communication by isolating speech from background noise and enhancing human-to-machine interactions by reducing transcription errors. Second, we present
open vocabulary keyword spotting – referred to as text2model (T2M) – an innovative approach that allows users to define custom commands on the fly without the need for extensive data
collection or labeling.

We then discuss practical strategies for running these models efficiently on MCUs, including quantization-aware training (QAT), single value decomposition (SVD) compression, and distillation. Real-world challenges are also addressed, such as converting models from PyTorch to TensorFlow Lite (TFLite) and handling streaming in convolutional neural network (CNN) layers.

By the end of this session, you will gain a concise overview of how these audio applications can be implemented on resource-constrained devices, along with insights into the practical
challenges of bridging the gap from research to real-world deployment.