Local AI model execution is essential for time-critical and data-sensitive edge AI applications. Modern edge-AI chips offer improved performance by integrating heterogeneous compute cores—such as CPUs, GPUs, and NPUs—into a single system. However, current deployment frameworks fall short in effectively leveraging such heterogeneous platforms.
We present a flexible and lightweight SDK designed to address this gap. Our SDK supports ahead-of-time model compilation for a range of formats, including PyTorch (custom or Hugging Face), TensorFlow Lite, and ONNX. It abstracts the complexities of heterogeneous systems, enabling seamless deployment across multiple compute targets without requiring specialized hardware knowledge.
With a Python-friendly API, the SDK empowers developers to switch workloads dynamically between different cores—for example, balancing inference between CPU and GPU on ARM-based SoCs. We demonstrate how this capability impacts model throughput for both image classification tasks and on-device generative AI workloads.