TinyML: Machine Learning on Microcontrollers
Tanishq Kapse, Tanmay Kawal, Shrawan Khambekar, and Pranav Khatod
Guide: Prof. Dr. Nilesh Sable
Department of Computer Science and Engineering (AI)
Vishwakarma Institute of Technology
Pune, India
{tanishq.kapse24, tanmay.kawal24, shrawan.khambekar24, pranav.khatod24}@[Link]
computational power.
In late 2019, Google introduced TensorFlow Lite for
Abstract—Machine Learning (ML) has become integral Microcontrollers (TFLM), a heavily optimized ML
to modern computing systems, powering tasks such as framework fit for low-power devices, signalling a change
image recognition, speech detection, and predictive in the relationship between embedded systems and ML.
analytics. However, traditional ML requires powerful TFLM showed the community that devices with just a few
servers or cloud infrastructure. TinyML bridges this hundred kilobytes of memory could do meaningful
gap by enabling ML on resource-constrained devices, inference. Shortly after TFLM, platforms, such as Edge
providing real-time inference without relying on the Impulse, [Link], and MicroTVM, made model
cloud. This shift enhances privacy, reduces latency, and training, optimization, and deployment on embedded
hardware incredibly simple.
enables intelligent decision-making at the edge. Despite
challenges such as memory and power limitations,
Since late 2019, an exponential uptick in academic
TinyML is rapidly growing with the support of research and commercial adoption of TinyML has
optimized frameworks and efficient model designs. occurred, as observed in the timeframe 2020 to 2024,
demonstrated by companies like ARM, Syntiant and Sony
producing hardware accelerators aimed specifically at low-
I. INTRODUCTION power ML applications. TinyML benchmarks, like
A. Background and Motivation TinyMLPerf, have also established a standard performance
Recently, Machine Learning (ML) has changed the way evaluation for their respective tasks. Today, TinyML is a
machines interact with the environment, enabling them to convergence of embedded systems, artificial intelligence,
identify patterns, predict, and learn from experience. and IoT, with innovative applications in diverse industries.
Historically, such ML models relied on considerable
computational power using powerful CPUs, GPUs, and C. Report Organization
large-scale access to cloud infrastructure for inference and This document is arranged in a series of sections to
training. However, these models have limitations in using provide an in-depth understanding of TinyML and the
centralized, cloud computing as they have higher latency, current state of the field:
higher energy use, and potential for data privacy risk, and
they rely on the internet. Abstract - Summarizes the goals, scope and
The essence of TinyML is to have ML at the edge, closer main findings of the report.
to the location of data production. Devices can perform Introduction - Provides background,
immediate ML which allows for device decision-making motivation, and history of TinyML.
in real-time, without relying on sustained internet
connection. This is especially meaningful in applications Literature Review - Reviews past research
such as wearable health monitors, preventative contributions, including key studies that
maintenance systems, agricultural sensor networks, and influenced the field.
autonomous drones, which have limitations on response Current State - Describes latest tools,
time and energy consumption. Additionally, running ML- frameworks, and hardware systems.
related inference on the device will reduce data
transmission, providing additional data privacy. Recent Developments - Discusses emerging
trends in hardware, software and hybrid edge-
B. Historical Context cloud intelligence.
The development of TinyML can be viewed through the
development of embedded systems and machine learning. Comparative Analysis - Compares existing
A common embedded device in the early 2000s was used frameworks and platforms.
for either simple sensing or control tasks. Even into the
Future Directions - Discusses current
2010s, when it became common to implement ML
challenges and possible opportunities.
algorithms, ML algorithms were generally limited to high-
performance computing since they required significant Conclusion and References - Summarizes key
points and references used in the report. with energy a key consideration, and
interoperability with the devices. TinyML has
II. LITERATURE REVIEW become the dominant paradigm for embedding
intelligence into the Internet of Things (IoT) and
No. Author(s) & Title / Source Main Focus Key Contribution to TinyML
Year
1 Pete Warden & TinyML: Machine Learning with Foundational This is the first comprehensive guide on
Daniel TensorFlow Lite on Arduino and book on deploying ML models on microcontrollers,
Situnayake Ultra-Low-Power Microcontrollers TinyML contributed to the popularization of TinyML
(2020) (O’Reilly Media) education, and demonstrated practical
applications of TinyML use cases.
2 Google Visual Wake Words with Framework Proposed TensorFlow Lite for Microcontrollers
TensorFlow TensorFlow Lite Micro introduction (TFLM), gaining the ability to perform ML
Team (2019) inference directly on devices with <256KB of
RAM.
3 Han, Mao & Deep Compression: Compressing Model Proposed a pipeline for pruning and quantization;
Dally (2015) Deep Neural Networks with compression this will reduce the model size and computations
Pruning, Quantization and Huffman and is the foundation for optimization for
Coding TinyML.
4 MLCommons MLPerf Tiny Benchmark Suite Standard Proposed benchmark metrics of accuracy,
(2021) benchmarking latency, and energy usage for evaluating TinyML
frameworks and devices.
5 Hymel et al. Edge Impulse: An MLOps Platform Toolchain & Proposed a cloud platform to automate the data
(2022) for Tiny Machine Learning (arXiv) MLOps collection, training, and deployment phases of a
TinyML project.
6 Lin et al. MCUNet: Tiny Deep Learning on Hardware- Proposed TinyNAS and TinyEngine for
(2020) IoT Devices (arXiv) aware design developing the jointly neural architectures
optimal on microcontrollers; achieved ImageNet-
like inference on MCUs.
7 Liu et al. Deploying Machine Learning Compiler- Showed MicroTVM would automate the
(2023) Models to Ahead-of-Time C Source based conversion of ML models to C code for
(MicroTVM) deployment constrained microcontrollers.
8 Wu et al. Integer Quantization for Deep Quantization Proposed an analysis of post-training
(2020) Learning Inference (arXiv) techniques quantization (INT8) metrics greatly reduce the
model footprint and latency.
9 Llisterri On-Device Training of ML Models Federated Examined distributed, privacy-preserving
Giménez et al. on Microcontrollers: A Look at learning learning that runs directly on MCUs; developed
(2022) Federated Learning adaptive TinyML models.
10 Yang et al. TinyFormer: Efficient Transformer Model Designed memory-efficient transformers suitable
(2023) Design and Deployment (arXiv) architecture for constrained microcontrollers; enabled
development of TinyML models outside of
CNN-based models.
edge computing environments with on-device
III. CURRENT STATE
Overview of Current TinyML Landscape
TinyM was born as an idea in a research setting, it intelligence across sensor networks, wearables, and
has matured into a well-established autonomous systems.
interdisciplinary field covering embedded systems, It now has two main priorities for industry:
low-power hardware design, and applied machine Maximizing performance and inference
learning. The TinyML ecosystem has developed accuracy, often in the limited resource
into diverse hardware platforms, software environments.
frameworks, and development toolchains that Energy-efficient and sustainable
allow many types of deep learning models to run performance through technology
on microcontrollers with memory only a few optimized for performance on the
hundred kilobytes in size in the current 2024-2025 deployment device.
landscape.
Hardware Platforms Supporting TinyML
Present-day TinyML is heavily focused on
practicality — model performance, development The hardware environment for TinyML has broadened
significantly. Newer microcontrollers are now available
with NPUs and DSP extensions to accelerate the
inference of ML. IV. RECENT ADVANCES
Platform / Manufacturer Key Features Hardware Accelerators and Endpoint NPUs
Series
ARM Cortex-M ARM Ultra-low-power MCUs with A significant development recently has been the
Series CMSIS-NN library support emergence of domain-specific micro-NPUs and other
hardware accelerators available on endpoint devices, allowing
Arduino Nano Arduino much higher, order-of-magnitude improvement in energy
Onboard sensors, Bluetooth,
33 BLE Sense efficiency for inference tasks (e.g., keyword and sensor
TensorFlow Lite Micro support
fusion) that are always on. Commercial chips like the
STM32 Series STMicroelectronics [Link] for Syntiant
neural NDP family show ultra low power operation for
network deployment audio/sensor tasks and have led to TinyML benchmark
submissions that have demonstrated large energy savings
ESP32/ESP32- Espressif Systems when AI
Integrated Wi-Fi + Bluetooth, compared to conventional MCUs. At the same time,
S3 accelerator in S3 versionmainstream MCU suppliers have released endpoint AI cores;
NXP [Link] RT NXP Semiconductors Crossover MCUs with high as an example, consider ARM's Cortex-M55 with Ethos-U
processing speed microNPU technology. The endpoint AI cores feature
Syntiant Syntiant Corp. hardware
Dedicated NPU optimized for primitives and instruction sets optimized for
NDP120 always-on deep learning quantized neural network operations, facilitating faster on-
device inference usage while providing greater energy
Sony IMX500 Sony Imaging sensor with built-in AI
efficiency across a wider range of workloads.
processor
4.2. System-Algorithm Co-Design
Now, most MCUs are capable of enabling ML inference
Recent work is focused on co-design, which is jointly
while consuming less than 1 mW of power, which
searching for architectures and building inference engines in
makes them ideal for battery-powered and energy-
consideration of constrained MCU memory and
harvesting devices. The hardware ecosystem continues
[Link] MCUNet framework uses constrained neural
to move towards domain-specific design — embedding
architecture search with a minimal runtime and has allowed
AI accelerators in already low-power MCUs to enable
for surprisingly strong networks to be developed that reside
inference
within sub-MB memory settings, and in some cases - even
ImageNet-level performance - using off-the-shelf MCU
Present-day Utilization of TinyML components. The co-design approach has been adopted as the
primary strategy for research efforts in TinyML: instead of
TinyML has already been applied and implements in finding ways to squeeze existing models, new methods for
multifarious practical cases: searching and compiling models that natively
Medical: Continuously monitoring vital signs using 4.3 Streamlined transformers and more sophisticated model
sensors attached to the body (ECG, oxygen level, families (TinyFormer / MCUFormer)
activity recognition).
The once monstrous transformers have been streamlined
Agriculture: Smart-irrigation systems running on for TinyML using specialized search and sparsity techniques.
microcontroller that analyze data from soil and weather TinyFormer and other MCU-aware transformer work
to efficiently operate irrigation systems. demonstrates design of sparse compact transformer blocks
and engines that can execute in typical microcontroller
Industrial IoT: Predictive maintenance through constraints (sub MB storage and few hundred kB RAM),
detecting anomalies in temperature and vibration. opening up TinyML to sequence modeling and richer sensor-
fusion tasks beyond simple CNN classification. Streamlining
Environmental Monitoring: Very low-power sensors transformers expands model space for tiny devices and can
monitor air quality, noise levels, and wildlife improve accuracy on temporal tasks (audio, biosignals) while
activities/deviations. remaining deployable.
Smart Homes: Continuous voice detection, gesture
orientation, and occupancy detection. 4.4 Compiler toolchains and ahead-of-time code-generation
(MicroTVM and optimized runtimes)
Autonomous Systems: Very low latency enhanced
vision and navigation capability in micro-drones and Feeding into the outcomes above is tooling which has
robots. advanced from interpreter-style runtimes to ahead-of-time
compilation and operator fusion targeted at microcontrollers.
These considerations provide evidence of the practical For example, projects that compile neural networks into
feasibility of TinyML - to make devices at the edge of stand-alone compact C source or tightly optimized C binaries
the network smart, without high-power processing or (MicroTVM and TinyEngine style backends) are able to
cloud-based computing. reduce runtime overhead and simplify deployment of deep
learning on bare-metal systems which have limited operator
support. Toolchains enable hardware-specific tuning
(memory layout, DMA friendly operators) and are key to Aspect TensorFl Edge Micro CMSI MCUNet
squeezing every bit of performance out of constrained hardware. ow Lite Impuls TVM S-NN (TinyNA
for e S +
4.5 Advances in Security, Privacy, and Robustness Microcon TinyEng
trollers ine)
As TinyML transitions into domains where safety and (TFLM)
privacy are paramount, recent research has shown progress on Develo Google Edge Apach ARM MIT /
secure model updates, encrypted on-device inference, and per / Impulse e TVM Research
resilience against adversarial inputs in constrained settings. Origin Inc. Project
Given that TinyML devices will be deployed widely and Primar Lightweig End-to- Ahead- Hand- Hardware
often have physical access, researchers have begun studying y Focus ht runtime end of-time optimi –
options for lightweight integrity checks, signed model for on- TinyM compil zed algorithm
artifacts, and privacy-preserving aggregation to enable device L er for NN co-design
federated updates, with an understanding of security needs inference lifecycl optimi kernels for
and the limitations of constrained resources. e (data zed for microcon
→ MCU ARM trollers
4.6 Synthesis — where these developments meet in deploy code CPUs
practice ment)
Hardwa Broad Broad Platfor ARM STM32,
The developments described above come together as an re (Arduino, (Arduin m- Cortex ARM
inching stack of developments — specialized devices (tiny Support STM32, o, indepe -M Cortex-M
NPUs) + co-designed compact architectures ESP32, Nordic, ndent family
(TinyNAS/MCUNet/TinyFormer) + efficient compilation etc.) NXP,
(MicroTVM/TFLM backends) + benchmark (MLPerf Tiny) + Sony,
and energy harvesting. Together, these developments enable a etc.)
new class of practical applications (always-on sensing, Model .tflite .tflite / ONNX Tensor NAS-
batteryless deployments, personalized wearables) that didn't Format (quantized ONNX , kernels generated
exist a few years ago. There are still challenges (especially ) / TFLite integra architectu
on-device training, security standards, and long-term custom ted via res
maintenance) but as we have demonstrated, the rapid formats C code
improvements are leading us to accessible new edge use cases Ease of High Very Mediu Moder Low–
realized as TinyML. Use (plug-and- high m ate Moderate
play (cloud (requir (requir (research
integration GUI, es es -level
V. Comparative Analysis with auto- compil C/C++ tools)
Arduino optimiz er experti
IDE) ation) setup) se)
5.1 Overview Optimi Quantizati Quantiz Operat Low- Neural
zation on, ation, or level architectu
The TinyML environment is a collection of multiple Method pruning data fusion, assemb re search
frameworks and hardware options designed to strike a s augmen ahead- ly (TinyNA
balance between model accuracy, memory usage, energy tation of-time optimi S)
usage, and ease of deployment. compil zation
In this section, we provide a comparison of the major ation
TinyML frameworks and microcontroller platforms in Perfor Moderate High High Very Very
common usage. mance (general- (auto- (hardw high high (co-
Efficie purpose) optimiz are- (hand- designed
ncy ed specifi optimi model–
Comparison of major TinyML frameworks and Hardware models) c zed hardware
Platforms: tuning) kernels optimizat
) ion)
Comparison of Hardware Platforms of TinyML:
Platform Processo Clock Memory TinyML Power Applications
r Type Speed Features Consumptio
n
Arduino Nano ARM 64 256 KB TFLM- ~5 mW Rapid prototyping, student projects
33 BLE Sense Cortex- MHz RAM compatible (active)
M4 sensors on board
STM32H7 ARM Up to Up to 1 [Link], ~10–20 mW Industrial ML tasks
Series Cortex- 480 MB CMSIS-NN
M7 MHz RAM optimized
ESP32-S3 Xtensa 240 512 KB AI accelerator, ~15 mW Audio, vision-based IoT devices
LX7 MHz – 1 MB Wi-Fi,
Dual- RAM Bluetooth
Core
NXP [Link] ARM 600 1 MB Crossover MCU ~20 mW Vision, industrial control
RT1060 Cortex- MHz RAM with ML
M7 support
Syntiant Custom 10-50 128 KB Dedicated NPU <1 mW Voice detection, continuous sensing
NDP120 Neural MHz SRAM for always-on
Decision deep learning
Processor
5.2 Analytical Discussion broad software ecosystem.
The comparative analysis indicates that no single All in all, the existing state of TinyML reveals a set
system or hardware implementation stands out as of design priorities:
superior across all the dimensions of TinyML
Ease of Use versus Performance
performance.
Adaptability vs. Specialization
Each implementation has specific trade-offs to
target: General Purpose vs. Domain-Specific
Frameworks
TensorFlow Lite for Microcontrollers (TFLM)
remains the predominant implementation because of its As we project forward, we envision a hybrid future--
ease of use, portability, and active open-source bundling the usability of Edge Impulse-style
community. TFLM is best suited for general-purpose frameworks with MCUNet, CMSIS-NN's low-level
applications and academic prototypes. efficiency, and the efficiency of running it on domain-
specific NPUs with always-on intelligence.
Edge Impulse ranks highest in usability and
automation, offering a full pipeline from cloud to edge
that obscures much of the technical complexity and VI. FUTURE DIRECTIONS
enables rapid prototyping and industry adoption.
6.1 Shifting Towards Continuous Learning and
MicroTVM and CMSIS-NN prioritize performance Adaptation On Devices
efficiency and low-level optimization. CMSIS-NN, in
particular, demonstrates near-optimal efficiencies with
ARM Cortex-M processors thanks to kernel At this point in the development of consumer-
implementations crafted by hand. oriented TinyML systems, the primary task of devices is
MCUNet ranks highly for accuracy–efficiency to perform the inference step of the underlying neural
research for demonstrating the best tradeoff between network. The next advance will depend on devices
accuracy and efficiency through a neural architecture being able to learn, where the system continuously
search (NAS) approach, but user scenarios are limited adapts to changes in the environment without needing
to either an academic or high-end research and retraining in the cloud. By employing federated learning
development context. and incremental training techniques, this could be
achieved on a microcontroller for the learn step of the
On the hardware front, Syntiant NDP and Sony ML pipeline. Early progress has already been reported
IMX500 are the forefront of specific TinyML hardware using techniques including few-shot learning, weight
implementations, utilizing hardware specialization to quantisation aware updates, and compressed gradient
achieve very high energy efficiency. While on the other exchange. The shift from merely performing inference
hand, Arduino and STM32 are the main platforms used to continuous learning will enable TinyML devices to
to gain hands-on education, research, and preliminary improve performance by personalising models based on
prototyping experiences due to their cost-to-access and the users' local data, in dynamic scenarios including
health monitoring, user-specific gesture recognition, or VII. CONCLUSION
environmental sensing.
7.1 Summary of Findings
6.2 Co-Designing the Hardware and Algorithms, and
The report provided an overview of the emerging
Dedicated Accelerators
field of Tiny Machine Learning (TinyML), which is a
disruptive new field that brings machine learning
inference to extremely resource-constrained
As noted previously, the trend towards co-designing
microcontrollers with less than 1MB of memory. The
the hardware and software will increase in the coming
report began with a discussion of the motivation for this
years.
field and its progression and history. Next, In the
Microcontrollers will incorporate AI accelerators discussion of TinyML's application in bridging the gap
designed specifically for low-latency implementation of between an embedded system and artificial intelligence,
quantised or binary neural networks, which will vastly intelligence can be placed at the edge of the network.
improve the speed and energy consumption of
performing machine learning on microcontrollers.
The review of the literature showed evidence of a
Microcontroller manufacturers are already integrating
developing research trajectory showcasing model
neural processing units (e.g.,ARM Ethos-U, Syntiant
compression, quantization, and energy efficient
NDP, Sony IMX series) within their devices.
frameworks such as TensorFlow Lite for
The next generation of accelerators may take Microcontrollers, Edge Impulse, and MCUNet. These
advantage of reconfigurable architectures or an AI works collectively built the groundwork for low-power,
extension to RISC-V, which provides potential for high-performance ML applications on microcontrollers.
increased flexibility and domain-specific optimizations.
Theses architectures will support larger and more
The analysis of the current state showed that TinyML
accurate models, but retain the low power consumption
is now a practical reality and not only a theoretical
that characterises TinyML.
concept, with an established ecosystem of hardware
platforms (ARM Cortex-M, STM32, Syntiant NDP,
6.3 Energy-Sensitive and Batteryless TinyML Sony IMX500) and software toolchains (TFLM,
MicroTVM, CMSIS-NN). Optimization methods have
Another exciting avenue of research is in energy- advanced rapidly over the past couple of years to use
harvesting and intermittent computing. By fusing ultra- quantization, pruning, and architecture search to enable
low-power MCUs with solar, kinetic, or RF energy increasingly complex tasks to run on ultra-low-power
harvesters, researchers are developing TinyML nodes devices.
powered by energy harvesting that are capable of
everlasting operation. Checkpoint-based inference and
energy-aware model execution will enable devices to The research demonstration touched upon new
provide robust functionality even with fluctuating significant advances in hardware-software co-design,
power supply. This area is important for applications efficient transformers, compiler optimization, and on-
with long-term environmental monitoring, agriculture, device learning. Introducing MLPerf Tiny benchmarks
or other applications with a remote sensor where it is and data extracting energy-harvesting systems is truly a
less practical to replace the battery. major advancement toward standardization and an
enabled field for practical application.
6.4 Secure and Trustworthy TinyML
With the increasing ubiquity of TinyML in critical
applications or applications that require user privacy,
security and robustness is becoming a major obstacle.
Future systems will have to deal with validating model
integrity, conducting reliable secure over the air
updates, and being resilient against adversarial inputs
while being energy efficient. With regards to the
execution process, lightweight encryption schemes,
TEEs, and secure model attestation protocols are all
being pursued to insulate the inference process from
prying eyes. Lastly, explainability and transparency in
decision making of these models society will become
critical for compliance in ethical regulations and user
trust - especially in medical or safety critical
applications.
The Comparative Analysis of frameworks showed trade-offs
compared to one another in different areas of performance.
TensorFlow Lite is the best for accessibility, Edge Impulse
excels in ease of automating machine learning, CMSIS-NN
performs the best in terms of efficiency, and MCUNet
achieves the best in new research. The hardware discussion
continues to point at the important upward trend of NPUs
designed specifically for applications and AI-enabled sensors
for real-world applications.i
In terms of Future Directions, the discussion emphasized
the transition of TinyML to be more autonomous, adaptable,
and secure — pursuing to incorporate on-device training,
batteryless devices, and accelerators specialized for AI
technology. The future of TinyML is expected to span into
every inch of the expanding AIoT (Artificial Intelligence of
Things) realm for automated capabilities in healthcare, smart
cities, industry, and environmental monitoring.
7.2 Final Thoughts
In summary, TinyML is the next evolutionary step toward
artificial intelligence — making it ever-present, affordable,
and sustainable. With researchers continuing to push the
boundaries of what TinyML can achieve with only several
kilobytes of memory, TinyML enables billions of connected
devices to operate intelligibly, privately, and efficiently at
the very edge.
Computing in the future will reduce solely based on cloud
servers or large GPU sizes, but instead rely on distributed
learns embedded into the objects that surround us daily,
offered by TinyML.
REFERENCES
1. P. Warden and D. Situnayake, TinyML: Machine Learning
with TensorFlow Lite on Arduino and Ultra-Low-Power
Microcontrollers, O’Reilly Media, 2020.
2. Google TensorFlow Team, “TensorFlow Lite for
Microcontrollers: Running ML models on devices with
<256KB memory,” Google AI Blog, 2019.
3. S. Han, H. Mao, and W. Dally, “Deep Compression:
Compressing Deep Neural Networks with Pruning,
Quantization, and Huffman Coding,” Proc. of ICLR, 2016.
4. MLCommons, “MLPerf Tiny Benchmark Suite,”
MLCommons Association, 2021.
5. C. Hymel et al., “Edge Impulse: An MLOps Platform for
Tiny Machine Learning,” arXiv preprint, 2022.
6. J. Lin et al., “MCUNet: Tiny Deep Learning on IoT
Devices,” arXiv preprint arXiv:2007.10319, 2020.
7. Z. Liu et al., “Deploying Machine Learning Models to
Ahead-of-Time C Source with MicroTVM,” Apache TVM
Project Documentation, 2023.
8. S. Wu et al., “Integer Quantization for Deep Learning
Inference,” arXiv preprint arXiv:2004.09602, 2020.
9. J. Llisterri Giménez et al., “On-Device Training of ML 11. M. Moosmann et al., “TinyissimoYOLO: Ultra-Efficient
Models on Microcontrollers: A Look at Federated On-Device Object Detection,” arXiv preprint
Learning,” Sensors, vol. 22, no. 19, 2022. arXiv:2303.09845, 20
10. J. Yang et al., “TinyFormer: Efficient Transformer
Design and Deployment for TinyML,” arXiv preprint
arXiv:2302.03498, 2023.
i