Untitled

# Edge AI hardware list

I'm a researcher from Belgium working on artificial intelligence on the edge. With a background in embedded systems and a specialization in machine learning, our focus is on providing local companies with advice and guidance in the adoption of new technologies. To us, "edge AI" includes everything from microcontrollers to on-premise servers, so the definition is very broad. The companies in our research group are diverse: IT, industrial, medical, retail, and more.

We are currently building a list of relevant edge AI hardware, with the goal of purchasing a representative piece from each category. I'm posting this to Hacker News to get your feedback: Are there any devices that we are missing? Of course, we will never be able to buy all existing hardware, so our focus is on relevant and important hardware that offers robust and modern support for machine learning (inference).

Hacker News won't let me post this much text, so I'm using pastebin to share it.

## Included hardware

List of hardware I think we should buy and evaluate.

- Hardware: NVIDIA GPU and Jetson
    - Motivation: NVIDIA is the market leader when it comes to model training/inference. Companies that want a powerful on-premise server will be interested in discrete GPUs. We will also study the NVIDIA Jetson, which is a system-on-module (SoM) focused on industry and robotics.
    - Software: NVIDIA GPUs are widely supported in machine learning frameworks through the CUDA API. Of special interest is NVIDIA TensorRT, an SDK for high-performance deep learning inference. NVIDIA TAO provides pre-trained models for fine-tuning.

- Hardware: Intel CPU and GPU
    - Motivation: Intel CPUs are popular in consumer PCs and servers. Intel integrated GPUs can be used to accelerate model inference. Intel CPUs have good software compatibility, providing a low barrier of entry. Companies looking for easy on-premise model inference might be interested in high-end Intel Core processors. We want to compare these with the low-end N100 processor as a budget alternative.
    - Software: Intel CPUs are supported by many machine learning frameworks. We will look at Intel OpenVINO, a toolkit for optimizing and deploying deep learning models on Intel CPU/GPU. Another interesting option is Intel Extension for PyTorch.

- Hardware: ARM Cortex-A CPU
    - Motivation: ARM Cortex-A processors are popular in embedded systems, mobile devices, and more. They are low-power and range from small to powerful. They have vector instructions (NEON) that can be used to accelerate neural network inference. Though they are not as powerful as other hardware on the list, they are very widespread, so they are worth investigating.
    - Software: ARM NN is a library to optimize neural networks on ARM Cortex-A CPUs. Many other tools support ARM Cortex-A, including TensorFlow Lite, PyTorch Mobile, ONNX Runtime, and ncnn.

- Hardware: ARM Cortex-M CPU
    - Motivation: ARM Cortex-M processors are used in microcontrollers, often running bare metal or with a real-time operating system (RTOS). They are chosen for their affordability and suitability for real-time control. Compared to other processors, they are less powerful, limiting their ability to handle large machine learning models. Techniques like quantization, pruning, distillation, and others are used to reduce model size and complexity.
    - Software: The most popular option here is TensorFlow Lite for Microcontrollers, which integrates with CMSIS-NN. Some vendors offer their own tools, such as STM32Cube.AI from STMicroelectronics.

- Hardware: ARM Mali GPU
    - Motivation: ARM Mali GPUs are found in embedded systems, mobile devices, and more. ARM Mali GPUs are well-suited for accelerating model inference on devices where power consumption and thermal constraints are critical.
    - Software: ARM NN allows optimizing models for inference on Mali GPUs.

- Hardware: Texas Instruments NPU
    - Motivation: TI produces system-on-chips (SoCs) with dedicated deep learning accelerators. An example is the TDA4VM, used in BeagleBone AI-64. These allow for efficient model inference with low latency and power usage. Companies that design their own PCBs will be interested in this SoC, especially those in the automotive industry.
    - Software: Edge AI Studio converts models to run on TI systems.

- Hardware: NXP NPU
    - Motivation: Similar to TI, NXP produces SoCs that include a Neural Processing Unit (NPU). We are interested in the i.MX 8M Plus, which focuses on machine learning and vision, advanced multimedia, and industrial automation with high reliability.
    - Software: NXP eIQ software enables ML algorithms on NXP microprocessors, including i.MX processors.

- Hardware: Rockchip NPU
    - Motivation: Rockchip produces a range of SoCs with NPUs, which are popular in single board computers (SBCs) and system-on-modules (SoMs). Of particular interest are the RK3566 and RK3588, which have powerful ARM processors, a Mali GPU, and a dedicated NPU. Companies can integrate a SoM into their PCB design or build their product around an existing SBC.
    - Software: Rockchip provides RKNN, a software stack to deploy AI models to Rockchip chips.

- Hardware: Hailo-8
    - Motivation: Hailo-8 is a powerful AI accelerator provided as an M.2 module, often sold in combination with a Raspberry Pi 5. The company behind Hailo is relatively new, but I'm very interested in testing this device.
    - Software: Hailo provides an AI Software Suite for compiling deep learning models. HailoRT is used for running inference on the Hailo-8 device.

- Hardware: Some FPGA boards
    - Motivation: I haven't done enough research to specify which boards exactly, but the parallel nature of FPGAs makes me suspect they will be great for model inference.

## Excluded hardware

List of hardware I think we should NOT evaluate, for various reasons.

- Hardware: Qualcomm Snapdragon (excluded)
    - Motivation: Qualcomm processors are powerful. Many of them have a GPU/NPU to accelerate model inference. Qualcomm publishes the AI Engine Direct SDK to optimize models for their hardware. However, they seem to be solely focused on the mobile market (smartphones/tablets). I cannot find any SBCs or SoMs with Qualcomm Snapdragon.

- Hardware: Apple computers (excluded)
    - Motivation: Apple devices have powerful ARM processors, GPUs, and NPUs. Machine learning models can be integrated using Core ML. However, none of the companies we work with deploy their software on Apple devices.

- Hardware: AMD CPU and GPU (excluded)
    - Motivation: AMD's offering feels comprehensive, or very confusing. They have ROCm for GPU acceleration. MIGraphX to optimize models. ZenDNN for CPU inference. Vitis AI for FPGAs, but also Ryzen processors? I'm not sure how to approach this. Any advice is welcome.

- Hardware: Google Coral (excluded)
    - Motivation: Coral devices are quite powerful, but I believe Google has abandoned the project. On top of that, they only support TensorFlow. No ONNX, no PyTorch. This is a showstopper for me.

- Hardware: Kendryte (excluded)
    - Motivation: I really wanted to include a RISC-V device in the list, but it's hard to find a good one. Kendryte seemed popular a few years ago, but I'm not sure if the project is still active. Feel free to suggest any alternatives.

## Summary

This is what I have so far. Does my list contain any mistakes? Is there any hardware you would add or remove? If so, why? Feel free to share any positive or negative experiences.