Posts

In modern computing, there are key concepts that define how machines process information and solve problems: Large Language Models (LLMs), algorithms, and computer programs. Each play a unique role in how tasks are performed and how intelligent systems operate.

LLMs, such as Chat-GPT, are advanced artificial intelligence models trained on massive amounts of text data to understand and generate human-like language responses. They excel at language-based tasks but rely on patterns from data rather having true intelligence based on human reasoning.

Algorithms, on the other hand, are step-by-step instructions (typically following a mathematical recipe), designed to solve specific problems or perform defined tasks. The rules or mathematical recipes that the algorithm follows have been designed by humans using reasoning and strict logic. As such, the output of the algorithm is deterministic, and can be recreate and explained by anybody following the method’s mathematical recipe or set of rules using the same input data.

Computer programs are the broader collection of code that encompasses both algorithms and models like LLMs, orchestrating various tasks by following sets of instructions. While algorithms are the building blocks for problem-solving, LLMs are specialized tools for tasks involving natural language, and programs bring these elements together to create functional software.

Understanding the differences between these three components helps clarify the architecture of modern computational systems. In this article, we discuss the differences between these terms and technologies, and provide hints and tips and a few practical examples for developers working on AIoT applications.

Programs and Algorithms: a Basic Example

A program consists of a set of instructions, often built upon one or more algorithms, to perform specific tasks based on a given input. An algorithm is a step-by-step procedure or formula for solving a problem, while a program is the implementation of that algorithm in a specific programming language.

For instance, consider a simple sorting algorithm like Bubble Sort, which can be implemented in a program:

  • The algorithm defines how to repeatedly compare and swap adjacent elements in a list until the list is sorted.
  • The program written in a language like Python or C++ implements this algorithm to sort any given list of numbers.

The key point is that a traditional program does not learn from the input or adapt its behaviour. It just follows the instructions of the algorithm every time based on the specific problem it is designed to solve.

LLMs: A Learning-Based Approach

In contrast, a Large Language Model (LLM) does not rely on predefined algorithms for specific tasks. Instead, it is trained on vast amounts of data and uses this training to predict responses based on learned patterns. For example:

  • If you ask an LLM to generate a recipe for pizza, it predicts the next word or sentence based on patterns it has seen in training data.
  • The LLM does not follow a fixed algorithm for generating recipes, but instead uses its learned understanding to predict the best response.

Unlike traditional programs, LLMs do not rely on strict rules or algorithms. They are probabilistic models that learn from a wide range of data, and their output is based on prediction rather than direct instruction.

Key Differences Between Programs and LLMs

  • Algorithm vs Learning: A traditional program follows strict instructions based on algorithms. LLMs, on the other hand, learn from data and use this learning to generate responses.
  • Fixed Output vs Prediction: In a program, the output is fixed for a given input based on the algorithm. An LLM predicts responses based on patterns, so the output can vary even with similar inputs.
  • No Adaptation vs Adaptation: Programs do not adapt or change their behavior unless reprogrammed. LLMs are capable of generating responses based on what they have learned, adapting to new inputs within the scope of their training.

Misconceptions about Algorithms and DLN/ML Models

Many people frequently refer to an ML model as an algorithm. This is incorrect, although the two terms are very closely related. In this section we discriminate between the two, and provide some practical examples.

Is it correct to distinguish between an ‘algorithm’ and a Deep Learning Network / ML model, as these terms are often used interchangeably but have distinct meanings?

An algorithm is a step-by-step procedure or set of rules for solving a problem, while a machine learning (ML) model is the output generated after an algorithm is applied to data during the training process. Essentially, an ML model is the learned representation or a mathematical construct based on an algorithm that can make predictions or decisions on new data.

For example, when training a neural network (which uses an algorithm like backpropagation), the result is an ML model that can classify images or recognize patterns. The algorithm guides the learning process, but the model is what performs the task after training.

What is an Algorithm?

As mentioned earlier, an algorithm is a set of rules or a mathematical recipe used to perform a specific task or to solve a problem. In ML, an algorithm is the method used to train an ML model. Examples include linear regression, decision trees, k-nearest neighbors and gradient descent.

Algorithms are very well established in the IoT sensor world for a variety of tasks, such as instrumentation and measurement, cleaning sensor data, AR (augmented reality), predictive maintenance with MEMS sensors and navigation (drones, cars and robotics). The latter makes heavy use of Kalman filtering and sensor fusion, which has been used with great success for decades.

As a simple example of an algorithm, consider the task of calculating the mean or average of set of numbers in the following dataset, \(z=[3,2,1,4,6]\). The mean can be calculated using the following mathematical recipe,

\(\displaystyle\mu = \frac{1}{5}\sum_{n=0}^{4}z(n) = 3.2\)

Note that this result is deterministic, in the sense that it can be recreated and more importantly explained by anybody following the function’s mathematical recipe using the same input data. This is very different to a ML model that would also reach the same result for the same input dataset, but as discussed in the next section, explaining how the model reached the result remains an enigma.

What is a DLN/ML Model?

An ML model is the resulting output or predicted result after training an algorithm on a various datasets. It typically uses various feature extraction algorithms (e.g. mean, standard deviation and correlation) during the training period in order to extract features of interest for the ML model. The resulting model represents the learned patterns, parameters, or rules that can be used to make predictions on new data.

A key point to realise here, is that unlike algorithms based on predefined rules and mathematical concepts, how the ML model reaches its result remains an enigma, and is the primary reason why they shouldn’t be allowed to operate without any scrutiny on critical processes. As such, AI systems are energy constrained Boltzmann machine models, as the model is trained on data.

In many AIoT applications, Kalman based sensor fusion is typically used for feeding the ML model with high quality features of the underlying process, thus significantly improving the accuracy of the AI system.

How Algorithms and DLN/ML Models Interact

A model provides the capability to make decisions based on input data. It can recognize patterns, make predictions, and adapt to new information. Essentially, a model simulates cognitive functions that are typically associated with human thinking, such as dealing with ambiguity and uncertainty, but as discussed in a previous article, AI does not have any common sense, as it has no understanding of the underlying data or process that it is modelling.

On the other hand, an algorithm is a set of defined instructions or a mathematical recipe. It is a rules based step-by-step procedure used for calculations, data processing, and automated reasoning tasks. Algorithms are the backbone of software and can solve a wide range of problems by following their defined logic.

However, not all functions are computable. This means that there are certain problems for which no algorithm can be formulated to provide a solution. These are referred to as non-computable functions. In such cases, even the most advanced algorithms cannot determine an outcome, highlighting a fundamental limitation in computational theory.

Human Intelligence and Digital Intelligence

In the field of computation, it is essential to differentiate between traditional algorithms and machine learning models. An algorithm is a direct output of human intelligence, crafted through logical reasoning and problem-solving techniques. It represents a set of predefined instructions designed to solve specific problems. The human mind formulates these steps to ensure a consistent and accurate outcome.

In contrast, a trained machine learning (ML) model is the product of digital intelligence. While algorithms underpin the model’s structure, the true power of an ML model arises through its capacity to learn and adapt from new training data. This process involves iteratively adjusting parameters to optimize performance in tasks like prediction, classification, or decision-making. In this sense, the model evolves beyond its initial algorithmic foundation, generating insights and results that may not be directly encoded by human logic.

“An algorithm is a direct manifestation of human intelligence, designed through logic, reasoning, and problem-solving techniques. On the other hand, a trained machine learning model represents the outcome of digital intelligence, which evolves through the iterative processing of data.”

The convergence of these two forms of intelligence—human and digital—marks a significant shift in computational systems. Algorithms, though foundational, are static and require manual updates. Machine learning models, by contrast, learn from experience, dynamically evolving with each new piece of training data. This shift positions ML models as more flexible and adaptive tools for solving complex problems where human-defined rules may fall short.

The distinction between human-driven algorithms and data-driven machine learning models emphasizes the growing role of adaptive systems in areas such as autonomous driving, personalized medicine, and financial forecasting. As machine learning continues to evolve, the boundaries between explicit programming and emergent behavior will continue to blur, paving the way for systems capable of independent learning and decision-making.

Low-Pass Filter and CNN for Classifying Periodic Signals

Both a Low-Pass Filter (LPF) and a Convolutional Neural Network (CNN) can be employed to handle periodic signals, but their approaches and purposes differ fundamentally.

Low-Pass Filter (LPF)

A Low-Pass Filter is an algorithm designed to attenuate the high-frequency components of a signal while allowing the low-frequency components to pass. Its primary use is to filter or clean a signal rather than classify it. Applications of the LPF in AIoT, include removing glitches from sensor data or even cleaning up noise on a measured periodic signal prior to feature extraction and subsequent ML classification, leading to higher accuracy.

A practical IIR (infinite impulse response) digital filter used in both AIoT and IoT may be defined in terms of a finite number of poles \(p\) and zeros \(q\), as defined by the linear constant coefficient difference equation,

\(\displaystyle y(n)=\sum_{k=0}^{q}b_k x(n-k)-\sum_{k=1}^{p}a_ky(n-k) \)

where, \(a_k\) and \(b_k\) are the filter’s denominator and numerator polynomial coefficients, who’s roots are equal to the filter’s poles and zeros respectively. LPF filter can used for all types of signals, not just periodic signals. However, for this article we limit the discussion to periodic signals.

Limitations for Classification

While an LPF can enhance a periodic signal by reducing high-frequency noise, it does not classify the signal. It merely transforms the input based on fixed mathematical operations, with no ability to learn from data or adapt its behaviour.

Convolutional Neural Network (CNN)

A Convolutional Neural Network (CNN) is a machine learning model designed to recognize patterns in data by learning from training examples. It can be trained to classify periodic signals by learning distinctive features in the signal’s structure.

Operation

The CNN applies a series of convolution operations:

\(\displaystyle S(i,j) = (I * K)(i,j) = \sum_m \sum_n I(m,n) K(i-m, j-n) \)

where \(I\) is the input signal, \(K\)is the kernel, and \(S(i,j)\) is the resulting feature map.

Classification

Unlike the LPF, the CNN is capable of learning to classify different periodic signals through training. The learned filters allow the network to distinguish between signals based on the periodic features it identifies.

Extraction vs Learned feature

  • Low-Pass Filter: Performs a deterministic operation that modifies the signal but cannot classify it.
  • CNN: Learns from data and can classify periodic signals by recognizing their features.

In conclusion, while a Low-Pass Filter may assist in signal preprocessing, a CNN is required for the task of classifying signals.

Adaptive Low-Pass Filters

An adaptive low-pass filter (LPF), such as those based on the Least Mean Squares (LMS) algorithm, introduces several key features and benefits compared to a traditional, static LPF:

  • Dynamic Adaptability: Adaptive LPFs adjust their characteristics in response to variations in the input signal, allowing for real-time filtering of noise or unwanted frequencies, especially in non-stationary signals.
  • Error Minimization: These filters utilize a feedback mechanism to minimize the difference (error) between the desired output and the actual output. The filter coefficients are continuously updated based on this error, enhancing the filter’s adaptability to changing signal conditions.
  • Improved Performance in Noisy Environments: Adaptive LPFs effectively reduce noise by optimizing signal quality, which is particularly valuable in applications like audio processing, telecommunications, and biomedical signal processing where signal characteristics can fluctuate.
  • Applications in Real-Time Systems: The adaptability of these filters makes them suitable for real-time systems, such as echo cancellation in telecommunication, where the noise characteristics may vary dynamically, ensuring consistent performance over time.
  • Computational Complexity: While adaptive filters provide significant advantages, they also come with increased computational complexity due to the need for constant updates to the filter coefficients, which can be a concern in systems with limited processing capabilities.

In summary, using an adaptive LPF enhances the filter’s ability to handle varying signal conditions effectively, making it particularly valuable in applications requiring real-time signal processing, thus improving overall performance and robustness against noise and interference.

Adaptive low-pass filter (LPF) differs significantly from a traditional LPF in terms of feature extraction and learning capabilities.

Feature Extraction vs. Feature Learning

  • Traditional LPF: This filter focuses on extracting specific frequency components from a signal by applying fixed coefficients determined by the filter design, which remain constant during operation. As a result, it extracts features based on pre-defined criteria.
  • Adaptive LPF: Utilizes algorithms like the Least Mean Squares (LMS) to adjust its filter coefficients in real-time based on the input signal characteristics. This enables the adaptive LPF to extract features that dynamically correspond to changing signal conditions, but it does not learn features in the same manner as a neural network.

Comparison with CNNs

  • Convolutional Neural Networks (CNNs): CNNs are designed to learn features from data through multiple layers, allowing them to automatically extract high-level features from raw inputs. Unlike traditional LPFs, CNNs perform feature learning, adapting to the input data through training on labeled datasets.
  • While adaptive LPFs adjust their response based on signal changes, they do not perform feature learning like CNNs. They can optimize their filter characteristics based on feedback but lack the hierarchical feature learning approach present in CNNs.

Adaptive LPFs can extract features based on the immediate conditions of the signal; however, they do not ‘learn’ features in the same way that CNNs do. Instead, adaptive LPFs optimize the extraction process in real-time, making them effective in environments where signal characteristics vary.

Comparison of Adaptive Low-Pass Filters and Convolutional Neural Networks

Adaptive low-pass filters (LPFs), such as those using the Least Mean Squares (LMS) algorithm, exhibit several similarities with convolutional neural networks (CNNs) regarding their operational principles and learning mechanisms.

  1. Adaptive Coefficients: Adaptive LPFs modify their coefficients based on the input signal, similar to how CNNs adjust their weights during training to minimize loss on a dataset.
  2. Supervised Learning: Both systems can be trained using labeled data to optimize performance. Adaptive filters adjust based on real-time feedback while CNNs learn complex patterns through multiple iterations.
  3. Feature Extraction: Adaptive LPFs extract relevant features dynamically, while CNNs automatically learn to identify hierarchical features through their architecture.
  4. Learning Methodology: Adaptive LPFs adjust their parameters based on incoming data but do not learn complex representations as CNNs do. CNNs can learn multiple levels of abstraction through backpropagation.
  5. Structure and Complexity: CNNs consist of multiple layers, allowing them to learn intricate patterns, whereas adaptive LPFs typically operate with a single, simpler structure focused on modifying coefficients.

Items 1,2 and 3 are similar, but item 4 and 5 are different.

While adaptive LPFs and CNNs share similarities in their adaptive behaviors and feature extraction capabilities, they fundamentally differ in methodologies and complexities. Adaptive LPFs do not fully replicate the intricate learning capabilities of CNNs, though both aim to improve task performance through adaptation.

Comparison of Adaptive LPFs and CNNs

  • Order of the Filter: The order of an adaptive low-pass filter (LPF) determines its ability to capture and process complex signal characteristics. A higher-order filter can approximate a more complex frequency response, allowing it to better handle diverse signal patterns, similar to how a deeper convolutional neural network (CNN) can learn more complex representations.
  • Learning Capabilities: While both CNNs and adaptive LPFs adjust their parameters based on input, CNNs inherently possess a more advanced learning capability through multiple layers, each designed to extract different levels of abstraction from the data. This allows CNNs to learn hierarchical feature representations effectively. In contrast, increasing the order of an adaptive LPF can enhance its feature extraction capabilities, but it still lacks the sophisticated learning mechanisms that CNNs implement, such as backpropagation and convolutional operations.
  • Complex Features: CNNs excel in extracting spatial hierarchies in data (e.g., images) by applying filters across multiple layers, progressively identifying edges, shapes, and more abstract features. Adaptive LPFs, when designed with a higher order, can capture complex signal behaviours, but their ability to generalize or learn from large datasets is limited compared to CNNs.

While increasing the order of an adaptive LPF can enhance its performance in signal processing, it does not equate to the deep learning capabilities of CNNs. CNNs utilize their layered architecture to learn complex features in a more robust and generalized manner, making them more suitable for tasks like image recognition and classification.

Parameter Estimation

Parameter estimation plays a crucial role in both traditional algorithmic processes and machine learning. It involves determining the best parameters for a given model based on observed data.

Algorithmic Parameter Estimation

In traditional algorithmic contexts, parameter estimation involves using specific algorithms to find optimal parameters for mathematical models. Key methods include:

Least Squares Estimation (LSE)

This method minimizes the sum of the squared differences between observed and predicted values. The parameter estimation is given by:

\(\displaystyle\hat{\theta} = \arg \min_{\theta} \sum_{i=1}^n (y_i – f(x_i; \theta))^2 \)

where \(\hat{\theta}\) denotes the estimated parameters. this concept is is central to Kalman filtering, whereby a state-space model of the process to be modelled uses the state estimates (i.e. the parameters of interest) to perform the prediction. The Kalman update equations attempt to minimise the error between the model output and the observed data in a least squares sense on a sample-by-sample basis.

Maximum Likelihood Estimation (MLE)

MLE estimates parameters by maximizing the likelihood function, which reflects the probability of the observed data under the model parameters:

\(\displaystyle\hat{\theta} = \arg \max_{\theta} L(\theta; \text{data}) \)

where \(L(\theta; \text{data})\) represents the likelihood function.

Parameter Estimation in Machine Learning

In machine learning, parameter estimation is integral to model training and involves iterative optimization techniques. Examples include:

Training Neural Networks

Parameters such as weights and biases are estimated using gradient-based optimization methods, typically through:

\(\displaystyle\theta_{n+1} = \theta_n – \alpha \nabla_{\theta} L(\theta_n)\)

where \(\theta_n\) represents the parameters at iteration \(n\), \(\alpha\) is the learning rate, and \(L(\theta)\) is the loss function.

Bayesian Parameter Estimation

In Bayesian methods, parameters are estimated based on posterior distributions that combine prior beliefs with observed data:

\(\displaystyle p(\theta | \text{data}) \propto p(\text{data} | \theta) \cdot p(\theta)\)

where \(p(\theta | \text{data})\) is the posterior distribution.

In both traditional algorithms and machine learning contexts, the aim is to find the optimal parameters that best fit the model to the observed data.

Key Takeaways

Many people frequently refer to an ML model as an algorithm. This is incorrect, although the two terms are very closely related. An algorithm is a direct output of human intelligence, crafted through logical reasoning and problem-solving techniques. It represents a set of predefined instructions or a mathematical recipe designed to solve specific problems. The human mind formulates these steps to ensure a consistent and accurate outcome. In contrast, a trained machine learning (ML) model is the product of digital intelligence that uses algorithms and datasets to construct an ML model.

A key takeaway is that algorithms are based on predefined rules and mathematical concepts, whereas AI systems are energy constrained Boltzmann machine models, as the model is trained on data. As such, how an ML model reaches its result remains an enigma, and is the primary reason why they shouldn’t be allowed to operate without any scrutiny on critical processes.

Authors

  • Dr. Jayakumar Singaram

    Jayakumar is a seasoned expert in semiconductor technology and AIoT. He advices companies such as Mistral Solutions, SunPlus Software, and Apollo Tyres at the strategic level on their AIoT solutions. He successfully founded Epigon Media Technologies, which focuses on Research and Development for the global market, and is also the co-author of the book "Deep Learning Networks: Design, Development, and Deployment."

    View all posts
  • Dr. Sanjeev Sarpal

    Sanjeev is an AIoT visionary and expert in signals and systems with a track record of successfully developing over 25 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT solutions and strategies for I4.0, telemedicine, smart healthcare, smart grids and smart buildings.

    View all posts

In the rapidly evolving landscape of digital transformation, organizations are increasingly leveraging Real-Time Edge Intelligence (RTEI) solutions to enhance operational efficiency and decision-making capabilities. RTEI refers to the deployment of advanced data processing and analytics at the edge of the network (i.e. closer to where data is generated) rather than relying solely on centralized cloud infrastructure. This approach successfully addresses the challenges posed by traditional data processing methods and offers significant benefits across multiple sectors, particularly when building solutions with Arm processor technology. 

Key concepts of Real-Time Edge Intelligence

  • Improved Response Times: RTEI enables immediate data processing and analysis, resulting in faster decision-making. For industries like healthcare, manufacturing, and transportation, this can mean the difference between success and failure in critical situations. Arm processors allow for high-performance computing in compact form factors, making them ideal for real-time applications.
  • DSP/ML at the Edge: Arm’s extensive ecosystem of partner solutions and in-built algorithmic accelerator technology makes deploying DSP algorithms and ML models on the edge very easy. This enables RTEI solutions to provide real-time insights and predictions of the process that they’re monitoring, empowering organizations to automate processes and respond dynamically to changing conditions.
  • Data cleaning and feature extraction: Arm-based devices can clean noisy sensor data and extract features interest at the edge, sending only relevant data to the cloud. This minimizes bandwidth usage and optimizes network performance, ensuring that only critical data is transmitted. Arm’s low-power architecture is ideal for this task, allowing devices to perform complex computations in battery-powered applications.
  • Cost Efficiency: By reducing the amount of data sent to the cloud, organizations can lower bandwidth costs and cloud storage expenses. The efficient processing capabilities of Arm processors allow for more effective resource use, leading to operational cost savings. Their energy efficiency further contributes to reduced operational costs in large-scale deployments.
  • Increased Reliability: RTEI solutions can operate independently of cloud connectivity, ensuring that essential mission-critical applications continue to function even during a network outage. The robustness of Arm technology in various environmental conditions enhances system reliability and operational resilience, particularly in remote locations, typically encountered in many IoT applications.
  • Scalability: Arm-based solutions can be easily scaled to accommodate growing data volumes and an increasing number of connected devices. The modularity of Arm architecture supports the development of a diverse ecosystem of devices, making it easier for organizations to adapt to changing business needs.

Enhanced security and privacy with Arm TrustZone

Security is a critical concern for edge devices, particularly those handling sensitive data. Arm TrustZone (Cortex-M33, Cortex-M52 and Cortex-A) implements a security paradigm that discriminates between the running and access of untrusted applications running in a Rich Execution Environment (REE) and trusted applications (TAs) running in a secure Trusted Execution Environment (TEE).  The basic idea behind a TEE is that all TAs and associated data are secure as they are completely isolated from the REE and its applications.  As such, this security model provides a high level of security against hacking, stealing of encryption keys, and counterfeiting, and as such provides an elegant way of protecting sensitive client information.

DSP support for Algorithms

DSP is critical for many RTEI applications, including audio and video processing, sensor signal processing and data analysis. Arm’s broad range of processors offer extensive DSP capabilities, allowing for the implementation of complex algorithms in floating-point. The Cortex-M family dominates the low-power micro-controller market as described below, whereas the more powerful Application or Cortex-A processors target mini-computers, such as the Raspberry Pi and smartphones etc. The Cortex-R family targets real-time safety-critical applications, such as automotive and radar.

All three types of processors offer algorithmic support, but the Cortex-M family is particularly interesting, as it adds DSP functionality to low-power microcontroller devices making it highly desirable for the IoT market, as we now discuss in the following section.

Cortex-M processors

Although a few processor technologies exist for microcontrollers (e.g. RISC-V, Xtensa, MIPS), over 90% of the microcontrollers used in the smart product market are powered by so-called Arm Cortex-M processors that offer a combination of high algorithmic performance, low-power and security. The Arm Cortex-M4 is a very popular choice with several silicon vendors (including ST, TI, NXP, ADI, Nordic, Microchip, Renesas), as it offers DSP (digital signal processing) functionality traditionally found in more expensive devices and is low-power.

Acceleration of DSP calculations

The Armv7E-M architecture supports a DSP extension that implements an SIMD (single instruction, multiple data) architecture extension that can significantly improve the performance of an algorithm. The basic idea behind SIMD involves parallel execution of an instruction (e.g. Add, Subtract, Multiply, Divide, Abs etc) on multiple data elements via the use of 64 or 128-bit registers. These DSP extension intrinsics (SIMD optimised instruction) support a variety of data types, such as integers, floating and fixed-point.

The high efficiency of the Arm compiler allows for the automatic dissemination of your C code in order to break it up into SIMD intrinsics, so explicit definition of any DSP extension intrinsics in your code is usually unnecessary. The net result for your application is much faster code, leading to better power consumption and for wearables, better battery life.

What algorithmic operations would use this?

The following examples give an idea of operations that can be significantly speeded up with SIMD intrinsics:

  • vadd can be used to expedite the calculation of a dataset’s mean. Typical applications include average temperature/humidity readings over a week, or even removing the DC offset from a dataset.
  • vsub can be used to expedite numerical differentiation in peak finding for a sinewave tracking application.
  • vabs can be used for expediting the calculation of an envelope of a fullwave rectified signal in EMG biomedical and smartgrid applications.
  • vmul can be used for windowing a frame of data prior to FFT analysis. This is also useful in audio applications using the overlap-and-add method.

The hardware floating point unit is very good for expediting MAC (multiply and accumulate) operations used in digital filtering, requiring just three cycles to complete. Other DSP operations such as add, subtract, multiply and divide require just one cycle to complete.

Key takeaways

As organizations continue to embrace digital transformation, Real-Time Edge Intelligence (RTEI) solutions, particularly when integrated with Arm processor technology, stand out as key enablers of innovation and efficiency. By harnessing the power of edge computing and the performance advantages of Arm’s Cortex-A and Cortex-M architectures, the security benefits of Arm TrustZone, and the DSP capabilities for advanced algorithms, businesses can achieve rapid decision-making, enhance security, and optimize operational costs. The future of data processing lies at the edge, and those who adopt RTEI solutions powered by Arm technology will be well-positioned to thrive in an increasingly competitive landscape.

Author

  • Dr. Sanjeev Sarpal

    Sanjeev is an AIoT visionary and expert in signals and systems with a track record of successfully developing over 25 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT solutions and strategies for I4.0, telemedicine, smart healthcare, smart grids and smart buildings.

    View all posts

Over the last few years, there has been tremendous interest in the possibility of replacing humans at the workplace with AI. One obvious advantage is AI’s ability to process massive amounts of data and perform tasks such as repetitive tasks (such as data entry) much more efficiently than humans, leading to higher productivity and reduced operational costs. AI can also work continuously without fatigue, ensuring 24/7 customer service. However, can AI truly think and rationalise like a human?

Challenges in understanding of the real-world

Our experience with AI systems suggests categorically that AI’s current limitation is that it does not have any common sense, as there is no reasoning component in the current generation AI model-based inference. As such, current AI does not truly understand the world the way humans do. AI models are trained on large datasets, which are essentially collections of text, images, sensor data etc, and generates responses based on statistical correlations in data. While they can learn patterns and extract important features from this data, they don’t inherently understand the meaning of this data. Understanding is a fundamental component of human intelligence and commonsense, allowing us to take actions and draw conclusions that may seem logical to some, but irrational to people with other experiences in life. In short, we can conclude that commonsense is built up from real-world experiences, social interactions, emotions and context, which is something that AI currently lacks.

The aforementioned acknowledges that current AI models lack commonsense due to the absence of reasoning components. Also, there is the potential for AI models to converge on solutions that may not adhere to Bayesian learning principles, which is an important consideration. We’ll now look at this aspect in depth with a few examples, but we’ll start off with examples of where having no common sense can be turned to an advantage.

Transparency and fake news

Having no common sense and no real understanding of the data can also be turned into an advantage. Consider the example of an AI sorting through CVs (resumes) of suitable candidates for a job. The model can be limited to just focus on the work experience and education, and ignore the name, gender and nationality, making the process much more transparent. Humans will generally try and form a picture in their heads about the candidate and may then appraise the CV with prejudice rather than merit.

Perhaps one of the best examples of AI has been in the media. Whereby a model can be fed with a certain narrative (e.g. anti-abortion or pro-war) and then instructed to produce a new article, filling in the details with arbitrary photos and facts taken from other media sources, and older publications. Many of the articles are not verified by an editorial team before publication, resulting in unconfirmed stories making it to the news websites. This is not just limited to news, reviewers of several scientific publications have also reported fake articles sent for review – some of which have been published, which is an area of concern for the scientific community.

Is it a Dog or a Cat?

Consider an AI model trained to classify images of animals. The model can accurately classify images of cats and dogs when presented with typical images of these animals. However, when presented with an unusual image, the model might fail to classify it correctly due to the lack of reasoning and common sense.

For instance, if the AI model is given an image of a cat wearing a dog costume, it might classify the image as a dog because it lacks the reasoning component to understand that the core features of a cat are still present despite the costume. A human, using common sense, would easily identify the animal as a cat in a dog’s costume.

In this example, the AI model converges on a solution that classifies the image as a dog, which may disobey Bayesian learning principles that consider the prior probability of encountering a cat versus a dog in such a context.

This limitation highlights the importance of integrating reasoning components into AI models to enhance their common sense and improve their ability to handle unusual or unexpected situations effectively.

Bayesian learning enhances deep learning networks by providing uncertainty quantification, preventing overfitting, facilitating model comparison, enabling data-efficient learning, and improving interpretability. This makes Bayesian approaches highly valuable in critical applications where reliability, robustness, and transparency are paramount. More information can be found in the following video.

Data vs Science for IoT T&M applications

Many IoT test and measurement (T&M) and calibration methods use sinewaves to check compliance of the DUT (device under test) by measuring the sinewave’s amplitude, some examples include:

  • Measuring material fatigue/strain with a loadcell – in vehicle and bridge/building applications measuring material fatigue and strain is essential for safety. An AC sinusoidal excitation overcomes the difficulty of dealing with instrumentation electronics DC offsets.
  • Calibrating CT (current transformers) sensors channels – a sinusoid of known amplitude is applied to channel input and the output amplitude is measured.
  • Measuring gas concentration in infra-red gas sensors – the resulting sinusoid’s amplitude is used to provide an estimate of gas concentration.
  • Measuring harmonic amplitudes in power quality smart grids applications – in 50/60Hz power systems, certain harmonic amplitudes are of interest.
  • ECG biomedical compliance testing (IEC60601-2-47) – channel compliance with IEC regulations needed for FDA testing typically uses a set of sinewaves at known amplitudes, to ensure that the channel’s signal chain amplitude error is within specification.

The latter example is particularly interesting, as the basic idea is to measure the amplitude differences in the DUT’s signal chain for a set of sinewaves at 0.67, 1, 2, 10, 20 and 40Hz with respect to a 5Hz reference sinewave. Where, it is assumed the amplitude of all input sinewaves remains constant, and that the relative amplitude error must be within ±3dB for the signal chain to be classed as IEC compliant.

There are a number of signal processing methods that can be employed to perform the estimation, such as the FFT, AM modulation, Hilbert transform and full-wave rectification. All of these methods require extra filtering operations and the FFT for examples, requires low frequency trend removal (usually a DC offset), and windowing so there a number of factors to take into consideration, which complicates the challenge. The FFT is perhaps one of the most widely used methods but is limited by its frequency resolution, which leads to a bias on the amplitude estimate if it’s not centred at a multiple of the ideal frequency bin resolution (\(F_s/N\)).

As most IoT devices use a low-cost oscillator, the sampling rate error can be as high as ±3%, leading to a significant bias in the amplitude estimation using the FFT method. Therefore, an important first step for establishing an estimate of the sinewave’s amplitude is to estimate the exact sampling frequency of the DUT.

Another simpler method that we’ve seen on some IoT devices, is to use high time-resolution timestamps from a higher accuracy crystal oscillator, but for lower-cost IoT devices this may not be available, so it’s better to have a strategy that extracts the exact sampling rate from the dataset.

Sampling rate estimation using AI

Sampling rate estimation can be achieved using AI, whereby datasets of the known input test sinewaves are collected for subsets of the ideal sampling rate. For many IoT ECG devices, 200Hz is typically used. Therefore, assuming an ideal sampling rate of 200Hz, we can generate test sinewave data sampled at 199.5Hz, 199.6Hz…..200.4Hz, 200.5Hz etc. This collection of sinewave data can then be fed into an ML classifier for estimation of the true sampling rate. Assuming that the training dataset is large enough to cover all required scenarios, this method will work.

However, it should be noted that this approach doesn’t have any commonsense, since it’s purely based on data and has no understanding of the physical process that it’s modelling. This becomes apparent if the sampling error is, say 199.54Hz. As the model doesn’t have any data for this scenario and doesn’t have any commonsense and as such can’t improvise, it must choose between 199.5Hz and 199.6Hz which will lead to a bias in the true sampling rate estimate. Another problem appears if another sampling rate or other test frequencies are used, as these were not taken into account during the training process.

Sampling rate estimation using a UKF

An alternative approach is to model the physical process using an Unscented Kalman Filter (UKF). The UKF’s flexibility allows for a more detailed mathematical model of the process to be implemented, leading to the possibility of estimating the sinewave’s amplitude, phase, DC offset as well as the true sampling rate.

Assuming stationarity, a mathematical model of the process can be described as,

\(\displaystyle y(n) = B+ A \sin(2\pi f\frac{n}{F_s} + \theta) + v(n)\)

Where, \(\theta\) is initial phase offset, \(v(n)\) is the measurement noise and \(A\) (sinewave’s amplitude), \(B\) (signal’s DC offset) and \(F_s\) (sampling rate) are the parameters that we want to estimate.

This model can be broken down and the entities of interest (\(A, B\) and \(F_s\)) implemented as state variables in the Kalman update equations. Notice that although the phase of sinewave is linear, the output of the \(\sin()\) function is non-linear, which means that the relationship between the observed signal to the entity of interest (\(F_s\) in our case) is non-linear. This is the main reason for choosing the UKF, as it is well suited to handling non-linear relationships. A description of the UKF equations is beyond the scope of this article, but the reader is referred to some of the excellent textbooks on the UKF for a complete description of the algorithm.

Assuming that the test sinewave is high accuracy – a realistic assumption since a modern calibrated signal generator has frequency error in the \(\mu\)Hz region, we can use the Kalman filtering equations to estimate the true sampling rate over time. An important to point to realise is that like the AI method described above, the UKF also doesn’t have any commonsense, but has the virtue of ‘understanding’ the process that it’s modelling by virtue of the mathematical model implemented in its update equations. This means that a sinewave of any frequency and sampling rate can be applied to the UKF, and assuming that the exact sinewave frequency is also entered into the state equations, the UKF method will always work.

However, one potential weakness of the method for this application is that the Kalman filter is a statistical state estimation method, meaning that its state estimation will be optimal in a statistical sense, but not necessarily in a deterministic sense. This means that there is no guarantee that the state estimates will be correct in a deterministic (absolute) sense.

An animation of the UKF estimation for a 32.3Hz sinewave, sampled at 100Hz with a 0.1% sampling rate error (100.1Hz) is shown below, as seen the UKF correctly estimates the state-estimates of the test sinewave within 1 second.

AI in weapons technology

Recently, much emphasis has been placed on developing smart weapons using AI technology. Major weapons manufacturers from all over the world are currently experimenting with AI-based drone technology that can be used to attack enemy combatants in swarms as well deploying GPS-guided smart munitions and developing new EW (electronic warfare) jamming technology.

Many Western nations allocate substantial resources to defence spending, with significant portions of their budgets dedicated to military operations and technological advancements. However, modern conflict zones highlight the evolving challenges that defence systems face, particularly in terms of operational complexity and sustainability in international theatres of operation.

Looking further to the East, nations such as Russia and China allocate a much smaller budget to their military-industrial complex. Their approach focuses instead on utilizing AI in a more targeted manner, emphasizing established scientific and mathematical principles, such as control theory, with AI applied for classification purposes. Over recent years, as repeatly demostrated in various international conflict zones, this strategy has proven to be very effective. Technologies like hypersonic missiles and electronic warfare systems developed by Russia have managed to evade many Western air defense systems, altering the balance of power in several theatres of operation. This impressive performance challenges the notion that simply investing large sums of money into AI weapons technology guarantees superior results.

Returning to the subject at hand, all of these AI smart weapons still lack any common sense as the AI cannot reason like a human. As such, it is dangerous to allow these systems to operate autonomously and to have high expectations of their performance in a combat situation. That being said, researchers working at Russia’s AIRI research institute contend to have taken significant steps forward in developing the world’s first self-learning AI system (Headless-AD) that can adapt to new situations/tasks without any human intervention by autoregressively predicting actions using the AI’s existing learning history model as context. If successful, Headless-AD would be a great leap forward in developing sentient AI technology for all walks of life.

Human Intelligence and Digital Intelligence

In the field of computation, it is essential to differentiate between traditional algorithms and machine learning models. An algorithm is a direct output of human intelligence, crafted through logical reasoning and problem-solving techniques. It represents a set of predefined instructions designed to solve specific problems. The human mind formulates these steps to ensure a consistent and accurate outcome.

In contrast, a trained machine learning (ML) model is the product of digital intelligence. While algorithms underpin the model’s structure, the true power of an ML model arises through its capacity to learn from large scale data. This process involves adjusting parameters during the training period (not during the inference time/runtime) to optimize performance in tasks like prediction, classification, or decision-making. In this sense, the model evolves beyond its initial algorithmic foundation, generating insights and results that may not be directly encoded by human logic.

An algorithm is a direct manifestation of human intelligence, designed through logic, reasoning, and problem-solving techniques. On the other hand, a trained machine learning model represents the outcome of digital intelligence, which evolves through the iterative processing of data.

The convergence of these two forms of intelligence—human and digital—marks a significant shift in computational systems. Algorithms, though foundational, are static and require manual updates. Machine learning models, by contrast, can be improved by providing them with more training data when available. This shift positions ML models as more flexible and adaptive tools for solving complex problems where human-defined rules may fall short.

The distinction between human-driven algorithms and data-driven machine learning models emphasizes the growing role of adaptive systems in areas such as autonomous driving, personalized medicine, and financial forecasting. As machine learning continues to evolve, the boundaries between explicit programming and emergent behavior will continue to blur, paving the way for systems capable of independent learning and decision-making.

Key takeaways

There has been considerable interest in the potential of replacing human roles in the workplace with AI. However, as discussed herein, AI fundamentally lacks an understanding of the meaning behind the data it processes for classification tasks. This ‘lack of understanding’ is a core component of human intelligence and common sense, which enables individuals to make decisions and draw conclusions that may appear logical to some but irrational to others based on varying life experiences. In essence, common sense is derived from real-world experiences, social interactions, emotions, and context—attributes that AI currently lacks and is unlikely to acquire in the foreseeable future.

Nevertheless, the absence of common sense and a deep understanding of data can also be leveraged to create a more transparent process for job applicants and application reviews. Conversely, AI can be utilized to generate misleading information shaped by influential entities to support specific narratives and sway public opinion.

Authors

  • Dr. Sanjeev Sarpal

    Sanjeev is an AIoT visionary and expert in signals and systems with a track record of successfully developing over 25 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT solutions and strategies for I4.0, telemedicine, smart healthcare, smart grids and smart buildings.

    View all posts
  • Dr. Jayakumar Singaram

    Jayakumar is a seasoned expert in semiconductor technology and AIoT. He advices companies such as Mistral Solutions, SunPlus Software, and Apollo Tyres at the strategic level on their AIoT solutions. He successfully founded Epigon Media Technologies, which focuses on Research and Development for the global market, and is also the co-author of the book "Deep Learning Networks: Design, Development, and Deployment."

    View all posts

AI (Artificial Intelligence) has its roots with the famous mathematician Alan Turing, who was the first known person to conduct substantial research into the field that he referred to as machine intelligence. Turing’s work was published as Artificial Intelligence and was formally categorised as an academic discipline in 1956. In the years following, work undertaken at IBM by Arthur Samuel led to the term Machine Learning, and the field was born.

In terms of definitions: AI is an umbrella term, whereas ML (Machine Learning) is a more specific subset of AI focused on producing inference using trained networks. During training, the dataset plays a key role in ML quality during inference. AI provides scope for ML and Deep Learning. In fact, Deep Learning Networks use amazing Transformer models for the current generation AI world.

  • AI is the overarching field focused on creating intelligent systems, whereas ML is a subset of AI that involves creating models to learn from data and make decisions.
  • ML is crucial for IoT because it enables efficient data analysis, predictive maintenance, smart automation, anomaly detection, and personalized user experiences, all of which are essential for maximizing the value and effectiveness of IoT deployments.

The difference between AI and ML in a nutshell

  • Artificial Intelligence (AI):
    • Definition: AI is a broad field of computer science focused on creating systems capable of performing tasks that normally require human intelligence. These tasks include reasoning, learning, problem-solving, perception, and language understanding.
    • Scope: Encompasses a wide range of technologies and methodologies, including machine learning, robotics, natural language processing, and more.
    • Example Applications: Voice assistants (e.g., Siri, Alexa), autonomous vehicles, game-playing agents (e.g., AlphaGo), and expert systems.
  • Machine Learning (ML):
    • Definition: ML is a subset of AI that involves the development of algorithms and statistical models that enable computers to learn from and make predictions or decisions based on data.
    • Scope: Focused specifically on creating models that can identify patterns in data and improve their performance over time without being explicitly programmed for specific tasks.
    • Example Applications: Spam detection, image recognition, recommendation systems (e.g., Netflix, Amazon), predictive maintenance of critical machinery and identifying medical conditions, such as heart arrhythmias and tracking vital life signs – the so called IoMT (Internet of Medical Things).

Why We Need ML for IoT

  • Data Analysis:
    • Massive Data: IoT devices generate a vast amount of data. ML is essential for analyzing this data to extract meaningful insights, detect patterns, and make informed decisions.
    • Real-Time Processing: ML models can process and analyze data in real-time, enabling immediate responses to changes in the environment, which is crucial for applications like autonomous vehicles and smart grids. They are also an invaluable tool for monitoring human well-being, such as tracking vital life signs, and checking motion sensor data for falls and epileptic fits in elderly and vulnerable persons.
  • Automation:
    • Smart Automation: ML enables IoT devices to automate complex tasks that require decision-making capabilities, such as adjusting climate control systems in smart buildings based on occupancy patterns.
    • Adaptability: ML models can adapt to changing conditions and improve their performance over time, leading to more efficient and effective automation.
  • Personalization:
    • User Experience: ML can analyze user preferences and behaviors to personalize experiences, such as recommending products, adjusting device settings, or providing personalized health insights from wearable devices.
    • Enhanced Interaction: Improves the interaction between users and IoT devices by making them more intuitive and responsive to individual needs.

What is AIoT exactly?

IoT nodes or edge devices use convolutional neural networks (CNN) or neural networks (NN) to perform inference on data collected locally. These devices can include cameras, microphones, or UAV-based sensors. By having the ability to perform inference locally on IoT devices, it enables intelligent communication or interaction with these devices. Other devices involved in the interaction can also be IoT devices, human users, or AIoT devices. This creates opportunities for AIoT (Artificial Intelligence of Things) rather than just IoT, as it facilitates more advanced and intelligent interactions between devices and humans.

  1. AIoT is interaction with another AIoT. In this case, there is a need for artificial intelligence on both sides to have a meaningful interaction.
  2. AIoT is interaction with IoT. In this case, there is a need for artificial intelligence on one side, and no AI on the other side. Thus, it is not a good and safe configuration for deployment.
  3. AIoT is interaction with a human. In this case, there is a need for artificial intelligence on one side and a human on the other side. This is a good configuration because the volume of data from the device to the human will be less.

Human Also in the Loop is a Thing of the Past

Historically, humans have used sensor devices, now referred to as IoT edges or nodes, to perform measurements before making decisions based on a particular set of data. In this process, both humans and IoT edge devices participate. Interaction between one IoT edge and another is common, but typically within restricted applications or well-defined subsystems. With the rise of AI technologies, such as ChatGPT and Watsonx, AI-enabled IoT devices are increasingly interacting with other IoT devices that also incorporate AI. This interaction is prevalent in advanced driver-assistance systems (ADAS) with Level 5 autonomy in vehicles. In earlier terms, this concept was known as Self-Organizing Networks or Cognitive Systems.

The interaction between two AIoT systems introduces new challenges in sensor fusion. For instance, the classic Byzantine Generals Problem has evolved into the Brooks-Iyengar Algorithms, which use interval measurements instead of point measurements to address Byzantine issues. This sensor fusion problem is closely related to the collaborative filtering problem. In this context, sensors must reach a consensus on given data from a group of sensors rather than relying on a single sensor. Traditionally, M measurements with N samples per measurement produce one outcome by averaging data over intervals and across sensor measurements.

Sensor fusion involves integrating data from multiple sensors to obtain more accurate and reliable information than what is possible with individual sensors. By rethinking this problem through the lens of collaborative filtering—an approach widely used in recommendation systems—we can uncover innovative solutions. In this analogy, sensors are akin to users, measurements are comparable to ratings, and the environmental parameters being measured are analogous to items. The goal is to achieve a consensus measurement, similar to how collaborative filtering aims to predict user preferences by aggregating various inputs. Applying collaborative filtering techniques to sensor fusion offers several advantages. Matrix factorization can reveal underlying patterns in the sensor data, handling noise and missing data effectively. Neighborhood-based methods leverage the similarity between sensors to weigh their contributions, enhancing measurement accuracy.

Probabilistic models, such as Bayesian approaches, provide a robust framework for managing uncertainty. By adopting these methods, we can improve the robustness, scalability, and flexibility of sensor fusion, paving the way for more precise and dependable applications in autonomous vehicles, smart cities, and environmental monitoring.

Kalman filtering and collaborative filtering represent two distinct approaches to processing sensor data, each with unique strengths and applications.

Kalman filtering is a recursive algorithm used for estimating the state of a dynamic system from noisy observed measurement data. It excels in real-time applications, offering a mathematically rigorous method of statistically estimating and predicting a model’s state estimates (i.e. a model’s parameters) using a known model of the system’s dynamics and statistical noise characteristics. However, it is important to note that although the ‘Kalman solution’ is optimum in a statistical sense, it may yield incorrect state estimates in a absolute deterministic sense.

In contrast, collaborative filtering, typically used in recommendation systems, aggregates data from multiple sensors (or users) to identify patterns and similarities. This approach doesn’t rely on a predefined model of system dynamics but instead leverages historical data to improve accuracy. Collaborative filtering is particularly effective when dealing with large datasets from multiple sensors, making it suitable for applications where the relationships between sensors can be learned and exploited.

Both methods can enhance sensor data reliability, but their effectiveness depends on the context: Kalman filtering for dynamic, real-time systems with welldefined models, and collaborative filtering for complex, multi-sensor environments where data-driven insights are crucial.

In our AIoT work, we implement Collaborative Filtering across multiple M sensors or AIoT edges to achieve consensus on a measured value over a specified interval. Then use a Restricted Boltzmann Machine (RBM) model for collaborative filtering. Additionally, we deploy and run these types of models within a network of IoT edge devices. This approach leverages the distributed computing capabilities of IoT edges to enhance the performance and scalability of our collaborative filtering solution.

The integration of Collaborative Filtering algorithms with CMSIS (Cortex Microcontroller Software Interface Standard) on Arm devices presents a significant advancement in leveraging edge computing for intelligent decision-making. Collaborative Filtering, commonly used in recommendation systems, can be enhanced on Arm Cortex-M processors by utilizing the CMSIS-DSP library. This combination allows for efficient signal processing and data analysis directly on microcontroller-based systems, enabling real-time and power-efficient computations. This approach can be particularly powerful in IoT applications, where Arm devices often operate. By implementing Restricted Boltzmann Machines (RBM) using CMSIS, devices can process and analyze sensor data locally, reducing latency and bandwidth usage. This local computation capability can lead to more responsive and intelligent IoT systems, paving the way for advanced applications in smart environments, healthcare, and personalized user experiences.

Signal Processing on the IoT edge

The objective is to measure the signal \(x_n\) for a duration of \(T\) seconds with a sampling rate \(F_s\). The samples collected during that duration \(T\) are \(r_n=1,2,\ldots N\) samples. These measurements are performed \(M\) times repeatedly. Since there are \(M\) sets of \(x_n\) samples of the signal, the revised objective is to find a representative of these \(M\) sets of samples. Let \(\tilde{x}_n \) be the above-mentioned representative.

Let \(y_m(n) = x(n) + v_m(n) \), where \( v_m(n)\) is the measurement noise during the \(m\)-th measurement.

By performing \(M\) measurements, is it possible to

  • Improve the Signal to Noise Ratio (SNR)?
  • Estimate \(x_n\) using Maximum Likelihood and achieve better performance as per the Cramer-Rao bound?
  • Use a priori information about the source that created \(x_n\) and estimate \(x_n\) using a Bayesian network?

To reduce noise and obtain a more accurate representation of the output signal, multiple measurements of \(y(n)\) are taken over time: \(y_1(n), y_2(n), \ldots, y_M(n) \).

The averaged output signal \(\overline{y(n)} \) is calculated as the mean of these measurements:


\(\displaystyle\overline{y(n)} = \frac{1}{M} \sum_{i=0}^{M-1} y_i(n)\)

Consider a smart thermostat system in a home (part of an AIoT system). The thermostat measures the room temperature \(y(n)\) and adjusts the heating or cooling based on the desired setpoint \(u(n)\).

The following averaging measurement might not yield results that overcome the bounds defined by the Cramer-Rao bound:

\(y_m(n) = x(n) + v_m(n)\)

where \(v_m(n)\) is the measurement noise during the \(m\)-th measurement.

In this context, \(y_m(n) \) represents the noisy measurements of the signal \(x(n)\). Averaging these measurements can reduce the noise variance, but it does not necessarily surpass the theoretical lower bounds on the variance of unbiased estimators, as defined by the Cramer-Rao bound. The Cramer-Rao bound provides a fundamental limit on the precision with which a parameter can be estimated from noisy observations.

  • System Description: The thermostat system is represented by \(H(z)\), which controls the heating/cooling based on the input \(u(n)\). The output signal \(y(n)\) represents the measured room temperature.
  • Multiple Time Measurements: The thermostat takes temperature measurements every minute, producing a set of outputs \(y_1(n), y_2(n), \ldots, y_M(n)\).
  • Averaging: To get a more accurate representation of the room temperature and to filter out noise (e.g., transient changes due to opening a door), the thermostat averages these measurements: \(\overline{y(n)} = \frac{1}{M} \sum_{i=0}^{M-1} y_i(n)\). By averaging the noisy output values \(y_i(n)\), the thermostat system can make more stable and accurate adjustments, leading to a more comfortable and energy-efficient environment.
  • Latency: One annoying situation that occurs by the averaging operation, is that it increases the system’s latency, i.e. the smoothed output temperature value lags the observed noisy temperature value taken at time n. This delay is referred to as latency or Group delay in digital filters, and must also be taken into account when designing a closed loop control system. The subject of minimising latency in digital filters can fill a whole book in itself, but suffice to say, IIR digital filters generally have lower latency than FIR filters counterparts. The Moving average filter described herein can be considered as a special case of the FIR filter, as all filter coefficients are equal to one.
       In order to improve matters, Minimum phase filters (also referred to as zero-latency filters) may be used to overcome the inherent \(N/2\) latency (group delay) in a linear phase FIR filter, by moving any zeros outside of the unit circle to their conjugate reciprocal locations inside the unit circle. The result of this ‘zero flipping operation’ is that the magnitude spectrum will be identical to the original filter, and the phase will be nonlinear, but most importantly the latency will be reduced from \(N/2\) to something much smaller (although non-constant), making it suitable for real-time control applications where IIR filters are typically employed.

AI Model in Signal Processing

In signal processing, where signals are sensed by sensors, statistical parameterized models, Bayesian networks, and energy models play crucial roles. Statistical parameterized models help in estimating signal parameters efficiently, providing a structured approach to model signal behavior. Bayesian networks offer a probabilistic framework to infer and predict signal characteristics, accommodating uncertainties inherent in sensor data. Energy models, such as those utilizing MCMC with Contrastive Divergence, optimize the representation of signal data by minimizing energy functions, leading to improved signal reconstruction. Similarly, energy models via Restricted Boltzmann Machines and Backpropagation facilitate learning complex signal patterns, enhancing the accuracy of signal interpretation and noise reduction. Together, these models enable robust analysis and processing of signals, crucial for applications like noise reduction, signal enhancement, and feature extraction.

The Cramer-Rao bound (CRB) provides a lower bound on the variance of unbiased estimators, indicating the best possible accuracy one can achieve when estimating parameters from noisy data. This bound applies to traditional estimation methods under certain assumptions, such as unbiasedness and a specific noise model.

MCMC does not directly ‘overcome’ the Cramer-Rao bound, it provides a framework for obtaining parameter estimates that can be more accurate and robust in practice, especially in complex and high-dimensional settings. This improved performance arises from the ability to use prior information, handle complex models, and perform Bayesian inference. Markov Chain Monte Carlo (MCMC) methods, however, are used primarily for sampling from complex probability distributions and performing Bayesian inference. While MCMC methods themselves do not directly ‘overcome’ the CramerRao bound in a traditional sense, they offer advantages in estimation that may be interpreted as achieving better practical performance under certain conditions:

Individual Models

Each model will have ts own bias and variance characteristics. High-capacity models may fit the training data well (low bias) but may perform poorly on new data (high variance). Low-capacity models may underfit the training data (high bias) but have more stable predictions (low variance).

Averaging Models (Ensembles)

By combining the outputs of multiple models, ensemble methods aim to reduce the overall variance. This results in more robust predictions compared to individual models, particularly when the individual models have high variance.

Combining many models seems promising for some applications. When the model capacity is low, it’s difficult to capture the regularities in the data. Conversely, if the model capacity is too large, it may overfit the training data. By using multiple models, such as in AIoT where models can be sensor-centric or device-centric, better results can be achieved compared to using a single huge model.

  • High-capacity models tend to have low bias but high variance.
  • Averaging models reduces variance, leading to more stable predictions.
  • Bias remains unchanged by averaging, so it’s essential to use models with appropriately low bias.
  • The ensemble approach can outperform individual models by leveraging the strengths of multiple models, especially in scenarios like AIoT, where combining sensor-centric or device-centric models can lead to improved results.

In some cases, an individual predictor may perform better compared to a combined predictor. However, if individual predictors disagree significantly, then the combined predictor can perform well.

AIoT system building blocks

An essential pre-building block in any AIoT system is the feature extraction algorithm. The challenge for any feature extraction algorithm is to extract and enhance any relevant sensor data features in noisy or undesirable circumstances and then pass them onto the ML model in order to provide an accurate classification. The concept is illustrated below:

As seen above, an AIoT system may actually contain multiple feature blocks per sensor and in some cases fuse the features locally before sending them onto the ML model for classification such that the system may then draw a conclusion. The challenge is therefore how to capture sensor data for training and design suitable algorithms to extract features of interest?

The challenge is actually two fold: namely how to capture the datasets for analysis and then which algorithms to use for Feature engineering.

Although a few commercial solutions are available (e.g. Node-RED, Labview, Mathworks Instrumentation toolbox), the latter two are expensive for most developers who just require simple data capture/logging via the UART. One possible solution is Arm’s SDS Framework that provides developers with a set of tools for capturing and playback real-world data using Arm Virtual Hardware. Where, the captured SDS data files can be subsequently converted into a single CSV file for use in 3rd party applications for algorithm development. Unfortunately, the SDS framework is primarily aimed at Arm SoC developers and not particularly suitable for developers working with EVMs/kits.
  Therefore, most developers use web tools based on AutoML (eg. Qeexo) that will assist with the data capture from hardware (eg. from an ST Nucleo board) and then try an automate the ML modelling process by choosing a set of limited feature extraction algorithms (such as mean, median, standard deviation, kurtosis etc) and then try and produce a suitable classification model. In theory, this sounds great, but there are a number of problems with this approach, as performance is dependent on the quality and relevance of datasets. Our experience has shown that the best performance can be obtained from knowledge of the physical process, and by designing Feature extraction algorithms using scientific principles tailored to the process that you are trying to model.

Example: Feature Engineering for human fall detection

A common requirement of most IoMT biomedical wearable products is detecting Human fall detection with a smartwatch, just using accelerometer data. Traditional fall detection algorithms using MEMS sensors are based on the ‘Falling’ concept, whereby all three axes fall close to zero for a second or so. Although this works well for falling objects, such as a cup or box falling from a table, it is not suitable for humans. The challenge is illustrated below:

As seen, a human’s fall is very different to a box or other object falling.

The challenge is discriminating between normal everyday activities and falls. By analysing datasets of net acceleration data of typical everyday activities, such as someone walking, using their smartphone, brushing their teeth or doing some morning exercises, and fall data it is not always easy to discriminate between the two using ’standard’ statistical features.

Therefore, we need to apply some physics to the process that we’re trying to model in order to derive specific features from the sensor data, so that we can make a classification – i.e. is it a fall, or not.

Analysing the diagram we see that there are actually 4 phases from where the person is standing through to the point of the person lying on the ground. So the big question is how do we go about modelling these phases just using accelerometer data? This is best analysed by breaking the fall up into phases:

  • Happy: where the subject is upright and going about their daily business.
  • Falling: Depending on the subject, this period can be very short (around 100ms) and manifests itself very differently to an object falling directly down (i.e. freefall). The net acceleration will usually manifest itself as a negative gradient starting from about 1g tending towards zero, as the body’s centre of gravity changes. This usually lasts for about 60-100ms.
  • Impact: this is the primary event to detect, as any impact from a standing posture with a hard surface will produce a large shock pulse that is several orders of magnitude >1g over a short period.
  • Inactivity: this usually follows impact with the ground, whereby the subject is lying flat and is motionless for several seconds. In the case of a collision with an object (e.g. a piece of furniture or a door) or as a result of a severe medical condition, such as a stroke or heart attack, the subject may become unconscious. In this case, the system should be able to discriminate between inactivity from normal movements, such as hand or slight limb movement and light movement (caused by breathing) and decide whether to alert medical services. In the case that no movement is detected, i.e. the subject may have died as a result of the fall, there is no need to provide swift medical assistance.

Armed with this knowledge we can now use Feature engineering to design our features. This forms the essence of building features based on understanding of the physical process.

What tools and processor technology are available?

Although a few processor technologies exist for microcontrollers (e.g. RISC-V, Xtensa, MIPS), over 90% of the microcontrollers used in the smart product market are powered by so-called Arm Cortex-M processors. These are split up into various market segments, depending on energy requirements and algorithmic performance.

The low-end cores, such as the M0, M0+ and M3 are good for simpler algorithms, such as sensor cleaning filters, simple analytics as they have limited memory and no hardware FP support. To give you an idea of performance, for those of you who own a Fitbit, this is based on the M3 processor.

However, the biggest plus (especially for the M0 family), is that they can have very low power footprint making them an ideal choice for coin cell battery powered wearable applications, as devices can be made to run for months and even in some cases up to a year.

For developers looking for decent computational performance, the M4F is an excellent choice as it has hardware FP support, which is ideal for rapid application development of algorithms. In fact, the Arm Cortex-M4 is a very popular choice with several silicon vendors (including ST, TI, NXP, ADI, Nordic, Microchip, Renesas), as it offers DSP (digital signal processing) functionality traditionally found in more expensive devices and is low-power.

If you need more your application needs more computational performance, then the M7 is an excellent choice, where some devices even offer H/W double precision floating point support, which is ideal for audio enhancement and biomedical algorithms.

For those of you looking for hardware security, then the M33 is a good choice, as it implements Arm TrustZone security architecture, as well as having the computational performance of the M4.

State-of-the art AIoT microcontrollers

Released in 2020, the Arm Cortex-M55 processor and its bigger brother the Cortex-M85 are targeted for AIoT applications on microcontrollers. These processors use Arm’s powerful Armv8.1-M architecture that implement their M-Profile Vector Extension (MVE) technology (nicknamed Helium) allowing for 128bit vector mathematical operations (such as dot product operations) needed for ML and some DSP algorithms.

In November 2023, Arm announced the release of the Cortex-M52 processor for AIoT applications. This processor looks to replace the older M33 processor, as it combines Helium technology with Arm TrustZone technology. However, as only a few IC vendors (Alif, Ambiq, Samsung, Renesas, HiMax, Bestechnic, Qualcomm) have currently released or are planning to release any devices, Helium processors remain a gem for the future.

Toolchains

Arm provides developers with extensive easy-to-use tooling and tried and tested software libraries. The Arm’s CMSIS-DSP and CMSIS-NN frameworks for algorithm development and machine learning (ML) are two very popular examples that are open source and are used internationally by tens of thousands of developers.

The Arm-CMSIS framework solutions are further strengthened by Arm partners ASN and Qeexo who provide developers with easy-to-use real-time filtering, feature extraction (ASN Filter Designer) and ML tooling (Qeexo AutoML) and reference designs, expediting the development of IoT applications, including industrial, audio and biomedical. These solutions have been optimised for Arm processors with the help of Arm’s architecture experts and insider knowledge of compiler workings.

Deployment of Deep Learning Networks to the IoT Edge

Deploying a trained model onto an Edge device requires meticulous attention and effort. Fortunately, there are many tools available to help developers achieve this, such as Qeexo AutoML and the DLtrain toolset. The latter offers robust support for developers working with Arm processor-based boards with Android platforms. DLtrain utilizes the Android NDK (native development kit) to deploy neural networks (NN) or convolutional neural networks (CNN) in the Linux kernel of the Android platform. The deployed components include JNI options to support applications developed in Java, bridging the gap between low-level implementation and high-level application development. Find out more here.

Deploying deep learning (DL) networks on Arm cores of Android platforms involves integrating these networks into the Linux kernel via the Android NDK. While application development is primarily done in Java, DL networks receive input from the Android layer (SDK) and efficiently perform inference. The results are then passed back to the Java side via the Java Native Interface (JNI). The following list describes the layers involved in performing inference on an Android device:

  1. Top Layer: User Interface
  2. Second Layer: Java
  3. Third Layer: Android SDK
  4. Fourth Layer: Arm
  5. Bottom Layer: GPU

This hierarchical structure ensures that the user interface seamlessly interacts with underlying DL networks, optimizing performance and maintaining an efficient workflow from input to inference to output.

Key takeways

AI is an umbrella term focused on creating intelligent systems, whereas ML is a subset of AI that involves creating models to learn from data and make decisions. ML is crucial for IoT because it enables efficient data analysis, predictive maintenance, smart automation, anomaly detection, and personalized user experiences, all of which are essential for maximizing the value and effectiveness of IoT deployments.

Arm and its rich ecosystem of partners provide IoT developers with extensive easy-to-use tooling and tried and tested software libraries for designing an implementing IoT algorithms for their smart products. Arm Cortex-MxF processors expedite RAD by virtue of their ease of use and hardware floating-point support, and modern semiconductor technology ensures low-power profiles making the technology an excellent fit for IoT/AIoT mobile/wearables applications.

Authors

  • Dr. Jayakumar Singaram

    Jayakumar is a seasoned expert in semiconductor technology and AIoT. He advices companies such as Mistral Solutions, SunPlus Software, and Apollo Tyres at the strategic level on their AIoT solutions. He successfully founded Epigon Media Technologies, which focuses on Research and Development for the global market, and is also the co-author of the book "Deep Learning Networks: Design, Development, and Deployment."

    View all posts
  • Dr. Sanjeev Sarpal

    Sanjeev is an AIoT visionary and expert in signals and systems with a track record of successfully developing over 25 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT solutions and strategies for I4.0, telemedicine, smart healthcare, smart grids and smart buildings.

    View all posts

Many IoT applications use a sinewave for estimating the amplitude of an entity of interest – some examples include:

  • Measuring material fatigue/strain with a loadcell – in vehicle and bridge/building applications measuring material fatigue and strain is essential for safety. An AC sinusoidal excitation overcomes the difficulty of dealing with instrumentation electronics DC offsets.
  • Calibrating CT (current transformers) sensors channels – a sinusoid of known amplitude is applied to channel input and the output amplitude is measured.
  • Measuring gas concentration in infra-red gas sensors – the resulting sinusoid’s amplitude is used to provide an estimate of gas concentration.
  • Measuring harmonic amplitudes in power quality smart grids applications – in 50/60Hz  power systems, certain harmonic amplitudes are of interest.
  • ECG biomedical compliance testing – channel compliance with IEC regulations needed for FDA testing typically uses a set of sinewaves at known amplitudes, to ensure that the channel amplitude error is within specification.

In a previous article, we discussed how differentiation could be used to find the peaks and troughs of sinewave, i.e. finding the zero crossing points. However, a much more traditional approach has been to use fullwave rectification, whereby a non-linear operator and lowpass filtering are employed. The concept used is described below:

  1. Remove any DC or low-frequency offsets via a highpass filter.
  2. Apply a non-linear operator via an abs() or sqr() non-linear operator.
  3. Lowpass filter the result to obtain an estimate of the sinusoid’s amplitude.
  4. Scale the amplitude.

Although this sounds easy, care should be taken to understand the effects of how the non-linear operator alters the waveform and affects the estimation of amplitude using lowpass filtering.

IoT application

A typical IoT application using a sinusoid is shown below:

As seen, the waveform can be modelled as:

\(x\left(n\right)=A\,sin\left(2\pi f_ot\right)+B\)

Where, \(f_o\)  is the frequency of oscillation and \(A\) is the amplitude of sinusoid respectively. Notice that the sinusoid is non-linear and symmetrical around the offset, \(B\). Notice also that it has a peak-to-peak amplitude of \(2A\), since specifying an amplitude \(A\) results in a bipolar amplitude of \(±A\). As many microcontrollers employ low-cost unipolar ADCs, the bipolar sinusoid needs to be offset by a DC offset, \(B\) (usually achieved by a resistor network) to ensure that the signal remains within the common-mode range of the ADC input.

As mentioned above, before applying the non-linear operator any DC offsets need to be removed. This can easily be achieved with either an IIR or FIR highpass filter. If using an IIR filter, it should be noted that the filter’s phase and group delay (latency) will significantly increase at the cut-off frequency, so a degree of experimentation is required to find a good trade-off.

After highpass filtering the data, we can apply the non-linear operator. Two popular operators are the abs()and sqr() operators.

Using the abs() operator, the Fourier series of \(\left|A \,sin(2\pi f_ot)\right|\) is shown below:

\(\left|A\ sin(2\pi f_ot)\right|\ =\ \displaystyle A\left[ \frac{2}{\pi}\ -\ \displaystyle\frac{4}{\pi}\normalsize{\sum\limits_{k=1}^{\infty}}\frac{cos(4k\pi f_ot)}{4k^2-1}\right]\)

Analysing the equation, it can be seen that the abs() operation doubles the frequency and that the DC component is actually \(\frac{2A}{\pi}\), as illustrated below.

As seen, lowpass filtering this result in its current form will produce an amplitude estimate of \(\frac{2A}{\pi}\) (dashed red line), which is clearly incorrect for estimating the sinewave’s amplitude, \(A\). However, this can be simply remedied by scaling the amplitude estimates by \(\frac{\pi}{2}\), which removes the bias, leaving the sinewave’s amplitude, \(A\).

Likewise, for a sqr() operator, we can define the resulting waveform using trigonometrical identities, i.e.

\({sin^2(2\pi f_ot)}\ =\ A\left[\displaystyle\frac{1\ -\ cos(4\pi f_ot)}{2}\right]\)

Lowpass filtering this signal requires a correction scaling factor of 2.  

Lowpass filter

Although any lowpass filter will suffice, the moving average filter is used by most developers by virtue of its computational simplicity and noise reduction characteristics. A more detailed explanation of moving average filters can be found here.

A 24th order moving average filter with a post gain of \(\frac{\pi}{2}\) or 1.571 is shown below.

Applying this moving average filter to a sinewave \(f_o=10Hz, A=0.5\), sampled at 500Hz processed with the abs()operator we obtain the following:

As seen, the amplitude estimation of the sinusoid using a lowpass filter and the \(\frac{\pi}{2}\) scaling factor is now correct. However, for real world applications that contain noise, it is considered to be more accurate to measure the RMS amplitude, in which case the scaling factor becomes \(\frac{\pi}{2\sqrt 2} \).

Note that these scaling factors are only valid for sinusoidal scaling. If your waveform is non-sinusoidal (e.g. triangular or square or affected by harmonics) another scaling factor/method will be required, as discussed below.

True RMS

In practice, many sinusoidal waveforms will be affected by harmonics (e.g. smart grid power systems) which will alter the shape of the main sinusoid and offset the RMS estimate using the \(\frac{\pi}{2\sqrt 2} \) scaling factor concept.

A much better method is to calculate the True RMS, whereby the sqr() operator is used for the full wave rectification, but this time a sqrt() function is used for scaling after the lowpass operation. The results of the two methods are shown below, where it can be seen that the True RMS method correctly estimates the signal’s RMS amplitude.

Author

  • Dr. Sanjeev Sarpal

    Sanjeev is an AIoT visionary and expert in signals and systems with a track record of successfully developing over 25 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT solutions and strategies for I4.0, telemedicine, smart healthcare, smart grids and smart buildings.

    View all posts

AIoT is an exciting new area that combines AI concepts (i.e. ML) with IoT in order to produce state-of-the-art smart embedded solutions. This augmentation of technologies requires a new set of tools to capture real-time IoT sensor data, analyse it, design suitable algorithms and then perform validation of the solution.  After completing validation of the algorithms on the test data, a final hurdle is then how to generate efficient C code of the developed algorithm(s) for an Arm Cortex-M microcontroller for use in an application. These concepts will be discussed herein.

Arm’s Synchronous Data Stream (SDS) Framework provides developers with an easy method of capturing and playing back real-time sensor data for embedded AIoT sensor applications on Arm Cortex-M processors, such as ST Microelectronics’ very popular STM32 family.

The SDS Framework provides embedded developers with a variety of essential tools, such as the ability to record real-world sensor data for analysis and development in tools such as ASN Filter Designer, Python and Matlab. A set of Python utility scripts are available for recording, playback, visualisation and data conversion, where the latter supports the conversion of captured SDS data files into a single CSV file – providing a simple bridge between the ASN Filter Designer and the SDS Framework.

The SDS framework also supports the possibility to playback real-world data for algorithm validation using Arm Virtual Hardware, allowing developers to verify execution of DSP algorithms on Cortex-M targets with off-line tools.

This application note provides AIoT developers with a complete reference guide of how to develop and deploy feature extraction algorithms for use in AIoT applications to STM32 Arm Cortex-M based microcontrollers using STM32CubeIDE or Keil mVision with the Arm SDS framework and ASN Filter Designer. As mentioned above, AIoT system challenges and concepts will also be covered.

Building AIoT systems

Almost all IoT embedded sensor applications require some level of signal processing to enhance sensor data and extract features of interest. However, an obvious hurdle for many developers is how to design, test and deploy efficient algorithms for their application. This is easier said than done, as many software engineers are not well-versed in understanding the mathematical concepts needed to implement algorithms. This is further complicated by the challenge of how to implement algorithms developed by researchers that are not interested/experienced in developing real-time embedded applications.

A possible solution offered by the Mathworks (Embedded Coder) automatically translates Matlab algorithms and functions into C for Arm processors, but its high price tag and steep learning curve make it unattractive for many.

That being said, Arm and its rich ecosystem of partners provide developers with extensive easy-to-use tooling and tried and tested software libraries. Arm’s CMSIS-DSP and CMSIS-NN frameworks for algorithm development and machine learning (ML) are two very popular examples that are open source and are used internationally by tens of thousands of developers.

The Arm CMSIS-DSP software framework is particularly interesting as it provides IoT developers with a rich collection of fast mathematical and vector functions, interpolation functions, digital filtering (FIR/IIR) and adaptive filtering (LMS) functions, motor control functions (e.g. PID controller), complex math functions and supports various data types, including fixed and floating point. The important point to make here is that all of these functions have been optimised for Arm Cortex-M processors, allowing you to focus on your application rather than worrying about optimisation.  

The Arm-CMSIS framework solutions are strengthened by Arm partners ASN and Qeexo who provide developers with easy-to-use real-time filtering, feature extraction (ASN Filter Designer) and ML tooling (AutoML) and reference designs, expediting the development of AIoT applications, including industrial, audio and biomedical. These solutions have been optimised for Arm processors with the help of Arm’s architecture experts and insider knowledge of compiler workings.

AIoT system building blocks

An essential pre-building block in any AIoT system is the feature extraction algorithm. The challenge for any feature extraction algorithm is to extract and enhance any relevant sensor data features in noisy or undesirable circumstances and then pass them onto the ML model in order to provide an accurate classification.  The concept is illustrated below:

As seen above, an AIoT system may actually contain multiple feature blocks per sensor and in some cases fuse the features locally before sending them onto the ML model for classification such that the system may then draw a conclusion. The challenge is therefore how to capture sensor data for training and design suitable algorithms to extract features of interest.

Feature extraction algorithms: challenges and solutions

The challenge for any feature extraction algorithm is to extract and enhance any relevant data features in noisy data or undesirable circumstances and then pass them onto the ML model in order to provide an accurate classification. Unfortunately, many ML models perform badly, due to poor quality data and insufficient training data.   An obvious challenge for AIoT is how do we obtain the training data in the first place? In many cases, this is extremely challenging as data pertaining to faults (such as preventive maintenance) is hard to come by, as many plant managers are reluctant to break their working production lines or processes to provide developers with training data.

In the absence of adequate training data, feature extraction based on science and mathematics is a prudent alternative, as less training data is required, and in general, the quality of the feature estimate is higher as knowledge of the underlying process is used. Examples include: obtaining accurate pulse and heart rate estimates from ECG and PPG sensors in smartwatch applications when a subject is moving.  For industrial sensors, such as loadcells, pressure, temperature, gas and accelerometer sensors the challenge is amplified, as harsh operating conditions and the sheer variety of the applications needed for I4.0 process control applications complicate the design significantly.

Example: Infrared gas sensor

Consider the following application for gas concentration measurement from an Infrared gas sensor. The requirement is to determine the peak-to-peak amplitude of the sinusoid in order to get an estimate of gas concentration – where the bigger amplitude is the higher the gas concentration will be.

Analysing the Figure, it can be seen that the sinusoid is corrupted with measurement noise (shown in blue), and any estimate based on the blue signal will have a high degree of uncertainty about it – which is not very useful for getting an accurate reading of gas concentration! After cleaning the sinusoid with a digital filter (red line), we obtain a much more accurate and usable signal for our gas concentration estimation challenge. But how do we obtain the amplitude?

Knowing that the gradient at the peaks is zero, a relatively easy and robust way of finding the peaks of the sinusoid is via numerical differentiation, i.e. computing the difference between sample values and then looking for the zero-crossing points in the differentiated data. Armed with the positions and amplitudes of the peaks, we can take the average and easily obtain the amplitude and frequency.  Notice that any DC offsets and low-frequency baseline wander will be removed via the differentiation operation.

This is just a simple example of how to extract the properties of a sinusoid in real-time using science and mathematics and an understanding of the underlying process without the need for ML training data.

AIoT feature extraction smart sensor design workflow

Arm’s Synchronous Data Stream (SDS) Framework provides developers with an easy method of capturing and playing back real-time sensor data for embedded AIoT sensor applications on Arm Cortex-M processors. A set of Python utility scripts are available for recording, playback, visualisation and data conversion, where the latter supports the conversion of captured SDS data files into a single CSV file – providing a simple bridge between the ASN Filter Designer and the SDS Framework.

An AIoT smart sensor design workflow using the ASN Filter Designer and the SDS Framework is shown below.

As seen above, three major components constitute the AIoT design workflow.

  1. Arm SDS Framework: capturing IoT sensor data and converting it to CSV format.
  2. ASN Filter Designer: importing the CSV datafile and then analysing the data. Based on the data analysis, a suitable filter can be designed together with other filters and IP blocks in order to build feature extraction algorithms for ML applications.
  3. Application deployment: Generating optimised C code and combining the design with an application for use on an Arm microcontroller.

The SDS Framework can be used with all major demo boards, including ST’s Discovery kit and Nucleo boards. SDS Python utilities are used to convert the captured *.sds and *.yaml files into a CSV file for import into the ASN Filter Designer, as discussed in the following section.

Data Import Wizard

The ASN Filter Designer’s comprehensive data import wizard can delimitate and import a variety of multi-column IoT datasets in CSV or TXT form.

As seen in the video, a generated CSV file can be dragged and dropped onto the signal analyser canvas, bringing up the data import wizard. The import wizard will automatically check the imported data for errors (such as NaNs, Infs etc) and then order the data into columns. Any header line data can be skipped by setting the Skip Headerlines value respectively.

For the example considered herein, the data is actually triaxial accelerometer data (i.e. X, Y and Z axes) with an extra column for the timebase. Therefore, if we wish to import the X-axis data, we can simply click on the header for the second column (B).   The tool will then ask you to recheck the data, and upon clicking on ‘Save’ will save the selected data as a single-column CSV file (needed for the ASN Filter Designer). This new CSV file can then be streamed via the tool’s signal generator for algorithm development.

Deploying to Arm Cortex-M processors

After completing the design process, the designed filter(s) can be deployed to STM32CubeIDE or Arm/Keil uVision for integration into an application project. Depending on the functionality of the ASN Filter Designer’s signal chain, two software frameworks are available: Arm’s CMSIS-DSP and ASN’s ANSI C DSP.

The ANSI C DSP framework was developed with close collaboration with Arm’s architecture team, providing outstanding computational performance that is required for real or complex coefficient floating-point designs that use multiple filters and mathematical functions in a signal chain. The Arm CMSIS-DSP framework on the other hand is an excellent choice for implementing both fixed-point and floating-point filters, but is limited to real coefficient single FIR filter or one IIR Biquad cascade with no extra mathematical functions.

A benchmark comparison for both frameworks is shown below for an 8th IIR filter running on three different Arm cores. As seen, the ASN Framework is slightly faster (lower means better performance) than the Arm Framework.

Framework Benchmarks: lower number of clock cycles means higher performance.

Arm CMSIS-DSP wizard

Professional licence users may expedite the deployment by using the Arm deployment wizard. The tool will automatically analyse your design and choose a suitable Framework. If the design cannot be exported via the Arm CMSIS-DSP framework, the tool will suggest that you use the ASN ANSI C framework and launch the code generation wizard.

Note that the built-in AI will automatically determine the best settings for your design based on the quantisation settings chosen.

Clicking on Deploy will automatically analyse your complete filter cascade and convert any extra filters in the cascade into an H1 (primary) for implementation. Upon completion, the tool will then launch the C code generation wizard.

CMSIS-DSP C code generation

Depending on which C framework is used, the C code generator wizard will automatically generate the C code needed for your design. For developers using the Arm CMSIS-DSP framework, a single C file is generated for use with an MDK5 software pack. The MDK5 pack is available from Arm Keil’s software pack repository, providing several complete filtering examples based on the ASN Filter Designer’s code generator using the Arm CMSIS-DSP library.

A detailed help tutorial is available by clicking on the Show me button. 

ASN ANSI C DSP code generation

For developers using the ASN ANSI C DSP framework, the code wizard should be used.

NB. The wizard will produce a CodeBlocks project in order to get you started. The following section describes in detail the steps needed for using the generated code in a STM32CUBE-IDE project. Please refer to the ANSI C SDK user guide for step-by-step instructions on how to use the generated code in other IDEs. 

A PDF version of this article is available as an application note.

Author

  • Dr. Sanjeev Sarpal

    Sanjeev is an AIoT visionary and expert in signals and systems with a track record of successfully developing over 25 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT solutions and strategies for I4.0, telemedicine, smart healthcare, smart grids and smart buildings.

    View all posts

Advances in telemedicine healthcare products over the past decades have been truly miraculous with ingenious little devices invented by start-ups as well as by larger corporations, .e.g Apple’s smart watch and the Fitbit. These advancements have been facilitated by the availability of low-cost microcontrollers offering algorithmic functionality, allowing developers to implement wearables with excellent battery life and edge based real-time data analysis.

Over 90% of the microcontrollers used in the smart product market are powered by so called Arm Cortex-M processors that offer a combination of high algorithmic performance, low-power and security. The Arm Cortex-M4 is a very popular choice with hundreds of silicon vendors (including ST, TI, NXP, ADI, Nordic, Microchip, Renesas), as it offers DSP (digital signal processing) functionality traditionally found in more expensive devices and is low-power. Arm and its rich eco system of partners provide developers with easy-to-use tooling and tried and tested software libraries, such as the CMSIS-DSP and CMSIS-NN frameworks for algorithm development and machine learning.

The choice is vast, and can be very confusing. Therefore, here are some practical hints and tips for both managers and developers to help you decide which Arm Cortex-M processor is best for your biomedical product.

Which Arm Cortex-M processor do I choose for my biomedical application?

The Arm Cortex-M0+ processor is an ultra-low power 32-bit processor designed for very low-cost IoT applications, such as simple wearable devices. The low price point is comparable with equivalent 8-bit devices, but with 32-bit performance. Microcontrollers built around the M0+ processor provide developers with excellent battery life (months to years), a rich peripheral set and a basic amount of connectivity and computational performance. The latter means that only simple algorithms can be implemented, such as algorithms for correcting baseline wander and minimizing the effects of motion artefacts using accelerometer data via an adaptive filter, such as the NLMS algorithm. Although for PPG pulse rate measurement applications, the sampling rate is typically 50Hz, leaving the processor plenty of time to perform various simpler algorithmic operations, such as digital filtering and zero-crossing detection.

For high performance PPG applications, sampling rates in the order of 500Hz are typically used. These types of applications usually look at more biomedical features, such as identifying the Systolic and Diastolic phases and finding the Dicrotic notch using feature extraction algorithms and ML models. These extra functionalities provide a significant strain on the processor’s abilities, and as such are beyond the abilities of the M0+.

The Cortex-M3 is a step up from the M0+, offering better computational performance but with less power efficiency. The extra processing power, rich hardware peripheral set for connecting other sensors and connectivity options makes the M3 a very good choice for developers looking to develop slightly more advanced wearable products, such as the Fitbit device that is based on ST’s low-power STM32L series of microcontrollers.

High performance wearables and beyond

The Arm Cortex-M4 processor and its more powerful bigger brother the Cortex-M7 are highly-efficient embedded processors designed for IoT applications that require decent real-time signal processing performance and memory. Depending on the flavour of the processor, the M4F/M7F processors implement DSP hardware accelerated instructions, as well hardware floating point support. This lends itself to the efficient implementation of much more computationally intensive biomedical DSP and ML algorithms needed for more advanced telemedicine products.

The hardware floating point support unit expedites RAD (rapid application development), as algorithms and functions developed in Matlab or Python can be ported to C for implementation without the need for a lengthy data arithmetic quantisation analysis. Microcontrollers based on the M4F or M7F, usually offer many of the hardware peripheral and connectivity advantages of the M3, providing developers with a very powerful, low power development platform for their telemedicine application.

The Arm Cortex-M33 is a step up from the M4 focusing on algorithms and hardware security via Arm’s TrustZone technology and memory-protection units. The Cortex-M33 processor attempts to achieve an optimal blend between real-time algorithmic performance, energy efficiency and system security.

State-of-the art AI microcontrollers

Released in 2020, the Arm Cortex-M55 processor and its bigger brother the Cortex-M85 are targeted for AI applications on microcontrollers. These processors feature Arm’s Helium vector processing technology, bringing energy-efficient digital signal processing (DSP) and machine learning (ML) capabilities to the Cortex-M family. In November 2023, Arm announced the release of Cortex-M52 processor for IoT applications. This processor looks to replace the older M33 processor, as it combines Helium technology with Arm TrustZone technology.

Although the IP for these processors is available for licencing, only a few IC vendors have developed a microcontroller, e.g. Samsung’s Exynos W920 SoC that has been specifically designed for the wearables market. The SoC packs two Arm Cortex-A55 processors, and the Arm Mali-G68 GPU using state-of-the art 5nm semiconductor technology. The chipset also features a dedicated low-power Cortex-M55 display processor for handling AoD (Always-on Display) tasks – although a little over the top for simple wearable devices, the Exynos processor family certainly seems like an excellent choice for building next generation AI capable low-power wearable products.

So, which one do I choose?

The compromise for biomedical product developers when choosing an M4, M7 or M33 based microcontroller over an M3 device usually comes down to a trade-off between algorithmic performance, security requirements and battery life. If good battery life and simple algorithms are key, then M3 devices are a good choice. However, if more computationally intensive analysis algorithms are required (such as ML models), then the M4 or M7 should be used.

As mentioned earlier, the Armv7E-M architecture used in M4/M7 processors supports a DSP extension that implements an SIMD (single instruction, multiple data) architecture extension that can significantly improve the performance of an algorithm. The hardware floating point unit is very good for expediting MAC (multiply and accumulate) operations used in digital filtering, requiring just three cycles to complete. Other DSP operations such as add, subtract, multiply and divide require just one cycle to complete.

The M7 out performs its M4 little brother by offering approximately twice the computational performance and some devices even offer hardware double precision floating point support which make M4/M7 processors attractive for high accuracy algorithms needed for medical analysis.

If data security is paramount, for example protecting and securing transferring patient data to a cloud service, then the M33 or the M52 (when avalaible) are good choices. These devices also offer a high level of protection against tampering and running of authorised code via TrustZone’s trusted execution environment.

Some IC vendors now offer hybrid micro-controllers that implement multi-processors on chip, such as ST’s ST32Wx family that combine the M0+ and M4 in order to get the advantages of each processor and maximise battery life. 

Finally, advances in semiconductor technology means that a modern M4F processor produced with 40nm process technology may match or even surpass the energy efficiency of an M3 produced with 90nm technology from several years ago. As such, higher performance processors that were until several years too costly and energy inefficient for low-cost wearables products are rapidly becoming a viable solution to this exciting marketplace.

Author

  • Dr. Sanjeev Sarpal

    Sanjeev is an AIoT visionary and expert in signals and systems with a track record of successfully developing over 25 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT solutions and strategies for I4.0, telemedicine, smart healthcare, smart grids and smart buildings.

    View all posts

“Improve your existing resources”

In a previous blog, we talked about how IoT can help in taking control. There is another step further: to optimize your processes with AIOT.

Benefits

  • Better use of existing resources
  • Take the right decisions at the right time
  • Optimal circumstances

Better use of existing resources

Control means you have a clear overview how assets are being used. Such as:

  • How long does each step in a process take?
  • What are the whereabouts of my assets (trucks, cranes, forklifts, containers…)?
  • What is the state of maintenance?

The next step is of course, to optimize your business.

First of all, many blogs write about total-new situations. In fact, most AIOT is needed in companies which are already established. With their inventory, processes, customers and all the responsibilities which come with them. Large investments have been made to reach the business today. And so, their processes may not be optimal, at least they work. So how to benefit from AIOT, without throwing away all these investments? And: how to be sure processes are at least working as they do know? Smart sensors help to bring the whole process at today’s level, without throwing away resources which are working fine. Besides, companies can choose to implement AIOT piecemeal.

This is especially the case when it’s about highly essential functions such as infrastructure, sluices and installations. Here, the asset is not just an asset, but a part of a total infrastructure. Downtime of such an asset has large implications for society as a whole.

Many processes are still monitored piecemeal. A further optimization is to connect systems with each other. Get 1 overview in 1 dashboard. Learn how your processes are doing, and where are the optimizations are required.

Take the right decisions at the right time

To measure is to know, to know is to be able to improve.

One most mentioned benefits of AIOT is preventive maintenance. Preventive maintenance means that something is repaired or replace, before it is breaks. Or at least, to maintain while the damage is still small. In normal situations there would be downtime, now repairs can be made scheduled. And if downtime is needed for repairs, then it can be scheduled at times the least inconvenient.

It’s already been said: to be able to schedule repairs. Take the right decisions at the right time. Besides, in the old situation, a foreman has to do his round, where he gives each machine the same attention. With AIOT, the quality of the assets can be guarded with sensors. So, at his round, a foreman can give most attention to the machines which mostly need it.

The same applies to a sector as biomedical: ‘to prevent is better then to cure’. So, help your clients and/or yourself to stay healthy. An example is fall detection. And does the elderly take his medicine?

Help your patients with therapy, to make use of knowledge from all previous patients: is therapy going on track? Also: give the patients who need it the right amount of attention. Instead of seeing all your patients with a standard scheduled time-frame, and as a consequence, give none of them enough time really.  If therapy is lagging, you probably want to give those patient attentions. Is therapy going faster then expected: what are the reasons? How can this knowledge be used to improve therapy in the future? Besides, if people can do therapy and appointments at home, they don’t have to spend their precious time; where the actual time needed for treatment is shorter then the time spent on travelling and waiting.

Optimal circumstances

Sensors can guard that product are made or kept in optimal circumstances. E.g., if cutting parts of a machine are still sharp enough, and in their right precision. Or guard the temperature of cooling or keep an eye on the indoor air quality. This may also make guarantees possible, and thus creating added value to your products or service.

Since the end of July 2020, KPN has renewed its mobile network which enables 5G. KPN is rapidly expanding coverage throughout the Netherlands. Business customers and entrepreneurs can already make use of special 5G services. To see tomorrow’s digital highway in action, since 5G is one of the enablers for smart industry

3.5 GHz frequency auction

Starting in 2022, KPN will auction G5 frequencies. Meanwhile, together with customers and technology partners, telecom and ICT service provider KPN has launched 5G field labs to discover the value of 5G applications. Thanks in part to 5G technology, these types of applications will become a reality.

During the new 5G auction, frequencies in the 3.5 GHz band will be distributed. These will enable connections at much higher speeds. At the auction in the first quarter of 2022, at least three parties must obtain licenses for the frequencies. No single party may acquire more than 40 percent of the available frequencies, says the proposal for the course of the auction.

A total of 300 megahertz of bandwidth is to be distributed. This consists of three blocks of 60 megahertz and twelve blocks of 10 megahertz. The auction will be held in three phases. Prior to and during the auction the Ministry of Economic Affairs will not provide information about the total number of participants. At the end, the winning parties will be announced and the State Secretary will make the entire bidding process public.

Tomorrow’s digital highway

Thanks to the capacity and reliability of our network, new applications such as innovations in security, healthcare, mobility, logistics and the manufacturing industry become possible.  Unlike 4G, 5G is expected to become an ecosystem from which many business sectors, industries and areas can benefit. Innovations from which the whole of society benefits. In addition to higher speed, 5G focuses explicitly on flexibility in the network to support very short response times and higher reliability. This will enable a wide range of new applications for customers and industries.

5G is expected to provide a huge boost in business for augmented and virtual reality, robotics, drones, intelligent assets, wearables, AI-based video analytics and Internet of Things (IoT), among others.

In addition to 5G, Internet of Things also requires edge computing. This involves placing a small cloud with computing power, storage and network capacity at the edge of the network, as it were, close to applications, devices and users. Because data no longer has to travel all the way up and down to the cloud or a data center, time-critical applications such as self-driving cars and augmented reality become possible.

5G: enabler for Industry 4.0

5G is also a key enabler for Industry 4.0. This involves using Internet of Things, cloud computing and data integration, among other things, to make the production process fully computer-controlled and remote. The human thought process is thereby partially or completely taken over. Due to its high speed and reliability and short latency, 5G is essential within Industry 4.0 for, for example, controlling production lines, facilitating self-driving vehicles and connecting large numbers of IoT devices.

Field Labs

KPN Eindhoven 5G Fieldlab

The national rollout of 5G has only just started, but KPN has been testing 5G for useful applications in its Field Labs for some time. 

The 5G field lab for the manufacturing industry shows everyone which 5G indoor use cases are possible in a factory environment. Besides speed, 5G also enables larger reliability and very low network latency. A large number of wireless sensors also plays an important role in the further rollout of the IoT. Thus, the 5G Field Labs shows that 5G can be used for very different applications simultaneously.

AIoT has many benefits. Those benefits can be summarized as: saving, controlling, optimizing and innovation. How does AIOT reduce costs and provide more efficiency?

How AIoT can help to save:

  • Preventive maintenance
  • Efficient use of time, equipment and money
  • Lesser costs of energy
  • Don´t throw away infrastructure which is working fine

Preventive maintenance

Purchasing new machinery involves high costs. The assets of public infrastructure exist of expensive equipment. So, there are high costs of replacing equipment which is failing. To reduce these costs, Preventive maintenance comes in. With Preventive maintenance, you can repair or replace parts from which you know that they will not be working properly in a short time. Or on the moment they are not working properly anymore.  With this maintenance program, you can act because an (expected) little failure has caused damage.

And, in many cases such as public infrastructure, a not working device isn’t just a not working device!  A failure of a sluice or railroad switch causes disruption for the infrastructure as a whole: Ships and trains can’t deliver their goods anymore on time. Customers are standing literary in the cold due to not working train infrastructure. With preventive maintenance you can spare them (or yourself) high costs and much annoyance.

Efficient use of time, equipment and money

Use your time, equipment or money? As efficient as possible. In a time of growing economies, employees are scarce and hard to find. So you want to make use of your employee’s time as efficient and effective as possible. This means that employees have to to be able give attention to things… really needed. IoT makes this possible. Some examples:

  • For offices: cleaners have to clean only the places of the office which have actually used instead of cleaning the whole building. Non-used offices can even be shut down.
  • Logistics: more efficienct planning of cranes, further transport
  • Already mentioned: the benefit of preventive maintenance

Lesser costs of energy

Another savings IoT makes possible is saving of energy.

And of course, this benefits the user but also the planet as a whole! And that makes your customers and employees even more satisfied. Which makes that they will stay customer or employer longer… Besides, if you rent offices, they will be longer and easier hired.

Don’t throw infrastructure which is working fine

In most buildings and logistics, the infrastructure has been built years ago with huge efforts and costs. The infrastructure is mission critical, so owners often still accept that their infrastructure isn’t the most efficient, as long as it works. Now sensors come in: they bring an extra layer upon the already existing devices, be it such different devices as hvac in office buildings or cranes in ports.