Training blank primes AI to interpret health data from smartwatches and fitness trackers

The human body constantly generates a variety of signals that can be measured from outside the body with wearable devices. These bio-signals – ranging from heart rate to sleep state and blood oxygen levels – can indicate whether someone is having mood swings or can be used to diagnose a variety of body or brain disorders.

It can be relatively cheap to gather a lot of bio-signal data. Researchers can organize a study and ask participants to use a wearable device akin to a smartwatch for a few days. However, to teach a machine learning algorithm to find a relationship between a specific bio-signal and a health disorder, you first need to teach the algorithm to recognize that disorder. That’s where computer engineers like myself come in.

Many commercial smartwatches, such as ones by Apple, AliveCor, Google and Samsung, currently support atrial fibrillation detection. Atrial fibrillation is a common type of irregular heart rhythm, and leaving it untreated can lead to a stroke. One way to automatically detect atrial fibrillation is to train a machine learning algorithm to recognize what atrial fibrillation looks like in the data.

This machine learning approach requires large bio-signal datasets in which instances of atrial fibrillation are labeled. The algorithm can use the labeled instances to learn to recognize a relationship between the bio-signal and atrial fibrillation.

The labeling process can be quite expensive because it requires experts, such as cardiologists, to go through millions of data points and label each instance of atrial fibrillation. The same problem extends to many other bio-signals and disorders.

To resolve this issue, researchers have been developing new ways to train machine learning algorithms with fewer labels. By first training a machine learning model to fill in the blanks of large-scale unlabeled bio-signal data, the machine learning model is primed to learn the relationship between a bio-signal and a disorder with fewer labels. This is called pretraining. Pretraining even helps a machine learning model learn a relationship between a bio-signal and a disorder when it is pretrained on a completely unrelated bio-signal.

Bio-signals are found all over the body and provide information about different bodily functions. Each of these is a bio-signal that measures a specific physiological signal in a noninvasive way.
Eloy Geenjaar

Challenges of working with bio-signals

Finding relationships between bio-signals and disorders can be difficult because of noise , or irrelevant data, differences between people’s bio-signals, and because the relationship between a bio-signal and disorder may not be clear.

First, bio-signals contain a lot of noise. For example, when you’re wearing a smartwatch while running, the watch will move around. This causes the sensor for the bio-signal to record at different locations during the run. Since the locations vary across the run, swings in the bio-signal value may now be due to variations in the recording location instead of due to physiological processes.

Second, everyone’s bio-signals are unique. The location of veins, for example, often differ between people. This means that even if smartwatches are worn at exactly the same place on everyone’s wrists, the bio-signal related to those veins is recorded differently from one person to the next. The same underlying signal, such as someone’s heart rate, will lead to different bio-signal values.

The underlying signal itself can also be unique for people or groups of people. The resting heart rate of an average person is around 60-80 beats per minute, but athletes can have resting heart rates as low as 30-40 beats per minute.

Lastly, the relationship between a bio-signal and a disorder is often complex. This means that the disorder is not immediately obvious from looking at the bio-signal.

Machine learning algorithms allow researchers to learn from data and account for the complexity, noise and variability of people. By using large bio-signal datasets, machine learning algorithms are able to find clear relationships that apply to everyone.

Learning to fill in the blanks

Researchers can use unlabeled bio-signal data as a warmup for the machine learning algorithm. This warmup, or pre-training, primes the machine learning algorithm to find a relationship between the bio-signal and a disorder. This is a bit like walking around a park to get the lay of the land before working out a route to go running.

There are many ways to pretrain a machine learning algorithm. In my research with Dolby Laboratories researcher Lie Lu and previous research, the machine learning algorithm is taught to fill in the blanks.

To do this, we take a bio-signal and artificially create gaps of a certain length – for example, one second. We then teach the machine learning algorithm to fill in the missing piece of bio-signal. This is possible because the machine learning algorithm sees what the bio-signal looks like before and after the gap.

If the heart rate of a person is around 60 beats per minute before the gap, there will likely be a heartbeat in the one-second gap. In this case, we’re training the machine learning algorithm to predict when that heartbeat will occur.

Once we have trained the machine learning algorithm to do this, it will have found a relationship between someone’s heart rate and when the next beat should occur. We can now train the machine learning algorithm with this relationship between a normal heart rate and bio-signal already learned. This makes it easier for the algorithm to learn the relationship between heart rate and atrial fibrillation. Since atrial fibrillation is characterized by fast and irregular heartbeats, and the algorithm is now good at predicting when a heartbeat will happen, it can quickly learn to detect these irregularities.

three rows of horizontal lines with regularly spaced vertical spikes — Machine learning pre-training on filling in the blanks of a heart bio-signal.
Eloy Geenjaar

The idea of filling in the blanks can be generalized to other bio-signals as well. Previous research has shown, and our work reconfirmed, that pretraining a model on one bio-signal without any labels allows it to learn clinically useful relationships from other bio-signals with few labels. This shortcut means that researchers can pretrain on bio-signals that are easy to gather and use the machine learning model on ones that are hard to gather and label.

Faster disorder detection development

By improving pretraining, researchers can make machine learning algorithms better and more efficient at detecting diseases and disorders. Pretraining improvements reduce cost and time spent by experts labeling.

A recent example of machine learning algorithms used for early detection is Google’s Loss of Pulse smartwatch feature. The emerging field of bio-signal pretraining can help enable faster development of similar features using a wider range of bio-signals and for a wider range of disorders.

With increasing types of bio-signals and more data, researchers may be able to discover relationships that dramatically improve early detection of disease and disorders. The earlier many diseases and disorders are found, the better a treatment plan works for patients.

The post “Fill-in-the-blank training primes AI to interpret health data from smartwatches and fitness trackers” by Eloy Geenjaar, Ph.D. Student in Electrical Engineering & Computer Engineering, Georgia Institute of Technology was published on 04/10/2025 by theconversation.com

Contents

Challenges of working with bio-signals

Learning to fill in the blanks

Faster disorder detection development