Can Machine Learning Predict Coeliac Disease?

Researchers recently explored whether machine learning can predict the likelihood of coeliac disease seropositivity using routine biochemical blood tests.

The study’s findings indicate that while this approach shows some promise, its predictive power is currently too low for practical clinical application.

Research Approach and Findings

The study, conducted by a team in Copenhagen, analysed a large dataset of blood test results from over 54,000 patients who had been tested for coeliac disease. The goal was to identify patterns in their blood work from up to five years prior to their diagnosis. The researchers developed two distinct machine learning models:

“Full” Model: This model was given a comprehensive set of 77 biochemical test parameters, allowing the algorithm to identify potential predictors without prior assumptions.
“Curated” Model: This model was more focused, using a smaller, pre-selected set of 18 test parameters that are already considered clinically relevant for coeliac disease.

Both models, however, demonstrated limited discriminative ability, with an estimated area under the curve (AUC) of 0.68 for the “Full” model and 0.63 for the “Curated” model. An AUC of 1.0 represents a perfect prediction, while an AUC of 0.5 is no better than a random guess. The results show that the models performed only slightly better than chance.

The most influential predictors in the models were immunoglobulin A (IgA) and food allergen antibodies. This is not a surprising finding, as these tests are already part of the standard diagnostic protocol for coeliac disease. The models did not identify any novel or highly predictive biochemical markers.

Implications and Limitations

The study highlights the challenge of predicting coeliac disease using biochemical data alone. The researchers suggest that the subtle and often non-specific nature of biochemical changes in undiagnosed coeliac disease may not be distinct enough to be reliably detected by these models.

A significant limitation of this study is the large amount of missing data, as not all patients had every test performed. The models attempted to account for this, but the high degree of missingness likely impacted their performance.

The findings indicate that while machine learning can provide some insight, a more robust predictive model would likely need to incorporate a broader range of clinical data, such as patient-reported symptoms, genetic information, and family history, to effectively reduce the current diagnostic delay for coeliac disease.

Similar research: Promising Research In Coeliac Disease Detection – read here>

The Challenge of “Silent” Symptoms

One of the key reasons the models performed weakly is that coeliac disease often presents with non-specific symptoms. Patients might feel tired, have headaches, or experience a range of other common ailments. Because these symptoms aren’t unique to coeliac disease, the blood tests ordered by doctors for these conditions might not show a clear pattern.

For example, a low iron count could be a sign of coeliac disease, but it’s also a common issue for many people. The machine learning model, without additional context, might struggle to differentiate between a patient with non-coeliac iron deficiency and one whose low iron is a result of their undiagnosed coeliac disease. This “noise” in the data makes it difficult to find a strong, clear signal.

Read the full research at Nature.com.