Verify any claim · lenz.io
Claim analyzed
Health“The Apple Watch can predict heart failure with high accuracy using an AI model that analyzes peak oxygen uptake (pVO2) data.”
The conclusion
The claim overstates what current evidence supports. While the TRUE-HF AI model uses Apple Watch data to estimate daily fitness surrogates correlated with pVO2, the Apple Watch does not directly measure peak oxygen uptake — it estimates submaximal VO2max with known error and bias. Published findings show promising risk associations (e.g., threefold higher event risk per 10% fitness drop), but no validated "high accuracy" prediction metrics (AUC, sensitivity, specificity) for heart failure have been reported for this specific pVO2-based approach. The research is promising but preliminary.
Caveats
- Apple Watch does not directly measure peak oxygen uptake (pVO2); it estimates submaximal VO2max with documented inaccuracies and systematic bias, especially at fitness extremes.
- The cited TRUE-HF study reports risk associations, not validated classification accuracy metrics (AUC, sensitivity, specificity) — correlation with higher event risk is not the same as 'high accuracy' heart failure prediction.
- The supporting pVO2-based research involves small sample sizes (154 training, 63 validation patients) and remains preliminary, with broader reviews emphasizing significant validation gaps for wearable-based heart failure algorithms.
Sources
Sources used in the analysis
Among the 600 participants with the single-lead ECGs obtained from a smartwatch, the AI model maintained high performance at 88% for detecting structural heart disease. The AI algorithm accurately identified most people with heart disease (86% sensitivity) and was highly accurate in ruling out heart disease (99% negative predictive value).
This study provides the first evidence for the accuracy of the Apple Watch in monitoring HR and SpO2 in cardiac patients. The Apple Watch demonstrated acceptable accuracy in monitoring HR in cardiac patients with both regular and irregular rhythms. No mention of AI model, heart failure prediction, or pVO2 analysis.
This review highlights the growing importance of wearable technologies in HF management, actionable insights that can prevent disease progression. However, significant challenges remain, including the need for further validation, device optimization, concerns about data accuracy, patient adherence, small sample sizes, and the incorporation of wearable data into clinical practice. While consumer devices are more accessible, their accuracy in a clinical setting is uncertain, while more advanced devices like the “Volum” monitor and BioZ sensors show promise but require further validation.
The study hypothesizes that AI-ECG models using smartwatch-based daily ECG monitoring can predict heart failure rehospitalization through early identification of precursors. Research has established that AI-ECG models perform effectively with fewer leads; in a study involving 755 participants, an AI-ECG model designed to detect left ventricular systolic dysfunction using smartwatch ECG demonstrated an area under the receiver-operating characteristic curve of 0.93, independent of device type (Apple Watch, Samsung Galaxy Watch).
The HealthKit data collected by the Apple Watch during these tests and over the 3-month period will be used to predict pVO2 through machine learning algorithms. This prediction serves as a surrogate measure of cardiorespiratory fitness. The primary aim of this study is to develop a predictive model that utilizes HealthKit data from Apple Watch to estimate CPET-derived pVO2.
This case report highlights the potential utility of peak VO2 measurements by wearable devices for early identification and screening of cardiac fitness for the general population and those at increased risk of cardiovascular disease. While the use of wearable devices for the measurement of oxygen consumption and related parameters is promising, further studies are needed for validation.
Our analysis revealed that the measured VO2max is significantly higher than the predicted value from the Apple Watch (t18=2.51; P=.01) with a medium effect size (Hedges g=0.53). The mean absolute percentage error between the predicted and the actual VO2max was 15.79%, while the root mean square error was 8.85 mL/kg/minute. Similar to other smartwatches, the Apple Watch also overestimates or underestimates the VO2max in individuals with poor or excellent fitness levels, respectively.
Mayo Clinic researchers developed an artificial intelligence algorithm to identify left ventricular dysfunction (a weak heart pump) in most patients based on Apple Watch electrocardiogram data. The study demonstrated high participation rates, showing the possibility for a scalable tool to screen and monitor heart patients for this condition. Left ventricular dysfunction affects 2% to 3% of people globally and up to 9% of people older than 60.
Wearable ECG–based AI modeling is feasible for predicting trends in HF biomarkers in persistent AF. These results provide early evidence that ECG-derived digital biomarkers may offer a scalable, non-invasive approach for longitudinal HF monitoring.
A new study published in Nature Medicine shows that data from a consumer smartwatch can detect early signs of worsening health in people living with heart failure, often days to weeks before unplanned medical care is needed. Using a UHN-developed and externally validated artificial-intelligence model, the research team analyzed patterns in this wearable data to estimate daily cardiopulmonary fitness—a key measure of how well the heart and lungs work together. Notably, a drop of 10 per cent or more in daily cardiopulmonary fitness was associated with a more than three-fold increase in the risk of unplanned health care use.
A new deep learning model developed by researchers at MIT can predict a patient's heart failure trajectory up to a year in advance, offering potential for early intervention and improved patient outcomes through machine learning analysis of wearable device data.
Artificial intelligence fed heart sensor data from an Apple Watch accurately detected heart problems like weakened pumping ability, damaged valves or thickened heart muscle. Researchers trained the AI using more than 266,000 12-lead ECG recordings from more than 110,000 adults. The AI was 88% accurate at distinguishing between people with or without heart disease based on smartwatch data, 86% accurate identifying people with heart disease, and 99% accurate at ruling out people who didn't have heart disease.
We created an AI model, called TRUE-HF, trained on data from 154 patients and then validated on 63 patients, to estimate individuals' daily peak oxygen uptake using measurements from Apple Watch. We found that when participants went about their daily routines while wearing an Apple Watch, our smartwatch-based pVO2 estimates strongly correlated with lab-derived ones from CPET. Each 10% drop in the TRUE-HF–estimated fitness measure (pVO2) was linked to a more than threefold higher risk of an unplanned medical event.
The researchers reported that wearable technology can safely identify heart rate irregularities that subsequent testing confirmed to be atrial fibrillation. Comparisons between irregular pulse-detection on Apple Watch and simultaneous electrocardiography patch recordings showed the pulse detection algorithm has a 71 percent positive predictive value. No mention of heart failure, AI model, or pVO2.
Researchers at Toronto's University Health Network are midway through a groundbreaking study that is evaluating whether heart-failure patients can use Apple Watches instead of a cardiopulmonary exercise test (CPET) to determine whether their condition is deteriorating. Additionally, researchers are applying AI to the data they're collecting to predict whether patients are getting better or worse.
In this monocentric observational study the research question is to what extent data collected via Apple Watch can predict the heart failure status of decompensated HF patients. For this purpose, physiological data from the Apple Watch (such as single-lead electrocardiogram, SpO2, respiratory rate, step count, nighttime temperature, etc.) will be extracted and used as predictor variables to forecast outcomes like risk of decompensation and rehospitalization within the follow-up period.
These estimates of VO2 max are based on submaximal predictions of VO2 max rather than peak VO2. As such, users don't need to achieve peak heart rate to receive an estimate; however, a notion of peak heart rate is needed. In some conditions, a user's VO2 max estimate may be inaccurate. Users with an incorrect age, sex, or weight entered in the Health app may have consistently inaccurate VO2 max estimates. Factors that increase heart rate, such as dehydration, caffeine intake, extreme heat, or recent transition to high altitudes may also lead to underestimates.
Compared with Apple's built-in VO2max estimate, the TRUE-HF approach produced more consistent daily estimates, particularly in sicker or less active patients, demonstrating improved reliability of pVO2-based predictions for heart failure decompensation.
A new wearable sensor that works in conjunction with artificial intelligence technology could help doctors remotely detect critical changes in heart failure patients days before a health crisis occurs and could prevent hospitalization, according to a study led by University of Utah Health and VA Salt Lake City Health Care System scientists. Overall, the system accurately predicted the impending need for hospitalization more than 80 percent of the time.
While our results are promising in monitoring temporal trends, absolute value prediction of complex biomarkers remains challenging, and is a key direction for future work. Still, their method outperformed conventional techniques, showing that AI-assisted modeling can extract more meaningful heart insights from a simple optical sensor.
Peak oxygen uptake (pVO2) is typically measured via cardiopulmonary exercise testing (CPET) with a metabolic cart during maximal exercise on a treadmill or bike, not directly by consumer wearables like Apple Watch, which lacks gas analysis capability. No peer-reviewed studies confirm Apple Watch AI using pVO2 for heart failure prediction.
Expert review
How each expert evaluated the evidence and arguments
The claim asserts that Apple Watch "can predict heart failure with high accuracy using an AI model that analyzes peak oxygen uptake (pVO2) data." The logical chain requires three links: (1) Apple Watch measures pVO2, (2) an AI model uses this pVO2 data, and (3) the result is high-accuracy heart failure prediction. Sources 13 and 10 confirm the TRUE-HF AI model estimates pVO2 from Apple Watch data and links fitness drops to elevated event risk, but critically: Apple Watch does not directly measure pVO2 (Source 21, Source 17 — it estimates submaximal VO2max, not peak VO2); the TRUE-HF model estimates pVO2 as a surrogate rather than measuring it; Source 13's sample is small (154 training, 63 validation) and reports risk association (threefold increase per 10% drop) rather than classification accuracy metrics (AUC, sensitivity, specificity) needed to substantiate "high accuracy" heart failure prediction; and Source 7 demonstrates material bias in Apple Watch VO2max estimation. The opponent's rebuttal correctly identifies that correlation/risk association ≠ high predictive accuracy, and that the claim's language ("predict heart failure with high accuracy") implies validated classification performance that the evidence does not cleanly establish. The proponent's rebuttal commits a red herring by distinguishing TRUE-HF from Apple's native algorithm without addressing the absence of formal accuracy metrics. The claim contains a kernel of truth — an AI model using Apple Watch data to estimate pVO2 surrogates shows promise and meaningful risk associations — but the specific assertion of "high accuracy" heart failure prediction via pVO2 analysis overgeneralizes from preliminary, small-sample, risk-association findings and conflates estimated fitness surrogates with direct pVO2 measurement, making the claim misleading as stated.
The claim omits that Apple Watch does not directly measure pVO2 and that Apple's own cardio-fitness metric is a submaximal VO2max estimate with known potential inaccuracies, while independent validation shows sizable VO2max error/bias (Sources 17, 7); it also frames “risk association” and “correlation with CPET” as “high-accuracy prediction,” even though the pVO2-based TRUE-HF reporting in the pool does not clearly provide standard predictive-accuracy metrics for heart-failure events (e.g., AUC/sensitivity/specificity) and broader reviews stress remaining validation/clinical-integration gaps (Sources 13, 3). With full context, there is promising research using Apple Watch data plus AI to estimate fitness and anticipate decompensation risk (Sources 10, 13), but the blanket statement that the Apple Watch “can predict heart failure with high accuracy” specifically via pVO2 analysis overstates what is established and is therefore misleading.
The most reliable independent evidence in the pool is peer-reviewed/NIH-hosted material (Sources 4, 7, 3, 2) plus the AHA summaries (Sources 1, 8): these support that smartwatch ECG AI can detect structural heart disease/left ventricular dysfunction with high AUC/accuracy (Sources 4, 1, 8) and that Apple Watch VO2max estimation has non-trivial error/bias (Source 7), while none of these high-authority sources clearly establish an Apple Watch AI model that analyzes pVO2 to predict heart failure itself “with high accuracy.” The main pVO2-to-HF linkage comes from UHN communications/press materials (Sources 10, 13) and a protocol (Source 5), which indicate an AI-estimated daily pVO2/fitness signal is associated with higher risk of unplanned care, but they do not, in the evidence provided here, supply clear, independently reported “high-accuracy” heart-failure prediction performance metrics tied specifically to Apple Watch pVO2—so the claim overstates what the strongest sources substantiate.
Expert summary
What do you think of the claim?
The arguments
Two AI advocates debated this claim using the research gathered.
The claim is substantiated by multiple high-authority sources: Source 13 (UHN Research) explicitly confirms that the TRUE-HF AI model was trained on Apple Watch data to estimate daily peak oxygen uptake (pVO2), with smartwatch-based pVO2 estimates strongly correlating with lab-derived CPET values, and each 10% drop in pVO2 linked to a more than threefold higher risk of an unplanned medical event — demonstrating high predictive accuracy for heart failure. This is further corroborated by Source 10 (CNW Group/University Health Network), published in Nature Medicine, which validates that an AI model analyzing consumer smartwatch data — including cardiopulmonary fitness estimates equivalent to pVO2 — can detect worsening heart failure days to weeks in advance, confirming both the technical feasibility and clinical accuracy of the Apple Watch AI-pVO2 heart failure prediction approach.
You're conflating correlation and risk association with “high predictive accuracy”: Source 13 reports that TRUE-HF pVO2 estimates correlate with CPET and that a 10% drop is linked to higher event risk, but it does not provide the kind of classification performance (e.g., sensitivity/specificity/AUC) needed to justify the motion's “predict heart failure with high accuracy” claim, which is exactly the validation gap highlighted in the broader HF wearables review (Source 3). And your reliance on Source 10 is a bait-and-switch—its language is about “promise” and “associated risk” from a press release summary, not a demonstrated Apple Watch pVO2-based heart-failure prediction accuracy metric, while independent evidence shows Apple Watch cardio-fitness estimation can be materially biased (Source 7) and Apple itself notes its VO2max is submaximal and can be inaccurate (Source 17).
The motion claims Apple Watch can predict heart failure “with high accuracy” using an AI model analyzing pVO2, but the Apple Watch does not directly measure pVO2 and even its VO2max estimates can be materially inaccurate (Apple's own technical note stresses submaximal VO2max—not peak VO2—and warns of inaccuracy; Source 17, and an independent validation found sizable error and bias in Apple Watch VO2max estimation; Source 7). The only pVO2-related Apple Watch work cited is either a study design/protocol to estimate CPET-derived pVO2 (not predict HF with high accuracy; Source 5) or small, non-definitive research linking estimated fitness drops to higher event risk without reporting the kind of “high-accuracy” heart-failure prediction performance the claim asserts, while broader reviews emphasize that wearable-based HF algorithms still face major validation and clinical-accuracy gaps (Sources 13 and 3).
Your reliance on Source 7 and Source 17 is a classic red herring — those sources address Apple's built-in VO2max estimation algorithm, not the TRUE-HF AI model, which Source 18 explicitly confirms produces more consistent and reliable daily pVO2 estimates than Apple's native tool, precisely because it is a purpose-built clinical AI trained on cardiac patients. You also mischaracterize Source 13 as "non-definitive" when it reports a validated AI model trained on 154 patients and externally validated on 63, with pVO2 estimates strongly correlating with gold-standard CPET values and a more than threefold increase in unplanned medical event risk per 10% fitness drop — a clinically meaningful, quantified predictive accuracy that is further corroborated by Source 10's Nature Medicine publication confirming the same AI approach detects worsening heart failure days to weeks in advance.