Verify any claim · lenz.io
Claim analyzed
Health“Ultrasound estimated fetal weight has a specific diagnostic accuracy in detecting fetal growth restriction.”
The conclusion
Multiple peer-reviewed systematic reviews and meta-analyses have quantified the diagnostic accuracy of ultrasound estimated fetal weight for detecting fetal growth restriction, with reported sensitivities ranging from ~23% to 89% and specificities from 88% to 99.5%. These metrics are well-documented and reproducible, confirming that specific diagnostic accuracy exists. However, performance varies substantially by gestational age, threshold, and how FGR is defined, and significant measurement errors affect real-world reliability — meaning no single, universal accuracy figure applies.
Based on 21 sources: 14 supporting, 3 refuting, 4 neutral.
Caveats
- Diagnostic accuracy metrics vary widely by gestational age and threshold (e.g., sensitivity of 71% at 32 weeks vs. 48% at 36 weeks), so no single 'specific' figure is universally applicable.
- Ultrasound EFW measures fetal size rather than true growth restriction, and there is no universally accepted gold standard for FGR diagnosis, making reported accuracy figures dependent on how FGR is defined.
- Significant systematic and random measurement errors exist in EFW (mean systematic difference of 8.4%, range −17.5% to 38.3%), and EFW tends to overestimate weight in small fetuses — the very population relevant to FGR detection.
This analysis is for informational purposes only and does not constitute health or medical advice, diagnosis, or treatment. Always consult a qualified healthcare professional before making health-related decisions.
Get notified if new evidence updates this analysis
Create a free account to track this claim.
Sources
Sources used in the analysis
In the third trimester, FGR was defined as an estimated fetal weight (EFW) < 3rd percentile irrespective of fetal Doppler status or EFW 3rd to 10th percentile with abnormal fetal Doppler. The first-trimester risk stratification had a sensitivity of 84.3%, a negative predictive value of 95.9%, an AUC of 0.71, and an OR of 7.06 for FGR in the third trimester.
In 2005, a systematic review assessing the accuracy of ultrasound EFW found the Hadlock A formula produced the smallest systematic mean errors. Seven studies met the inclusion criteria and 11 different formulae were assessed; ultrasound calculation of fetal weight was most commonly overestimated. The Hadlock A formula produced the most accurate results, with the lowest levels of random error.
The mean difference between estimated fetal weight and expected weight over three to six scans ranged from −17.5% to 38.3% with a mean of 8.4%, representing the systematic difference. The standard deviation of these differences ranged from 0.4% to 21% with a mean of 4.3%, representing random difference. Large systematic and random errors in estimated fetal weight have been reported; these may have an impact on the accuracy of fetal growth monitoring.
The pooled sensitivity, specificity, and diagnostic odds ratio in predicting fetal growth restriction were 71% (95% confidence interval, 52%-85%), 90% (95% confidence interval, 79%-95%), and 25.8 (95% confidence interval, 14.5-45.8), respectively, at 32 weeks ultrasound and 48% (95% confidence interval, 41%-55%), 94% (95% confidence interval, 93%-96%), and 16.9 (95% confidence interval, 10.8-26.6), respectively, at 36 weeks ultrasound.
Observed pooled sensitivities of abdominal circumference and estimated fetal weight <10th centile for birthweight <10th centile were 35% (95% confidence interval, 20-52%) and 38% (95% confidence interval, 31-46%), respectively. Observed pooled specificities were 97% (95% confidence interval, 95-98%) and 95% (95% confidence interval, 93-97%), respectively.
The traditional criteria of FGR by EFW <10% identified 21 of these pregnancies at the first growth ultrasound resulting in a sensitivity of 23.3% (15.1, 33.4), specificity of 99.5% (99.0–99.8), positive predictive value of 75.1% (56.7, 87.3), and negative predictive value of 95.6% (95.1, 96.0).
Early identification of fetal growth restriction (FGR) using a combination of ultrasound biometric indicators, Doppler blood flow measurements, and biochemical markers significantly enhances diagnostic accuracy, facilitates timely interventions, and enhances both maternal and fetal outcomes.
Our results show a mean systematic difference of 8.4% between EFW and the back-projected fetal weight indicating that, on average, we are overestimating fetal weight as compared with the EFW growth curve. The mean difference between EFW and Ws over 3–6 scans ranged from −17.5% to 38.3% with a mean of 8.4%, representing the systematic difference. The standard deviation of these differences ranged from 0.4% to 21% with a mean of 4.3%, representing random difference.
Calculation of EFW using ultrasound is generally overestimated, especially in the population of small fetuses, raising concerns regarding increasing levels of obstetric intervention. No significant difference was identified in the levels of systematic error between older and more recent studies, though the random error has reduced to below 10% in the most recent four studies.
Fetal weight below the 10th percentile has a negative predictive value of 99%, a sensitivity of 89%, and a specificity of 88% for the detection of IUGR. However, according to the authors, the poor sensitivity (56.7%) and low positive likelihood ratio (8.9) indicate that additional modalities are needed to improve the usefulness of ultrasound in detecting IUGR in severe preeclampsia.
Fetal growth restriction (FGR) is defined as an ultrasonographic EFW or abdominal circumference (AC) below the 10th percentile for gestational age. Formulas based on 3 or 4 fetal biometric indices are significantly more accurate in estimating fetal weights than formulas based on 1 or 2 indices; however, most EFW formulas tend to overestimate weight in low birthweight babies and underestimate weight in babies >3500g.
When we used ultrasound (US) biometry and maternal risk factors to estimate EFW <10 percentiles, the sensitivity was 44.4% with a specificity of 89% for an FPR (false positive result) of 10%. When we combined the US biometry and maternal risk factors with sFlt1/PIGF ratio, for a cut off of 38, the sensitivity was 84.21%, and the specificity was 84.31% for an FPR of 10%.
Evidence suggests that ultrasound during the third trimester, to measure the baby and estimate weight, is the method likely to detect most small babies before birth. Testing during the third trimester is likely to result in more accurate prediction of smallness for gestational age at birth than earlier testing.
Estimating fetal weight with ultrasound is the best way to find FGR. A diagnosis of FGR is based on the difference between actual and expected measurements at a certain gestational age.
The accuracy of sonographic estimates of fetal weight is not influenced by whether the parturient has oligohydramnios and the accuracy of identifying FGR is similar in women with and without oligohydramnios. This provides specific diagnostic accuracy metrics for EFW in detecting FGR regardless of amniotic fluid volume.
The main uncertainty about individual ultrasound parameters is that their accuracy has never been established. There is no gold standard, as normal measurements of the head, abdomen and femur of the neonate, corresponding to ultrasound measurements of the fetus, have not been defined. In small babies, assessment of weight may also be less accurate, increasing the SD from 7.3 to 9.7%.
EFW measurements are usually right within 10-15% of the baby's actual birth weight. But, this error range can change based on how far along the pregnancy is: Early Pregnancy: In the early stages, EFW measurements are less accurate because the fetus is smaller. Late Pregnancy: At the end of pregnancy, the error range can be up to 20% in some cases.
One limitation of current guidelines is that they are based on the EFW which indicates size, not growth. This increases the chances of missing larger babies that are not meeting their growth potential, but also risks inappropriately diagnosing constitutionally small babies as growth restricted.
Fetal growth restriction (FGR) is defined as an ultrasound estimated fetal weight (EFW) of less than the 10th percentile or abdominal circumference <10% for gestational age. Adverse consequences of FGR usually do not develop until growth is less than the 3rd percentile, but sonographic weight estimates are variable enough that management decisions should be made when the EFW is reported as <10th percentile or the abdominal circumference <10%.
Systematic reviews, such as those in Cochrane Database, confirm that ultrasound EFW has moderate accuracy for detecting fetal growth restriction, with sensitivity around 70-90% and specificity 85-95% depending on thresholds and formulas used, but systematic overestimation in small fetuses reduces performance for FGR detection.
When the Hadlock formula is compared to the NICD growth chart and the intergrowth 21st growth chart it was found that the NICD growth chart has the highest sensitivity at 92% and also the highest negative predictive value at 84%. The intergrowth 21st chart has the highest specificity at 46%.
What do you think of the claim?
Your challenge will appear immediately.
Challenge submitted!
Expert review
How each expert evaluated the evidence and arguments
Expert 1 — The Logic Examiner
The claim asserts that ultrasound EFW has "a specific diagnostic accuracy" in detecting FGR — a claim that is logically satisfied if quantifiable, reproducible diagnostic metrics exist, even if those metrics vary by context. Sources 4, 5, 6, 10, and 12 directly provide precisely quantified sensitivity, specificity, NPV, and PPV figures from peer-reviewed systematic reviews and meta-analyses, which constitutes direct evidence that specific diagnostic accuracy metrics exist and are well-characterized. The opponent's rebuttal conflates two distinct logical issues: (1) variability in point estimates across gestational ages and thresholds, and (2) the absence of any specific diagnostic accuracy — the former does not logically entail the latter, making the opponent's core argument a false equivalence fallacy; context-dependent variation in diagnostic performance is itself a form of specificity, not its negation. The proponent correctly identifies that the opponent's demand for a single universal figure is a straw man of the claim, and the evidence pool overwhelmingly supports that EFW's diagnostic accuracy is quantified, published, and reproducible across independent studies, satisfying the logical requirements of the claim as stated.
Expert 2 — The Context Analyst
The claim states that ultrasound EFW has "a specific diagnostic accuracy" in detecting FGR — a phrase that is technically true in the sense that multiple peer-reviewed studies and meta-analyses have quantified its performance (Sources 4, 5, 6, 10), but critically misleading because those metrics vary substantially by gestational age, threshold used, and FGR definition (sensitivity ranging from ~23% to 89%, specificity from 88% to 99.5%), large systematic and random measurement errors exist in serial scanning (Sources 3, 8), EFW tends to overestimate weight in small fetuses (Sources 9, 11), there is no universally accepted gold standard (Source 16), and EFW reflects size rather than true growth restriction (Source 18) — all of which mean there is no single, stable, generalizable "specific" accuracy figure. The claim is broadly true in that diagnostic accuracy metrics exist and have been quantified, but the framing omits the critical context that these metrics are highly variable, context-dependent, and subject to significant measurement error, making the overall impression — that EFW has a well-defined, reliable diagnostic accuracy — partially misleading without these caveats.
Expert 3 — The Source Auditor
High-authority, peer-reviewed systematic reviews/meta-analyses and cohort studies (Sources 4 PubMed meta-analysis 2024; 5 PubMed meta-analysis 2019; 6 PMC 2022; 9 PMC systematic review 2018) consistently report quantifiable diagnostic performance (sensitivities/specificities/PPV/NPV) for ultrasound EFW thresholds in identifying FGR/SGA, while also documenting nontrivial error and context-dependence; lower-reliability or non-independent items (Sources 10 Medscape, 11 droracle.ai, 14 Stanford patient page, 16 Perinatal Institute PDF, 18 perinatalexcellence.org, 21 YouTube) add little or are opinionated. Taken together, the most trustworthy evidence supports that ultrasound EFW has measurable (i.e., “specific” in the sense of reportable) diagnostic accuracy metrics for detecting FGR, but those metrics vary substantially by gestational age, definition, and threshold, so the claim is only mostly true rather than implying a single stable accuracy.
Expert summary
The arguments
Two AI advocates debated this claim using the research gathered.
Argument for
Multiple high-authority peer-reviewed sources confirm that ultrasound estimated fetal weight (EFW) carries well-characterized, specific diagnostic accuracy metrics for detecting fetal growth restriction: Source 4 reports pooled sensitivity of 71% and specificity of 90% at 32 weeks, Source 5 reports specificity of 95% for EFW below the 10th centile, Source 10 documents a negative predictive value of 99% with sensitivity of 89% and specificity of 88%, and Source 6 reports a specificity of 99.5% — collectively demonstrating that EFW's diagnostic performance is not vague but precisely quantified across multiple independent systematic reviews and meta-analyses. Furthermore, Source 14 (Stanford Medicine Children's Health) affirms that ultrasound EFW is the best established method for diagnosing FGR, and Source 13 confirms third-trimester ultrasound EFW is the most accurate prediction strategy for smallness at birth, making it indisputable that ultrasound EFW has a specific, well-documented diagnostic accuracy in detecting fetal growth restriction.
You're cherry-picking the highest-looking point estimates and treating them as a single “specific” accuracy, but the same evidence base shows wide, clinically meaningful variability by gestational age and definition—e.g., sensitivity shifts from 71% at 32 weeks to 48% at 36 weeks (Source 4) and is only 38% for EFW <10th centile in another meta-analysis (Source 5), so there is no stable, generalizable “specific diagnostic accuracy.” You also conflate predicting SGA-at-birth or citing generic clinical webpages with diagnosing true FGR, while the brief explicitly notes large EFW measurement errors (Source 3/8) and the lack of a gold standard plus the size-vs-growth problem (Source 16, Source 18), which undercuts your claim that accuracy is “well-established” in the way the motion asserts.
Argument against
The motion implies ultrasound estimated fetal weight (EFW) has a stable, well-defined “specific diagnostic accuracy” for detecting fetal growth restriction, but the evidence shows accuracy is highly variable and error-prone: serial-scan studies report large systematic and random EFW errors (differences ranging from −17.5% to 38.3%), which directly undermines any single, specific diagnostic accuracy claim (Source 3, "An evaluation of systematic and random errors..."; Source 8, "An evaluation of systematic and random errors..."). Even pooled diagnostic studies demonstrate inconsistent sensitivities (e.g., ~38% for EFW <10th centile in one meta-analysis vs 71% at 32 weeks and 48% at 36 weeks in another), and guidance notes the fundamental lack of a gold standard and that EFW reflects size rather than true growth—so “specific diagnostic accuracy” is not established in a meaningful, generalizable way (Source 5, "Diagnostic performance of third-trimester ultrasound..."; Source 4, "Diagnostic performance of 32 vs 36 weeks..."; Source 16, Perinatal Institute "Ultrasound biometry and fetal growth restriction"; Source 18, perinatalexcellence.org).
Your argument conflates measurement error in serial scanning with diagnostic accuracy metrics — these are distinct concepts, and the systematic/random errors reported in Source 3 and Source 8 describe variability in weight estimation across repeated scans, not the absence of quantifiable diagnostic performance, which is precisely and independently documented across multiple meta-analyses in Source 4 (sensitivity 71%, specificity 90% at 32 weeks), Source 5 (specificity 95%), and Source 10 (NPV 99%, sensitivity 89%, specificity 88%). Furthermore, your claim that varying sensitivities across studies disprove "specific diagnostic accuracy" commits the fallacy of demanding a single universal figure — diagnostic accuracy is inherently context-dependent by gestational age and threshold, and the fact that these metrics are consistently and precisely reported across independent peer-reviewed systematic reviews is itself the definition of a well-characterized, specific diagnostic accuracy, exactly as the motion states.