Shortly after birth, infants are given an Apgar score, a measure of how well they are adjusting to life outside the womb. This is the first time a child is rated: on a scale of 0 to 10, with up to two points awarded for each of the five measures.
One is skin color, which indicates how much oxygen the baby is getting. A newborn gets two points only if it is completely pink. Pale or blue fingers and toes earn one point; a baby who is completely white, gray, or blue gets none.
Some doctors consider the tool a quick and easy way to assess whether a newborn needs urgent care. But a growing body of research has shown that black and other babies of color don’t score as well on the Apgar as white babies.
Now, a growing number of experts are calling for changes, fearing that reliance on the widely used scale could make some newborns appear sicker than they are and expose them to unnecessary medical treatment.
Not only is color perception subjective, critics say, but skin tone is a confusing and discriminatory measure to include in a health screening tool.
“Skin color should no longer be considered as part of a health assessment,” said Dr. Amos Grünebaum, professor of obstetrics and gynecology at Hofstra University’s Zucker School of Medicine. “It should not be part of a medical exam.”
In addition to skin tone, the Apgar test evaluates infants’ heart rate, breathing, muscle tone, and reflexes. A perfect score is 10.
Dr. Grünebaum led a study that analyzed the Apgar scores of more than nine million American babies born between 2016 and 2019. Overall, he found that only 2.6% of newborns achieved a perfect score.
But a black baby’s chances of achieving that were half that of a white baby: 3.3 percent of white babies scored a perfect 10, compared with 1.4 percent of black newborns, the study found.
Only 1.2 percent of Chinese-American infants scored a 10, and less than 1 percent of Indian-American infants scored a 10. The study was published last year in the Journal of Perinatal Medicine.
Review of the The universal use of the Apgar score comes as racial disparities and concerns about systemic bias in medicine are gaining renewed attention. Experts are reexamining many aspects of health care, including how race is used in clinical algorithms that determine treatment.
Steps have been taken in recent years to eliminate race and ethnicity from many health assessments. Many health systems, for example, have abandoned the use of a common measure of kidney function that adjusted for race, adopting instead a race-neutral measure.
Late last year, the American Heart Association removed race from the formulas used to calculate a patient’s risk of developing cardiovascular disease.
But experts tweak these algorithms with apprehension, fearing that careless changes could be harmful. If skin color is removed from the Apgar score, for example, it won’t be easy to find another measure to replace it — and it’s not clear that removing the measure altogether would be enough.
The goal must be to ensure that people of color have equitable access to the full range of available medical treatments and resources, said Dr. David Jones, a Harvard professor who has been a leading advocate for removing race from clinical algorithms.
But critics of the Apgar score’s reliance on skin color worry that it could lead to additional medical treatment for healthy babies of color, potentially sending them to intensive care unnecessarily.
This decision separates newborns from their mothers, disrupts bonding and breastfeeding, puts babies at risk of infections and can be traumatic for parents. “Neonatal intensive care units save lives, but babies only belong there if they need to be,” said Dr. Grünebaum.
Other research has also shown that Apgar scores are a less accurate indicator of health outcomes for black babies than for white babies. One study linked the disparities specifically to the skin color component of the scoring system.
It was the first clinical method that drew attention to the specific medical needs of newborns and recognized them as patients. It has since been adopted worldwide to provide a rapid assessment of the infant’s transition to extrauterine life.
Yet these results have only been validated in predominantly white populations, and most of the world’s population is not white.
In a paper published in July 1953, Dr. Apgar stated that a score of two for skin color “was given only when the child was entirely pink.” She had reservations about this criterion, however, stating that it was “by far the most unsatisfactory” of the five that made up the composite score.
“Foreign material often covering the infant’s skin at birth interferes with the interpretation of this sign, as does the inherited pigmentation of the skin of colored children and an occasional congenital defect,” she wrote.
The test is performed one minute after birth and repeated five minutes later. If the newborn still does not perform well, the Apgar is repeated every five minutes.
A score between seven and ten is considered normal, while a score of three or less is considered low and means the baby has a higher than average risk of dying before the age of one.
But the association between the scores and infant well-being varies considerably by race and ethnicity, according to a study by Dr. Emma Gillette of the Arnhold Institute for Global Health at the Icahn School of Medicine in New York City.
She and her colleagues found that low five-minute scores were much more imprecise predictors of poor outcomes for black babies than for other babies.
Asian infants with low Apgar scores were 100 times more likely to die than Asian babies with normal scores. White babies with low scores were 54 times more likely to die than white babies with normal scores.
But black babies with low scores were only 23 times more likely to die than those with normal scores. “That’s pretty shocking, because overall, black babies are almost twice as likely to die,” Dr. Gillette said.
Still, the scores are a useful tool in the first minutes of life and a “powerful predictor of mortality in the first year of life,” she added.
Only one study has incorporated the kind of data researchers need to determine whether skin color is responsible for black babies’ lower Apgar scores. Dr. Sara E. Edwards launched the study because she believed some black infants were being sent unnecessarily to the neonatal intensive care unit at her medical center, the University of Illinois Hospital and Health Sciences System.
“I felt like I was seeing babies who didn’t look like they needed to go to the NICU, but they did,” said Dr. Edwards, who is now a maternal-fetal medicine fellow at the Icahn School of Medicine at Mount Sinai in New York. “They looked clinically healthy and vigorous, but they ended up having lower Apgar scores.”
She and her colleagues analyzed data on babies from the Chicago hospital, which included scores on individual components. The study also compared Apgar scores to results from laboratory tests of cord blood gases, which are an objective measure of the newborn’s metabolic status and oxygen levels.
The researchers were surprised by the results of their study. Although black newborns had lower Apgar scores at one and five minutes than non-black babies and were admitted to the neonatal intensive care unit at a higher rate than non-black babies, they did not have more abnormal umbilical cord gas values.
“There was no difference in cord gas, but there was a difference in Apgar scores – that’s the key take-home message,” said Dr. Quetzal Class, lead author of the study, published last year in the American Journal of Obstetrics and Gynecology.
The study also found that skin color was behind racial differences in the first Apgar tests, those given to babies one minute after birth.
“The reason this is important is that the quick score given subjectively by the provider helps determine whether that newborn is sent to the NICU or not,” Dr. Class said.
She added: “There are a ton of other factors that can be considered, but the most important question is whether your Apgar score is 3 or 4. In that case, you’re going to go to the neonatal intensive care unit.”
Neonatal resuscitation experts acknowledge both the subjectivity and imprecision of some aspects of the Apgar test and say it is not meant to be used in isolation to guide clinical decisions.
“We don’t put any emphasis on the Apgar score at all,” said Vishal S. Kapadia, MD, co-chair of the American Academy of Pediatrics’ Neonatal Resuscitation Program Steering Committee.
A 2015 policy statement from the AAP and the American College of Obstetricians and Gynecologists clearly acknowledges the limitations of the Apgar score, Dr. Kapadia noted.
“We emphasize rapid assessment of breathing and heart rate to guide resuscitation decisions, not color,” Dr. Kapadia said. “We know that Apgar scores can be subjective and imperfect.”
Dr. Eric Eichenwald, chief of neonatology at Children’s Hospital of Philadelphia, noted that since the test was developed, many other tools have been added to delivery rooms to help evaluate newborns.
If skin color is not considered as an indicator of oxygenation, it may be difficult to find a substitute. Cord gas analyses take time to perform in a laboratory and cannot provide immediate results.
Pulse oximetry — a noninvasive method that uses a clip-like device attached to a finger to measure oxygen saturation in the blood — has also been shown to be less accurate for black patients.
The devices work by shining light through the skin and detecting the color of the blood, which varies depending on the amount of oxygen. Scientists suspect that inaccurate measurements may be due to the way light is absorbed by the skin’s darker pigments.
Dr. Grünebaum believes that skin color should be removed from the Apgar test and an eight-point scale should be adopted. But he says this change will be difficult to implement because the test is so widely used around the world. “People don’t even think about the Apgar score anymore, it’s calculated automatically,” he said.