monsitj / iStock
A study published in JAMA Network Open demonstrated the feasibility of using advanced predictive algorithms to identify redundant tests in individual patient cases. This approach could take a bite out of a common problem in hospital settings, overutilization of low-value laboratory tests. “Escalating tensions over healthcare reform are really about tensions between cost, quality, and access. One of the only ways to reconcile all of the above is to systematically find and eliminate waste and inefficiencies,” study co-author Jonathan Chen, MD, PhD, an assistant professor at Stanford University’s Department of Medicine, told CLN Stat.
“This can be especially frustrating with estimates of up to one-third of medical expenditures being waste, with laboratory testing being the highest volume medical activity,” he added.
Machine learning prediction models are increasingly becoming a useful tool in medicine, Chen continued. “This project was my take on combining these threads to tackle a common and important issue in medicine using innovative techniques.”
Using multivariable prediction models, Chen and his colleagues conducted a retrospective diagnostic study of three inpatient cohorts: 116,637 at Stanford University Hospital over 10 years; 60,929 at University of Michigan from 2015 to 2018; and 13,940 at the University of California, San Francisco during calendar year 2018.
The investigators developed several prediction modules based on established algorithms and used these models to synthesize large volumes of electronic medical record data to identify low-yield tests and quantify the predictability of results. “Machine learning algorithms largely just learn by example. When an algorithm encounters a new blood test being ordered, the algorithm makes a prediction of the result based on the results of prior tests in similar patient cases the algorithm has seen before,” explained Chen.
Chen and his co-authors’ goal was to identify any test that was unlikely to be useful. This included repeat tests and even initial tests, using a patient’s age, vital signs, and other lab tests to make a prediction about a test’s usefulness, he explained.
Determining a “normal” result could help reduce or eliminate redundancies in testing.
The investigators looked at variables such as negative and positive predictive values, sensitivity, specificity and leveraged area under the receiver operating characteristic curve (AUROC) to calculate their results. Through these methods, Chen and his colleagues observed that they could easily identify overutilized, low-value tests and target them for elimination. As an example, among the Stanford patients, they identified that 792,397 repeat orders had taken place within 24 hours among the top 20 highest volume tests. Some of these repeat tests—white blood cell differential, glycated hemoglobin, and serum albumin level—were unlikely to produce new information in such a short period of time.
“The best-performing machine learning models predicted normal results with an AUROC of 0.90 or greater for 12 stand-alone laboratory tests,” reported the investigators. These included sodium, troponin I, and lactate dehydrogenase tests. The models were also able to predict normal results for 10 common lab test components, including hemoglobin, creatinine, and urea nitrogen.
The study covered a broad sweep across a range of common lab tests. In future work, “we’ll be focusing in on some specific tests, such as blood cultures, for more nuanced decision support and working to implement such tools to start shifting clinical practice patterns,” Chen said.
The fact that machine learning is just another tool, which can help or hurt if misused just as any other, is an important takeaway, Chen noted.
“A good analogy is to think of machine learning prediction models as just another diagnostic test with its own sensitivity and specificity. If it’s used to predict events with no clear action to respond with, then it may be ‘informative’ but only adding complication to our lives,” Chen said. “Where it can greatly simplify and enhance our lives, is when it can synthesize large, complex data streams that are too much for human cognition to deal with.”