Summary: Researchers have developed a new approach to improve uncertainty estimates in machine learning models, thereby improving the accuracy of predictions. Their method, IF-COMP, uses the principle of minimum description length to provide more reliable confidence measures for AI decisions, crucial in high-stakes environments like healthcare.
This scalable technique can be applied to large models, helping non-experts determine the reliability of AI predictions. The results could lead to better decision-making in real-world applications.
Highlights:
- Improved accuracy: IF-COMP improves uncertainty estimates in AI predictions.
- Scalability: Applicable to large and complex models in critical environments such as healthcare.
- User-friendly: Helps non-experts assess the reliability of AI decisions.
Source: WITH
Because machine learning models can make false predictions, researchers often equip them with the ability to tell a user how confident they are in a certain decision. This is especially important in high-stakes situations, such as when the models are used to help identify disease in medical images or filter job applications.
But model uncertainty quantifications are only useful if they are precise. If a model says it is 49% certain that a medical image shows a pleural effusion, then 49% of the time the model should be right.
MIT researchers have developed a new approach to improve uncertainty estimates in machine learning models. Their method not only produces more accurate uncertainty estimates than other techniques, it also does so more efficiently.
Additionally, because the technique is scalable, it can be applied to massive deep learning models that are increasingly deployed in healthcare and other safety-critical situations.
This technique could provide end users, many of whom lack machine learning expertise, with better information they can use to determine whether they should trust a model’s predictions or whether the model should be deployed for a particular task.
“It’s easy to see that these models perform really well in scenarios where they are very good, and then assume that they will be just as good in other scenarios.
“So it’s particularly important to promote this type of work that aims to better calibrate the uncertainty of these models to ensure that they match human notions of uncertainty,” says lead author Nathan Ng, a graduate student at the University of Toronto and visiting student at MIT.
Ng wrote the paper with Roger Grosse, an assistant professor of computer science at the University of Toronto, and lead author Marzyeh Ghassemi, an associate professor in the Department of Electrical Engineering and Computer Science and a member of the Institute of Medical Engineering Sciences and the Information and Decision Systems Laboratory. The research will be presented at the International Conference on Machine Learning.
Quantifying uncertainty
Uncertainty quantification methods often require complex statistical calculations that do not scale well to machine learning models with millions of parameters. These methods also require users to make assumptions about the model and the data used to train it.
The MIT researchers took a different approach. They use what’s called the minimum description length (MDL) principle, which doesn’t require assumptions that can hurt the accuracy of other methods. MDL is used to better quantify and calibrate the uncertainty of the test points that the model must label.
The technique the researchers developed, known as IF-COMP, makes MDL fast enough to be used with the types of large deep learning models deployed in many real-world settings.
MDL involves considering all possible labels that a model could assign to a test point. If there are many alternative labels for that point that fit well, its confidence in the label it chose should decrease accordingly.
“One way to understand how confident a model is would be to feed it counterfactual information and see how likely it is to believe you,” Ng says.
For example, consider a model that tells the model that a medical image shows pleural effusion. If researchers tell the model that the image shows edema, and the model is willing to update its belief, then the model should be less confident in its initial decision.
With MDL, if a model is confident in labeling a data point, it should use a very short code to describe that point. If it is unsure of its decision because the point could have many other labels, it uses a longer code to capture those possibilities.
The amount of code used to label a data point is called the stochastic data complexity. If researchers ask the model how willing it is to update its belief about a data point in the face of contrary evidence, the stochastic data complexity should decrease if the model is confident.
But testing every data point using MDL would require a huge amount of computation.
Speed up the process
With IF-COMP, the researchers developed an approximation technique that can accurately estimate the complexity of stochastic data using a special function, called an influence function. They also used a statistical technique called temperature scaling, which improves the calibration of model results. This combination of influence functions and temperature scaling allows for high-quality approximations of the complexity of stochastic data.
Ultimately, IF-COMP can efficiently produce well-calibrated uncertainty quantifications that reflect the true confidence of a model. The technique can also determine whether the model has mislabeled some data points or reveal which data points are outliers.
The researchers tested their system on these three tasks and found that it was faster and more accurate than other methods.
“It’s very important to have confidence that a model is well-calibrated, and it’s increasingly necessary to detect when a specific prediction doesn’t seem quite right. Auditing tools are becoming increasingly necessary in machine learning problems because we’re using large amounts of unexamined data to build models that will be applied to human problems,” Ghassemi says.
IF-COMP is model-independent and can therefore provide accurate uncertainty quantifications for many types of machine learning models. This could allow it to be deployed in a wider range of real-world situations, helping more practitioners make better decisions.
“People need to understand that these systems are very fallible and can make things up as they go along. A model can seem very confident, but it’s willing to believe a whole bunch of different things if there’s evidence to the contrary,” Ng says.
In the future, the researchers want to apply their approach to large language models and study other potential use cases of the minimum description length principle.
About this AI research news
Author: Melanie Grados
Source: WITH
Contact: Melanie Grados – MIT
Picture: Image credited to Neuroscience News
Original research: Access closed.
“Measuring the complexity of stochastic data with Boltzmann influence functions” by Roger Grosse et al. arXiv
Abstract
Measuring stochastic data complexity with Boltzmann influence functions
Estimating the uncertainty of a model’s prediction at a test point is a crucial element to ensure reliability and calibration in the event of distribution shifts.
A minimum description length approach for this problem uses the predictive normalized maximum likelihood (pNML) distribution, which considers every possible label for a data point and decreases confidence in a prediction if other labels are also consistent with the model and training data.
In this work, we propose IF-COMP, a scalable and efficient approximation of the pNML distribution that linearizes the model with a temperature-scaled Boltzmann influence function. IF-COMP can be used to produce well-calibrated predictions on test points as well as to measure complexity in labeled and unlabeled environments.
We experimentally validate IF-COMP on the tasks of uncertainty calibration, mislabeling detection, and OOD detection, where it consistently matches or outperforms robust baseline methods.