Machine-learning models can make mistakes and be difficult to use, so scientists have developed explanation methods to help users understand when and how they should trust a model’s predictions.
These explanations are often complex, however, perhaps containing information about hundreds of model features. And they are sometimes presented as multifaceted visualizations that can be difficult for users who lack machine-learning expertise to fully comprehend.
To help people make sense of AI explanations, MIT researchers used large language models (LLMs) to transform plot-based explanations into plain language.
They developed a two-part system that converts a machine-learning explanation into a paragraph of human-readable text and then automatically evaluates the quality of the narrative, so an end-user knows whether to trust it.
By prompting the system with a few example explanations, the researchers can customize its narrative descriptions to meet the preferences of users or the requirements of specific applications.
In the long run, the researchers hope to build upon this technique by enabling users to ask a model follow-up questions about how it came up with predictions in real-world settings.
“Our goal with this research was to take the first step toward allowing users to have full-blown conversations with machine-learning models about the reasons they made certain predictions, so they can make better decisions about whether to listen to the model,” says Alexandra Zytek, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this technique.
She is joined on the paper by Sara Pido, an MIT postdoc; Sarah Alnegheimish, an EECS graduate student; Laure Berti-Équille, a research director at the French National Research Institute for Sustainable Development; and senior author Kalyan Veeramachaneni, a principal research scientist in the Laboratory for Information and Decision Systems. The research will be presented at the IEEE Big Data Conference.
Elucidating explanations
The researchers focused on a popular type of machine-learning explanation called SHAP. In a SHAP explanation, a value is assigned to every feature the model uses to make a prediction. For instance, if a model predicts house prices, one feature might be the location of the house. Location would be assigned a positive or negative value that represents how much that feature modified the model’s overall prediction.
Often, SHAP explanations are presented as bar plots that show which features are most or least important. But for a model with more than 100 features, that bar plot quickly becomes unwieldy.
“As researchers, we have to make a lot of choices about what we are going to present visually. If we choose to show only the top 10, people might wonder what happened to another feature that isn’t in the plot. Using natural language unburdens us from having to make those choices,” Veeramachaneni says.
However, rather than utilizing a large language model to generate an explanation in natural language, the researchers use the LLM to transform an existing SHAP explanation into a readable narrative.
By only having the LLM handle the natural language part of the process, it limits the opportunity to introduce inaccuracies into the explanation, Zytek explains.
Their system, called EXPLINGO, is divided into two pieces that work together.
The first component, called NARRATOR, uses an LLM to create narrative descriptions of SHAP explanations that meet user preferences. By initially feeding NARRATOR three to five written examples of narrative explanations, the LLM will mimic that style when generating text.
“Rather than having the user try to define what type of explanation they are looking for, it is easier to just have them write what they want to see,” says Zytek.
This allows NARRATOR to be easily customized for new use cases by showing it a different set of manually written examples.
After NARRATOR creates a plain-language explanation, the second component, GRADER, uses an LLM to rate the narrative on four metrics: conciseness, accuracy, completeness, and fluency. GRADER automatically prompts the LLM with the text from NARRATOR and the SHAP explanation it describes.
“We find that, even when an LLM makes a mistake doing a task, it often won’t make a mistake when checking or validating that task,” she says.
Users can also customize GRADER to give different weights to each metric.
“You could imagine, in a high-stakes case, weighting accuracy and completeness much higher than fluency, for example,” she adds.
Analyzing narratives
For Zytek and her colleagues, one of the biggest challenges was adjusting the LLM so it generated natural-sounding narratives. The more guidelines they added to control style, the more likely the LLM would introduce errors into the explanation.
“A lot of prompt tuning went into finding and fixing each mistake one at a time,” she says.
To test their system, the researchers took nine machine-learning datasets with explanations and had different users write narratives for each dataset. This allowed them to evaluate the ability of NARRATOR to mimic unique styles. They used GRADER to score each narrative explanation on all four metrics.
In the end, the researchers found that their system could generate high-quality narrative explanations and effectively mimic different writing styles.
Their results show that providing a few manually written example explanations greatly improves the narrative style. However, those examples must be written carefully — including comparative words, like “larger,” can cause GRADER to mark accurate explanations as incorrect.
Building on these results, the researchers want to explore techniques that could help their system better handle comparative words. They also want to expand EXPLINGO by adding rationalization to the explanations.
In the long run, they hope to use this work as a stepping stone toward an interactive system where the user can ask a model follow-up questions about an explanation.
“That would help with decision-making in a lot of ways. If people disagree with a model’s prediction, we want them to be able to quickly figure out if their intuition is correct, or if the model’s intuition is correct, and where that difference is coming from,” Zytek says.


