To cement trust and drive business success, we must be able to explain how algorithms “think.”
Artificial intelligence (AI) promises to help companies accomplish business goals in the short and long term. But if companies can’t demonstrate that AI has been correctly programmed, people may be less likely to trust AI-powered systems. It’s natural for people to distrust what they don’t fully understand—and there’s a lot about AI that’s not immediately clear. For example, how does it know what action to complete or decision to make? How can companies prevent it from making a mistake?
If AI-powered software turns down a mortgage application or singles out certain individuals for additional screening at airport security, the bank and airport must be able to explain why. More importantly, if human safety is at risk, how can you guarantee that the reasoning behind the AI system’s decision-making is clear—and that you have taken into account the possibility that the humans working alongside the AI system will override it?
At the moment, some machine learning models that underlie AI applications qualify as “black boxes.” That is, humans can’t always understand exactly how a given machine learning algorithm makes decisions.
This understanding is important in many AI use cases. Examples include robots that replace human workers on an assembly line, or software that can crunch huge amounts of medical data in a very short time to help physicians more accurately diagnose and treat their patients.
To reach the point where AI helps people work better and smarter, business leaders must take steps to help people understand how AI learns. People also should understand exactly what is behind AI’s reasoning and decision-making once it has learned how to perform its intended function. PwC is in the process of establishing a framework to help companies that are employing AI. With this framework, companies can better ensure they have assessed potential concerns about whether users or others will (or will not) be likely to understand—and ultimately trust—the decisions that AI makes.
Expanding abilities drive the need to understand AI’s decisions
As AI gets smarter, the need grows for humans interacting with or in charge of the AI to understand why it does what it does. An AI system’s performance—that is, its ability to perform correctly at all times—is also important. AI system architects must ensure that the data they use to “teach” the system is representative and that there is no bias in the data.
The emerging field of explainable AI (XAI) aims to create new AI methods that are accountable to human reasoning. The framework PwC is developing will ultimately help companies ensure they develop machine learning models that address potential concerns about trustworthiness.
Companies should be able to account for three specific aspects of how an AI system decides what to do, which is determined by the machine learning model used to train the AI. These traits are defined as follows:
- Explainability: the ability to understand the reasoning behind each individual prediction
- Transparency: the ability to fully understand the model upon which the AI decision-making is based
- Provability: the level of mathematical certainty behind predictions
For example, a bank might develop AI software that helps home loan officers to evaluate applicants faster and to better gauge the chances of loan repayment. If the bank can ensure appropriate levels of explainability, transparency, and provability, it can help build confidence in the AI system among loan officers and consumers alike.
No two AI applications are the same
The importance of each of the three traits described above varies by use case, because every AI application will be slightly different and require its own level of rigor. The concept of rigor refers to whether the results of the AI’s decisions could harm humans, the company, or others. In some cases—such as an e-commerce recommendation engine that suggests products that shoppers might be interested in based on their browsing and purchase history—not knowing exactly what is behind an AI decision is not necessarily a problem. But in more critical use cases—self-driving cars or a situation in which AI-powered robots are working alongside humans in a factory—the risk level increases. If the AI makes an incorrect choice in these cases, the result could be human injury or even death.
When determining the rigor required in a given use case, you should look at two factors: criticality, which is the extent to which human safety is at risk, and vulnerability, which refers to the likelihood that human operators working with the AI will distrust its decisions and override it, defeating the purpose of having the AI in the first place. To assess criticality, ask questions such as the following: Will users’ safety be at risk? Is safety significant? Are there other risks to end users? Is there financial risk? Is the financial risk significant?
The more yes answers you have, the higher the rigor required. The level of criticality of a use case is also related to its level of vulnerability. To assess vulnerability, ask questions such as the following: Are current users considered experts? Will users feel the automated process is highly critical? Does the conventional technique require significant training? Will the automated process result in job displacement?
As with criticality, the more yes answers to the vulnerability assessment questions, the greater the vulnerability of a use case.
Take, for example, the home loan application scenario. If the AI makes mistakes—either by approving too many applicants who can’t make their mortgage payments or by rejecting applications from qualified candidates due to biased data—there are real consequences for the bank in the form of financial losses or damage to the bank’s reputation. However, no one is physically injured or killed in this case.
The more complex the use case and the greater the level of rigor required, the greater the need for high levels of explainability, transparency, and provability.
Who does what—and why does each role matter?
As with any high-functioning team, everyone should have a clear understanding of their role in developing machine learning models to train AI systems. In the context of PwC’s AI framework, executives are responsible for defining the scope and specifications of a given use case and making sure AI is the appropriate solution. How can AI solve the problem better than anything else? Does the potential profit or cost savings justify the cost of the AI solution?
You also want to be aware of the other key roles: Developers build the machine learning model, select the algorithms used for the AI application, and verify that the AI was built correctly. Analysts validate that the AI model the developers created will meet the business need at hand. End users will use the AI and therefore must trust the model because they are the arbiters of its success.
When all team members are doing their jobs, you can achieve the necessary level of explainability for a given AI application. The AI will function properly and can be trained through an iterative process in which analysts provide feedback and developers fine-tune the models to help the machines get smarter.
Help users build trust in AI
AI is still a new enough technology that people do not implicitly trust it to work flawlessly. At least not yet. Companies can help users and consumers trust AI systems by ensuring their machine learning models are explainable, transparent, and provable.