Explainable Artificial Intelligence (XAI) aims to provide insights into the inner workings and the outputs of AI systems. Recently, there’s been growing recognition that explainability is inherently human-centric, tied to how people perceive explanations. Despite this, there is no consensus in the research community on whether user evaluation is crucial in XAI, and if so, what exactly needs to be evaluated and how. This systematic literature review addresses this gap by providing a detailed overview of the current state of affairs in human-centered XAI evaluation. We reviewed 73 papers across various domains where XAI was evaluated with users. These studies assessed what makes an explanation “good” from a user’s perspective, i.e., what makes an explanation meaningful to a user of an AI system. We identified 30 components of meaningful explanations that were evaluated in the reviewed papers and categorized them into a taxonomy of human-centered XAI evaluation, based on: (a) the contextualized quality of the explanation, (b) the contribution of the explanation to human-AI interaction, and (c) the contribution of the explanation to human- AI performance. Our analysis also revealed a lack of standardization in the methodologies applied in XAI user studies, with only 19 of the 73 papers applying an evaluation framework used by at least one other study in the sample. These inconsistencies hinder cross-study comparisons and broader insights. Our findings contribute to understanding what makes explanations meaningful to users and how to measure this, guiding the XAI community toward a more unified approach in human-centered explainability.
MULTIFILE
One aspect of a responsible application of Artificial Intelligence (AI) is ensuring that the operation and outputs of an AI system are understandable for non-technical users, who need to consider its recommendations in their decision making. The importance of explainable AI (XAI) is widely acknowledged; however, its practical implementation is not straightforward. In particular, it is still unclear what the requirements are of non-technical users from explanations, i.e. what makes an explanation meaningful. In this paper, we synthesize insights on meaningful explanations from a literature study and two use cases in the financial sector. We identified 30 components of meaningfulness in XAI literature. In addition, we report three themes associated with explanation needs that were central to the users in our use cases, but are not prominently described in literature: actionability, coherent narratives and context. Our results highlight the importance of narrowing the gap between theoretical and applied responsible AI.
MULTIFILE
Whitepaper: The use of AI is on the rise in the financial sector. Utilizing machine learning algorithms to make decisions and predictions based on the available data can be highly valuable. AI offers benefits to both financial service providers and its customers by improving service and reducing costs. Examples of AI use cases in the financial sector are: identity verification in client onboarding, transaction data analysis, fraud detection in claims management, anti-money laundering monitoring, price differentiation in car insurance, automated analysis of legal documents, and the processing of loan applications.
DOCUMENT
Artificial Intelligence (AI) offers organizations unprecedented opportunities. However, one of the risks of using AI is that its outcomes and inner workings are not intelligible. In industries where trust is critical, such as healthcare and finance, explainable AI (XAI) is a necessity. However, the implementation of XAI is not straightforward, as it requires addressing both technical and social aspects. Previous studies on XAI primarily focused on either technical or social aspects and lacked a practical perspective. This study aims to empirically examine the XAI related aspects faced by developers, users, and managers of AI systems during the development process of the AI system. To this end, a multiple case study was conducted in two Dutch financial services companies using four use cases. Our findings reveal a wide range of aspects that must be considered during XAI implementation, which we grouped and integrated into a conceptual model. This model helps practitioners to make informed decisions when developing XAI. We argue that the diversity of aspects to consider necessitates an XAI “by design” approach, especially in high-risk use cases in industries where the stakes are high such as finance, public services, and healthcare. As such, the conceptual model offers a taxonomy for method engineering of XAI related methods, techniques, and tools.
MULTIFILE
One aspect of a responsible application of Artificial Intelligence (AI) is ensuring that the operation and outputs of an AI system are understandable for non-technical users, who need to consider its recommendations in their decision making. The importance of explainable AI (XAI) is widely acknowledged; however, its practical implementation is not straightforward. In particular, it is still unclear what the requirements are of non-technical users from explanations, i.e. what makes an explanation meaningful. In this paper, we synthesize insights on meaningful explanations from a literature study and two use cases in the financial sector. We identified 30 components of meaningfulness in XAI literature. In addition, we report three themes associated with explanation needs that were central to the users in our use cases, but are not prominently described in literature: actionability, coherent narratives and context. Our results highlight the importance of narrowing the gap between theoretical and applied responsible AI.
MULTIFILE
This white paper is the result of a research project by Hogeschool Utrecht, Floryn, Researchable, and De Volksbank in the period November 2021-November 2022. The research project was a KIEM project1 granted by the Taskforce for Applied Research SIA. The goal of the research project was to identify the aspects that play a role in the implementation of the explainability of artificial intelligence (AI) systems in the Dutch financial sector. In this white paper, we present a checklist of the aspects that we derived from this research. The checklist contains checkpoints and related questions that need consideration to make explainability-related choices in different stages of the AI lifecycle. The goal of the checklist is to give designers and developers of AI systems a tool to ensure the AI system will give proper and meaningful explanations to each stakeholder.
MULTIFILE
The user experience of our daily interactions is increasingly shaped with the aid of AI, mostly as the output of recommendation engines. However, it is less common to present users with possibilities to navigate or adapt such output. In this paper we argue that adding such algorithmic controls can be a potent strategy to create explainable AI and to aid users in building adequate mental models of the system. We describe our efforts to create a pattern library for algorithmic controls: the algorithmic affordances pattern library. The library can aid in bridging research efforts to explore and evaluate algorithmic controls and emerging practices in commercial applications, therewith scaffolding a more evidence-based adoption of algorithmic controls in industry. A first version of the library suggested four distinct categories of algorithmic controls: feeding the algorithm, tuning algorithmic parameters, activating recommendation contexts, and navigating the recommendation space. In this paper we discuss these and reflect on how each of them could aid explainability. Based on this reflection, we unfold a sketch for a future research agenda. The paper also serves as an open invitation to the XAI community to strengthen our approach with things we missed so far.
MULTIFILE
The user’s experience with a recommender system is significantly shaped by the dynamics of user-algorithm interactions. These interactions are often evaluated using interaction qualities, such as controllability, trust, and autonomy, to gauge their impact. As part of our effort to systematically categorize these evaluations, we explored the suitability of the interaction qualities framework as proposed by Lenz, Dieffenbach and Hassenzahl. During this examination, we uncovered four challenges within the framework itself, and an additional external challenge. In studies examining the interaction between user control options and interaction qualities, interdependencies between concepts, inconsistent terminology, and the entity perspective (is it a user’s trust or a system’s trustworthiness) often hinder a systematic inventory of the findings. Additionally, our discussion underscored the crucial role of the decision context in evaluating the relation of algorithmic affordances and interaction qualities. We propose dimensions of decision contexts (such as ‘reversibility of the decision’, or ‘time pressure’). They could aid in establishing a systematic three-way relationship between context attributes, attributes of user control mechanisms, and experiential goals, and as such they warrant further research. In sum, while the interaction qualities framework serves as a foundational structure for organizing research on evaluating the impact of algorithmic affordances, challenges related to interdependencies and context-specific influences remain. These challenges necessitate further investigation and subsequent refinement and expansion of the framework.
LINK
Algorithmic affordances are defined as user interaction mechanisms that allow users tangible control over AI algorithms, such as recommender systems. Designing such algorithmic affordances, including assessing their impact, is not straightforward and practitioners state that they lack resources to design adequately for interfaces of AI systems. This could be amended by creating a comprehensive pattern library of algorithmic affordances. This library should provide easy access to patterns, supported by live examples and research on their experiential impact and limitations of use. The Algorithmic Affordances in Recommender Interfaces workshop aimed to address key challenges related to building such a pattern library, including pattern identification features, a framework for systematic impact evaluation, and understanding the interaction between algorithmic affordances and their context of use, especially in education or with users with a low algorithmic literacy. Preliminary solutions were proposed for these challenges.
LINK
The healthcare sector has been confronted with rapidly rising healthcare costs and a shortage of medical staff. At the same time, the field of Artificial Intelligence (AI) has emerged as a promising area of research, offering potential benefits for healthcare. Despite the potential of AI to support healthcare, its widespread implementation, especially in healthcare, remains limited. One possible factor contributing to that is the lack of trust in AI algorithms among healthcare professionals. Previous studies have indicated that explainability plays a crucial role in establishing trust in AI systems. This study aims to explore trust in AI and its connection to explainability in a medical setting. A rapid review was conducted to provide an overview of the existing knowledge and research on trust and explainability. Building upon these insights, a dashboard interface was developed to present the output of an AI-based decision-support tool along with explanatory information, with the aim of enhancing explainability of the AI for healthcare professionals. To investigate the impact of the dashboard and its explanations on healthcare professionals, an exploratory case study was conducted. The study encompassed an assessment of participants’ trust in the AI system, their perception of its explainability, as well as their evaluations of perceived ease of use and perceived usefulness. The initial findings from the case study indicate a positive correlation between perceived explainability and trust in the AI system. Our preliminary findings suggest that enhancing the explainability of AI systems could increase trust among healthcare professionals. This may contribute to an increased acceptance and adoption of AI in healthcare. However, a more elaborate experiment with the dashboard is essential.
LINK