Multilevel models (MLMs) are increasingly deployed in industry across different functions. Applications usually result in binary classification within groups or hierarchies based on a set of input features. For transparent and ethical applications of such models, sound audit frameworks need to be developed. In this paper, an audit framework for technical assessment of regression MLMs is proposed. The focus is on three aspects: model, discrimination, and transparency & explainability. These aspects are subsequently divided into sub-aspects. Contributors, such as inter MLM-group fairness, feature contribution order, and aggregated feature contribution, are identified for each of these sub-aspects. To measure the performance of the contributors, the framework proposes a shortlist of KPIs, among others, intergroup individual fairness (DiffInd_MLM) across MLM-groups, probability unexplained (PUX) and percentage of incorrect feature signs (POIFS). A traffic light risk assessment method is furthermore coupled to these KPIs. For assessing transparency & explainability, different explainability methods (SHAP and LIME) are used, which are compared with a model intrinsic method using quantitative methods and machine learning modelling.Using an open-source dataset, a model is trained and tested and the KPIs are computed. It is demonstrated that popular explainability methods, such as SHAP and LIME, underperform in accuracy when interpreting these models. They fail to predict the order of feature importance, the magnitudes, and occasionally even the nature of the feature contribution (negative versus positive contribution on the outcome). For other contributors, such as group fairness and their associated KPIs, similar analysis and calculations have been performed with the aim of adding profundity to the proposed audit framework. The framework is expected to assist regulatory bodies in performing conformity assessments of AI systems using multilevel binomial classification models at businesses. It will also benefit providers, users, and assessment bodies, as defined in the European Commission’s proposed Regulation on Artificial Intelligence, when deploying AI-systems such as MLMs, to be future-proof and aligned with the regulation.
DOCUMENT
The healthcare sector has been confronted with rapidly rising healthcare costs and a shortage of medical staff. At the same time, the field of Artificial Intelligence (AI) has emerged as a promising area of research, offering potential benefits for healthcare. Despite the potential of AI to support healthcare, its widespread implementation, especially in healthcare, remains limited. One possible factor contributing to that is the lack of trust in AI algorithms among healthcare professionals. Previous studies have indicated that explainability plays a crucial role in establishing trust in AI systems. This study aims to explore trust in AI and its connection to explainability in a medical setting. A rapid review was conducted to provide an overview of the existing knowledge and research on trust and explainability. Building upon these insights, a dashboard interface was developed to present the output of an AI-based decision-support tool along with explanatory information, with the aim of enhancing explainability of the AI for healthcare professionals. To investigate the impact of the dashboard and its explanations on healthcare professionals, an exploratory case study was conducted. The study encompassed an assessment of participants’ trust in the AI system, their perception of its explainability, as well as their evaluations of perceived ease of use and perceived usefulness. The initial findings from the case study indicate a positive correlation between perceived explainability and trust in the AI system. Our preliminary findings suggest that enhancing the explainability of AI systems could increase trust among healthcare professionals. This may contribute to an increased acceptance and adoption of AI in healthcare. However, a more elaborate experiment with the dashboard is essential.
LINK
Artificial Intelligence (AI) offers organizations unprecedented opportunities. However, one of the risks of using AI is that its outcomes and inner workings are not intelligible. In industries where trust is critical, such as healthcare and finance, explainable AI (XAI) is a necessity. However, the implementation of XAI is not straightforward, as it requires addressing both technical and social aspects. Previous studies on XAI primarily focused on either technical or social aspects and lacked a practical perspective. This study aims to empirically examine the XAI related aspects faced by developers, users, and managers of AI systems during the development process of the AI system. To this end, a multiple case study was conducted in two Dutch financial services companies using four use cases. Our findings reveal a wide range of aspects that must be considered during XAI implementation, which we grouped and integrated into a conceptual model. This model helps practitioners to make informed decisions when developing XAI. We argue that the diversity of aspects to consider necessitates an XAI “by design” approach, especially in high-risk use cases in industries where the stakes are high such as finance, public services, and healthcare. As such, the conceptual model offers a taxonomy for method engineering of XAI related methods, techniques, and tools.
MULTIFILE
This guide was developed for designers and developers of AI systems, with the goal of ensuring that these systems are sufficiently explainable. Sufficient here means that it meets the legal requirements from AI Act and GDPR and that users can use the system properly. Explainability of decisions is an important requirement in many systems and even an important principle for AI systems [HLEG19]. In many AI systems, explainability is not self-evident. AI researchers expect that the challenge of making AI explainable will only increase. For one thing, this comes from the applications: AI will be used more and more often, for larger and more sensitive decisions. On the other hand, organizations are making better and better models, for example, by using more different inputs. With more complex AI models, it is often less clear how a decision was made. Organizations that will deploy AI must take into account users' need for explanations. Systems that use AI should be designed to provide the user with appropriate explanations. In this guide, we first explain the legal requirements for explainability of AI systems. These come from the GDPR and the AI Act. Next, we explain how AI is used in the financial sector and elaborate on one problem in detail. For this problem, we then show how the user interface can be modified to make the AI explainable. These designs serve as prototypical examples that can be adapted to new problems. This guidance is based on explainability of AI systems for the financial sector. However, the advice can also be used in other sectors.
DOCUMENT
This white paper is the result of a research project by Hogeschool Utrecht, Floryn, Researchable, and De Volksbank in the period November 2021-November 2022. The research project was a KIEM project1 granted by the Taskforce for Applied Research SIA. The goal of the research project was to identify the aspects that play a role in the implementation of the explainability of artificial intelligence (AI) systems in the Dutch financial sector. In this white paper, we present a checklist of the aspects that we derived from this research. The checklist contains checkpoints and related questions that need consideration to make explainability-related choices in different stages of the AI lifecycle. The goal of the checklist is to give designers and developers of AI systems a tool to ensure the AI system will give proper and meaningful explanations to each stakeholder.
MULTIFILE
Multilevel models using logistic regression (MLogRM) and random forest models (RFM) are increasingly deployed in industry for the purpose of binary classification. The European Commission’s proposed Artificial Intelligence Act (AIA) necessitates, under certain conditions, that application of such models is fair, transparent, and ethical, which consequently implies technical assessment of these models. This paper proposes and demonstrates an audit framework for technical assessment of RFMs and MLogRMs by focussing on model-, discrimination-, and transparency & explainability-related aspects. To measure these aspects 20 KPIs are proposed, which are paired to a traffic light risk assessment method. An open-source dataset is used to train a RFM and a MLogRM model and these KPIs are computed and compared with the traffic lights. The performance of popular explainability methods such as kernel- and tree-SHAP are assessed. The framework is expected to assist regulatory bodies in performing conformity assessments of binary classifiers and also benefits providers and users deploying such AI-systems to comply with the AIA.
DOCUMENT
Explainable Artificial Intelligence (XAI) aims to provide insights into the inner workings and the outputs of AI systems. Recently, there’s been growing recognition that explainability is inherently human-centric, tied to how people perceive explanations. Despite this, there is no consensus in the research community on whether user evaluation is crucial in XAI, and if so, what exactly needs to be evaluated and how. This systematic literature review addresses this gap by providing a detailed overview of the current state of affairs in human-centered XAI evaluation. We reviewed 73 papers across various domains where XAI was evaluated with users. These studies assessed what makes an explanation “good” from a user’s perspective, i.e., what makes an explanation meaningful to a user of an AI system. We identified 30 components of meaningful explanations that were evaluated in the reviewed papers and categorized them into a taxonomy of human-centered XAI evaluation, based on: (a) the contextualized quality of the explanation, (b) the contribution of the explanation to human-AI interaction, and (c) the contribution of the explanation to human- AI performance. Our analysis also revealed a lack of standardization in the methodologies applied in XAI user studies, with only 19 of the 73 papers applying an evaluation framework used by at least one other study in the sample. These inconsistencies hinder cross-study comparisons and broader insights. Our findings contribute to understanding what makes explanations meaningful to users and how to measure this, guiding the XAI community toward a more unified approach in human-centered explainability.
MULTIFILE
Research into automatic text simplification aims to promote access to information for all members of society. To facilitate generalizability, simplification research often abstracts away from specific use cases, and targets a prototypical reader and an underspecified content creator. In this paper, we consider a real-world use case – simplification technology for use in Dutch municipalities – and identify the needs of the content creators and the target audiences in this scenario. The stakeholders envision a system that (a) assists the human writer without taking over the task; (b) provides diverse outputs, tailored for specific target audiences; and (c) explains the suggestions that it outputs. These requirements call for technology that is characterized by modularity, explainability, and variability. We argue that these are important research directions that require further exploration
MULTIFILE
As artificial intelligence (AI) reshapes hiring, organizations increasingly rely on AI-enhanced selection methods such as chatbot-led interviews and algorithmic resume screening. While AI offers efficiency and scalability, concerns persist regarding fairness, transparency, and trust. This qualitative study applies the Artificially Intelligent Device Use Acceptance (AIDUA) model to examine how job applicants perceive and respond to AI-driven hiring. Drawing on semi-structured interviews with 15 professionals, the study explores how social influence, anthropomorphism, and performance expectancy shape applicant acceptance, while concerns about transparency and fairness emerge as key barriers. Participants expressed a strong preference for hybrid AI-human hiring models, emphasizing the importance of explainability and human oversight. The study refines the AIDUA model in the recruitment context and offers practical recommendations for organizations seeking to implement AI ethically and effectively in selection processes.
MULTIFILE
The user experience of our daily interactions is increasingly shaped with the aid of AI, mostly as the output of recommendation engines. However, it is less common to present users with possibilities to navigate or adapt such output. In this paper we argue that adding such algorithmic controls can be a potent strategy to create explainable AI and to aid users in building adequate mental models of the system. We describe our efforts to create a pattern library for algorithmic controls: the algorithmic affordances pattern library. The library can aid in bridging research efforts to explore and evaluate algorithmic controls and emerging practices in commercial applications, therewith scaffolding a more evidence-based adoption of algorithmic controls in industry. A first version of the library suggested four distinct categories of algorithmic controls: feeding the algorithm, tuning algorithmic parameters, activating recommendation contexts, and navigating the recommendation space. In this paper we discuss these and reflect on how each of them could aid explainability. Based on this reflection, we unfold a sketch for a future research agenda. The paper also serves as an open invitation to the XAI community to strengthen our approach with things we missed so far.
MULTIFILE