Multilevel models (MLMs) are increasingly deployed in industry across different functions. Applications usually result in binary classification within groups or hierarchies based on a set of input features. For transparent and ethical applications of such models, sound audit frameworks need to be developed. In this paper, an audit framework for technical assessment of regression MLMs is proposed. The focus is on three aspects: model, discrimination, and transparency & explainability. These aspects are subsequently divided into sub-aspects. Contributors, such as inter MLM-group fairness, feature contribution order, and aggregated feature contribution, are identified for each of these sub-aspects. To measure the performance of the contributors, the framework proposes a shortlist of KPIs, among others, intergroup individual fairness (DiffInd_MLM) across MLM-groups, probability unexplained (PUX) and percentage of incorrect feature signs (POIFS). A traffic light risk assessment method is furthermore coupled to these KPIs. For assessing transparency & explainability, different explainability methods (SHAP and LIME) are used, which are compared with a model intrinsic method using quantitative methods and machine learning modelling.Using an open-source dataset, a model is trained and tested and the KPIs are computed. It is demonstrated that popular explainability methods, such as SHAP and LIME, underperform in accuracy when interpreting these models. They fail to predict the order of feature importance, the magnitudes, and occasionally even the nature of the feature contribution (negative versus positive contribution on the outcome). For other contributors, such as group fairness and their associated KPIs, similar analysis and calculations have been performed with the aim of adding profundity to the proposed audit framework. The framework is expected to assist regulatory bodies in performing conformity assessments of AI systems using multilevel binomial classification models at businesses. It will also benefit providers, users, and assessment bodies, as defined in the European Commission’s proposed Regulation on Artificial Intelligence, when deploying AI-systems such as MLMs, to be future-proof and aligned with the regulation.
DOCUMENT
Algorithms that significantly impact individuals and society should be transparent, yet they can often function as complex black boxes. Such high-risk AI systems necessitate explainability of their inner workings and decision-making processes, which is also crucial for fostering trust, understanding, and adoption of AI. Explainability is a major topic, not only in literature (Maslej et al. 2024) but also in AI regulation. The EU AI Act imposes explainability requirements on providers and deployers of high-risk AI systems. Additionally, it grants the right to explanation for individuals affected by high-risk AI systems. However, legal literature illustrates a lack of clarity and consensus regarding the definition of explainability and the interpretation of the relevant obligations of the AI Act (See e.g. Bibal et al. 2021; Nannini 2024; Sovrano et al. 2022). The practical implementation also presents further challenges, calling for an interdisciplinary approach (Gyevnar, Ferguson, and Schafer 2023; Nahar et al. 2024, 2110).Explainability can be examined from various perspectives. One such perspective concerns a functional approach, where explanations serve specific functions (Hacker and Passoth 2022). Looking at this functional perspective of explanations, my previous work elaborates on the central functions of explanations interwoven in the AI Act. Through comparative research on the evolution of the explainability provisions in soft and hard law on AI from the High-Level Expert Group on AI, Council of Europe, and OECD, my previous research establishes that explanations in the AI Act primarily serve to provide understanding of the inner workings and output of an AI system, to enable contestation of a decision, to increase usability, and to achieve legal compliance (Van Beem, ongoing work, paper presented at Bileta 2025 conference; submission expected June 2025).Moreover, my previous work reveals that the AI lifecycle is an important concept in AI policy and legal documents. The AI lifecycle includes phases that lead to the design, development, and deployment of an AI system (Silva and Alahakoon 2022). The AI Act requires various explanations in each phase. The provider and deployer shall observe an explainability by design and development approach throughout the entire AI lifecycle, adapting explanations as their AI evolves equally. However, the practical side of balancing between clear, meaningful, legally compliant explanations and technical explanations proves challenging.Assessing this practical side, my current research is a case study in the agricultural sector, where AI plays an increasing role and where explainability is a necessary ingredient for adoption (EPRS 2023). The case study aims to map which legal issues AI providers, deployers, and other AI experts in field crop farming encounter. Secondly, the study explores the role of explainability (and the field of eXplainable AI) in overcoming such legal challenges. The study is conducted through further doctrinal research, case law analysis, and empirical research using interviews, integrating the legal and technical perspectives. Aiming to enhance trustworthiness and adoption of AI in agriculture, this research seeks to contribute to an interdisciplinary debate regarding the practical application of the AI Act's explainability obligations.
DOCUMENT
The healthcare sector has been confronted with rapidly rising healthcare costs and a shortage of medical staff. At the same time, the field of Artificial Intelligence (AI) has emerged as a promising area of research, offering potential benefits for healthcare. Despite the potential of AI to support healthcare, its widespread implementation, especially in healthcare, remains limited. One possible factor contributing to that is the lack of trust in AI algorithms among healthcare professionals. Previous studies have indicated that explainability plays a crucial role in establishing trust in AI systems. This study aims to explore trust in AI and its connection to explainability in a medical setting. A rapid review was conducted to provide an overview of the existing knowledge and research on trust and explainability. Building upon these insights, a dashboard interface was developed to present the output of an AI-based decision-support tool along with explanatory information, with the aim of enhancing explainability of the AI for healthcare professionals. To investigate the impact of the dashboard and its explanations on healthcare professionals, an exploratory case study was conducted. The study encompassed an assessment of participants’ trust in the AI system, their perception of its explainability, as well as their evaluations of perceived ease of use and perceived usefulness. The initial findings from the case study indicate a positive correlation between perceived explainability and trust in the AI system. Our preliminary findings suggest that enhancing the explainability of AI systems could increase trust among healthcare professionals. This may contribute to an increased acceptance and adoption of AI in healthcare. However, a more elaborate experiment with the dashboard is essential.
LINK
This guide was developed for designers and developers of AI systems, with the goal of ensuring that these systems are sufficiently explainable. Sufficient here means that it meets the legal requirements from AI Act and GDPR and that users can use the system properly. Explainability of decisions is an important requirement in many systems and even an important principle for AI systems [HLEG19]. In many AI systems, explainability is not self-evident. AI researchers expect that the challenge of making AI explainable will only increase. For one thing, this comes from the applications: AI will be used more and more often, for larger and more sensitive decisions. On the other hand, organizations are making better and better models, for example, by using more different inputs. With more complex AI models, it is often less clear how a decision was made. Organizations that will deploy AI must take into account users' need for explanations. Systems that use AI should be designed to provide the user with appropriate explanations. In this guide, we first explain the legal requirements for explainability of AI systems. These come from the GDPR and the AI Act. Next, we explain how AI is used in the financial sector and elaborate on one problem in detail. For this problem, we then show how the user interface can be modified to make the AI explainable. These designs serve as prototypical examples that can be adapted to new problems. This guidance is based on explainability of AI systems for the financial sector. However, the advice can also be used in other sectors.
DOCUMENT
This white paper is the result of a research project by Hogeschool Utrecht, Floryn, Researchable, and De Volksbank in the period November 2021-November 2022. The research project was a KIEM project1 granted by the Taskforce for Applied Research SIA. The goal of the research project was to identify the aspects that play a role in the implementation of the explainability of artificial intelligence (AI) systems in the Dutch financial sector. In this white paper, we present a checklist of the aspects that we derived from this research. The checklist contains checkpoints and related questions that need consideration to make explainability-related choices in different stages of the AI lifecycle. The goal of the checklist is to give designers and developers of AI systems a tool to ensure the AI system will give proper and meaningful explanations to each stakeholder.
MULTIFILE
Artificial Intelligence (AI) offers organizations unprecedented opportunities. However, one of the risks of using AI is that its outcomes and inner workings are not intelligible. In industries where trust is critical, such as healthcare and finance, explainable AI (XAI) is a necessity. However, the implementation of XAI is not straightforward, as it requires addressing both technical and social aspects. Previous studies on XAI primarily focused on either technical or social aspects and lacked a practical perspective. This study aims to empirically examine the XAI related aspects faced by developers, users, and managers of AI systems during the development process of the AI system. To this end, a multiple case study was conducted in two Dutch financial services companies using four use cases. Our findings reveal a wide range of aspects that must be considered during XAI implementation, which we grouped and integrated into a conceptual model. This model helps practitioners to make informed decisions when developing XAI. We argue that the diversity of aspects to consider necessitates an XAI “by design” approach, especially in high-risk use cases in industries where the stakes are high such as finance, public services, and healthcare. As such, the conceptual model offers a taxonomy for method engineering of XAI related methods, techniques, and tools.
MULTIFILE
Multilevel models using logistic regression (MLogRM) and random forest models (RFM) are increasingly deployed in industry for the purpose of binary classification. The European Commission’s proposed Artificial Intelligence Act (AIA) necessitates, under certain conditions, that application of such models is fair, transparent, and ethical, which consequently implies technical assessment of these models. This paper proposes and demonstrates an audit framework for technical assessment of RFMs and MLogRMs by focussing on model-, discrimination-, and transparency & explainability-related aspects. To measure these aspects 20 KPIs are proposed, which are paired to a traffic light risk assessment method. An open-source dataset is used to train a RFM and a MLogRM model and these KPIs are computed and compared with the traffic lights. The performance of popular explainability methods such as kernel- and tree-SHAP are assessed. The framework is expected to assist regulatory bodies in performing conformity assessments of binary classifiers and also benefits providers and users deploying such AI-systems to comply with the AIA.
DOCUMENT
Explainable Artificial Intelligence (XAI) aims to provide insights into the inner workings and the outputs of AI systems. Recently, there’s been growing recognition that explainability is inherently human-centric, tied to how people perceive explanations. Despite this, there is no consensus in the research community on whether user evaluation is crucial in XAI, and if so, what exactly needs to be evaluated and how. This systematic literature review addresses this gap by providing a detailed overview of the current state of affairs in human-centered XAI evaluation. We reviewed 73 papers across various domains where XAI was evaluated with users. These studies assessed what makes an explanation “good” from a user’s perspective, i.e., what makes an explanation meaningful to a user of an AI system. We identified 30 components of meaningful explanations that were evaluated in the reviewed papers and categorized them into a taxonomy of human-centered XAI evaluation, based on: (a) the contextualized quality of the explanation, (b) the contribution of the explanation to human-AI interaction, and (c) the contribution of the explanation to human- AI performance. Our analysis also revealed a lack of standardization in the methodologies applied in XAI user studies, with only 19 of the 73 papers applying an evaluation framework used by at least one other study in the sample. These inconsistencies hinder cross-study comparisons and broader insights. Our findings contribute to understanding what makes explanations meaningful to users and how to measure this, guiding the XAI community toward a more unified approach in human-centered explainability.
MULTIFILE
Research into automatic text simplification aims to promote access to information for all members of society. To facilitate generalizability, simplification research often abstracts away from specific use cases, and targets a prototypical reader and an underspecified content creator. In this paper, we consider a real-world use case – simplification technology for use in Dutch municipalities – and identify the needs of the content creators and the target audiences in this scenario. The stakeholders envision a system that (a) assists the human writer without taking over the task; (b) provides diverse outputs, tailored for specific target audiences; and (c) explains the suggestions that it outputs. These requirements call for technology that is characterized by modularity, explainability, and variability. We argue that these are important research directions that require further exploration
MULTIFILE