Explainable Artificial Intelligence (XAI) aims to provide insights into the inner workings and the outputs of AI systems. Recently, there’s been growing recognition that explainability is inherently human-centric, tied to how people perceive explanations. Despite this, there is no consensus in the research community on whether user evaluation is crucial in XAI, and if so, what exactly needs to be evaluated and how. This systematic literature review addresses this gap by providing a detailed overview of the current state of affairs in human-centered XAI evaluation. We reviewed 73 papers across various domains where XAI was evaluated with users. These studies assessed what makes an explanation “good” from a user’s perspective, i.e., what makes an explanation meaningful to a user of an AI system. We identified 30 components of meaningful explanations that were evaluated in the reviewed papers and categorized them into a taxonomy of human-centered XAI evaluation, based on: (a) the contextualized quality of the explanation, (b) the contribution of the explanation to human-AI interaction, and (c) the contribution of the explanation to human- AI performance. Our analysis also revealed a lack of standardization in the methodologies applied in XAI user studies, with only 19 of the 73 papers applying an evaluation framework used by at least one other study in the sample. These inconsistencies hinder cross-study comparisons and broader insights. Our findings contribute to understanding what makes explanations meaningful to users and how to measure this, guiding the XAI community toward a more unified approach in human-centered explainability.
MULTIFILE
As artificial intelligence (AI) reshapes hiring, organizations increasingly rely on AI-enhanced selection methods such as chatbot-led interviews and algorithmic resume screening. While AI offers efficiency and scalability, concerns persist regarding fairness, transparency, and trust. This qualitative study applies the Artificially Intelligent Device Use Acceptance (AIDUA) model to examine how job applicants perceive and respond to AI-driven hiring. Drawing on semi-structured interviews with 15 professionals, the study explores how social influence, anthropomorphism, and performance expectancy shape applicant acceptance, while concerns about transparency and fairness emerge as key barriers. Participants expressed a strong preference for hybrid AI-human hiring models, emphasizing the importance of explainability and human oversight. The study refines the AIDUA model in the recruitment context and offers practical recommendations for organizations seeking to implement AI ethically and effectively in selection processes.
MULTIFILE
This research investigates the potential and challenges of using artificial intelligence, specifically the ChatGPT-4 model developed by OpenAI, in grading and providing feedback in an educational setting. By comparing the grading of a human lecturer and ChatGPT-4 in an experiment with 105 students, our study found a strong positive correlation between the scores given by both, despite some mismatches. In addition, we observed that ChatGPT-4's feedback was effectively personalized and understandable for students, contributing to their learning experience. While our findings suggest that AI technologies like ChatGPT-4 can significantly speed up the grading process and enhance feedback provision, the implementation of these systems should be thoughtfully considered. With further research and development, AI can potentially become a valuable tool to support teaching and learning in education. https://saiconference.com/FICC
DOCUMENT
From the article: The ethics guidelines put forward by the AI High Level Expert Group (AI-HLEG) present a list of seven key requirements that Human-centered, trustworthy AI systems should meet. These guidelines are useful for the evaluation of AI systems, but can be complemented by applied methods and tools for the development of trustworthy AI systems in practice. In this position paper we propose a framework for translating the AI-HLEG ethics guidelines into the specific context within which an AI system operates. This approach aligns well with a set of Agile principles commonly employed in software engineering. http://ceur-ws.org/Vol-2659/
DOCUMENT
Both because of the shortcomings of existing risk assessment methodologies, as well as newly available tools to predict hazard and risk with machine learning approaches, there has been an emerging emphasis on probabilistic risk assessment. Increasingly sophisticated AI models can be applied to a plethora of exposure and hazard data to obtain not only predictions for particular endpoints but also to estimate the uncertainty of the risk assessment outcome. This provides the basis for a shift from deterministic to more probabilistic approaches but comes at the cost of an increased complexity of the process as it requires more resources and human expertise. There are still challenges to overcome before a probabilistic paradigm is fully embraced by regulators. Based on an earlier white paper (Maertens et al., 2022), a workshop discussed the prospects, challenges and path forward for implementing such AI-based probabilistic hazard assessment. Moving forward, we will see the transition from categorized into probabilistic and dose-dependent hazard outcomes, the application of internal thresholds of toxicological concern for data-poor substances, the acknowledgement of user-friendly open-source software, a rise in the expertise of toxicologists required to understand and interpret artificial intelligence models, and the honest communication of uncertainty in risk assessment to the public.
DOCUMENT
Design schools in digital media and interaction design face the challenge of integrating recent artificial intelligence (AI) advancements into their curriculum. To address this, curricula must teach students to design both "with" and "for" AI. This paper addresses how designing for AI differs from designing for other novel technologies that have entered interaction design education. Future digital designers must develop new solution repertoires for intelligent systems. The paper discusses preparing students for these challenges, suggesting that design schools must choose between a lightweight and heavyweight approach toward the design of AI. The lightweight approach prioritises designing front-end AI applications, focusing on user interfaces, interactions, and immediate user experience impact. This requires adeptness in designing for evolving mental models and ethical considerations but is disconnected from a deep technological understanding of the inner workings of AI. The heavyweight approach emphasises conceptual AI application design, involving users, altering design processes, and fostering responsible practices. While it requires basic technological understanding, the specific knowledge needed for students remains uncertain. The paper compares these approaches, discussing their complementarity.
DOCUMENT
Recent years have seen a massive growth in ethical and legal frameworks to govern data science practices. Yet one of the core questions associated with ethical and legal frameworks is the extent to which they are implemented in practice. A particularly interesting case in this context comes to public officials, for whom higher standards typically exist. We are thus trying to understand how ethical and legal frameworks influence the everyday practices on data and algorithms of public sector data professionals. The following paper looks at two cases: public sector data professionals (1) at municipalities in the Netherlands and (2) at the Netherlands Police. We compare these two cases based on an analytical research framework we develop in this article to help understanding of everyday professional practices. We conclude that there is a wide gap between legal and ethical governance rules and the everyday practices.
MULTIFILE
In the book, 40 experts speak, who explain in clear language what AI is, and what questions, challenges and opportunities the technology brings.
DOCUMENT
Risk assessment instruments are widely used to predict risk of adverse outcomes, such as violence or victimization, and to allocate resources for managing these risks among individuals involved in criminal justice and forensic mental health services. For risk assessment instruments to reach their full potential, they must be implemented with fidelity. A lack of information on administration fidelity hinders transparency about the implementation quality, as well as the interpretation of negative or inconclusive findings from predictive validity studies. The present study focuses on adherence, a dimension of fidelity. Adherence denotes the extent to which the risk assessment is completed according to the instrument’s guidelines. We developed an adherence measure, tailored to the ShortTerm Assessment of Risk and Treatability: Adolescent Version (START:AV), an evidence-based risk assessment instrument for adolescents. With the START:AV Adherence Rating Scale, we explored the degree to which 11 key features of the instrument were adhered to in 306 START:AVs forms, completed by 17 different evaluators in a Dutch residential youth care facility over a two-year period. Good to excellent interrater reliability was found for all adherence items. We identified differences in adherence scores on the various START:AV features, as well as significant improvement in adherence for those who attended a START:AV refresher workshop. Outcomes of risk assessment instruments potentially impact decision-making, for example, whether a youth’s secure placement should be extended. Therefore, we recommend fidelity monitoring to ensure the risk assessment practice was delivered as intended.
DOCUMENT
Research into automatic text simplification aims to promote access to information for all members of society. To facilitate generalizability, simplification research often abstracts away from specific use cases, and targets a prototypical reader and an underspecified content creator. In this paper, we consider a real-world use case – simplification technology for use in Dutch municipalities – and identify the needs of the content creators and the target audiences in this scenario. The stakeholders envision a system that (a) assists the human writer without taking over the task; (b) provides diverse outputs, tailored for specific target audiences; and (c) explains the suggestions that it outputs. These requirements call for technology that is characterized by modularity, explainability, and variability. We argue that these are important research directions that require further exploration
MULTIFILE