Key to reinforcement learning in multi-agent systems is the ability to exploit the fact that agents only directly influence only a small subset of the other agents. Such loose couplings are often modelled using a graphical model: a coordination graph. Finding an (approximately) optimal joint action for a given coordination graph is therefore a central subroutine in cooperative multi-agent reinforcement learning (MARL). Much research in MARL focuses on how to gradually update the parameters of the coordination graph, whilst leaving the solving of the coordination graph up to a known typically exact and generic subroutine. However, exact methods { e.g., Variable Elimination { do not scale well, and generic methods do not exploit the MARL setting of gradually updating a coordination graph and recomputing the joint action to select. In this paper, we examine what happens if we use a heuristic method, i.e., local search, to select joint actions in MARL, and whether we can use outcome of this local search from a previous time-step to speed up and improve local search. We show empirically that by using local search, we can scale up to many agents and complex coordination graphs, and that by reusing joint actions from the previous time-step to initialise local search, we can both improve the quality of the joint actions found and the speed with which these joint actions are found.
LINK
Industrial robot manipulators are widely used for repetitive applications that require high precision, like pick-and-place. In many cases, the movements of industrial robot manipulators are hard-coded or manually defined, and need to be adjusted if the objects being manipulated change position. To increase flexibility, an industrial robot should be able to adjust its configuration in order to grasp objects in variable/unknown positions. This can be achieved by off-the-shelf vision-based solutions, but most require prior knowledge about each object tobe manipulated. To address this issue, this work presents a ROS-based deep reinforcement learning solution to robotic grasping for a Collaborative Robot (Cobot) using a depth camera. The solution uses deep Q-learning to process the color and depth images and generate a greedy policy used to define the robot action. The Q-values are estimated using Convolutional Neural Network (CNN) based on pre-trained models for feature extraction. Experiments were carried out in a simulated environment to compare the performance of four different pre-trained CNNmodels (RexNext, MobileNet, MNASNet and DenseNet). Results showthat the best performance in our application was reached by MobileNet,with an average of 84 % accuracy after training in simulated environment.
DOCUMENT
The number of applications in which industrial robots share their working environment with people is increasing. Robots appropriate for such applications are equipped with safety systems according to ISO/TS 15066:2016 and are often referred to as collaborative robots (cobots). Due to the nature of human-robot collaboration, the working environment of cobots is subjected to unforeseeable modifications caused by people. Vision systems are often used to increase the adaptability of cobots, but they usually require knowledge of the objects to be manipulated. The application of machine learning techniques can increase the flexibility by enabling the control system of a cobot to continuously learn and adapt to unexpected changes in the working environment. In this paper we address this issue by investigating the use of Reinforcement Learning (RL) to control a cobot to perform pick-and-place tasks. We present the implementation of a control system that can adapt to changes in position and enables a cobot to grasp objects which were not part of the training. Our proposed system uses deep Q-learning to process color and depth images and generates an (Formula presented.) -greedy policy to define robot actions. The Q-values are estimated using Convolution Neural Networks (CNNs) based on pre-trained models for feature extraction. To reduce training time, we implement a simulation environment to first train the RL agent, then we apply the resulting system on a real cobot. System performance is compared when using the pre-trained CNN models ResNext, DenseNet, MobileNet, and MNASNet. Simulation and experimental results validate the proposed approach and show that our system reaches a grasping success rate of 89.9% when manipulating a never-seen object operating with the pre-trained CNN model MobileNet.
DOCUMENT
Just-in-time adaptive intervention (JITAI) has gained attention recently and previous studies have indicated that it is an effective strategy in the field of mobile healthcare intervention. Identifying the right moment for the intervention is a crucial component. In this paper the reinforcement learning (RL) technique has been used in a smartphone exercise application to promote physical activity. This RL model determines the ‘right’ time to deliver a restricted number of notifications adaptively, with respect to users’ temporary context information (i.e., time and calendar). A four-week trial study was conducted to examine the feasibility of our model with real target users. JITAI reminders were sent by the RL model in the fourth week of the intervention, while the participants could only access the app’s other functionalities during the first 3 weeks. Eleven target users registered for this study, and the data from 7 participants using the application for 4 weeks and receiving the intervening reminders were analyzed. Not only were the reaction behaviors of users after receiving the reminders analyzed from the application data, but the user experience with the reminders was also explored in a questionnaire and exit interviews. The results show that 83.3% reminders sent at adaptive moments were able to elicit user reaction within 50 min, and 66.7% of physical activities in the intervention week were performed within 5 h of the delivery of a reminder. Our findings indicated the usability of the RL model, while the timing of the moments to deliver reminders can be further improved based on lessons learned.
DOCUMENT
For people with early-dementia (PwD), it can be challenging to remember to eat and drink regularly and maintain a healthy independent living. Existing intelligent home technologies primarily focus on activity recognition but lack adaptive support. This research addresses this gap by developing an AI system inspired by the Just-in-Time Adaptive Intervention (JITAI) concept. It adapts to individual behaviors and provides personalized interventions within the home environment, reminding and encouraging PwD to manage their eating and drinking routines. Considering the cognitive impairment of PwD, we design a human-centered AI system based on healthcare theories and caregivers’ insights. It employs reinforcement learning (RL) techniques to deliver personalized interventions. To avoid overwhelming interaction with PwD, we develop an RL-based simulation protocol. This allows us to evaluate different RL algorithms in various simulation scenarios, not only finding the most effective and efficient approach but also validating the robustness of our system before implementation in real-world human experiments. The simulation experimental results demonstrate the promising potential of the adaptive RL for building a human-centered AI system with perceived expressions of empathy to improve dementia care. To further evaluate the system, we plan to conduct real-world user studies.
DOCUMENT
Background: Although physical activity (PA) has positive effects on health and well-being, physical inactivity is a worldwide problem. Mobile health interventions have been shown to be effective in promoting PA. Personalizing persuasive strategies improves intervention success and can be conducted using machine learning (ML). For PA, several studies have addressed personalized persuasive strategies without ML, whereas others have included personalization using ML without focusing on persuasive strategies. An overview of studies discussing ML to personalize persuasive strategies in PA-promoting interventions and corresponding categorizations could be helpful for such interventions to be designed in the future but is still missing. Objective: First, we aimed to provide an overview of implemented ML techniques to personalize persuasive strategies in mobile health interventions promoting PA. Moreover, we aimed to present a categorization overview as a starting point for applying ML techniques in this field. Methods: A scoping review was conducted based on the framework by Arksey and O’Malley and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) criteria. Scopus, Web of Science, and PubMed were searched for studies that included ML to personalize persuasive strategies in interventions promoting PA. Papers were screened using the ASReview software. From the included papers, categorized by the research project they belonged to, we extracted data regarding general study information, target group, PA intervention, implemented technology, and study details. On the basis of the analysis of these data, a categorization overview was given. Results: In total, 40 papers belonging to 27 different projects were included. These papers could be categorized in 4 groups based on their dimension of personalization. Then, for each dimension, 1 or 2 persuasive strategy categories were found together with a type of ML. The overview resulted in a categorization consisting of 3 levels: dimension of personalization, persuasive strategy, and type of ML. When personalizing the timing of the messages, most projects implemented reinforcement learning to personalize the timing of reminders and supervised learning (SL) to personalize the timing of feedback, monitoring, and goal-setting messages. Regarding the content of the messages, most projects implemented SL to personalize PA suggestions and feedback or educational messages. For personalizing PA suggestions, SL can be implemented either alone or combined with a recommender system. Finally, reinforcement learning was mostly used to personalize the type of feedback messages. Conclusions: The overview of all implemented persuasive strategies and their corresponding ML methods is insightful for this interdisciplinary field. Moreover, it led to a categorization overview that provides insights into the design and development of personalized persuasive strategies to promote PA. In future papers, the categorization overview might be expanded with additional layers to specify ML methods or additional dimensions of personalization and persuasive strategies.
DOCUMENT
Professional development of teacher educators is an important topic, because teacher educators need to maintain and enhance their expertise in order to educate our future teachers (Kools & Koster, n.d. ; Dengerink, Lunenberg & Kools, 2015). How do teacher educators fulfil this task, especially within the hectic timeframe of everyday work? I asked four colleges to participate in a group to share their experiences, actions or behaviour in the organisation about their development in their profession of being a teacher educator. My purpose is to bring awareness and movement into that group. My research focusses on teacher educators in a large teacher education department in the Netherlands and the opportunities for action available to them. During this study we are currently creating a learning environment in which mutual cooperation increases the learning potential of all participants. In this group participants take or make time to learn, giving words to their scopes . Researcher and participants discuss and explore on the basis of equality, reciprocity and mutual understanding. By deploying methods borrowed from ‘Appreciative Inquiry’(Massenlink et al., 2008) the enthusiasm of a study group is raised and the intrinsic motivation of the participants stimulated. Our study group will convene three times. Its goal is to stimulate cooperation among teacher educators through optimisation of existing qualities, a method that could be described as empowerment, or a process of collective reinforcement ‘To learn’ involves experiencing that what one does really matters, as well as developing one’s own persona in the local community. Intervention, action, reflection and study group meetings alternate in the course of our research. In addition to audio and video recordings, data consists of reports drawn up on the basis of member checks. Data is analysed qualitatively by coding the interview texts and reports. After applying the codes, the researcher discusses the coding in a research group and with the participants of the study group (membercheck). Working collaboratively can offer learning challenges that catalyse growth as a professional, teacher educators become acquainted and approach each other from the perspective of their respective professional and functional responsibilities. This study offers perspectives for other teacher educators to recognize these possibilities in their own situation. Moreover the study offers a description of a way to organise collegial exchange. The research is related to the RDC professional development of teacher educators.
DOCUMENT
Processes of collective learning are expected to increase the professionalism of teachers and school leaders. Little is known about the processes of collective learning which take place in schools and about the way in which those processes may be improved. This paper describes a research into processes of collective learning at three primary schools. Processes of collective learning are described which took place in small teams in these schools. It is also pointed out which attempts can be made in order to reinforce these processes in the schools mentioned.
DOCUMENT
This paper presents a mixed methods study in which 77 students and 3 teachers took part, that investigated the practice of Learning by Design (LBD). The study is part of a series of studies, funded by the Netherlands Organisation for Scientific Research (NWO), that aims to improve student learning, teaching skills and teacher training. LBD uses the context of design challenges to learn, among other things, science. Previous research showed that this approach to subject integration is quite successful but provides little profit regarding scientific concept learning. Perhaps, when the process of concept learning is better understood, LBD is a suitable method for integration. Through pre- and post-exams we measured, like others, a medium gain in the mastery of scientific concepts. Qualitative data revealed important focus-related issues that impede concept learning. As a result, mainly implicit learning of loose facts and incomplete concepts occurs. More transparency of the learning situation and a stronger focus on underlying concepts should make concept learning more explicit and coherent.
DOCUMENT
Artificially intelligent agents increasingly collaborate with humans in human-agent teams. Timely proactive sharing of relevant information within the team contributes to the overall team performance. This paper presents a machine learning approach to proactive communication in AI-agents using contextual factors. Proactive communication was learned in two consecutive experimental steps: (a) multi-agent team simulations to learn effective communicative behaviors, and (b) human-agent team experiments to refine communication suitable for a human team member. Results consist of proactive communication policies for communicating both beliefs and goals within human-agent teams. Agents learned to use minimal communication to improve team performance in simulation, while they learned more specific socially desirable behaviors in the human-agent team experiment
DOCUMENT