Both Software Engineering and Machine Learning have become recognized disciplines. In this article I analyse the combination of the two: engineering of machine learning applications. I believe the systematic way of working for machine learning applications is at certain points different from traditional (rule-based) software engineering. The question I set out to investigate is “How does software engineering change when we develop machine learning applications”?. This question is not an easy to answer and turns out to be a rather new, with few publications. This article collects what I have found until now.
LINK
In this post I give an overview of the theory, tools, frameworks and best practices I have found until now around the testing (and debugging) of machine learning applications. I will start by giving an overview of the specificities of testing machine learning applications.
LINK
The past two years I have conducted an extensive literature and tool review to answer the question: “What should software engineers learn about building production-ready machine learning systems?”. During my research I noted that because the discipline of building production-ready machine learning systems is so new, it is not so easy to get the terminology straight. People write about it from different perspectives and backgrounds and have not yet found each other to join forces. At the same time the field is moving fast and far from mature. My focus on material that is ready to be used with our bachelor level students (applied software engineers, profession-oriented education), helped me to consolidate everything I have found into a body of knowledge for building production-ready machine learning (ML) systems. In this post I will first define the discipline and introduce the terminology for AI engineering and MLOps.
LINK
This video offers a concise exploration of the distinctions between Data Science, AI, Machine Learning, and Deep Learning. Starting with the foundational role of Data Science, it navigates through the various machine learning categories and touches upon the capabilities and constraints of Deep Learning. The discussion culminates in understanding the nuances of AI, differentiating between narrow and general AI. Through insightful examples, viewers are guided on selecting the right technique for specific projects, ensuring both clarity and cost-effectiveness in the realm of data science.
VIDEO
Context:Rapid developments and adoption of machine learning-based software solutions have enabled novel ways to tackle our societal problems. The ongoing digital transformation has led to the incorporation of these software solutions in just about every application domain. Software architecture for machine learning applications used during sustainable digital transformation can potentially aid the evolution of the underlying software system adding to its sustainability over time.Objective:Software architecture for machine learning applications in general is an open research area. When applying it to sustainable digital transformation it is not clear which of its considerations actually apply in this context. We therefore aim to understand how the topics of sustainable digital transformation, software architecture, and machine learning interact with each other.Methods:We perform a systematic mapping study to explore the scientific literature on the intersection of sustainable digital transformation, machine learning and software architecture.Results:We have found that the intersection of interest is small despite the amount of works on its individual aspects, and not all dimensions of sustainability are represented equally. We also found that application domains are diverse and include many important sectors and industry groups. At the same time, the perceived level of maturity of machine learning adoption by existing works seems to be quite low.Conclusion:Our findings show an opportunity for further software architecture research to aid sustainable digital transformation, especially by building on the emerging practice of machine learning operations.
DOCUMENT
The current set of research methods on ictresearchmethods.nl contains only one research method that refers to machine learning: the “Data analytics” method in the “Lab” strategy. This does not reflect the way of working in ML projects, where Data Analytics is not a method to answer one question but the main goal of the project. For ML projects, the Data Analytics method should be divided in several smaller steps, each becoming a method of its own. In other words, we should treat the Data Analytics (or more appropriate ML engineering) process in the same way the software engineering process is treated in the framework. In the remainder of this post I will briefly discuss each of the existing research methods and how they apply to ML projects. The methods are organized by strategy. In the discussion I will give pointers to relevant tools or literature for ML projects.
LINK
The use of machine learning in embedded systems is an interesting topic, especially with the growth in popularity of the Internet of Things (IoT). The capacity of a system, such as a robot, to self-localize, is a fundamental skill for its navigation and decision-making processes. This work focuses on the feasibility of using machine learning in a Raspberry Pi 4 Model B, solving the localization problem using images and fiducial markers (ArUco markers) in the context of the RobotAtFactory 4.0 competition. The approaches were validated using a realistically simulated scenario. Three algorithms were tested, and all were shown to be a good solution for a limited amount of data. Results also show that when the amount of data grows, only Multi-Layer Perception (MLP) is feasible for the embedded application due to the required training time and the resulting size of the model.
DOCUMENT
Graphs are ubiquitous. Many graphs, including histograms, bar charts, and stacked dotplots, have proven tricky to interpret. Students’ gaze data can indicate students’ interpretation strategies on these graphs. We therefore explore the question: In what way can machine learning quantify differences in students’ gaze data when interpreting two near-identical histograms with graph tasks in between? Our work provides evidence that using machine learning in conjunction with gaze data can provide insight into how students analyze and interpret graphs. This approach also sheds light on the ways in which students may better understand a graph after first being presented with other graph types, including dotplots. We conclude with a model that can accurately differentiate between the first and second time a student solved near-identical histogram tasks.
DOCUMENT
Machine learning models have proven to be reliable methods in classification tasks. However, little research has been done on classifying dwelling characteristics based on smart meter & weather data before. Gaining insights into dwelling characteristics can be helpful to create/improve the policies for creating new dwellings at NZEB standard. This paper compares the different machine learning algorithms and the methods used to correctly implement the models. These methods include the data pre-processing, model validation and evaluation. Smart meter data was provided by Groene Mient, which was used to train several machine learning algorithms. The models that were generated by the algorithms were compared on their performance. The results showed that Recurrent Neural Network (RNN) 2performed the best with 96% of accuracy. Cross Validation was used to validate the models, where 80% of the data was used for training purposes and 20% was used for testing purposes. Evaluation metrices were used to produce classification reports, which can indicate which of the models work the best for this specific problem. The models were programmed in Python.
DOCUMENT
Living a sedentary lifestyle is one of the major causes of numerous health problems. To encourage employees to lead a less sedentary life, the Hanze University started a health promotion program. One of the interventions in the program was the use of an activity tracker to record participants' daily step count. The daily step count served as input for a fortnightly coaching session. In this paper, we investigate the possibility of automating part of the coaching procedure on physical activity by providing personalized feedback throughout the day on a participant's progress in achieving a personal step goal. The gathered step count data was used to train eight different machine learning algorithms to make hourly estimations of the probability of achieving a personalized, daily steps threshold. In 80% of the individual cases, the Random Forest algorithm was the best performing algorithm (mean accuracy = 0.93, range = 0.88–0.99, and mean F1-score = 0.90, range = 0.87–0.94). To demonstrate the practical usefulness of these models, we developed a proof-of-concept Web application that provides personalized feedback about whether a participant is expected to reach his or her daily threshold. We argue that the use of machine learning could become an invaluable asset in the process of automated personalized coaching. The individualized algorithms allow for predicting physical activity during the day and provides the possibility to intervene in time.
DOCUMENT