Both Software Engineering and Machine Learning have become recognized disciplines. In this article I analyse the combination of the two: engineering of machine learning applications. I believe the systematic way of working for machine learning applications is at certain points different from traditional (rule-based) software engineering. The question I set out to investigate is “How does software engineering change when we develop machine learning applications”?. This question is not an easy to answer and turns out to be a rather new, with few publications. This article collects what I have found until now.
LINK
Machine learning models have proven to be reliable methods in classification tasks. However, little research has been conducted on the classification of dwelling characteristics based on smart meter and weather data before. Gaining insights into dwelling characteristics, which comprise of the type of heating system used, the number of inhabitants, and the number of solar panels installed, can be helpful in creating or improving the policies to create new dwellings at nearly zero-energy standard. This paper compares different supervised machine learning algorithms, namely Logistic Regression, Support Vector Machine, K-Nearest Neighbor, and Long-short term memory, and methods used to correctly implement these algorithms. These methods include data pre-processing, model validation, and evaluation. Smart meter data, which was used to train several machine learning algorithms, was provided by Groene Mient. The models that were generated by the algorithms were compared on their performance. The results showed that the Long-short term memory performed the best with 96% accuracy. Cross Validation was used to validate the models, where 80% of the data was used for training purposes and 20% was used for testing purposes. Evaluation metrics were used to produce classification reports, which indicates that the Long-short term memory outperforms the compared models on the evaluation metrics for this specific problem.
DOCUMENT
In this post I give an overview of the theory, tools, frameworks and best practices I have found until now around the testing (and debugging) of machine learning applications. I will start by giving an overview of the specificities of testing machine learning applications.
LINK
In this paper, we present a digital tool named Diversity Perspectives in Social Media (DivPSM) which conducts automated content analysis of strategic diversity communication in organizational social media posts, using supervised machine-learning. DivPSM is trained to identify whether a post makes mention of diversity or a diversity-related issue, and to subsequently code for the presence of three diversity dimensions (cultural/ethnic/racial, gender, and LHGBTQ+ diversity) and three diversity perspectives (the moral, market, and innovation perspectives). In Study 1, we describe the training and validation of the instrument, and examine how it performs compared to human coders. Our findings confirm that DivPSM is sufficiently reliable for use in future research. In study 2, we illustrate the type of data that DivPSM generates, by analyzing the prevalence of strategic diversity communication in social media posts (n = 84,561) of large organizations in the Netherlands. Our results show that in this context gender diversity is most prevalent, followed by LHGBTQ+ and cultural/ethnic/racial diversity. Furthermore, gender diversity is often associated with the innovation perspective, whereas LHGBTQ+ diversity is more often associated with the moral perspective. Cultural/ethnic/racial diversity does not show strong associations with any of the perspectives. Theoretical implications and directions for future research are discussed at the end of the paper.
MULTIFILE
Machine learning models have proven to be reliable methods in classification tasks. However, little research has been done on classifying dwelling characteristics based on smart meter & weather data before. Gaining insights into dwelling characteristics can be helpful to create/improve the policies for creating new dwellings at NZEB standard. This paper compares the different machine learning algorithms and the methods used to correctly implement the models. These methods include the data pre-processing, model validation and evaluation. Smart meter data was provided by Groene Mient, which was used to train several machine learning algorithms. The models that were generated by the algorithms were compared on their performance. The results showed that Recurrent Neural Network (RNN) 2performed the best with 96% of accuracy. Cross Validation was used to validate the models, where 80% of the data was used for training purposes and 20% was used for testing purposes. Evaluation metrices were used to produce classification reports, which can indicate which of the models work the best for this specific problem. The models were programmed in Python.
DOCUMENT
Graphs are ubiquitous. Many graphs, including histograms, bar charts, and stacked dotplots, have proven tricky to interpret. Students’ gaze data can indicate students’ interpretation strategies on these graphs. We therefore explore the question: In what way can machine learning quantify differences in students’ gaze data when interpreting two near-identical histograms with graph tasks in between? Our work provides evidence that using machine learning in conjunction with gaze data can provide insight into how students analyze and interpret graphs. This approach also sheds light on the ways in which students may better understand a graph after first being presented with other graph types, including dotplots. We conclude with a model that can accurately differentiate between the first and second time a student solved near-identical histogram tasks.
DOCUMENT
The current set of research methods on ictresearchmethods.nl contains only one research method that refers to machine learning: the “Data analytics” method in the “Lab” strategy. This does not reflect the way of working in ML projects, where Data Analytics is not a method to answer one question but the main goal of the project. For ML projects, the Data Analytics method should be divided in several smaller steps, each becoming a method of its own. In other words, we should treat the Data Analytics (or more appropriate ML engineering) process in the same way the software engineering process is treated in the framework. In the remainder of this post I will briefly discuss each of the existing research methods and how they apply to ML projects. The methods are organized by strategy. In the discussion I will give pointers to relevant tools or literature for ML projects.
LINK
The prevention and diagnosis of frailty syndrome (FS) in cardiac patients requires innovative systems to support medical personnel, patient adherence, and self-care behavior. To do so, modern medicine uses a supervised machine learning approach (ML) to study the psychosocial domains of frailty in cardiac patients with heart failure (HF). This study aimed to determine the absolute and relative diagnostic importance of the individual components of the Tilburg Frailty Indicator (TFI) questionnaire in patients with HF. An exploratory analysis was performed using machine learning algorithms and the permutation method to determine the absolute importance of frailty components in HF. Based on the TFI data, which contain physical and psychosocial components, machine learning models were built based on three algorithms: a decision tree, a random decision forest, and the AdaBoost Models classifier. The absolute weights were used to make pairwise comparisons between the variables and obtain relative diagnostic importance. The analysis of HF patients’ responses showed that the psychological variable TFI20 diagnosing low mood was more diagnostically important than the variables from the physical domain: lack of strength in the hands and physical fatigue. The psychological variable TFI21 linked with agitation and irritability was diagnostically more important than all three physical variables considered: walking difficulties, lack of hand strength, and physical fatigue. In the case of the two remaining variables from the psychological domain (TFI19, TFI22), and for all variables from the social domain, the results do not allow for the rejection of the null hypothesis. From a long-term perspective, the ML based frailty approach can support healthcare professionals, including psychologists and social workers, in drawing their attention to the nonphysical origins of HF.
DOCUMENT
Youyou et al. showed that from 70 likes the algorithm could predict the personality better than friends, from 150 likes better than family members and from 300 likes even better than the test person himself. However, the machine learning algorithm does not know the person better than the colleagues, the friends or the person themselves. The machine can "only", after sufficient "supervised learning" trials (iterations), determine the correlation between the click behaviour on Facebook and the scored Big5 factors better than individuals. Prediction replaces the Big5 questionnaire. But we are not getting closer to the personality of people than with the Big5 questionnaire. It is argued that - though data mining can help enormously - psychology remains a subject of the narrative in the end.
MULTIFILE
BACKGROUND: Approximately 5%-10% of elementary school children show delayed development of fine motor skills. To address these problems, detection is required. Current assessment tools are time-consuming, require a trained supervisor, and are not motivating for children. Sensor-augmented toys and machine learning have been presented as possible solutions to address this problem.OBJECTIVE: This study examines whether sensor-augmented toys can be used to assess children's fine motor skills. The objectives were to (1) predict the outcome of the fine motor skill part of the Movement Assessment Battery for Children Second Edition (fine MABC-2) and (2) study the influence of the classification model, game, type of data, and level of difficulty of the game on the prediction.METHODS: Children in elementary school (n=95, age 7.8 [SD 0.7] years) performed the fine MABC-2 and played 2 games with a sensor-augmented toy called "Futuro Cube." The game "roadrunner" focused on speed while the game "maze" focused on precision. Each game had several levels of difficulty. While playing, both sensor and game data were collected. Four supervised machine learning classifiers were trained with these data to predict the fine MABC-2 outcome: k-nearest neighbor (KNN), logistic regression (LR), decision tree (DT), and support vector machine (SVM). First, we compared the performances of the games and classifiers. Subsequently, we compared the levels of difficulty and types of data for the classifier and game that performed best on accuracy and F1 score. For all statistical tests, we used α=.05.RESULTS: The highest achieved mean accuracy (0.76) was achieved with the DT classifier that was trained on both sensor and game data obtained from playing the easiest and the hardest level of the roadrunner game. Significant differences in performance were found in the accuracy scores between data obtained from the roadrunner and maze games (DT, P=.03; KNN, P=.01; LR, P=.02; SVM, P=.04). No significant differences in performance were found in the accuracy scores between the best performing classifier and the other 3 classifiers for both the roadrunner game (DT vs KNN, P=.42; DT vs LR, P=.35; DT vs SVM, P=.08) and the maze game (DT vs KNN, P=.15; DT vs LR, P=.62; DT vs SVM, P=.26). The accuracy of only the best performing level of difficulty (combination of the easiest and hardest level) achieved with the DT classifier trained with sensor and game data obtained from the roadrunner game was significantly better than the combination of the easiest and middle level (P=.046).CONCLUSIONS: The results of our study show that sensor-augmented toys can efficiently predict the fine MABC-2 scores for children in elementary school. Selecting the game type (focusing on speed or precision) and data type (sensor or game data) is more important for determining the performance than selecting the machine learning classifier or level of difficulty.
DOCUMENT