Analyzing historical decision-related data can help support actual operational decision-making processes. Decision mining can be employed for such analysis. This paper proposes the Decision Discovery Framework (DDF) designed to develop, adapt, or select a decision discovery algorithm by outlining specific guidelines for input data usage, classifier handling, and decision model representation. This framework incorporates the use of Decision Model and Notation (DMN) for enhanced comprehensibility and normalization to simplify decision tables. The framework’s efficacy was tested by adapting the C4.5 algorithm to the DM45 algorithm. The proposed adaptations include (1) the utilization of a decision log, (2) ensure an unpruned decision tree, (3) the generation DMN, and (4) normalize decision table. Future research can focus on supporting on practitioners in modeling decisions, ensuring their decision-making is compliant, and suggesting improvements to the modeled decisions. Another future research direction is to explore the ability to process unstructured data as input for the discovery of decisions.
MULTIFILE
Recent years have seen a massive growth in ethical and legal frameworks to govern data science practices. Yet one of the core questions associated with ethical and legal frameworks is the extent to which they are implemented in practice. A particularly interesting case in this context comes to public officials, for whom higher standards typically exist. We are thus trying to understand how ethical and legal frameworks influence the everyday practices on data and algorithms of public sector data professionals. The following paper looks at two cases: public sector data professionals (1) at municipalities in the Netherlands and (2) at the Netherlands Police. We compare these two cases based on an analytical research framework we develop in this article to help understanding of everyday professional practices. We conclude that there is a wide gap between legal and ethical governance rules and the everyday practices.
MULTIFILE
Completeness of data is vital for the decision making and forecasting on Building Management Systems (BMS) as missing data can result in biased decision making down the line. This study creates a guideline for imputing the gaps in BMS datasets by comparing four methods: K Nearest Neighbour algorithm (KNN), Recurrent Neural Network (RNN), Hot Deck (HD) and Last Observation Carried Forward (LOCF). The guideline contains the best method per gap size and scales of measurement. The four selected methods are from various backgrounds and are tested on a real BMS and metereological dataset. The focus of this paper is not to impute every cell as accurately as possible but to impute trends back into the missing data. The performance is characterised by a set of criteria in order to allow the user to choose the imputation method best suited for its needs. The criteria are: Variance Error (VE) and Root Mean Squared Error (RMSE). VE has been given more weight as its ability to evaluate the imputed trend is better than RMSE. From preliminary results, it was concluded that the best K‐values for KNN are 5 for the smallest gap and 100 for the larger gaps. Using a genetic algorithm the best RNN architecture for the purpose of this paper was determined to be GatedRecurrent Units (GRU). The comparison was performed using a different training dataset than the imputation dataset. The results show no consistent link between the difference in Kurtosis or Skewness and imputation performance. The results of the experiment concluded that RNN is best for interval data and HD is best for both nominal and ratio data. There was no single method that was best for all gap sizes as it was dependent on the data to be imputed.
MULTIFILE
Abstract Background: COVID-19 was first identified in December 2019 in the city of Wuhan, China. The virus quickly spread and was declared a pandemic on March 11, 2020. After infection, symptoms such as fever, a (dry) cough, nasal congestion, and fatigue can develop. In some cases, the virus causes severe complications such as pneumonia and dyspnea and could result in death. The virus also spread rapidly in the Netherlands, a small and densely populated country with an aging population. Health care in the Netherlands is of a high standard, but there were nevertheless problems with hospital capacity, such as the number of available beds and staff. There were also regions and municipalities that were hit harder than others. In the Netherlands, there are important data sources available for daily COVID-19 numbers and information about municipalities. Objective: We aimed to predict the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants per municipality in the Netherlands, using a data set with the properties of 355 municipalities in the Netherlands and advanced modeling techniques. Methods: We collected relevant static data per municipality from data sources that were available in the Dutch public domain and merged these data with the dynamic daily number of infections from January 1, 2020, to May 9, 2021, resulting in a data set with 355 municipalities in the Netherlands and variables grouped into 20 topics. The modeling techniques random forest and multiple fractional polynomials were used to construct a prediction model for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants per municipality in the Netherlands. Results: The final prediction model had an R2 of 0.63. Important properties for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants in a municipality in the Netherlands were exposure to particulate matter with diameters <10 μm (PM10) in the air, the percentage of Labour party voters, and the number of children in a household. Conclusions: Data about municipality properties in relation to the cumulative number of confirmed infections in a municipality in the Netherlands can give insight into the most important properties of a municipality for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants in a municipality. This insight can provide policy makers with tools to cope with COVID-19 and may also be of value in the event of a future pandemic, so that municipalities are better prepared.
LINK
The report from Inholland University is dedicated to the impacts of data-driven practices on non-journalistic media production and creative industries. It explores trends, showcases advancements, and highlights opportunities and threats in this dynamic landscape. Examining various stakeholders' perspectives provides actionable insights for navigating challenges and leveraging opportunities. Through curated showcases and analyses, the report underscores the transformative potential of data-driven work while addressing concerns such as copyright issues and AI's role in replacing human artists. The findings culminate in a comprehensive overview that guides informed decision-making in the creative industry.
MULTIFILE
Through artistic interventions into the computational backbone of maternity services, the artists behind the Body Recovery Unit explore data production and its usages in healthcare governance. Taking their artwork The National Catalogue Of Savings Opportunities. Maternity, Volume 1: London (2017) as a case study, they explore how artists working with ‘live’ computational culture might draw from critical theory, Science and Technology Studies as well as feminist strategies within arts-led enquiry. This paper examines the mechanisms through which maternal bodies are rendered visible or invisible to managerial scrutiny, by exploring the interlocking elements of commissioning structures, nationwide information standards and databases in tandem with everyday maternity healthcare practices on the wards in the UK. The work provides a new context to understand how re-prioritisation of ‘natural’ and ‘normal’ births, breastfeeding, skin-to-skin contact, age of conception and other factors are gaining momentum in sync with cost-reduction initiatives, funding cuts and privatisation of healthcare services.
MULTIFILE
The continuation of emotional abuse as a normalized practice in elite youth sport has received scholarly attention, often with the use of a Foucauldian framework. The use of sense-making, a theoretical framework that focuses on how meaning is created in ambiguous situations, may give additional insights into the continuation of emotionally abusive coaching practices. The purpose of this study was to apply the seven properties of sense-making to explore how athletes and parents made sense of coaching practices in elite women’s gymnastics. We interviewed 14 elite women gymnasts and their parents to examine how they made sense of what occurred during practices. The results show how the sense-making of athletes and parents was an ongoing activity that resulted in a code of silence and a normalization of abusive coaching practices.
MULTIFILE
Background: Adverse outcome pathway (AOP) networks are versatile tools in toxicology and risk assessment that capture and visualize mechanisms driving toxicity originating from various data sources. They share a common structure consisting of a set of molecular initiating events and key events, connected by key event relationships, leading to the actual adverse outcome. AOP networks are to be considered living documents that should be frequently updated by feeding in new data. Such iterative optimization exercises are typically done manually, which not only is a time-consuming effort, but also bears the risk of overlooking critical data. The present study introduces a novel approach for AOP network optimization of a previously published AOP network on chemical-induced cholestasis using artificial intelligence to facilitate automated data collection followed by subsequent quantitative confidence assessment of molecular initiating events, key events, and key event relationships. Methods: Artificial intelligence-assisted data collection was performed by means of the free web platform Sysrev. Confidence levels of the tailored Bradford-Hill criteria were quantified for the purpose of weight-of-evidence assessment of the optimized AOP network. Scores were calculated for biological plausibility, empirical evidence, and essentiality, and were integrated into a total key event relationship confidence value. The optimized AOP network was visualized using Cytoscape with the node size representing the incidence of the key event and the edge size indicating the total confidence in the key event relationship. Results: This resulted in the identification of 38 and 135 unique key events and key event relationships, respectively. Transporter changes was the key event with the highest incidence, and formed the most confident key event relationship with the adverse outcome, cholestasis. Other important key events present in the AOP network include: nuclear receptor changes, intracellular bile acid accumulation, bile acid synthesis changes, oxidative stress, inflammation and apoptosis. Conclusions: This process led to the creation of an extensively informative AOP network focused on chemical-induced cholestasis. This optimized AOP network may serve as a mechanistic compass for the development of a battery of in vitro assays to reliably predict chemical-induced cholestatic injury.
DOCUMENT
Green data centres are the talk of the day. But who in fact is involved in developing green data centres? What is their contribution? And what does this contribution constitute in practical terms? This article states which stakeholders are involved in green data centres in the Netherlands, what their involvement is and what effect their involvement has. The article starts by giving the definitions for sustainability and by determining the stakeholders and their possibilities in this field. Next, we examine the actual impact of each stakeholder for arriving at greener data centres. This leads to a number of conclusions for achieving a larger degree of sustainability.
DOCUMENT
Completeness of data is vital for the decision making and forecasting on Building Management Systems (BMS) as missing data can result in biased decision making down the line. This study creates a guideline for imputing the gaps in BMS datasets by comparing four methods: K Nearest Neighbour algorithm (KNN), Recurrent Neural Network (RNN), Hot Deck (HD) and Last Observation Carried Forward (LOCF). The guideline contains the best method per gap size and scales of measurement. The four selected methods are from various backgrounds and are tested on a real BMS and meteorological dataset. The focus of this paper is not to impute every cell as accurately as possible but to impute trends back into the missing data. The performance is characterised by a set of criteria in order to allow the user to choose the imputation method best suited for its needs. The criteria are: Variance Error (VE) and Root Mean Squared Error (RMSE). VE has been given more weight as its ability to evaluate the imputed trend is better than RMSE. From preliminary results, it was concluded that the best K‐values for KNN are 5 for the smallest gap and 100 for the larger gaps. Using a genetic algorithm the best RNN architecture for the purpose of this paper was determined to be Gated Recurrent Units (GRU). The comparison was performed using a different training dataset than the imputation dataset. The results show no consistent link between the difference in Kurtosis or Skewness and imputation performance. The results of the experiment concluded that RNN is best for interval data and HD is best for both nominal and ratio data. There was no single method that was best for all gap sizes as it was dependent on the data to be imputed.
DOCUMENT