Citizens regularly search the Web to make informed decisions on daily life questions, like online purchases, but how they reason with the results is unknown. This reasoning involves engaging with data in ways that require statistical literacy, which is crucial for navigating contemporary data. However, many adults struggle to critically evaluate and interpret such data and make data-informed decisions. Existing literature provides limited insight into how citizens engage with web-sourced information. We investigated: How do adults reason statistically with web-search results to answer daily life questions? In this case study, we observed and interviewed three vocationally educated adults searching for products or mortgages. Unlike data producers, consumers handle pre-existing, often ambiguous data with unclear populations and no single dataset. Participants encountered unstructured (web links) and structured data (prices). We analysed their reasoning and the process of preparing data, which is part of data-ing. Key data-ing actions included judging relevance and trustworthiness of the data and using proxy variables when relevant data were missing (e.g., price for product quality). Participants’ statistical reasoning was mainly informal. For example, they reasoned about association but did not calculate a measure of it, nor assess underlying distributions. This study theoretically contributes to understanding data-ing and why contemporary data may necessitate updating the investigative cycle. As current education focuses mainly on producers’ tasks, we advocate including consumers’ tasks by using authentic contexts (e.g., music, environment, deferred payment) to promote data exploration, informal statistical reasoning, and critical web-search skills—including selecting and filtering information, identifying bias, and evaluating sources.
LINK
Poster presented at the 14th Congress of the European Society for Research in Mathematics Education, Free University of Bozen-Bolsano, Italy.
DOCUMENT
Abstract Background: COVID-19 was first identified in December 2019 in the city of Wuhan, China. The virus quickly spread and was declared a pandemic on March 11, 2020. After infection, symptoms such as fever, a (dry) cough, nasal congestion, and fatigue can develop. In some cases, the virus causes severe complications such as pneumonia and dyspnea and could result in death. The virus also spread rapidly in the Netherlands, a small and densely populated country with an aging population. Health care in the Netherlands is of a high standard, but there were nevertheless problems with hospital capacity, such as the number of available beds and staff. There were also regions and municipalities that were hit harder than others. In the Netherlands, there are important data sources available for daily COVID-19 numbers and information about municipalities. Objective: We aimed to predict the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants per municipality in the Netherlands, using a data set with the properties of 355 municipalities in the Netherlands and advanced modeling techniques. Methods: We collected relevant static data per municipality from data sources that were available in the Dutch public domain and merged these data with the dynamic daily number of infections from January 1, 2020, to May 9, 2021, resulting in a data set with 355 municipalities in the Netherlands and variables grouped into 20 topics. The modeling techniques random forest and multiple fractional polynomials were used to construct a prediction model for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants per municipality in the Netherlands. Results: The final prediction model had an R2 of 0.63. Important properties for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants in a municipality in the Netherlands were exposure to particulate matter with diameters <10 μm (PM10) in the air, the percentage of Labour party voters, and the number of children in a household. Conclusions: Data about municipality properties in relation to the cumulative number of confirmed infections in a municipality in the Netherlands can give insight into the most important properties of a municipality for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants in a municipality. This insight can provide policy makers with tools to cope with COVID-19 and may also be of value in the event of a future pandemic, so that municipalities are better prepared.
LINK
Graphs are ubiquitous. Many graphs, including histograms, bar charts, and stacked dotplots, have proven tricky to interpret. Students’ gaze data can indicate students’ interpretation strategies on these graphs. We therefore explore the question: In what way can machine learning quantify differences in students’ gaze data when interpreting two near-identical histograms with graph tasks in between? Our work provides evidence that using machine learning in conjunction with gaze data can provide insight into how students analyze and interpret graphs. This approach also sheds light on the ways in which students may better understand a graph after first being presented with other graph types, including dotplots. We conclude with a model that can accurately differentiate between the first and second time a student solved near-identical histogram tasks.
DOCUMENT
Abstract Despite the numerous business benefits of data science, the number of data science models in production is limited. Data science model deployment presents many challenges and many organisations have little model deployment knowledge. This research studied five model deployments in a Dutch government organisation. The study revealed that as a result of model deployment a data science subprocess is added into the target business process, the model itself can be adapted, model maintenance is incorporated in the model development process and a feedback loop is established between the target business process and the model development process. These model deployment effects and the related deployment challenges are different in strategic and operational target business processes. Based on these findings, guidelines are formulated which can form a basis for future principles how to successfully deploy data science models. Organisations can use these guidelines as suggestions to solve their own model deployment challenges.
DOCUMENT
Gaze data are still uncommon in statistics education despite their promise. Gaze data provide teachers and researchers with a new window into complex cognitive processes. This article discusses how gaze data can inform and be used by teachers both for their own teaching practice and with students. With our own eye-tracking research as an example, background information on eye-tracking and possible applications of eye-tracking in statistics education is provided. Teachers indicated that our eye-tracking research created awareness of the difficulties students have when interpreting histograms. Gaze data showed details of students' strategies that neither teachers nor students were aware of. With this discussion paper, we hope to contribute to the future usage and implementation of gaze data in statistics education by teachers, researchers, educational and textbook designers, and students.
LINK
Terms like ‘big data’, ‘data science’, and ‘data visualisation’ have become buzzwords in recent years and are increasingly intertwined with journalism. Data visualisation may further blur the lines between science communication and graphic design. Our study is situated in these overlaps to compare the design of data visualisations in science news stories across four online news media platforms in South Africa and the United States. Our study contributes to an understanding of how well-considered data visualisations are tools for effective storytelling, and offers practical recommendations for using data visualisation in science communication efforts.
LINK
Big data analytics received much attention in the last decade and is viewed as one of the next most important strategic resources for organizations. Yet, the role of employees' data literacy seems to be neglected in current literature. The aim of this study is twofold: (1) it develops data literacy as an organization competency by identifying its dimensions and measurement, and (2) it examines the relationship between data literacy and governmental performance (internal and external). Using data from a survey of 120 Dutch governmental agencies, the proposed model was tested using PLS-SEM. The results empirically support the suggested theoretical framework and corresponding measurement instrument. The results partially support the relationship of data literacy with performance as a significant effect of data literacy on internal performance. However, counter-intuitively, this significant effect is not found in relation to external performance.
MULTIFILE
Although governments are investing heavily in big data analytics, reports show mixed results in terms of performance. Whilst big data analytics capability provided a valuable lens in business and seems useful for the public sector, there is little knowledge of its relationship with governmental performance. This study aims to explain how big data analytics capability led to governmental performance. Using a survey research methodology, an integrated conceptual model is proposed highlighting a comprehensive set of big data analytics resources influencing governmental performance. The conceptual model was developed based on prior literature. Using a PLS-SEM approach, the results strongly support the posited hypotheses. Big data analytics capability has a strong impact on governmental efficiency, effectiveness, and fairness. The findings of this paper confirmed the imperative role of big data analytics capability in governmental performance in the public sector, which earlier studies found in the private sector. This study also validated measures of governmental performance.
MULTIFILE
Many students persistently misinterpret histograms. This calls for closer inspection of students’ strategies when interpreting histograms and case-value plots (which look similar but are diferent). Using students’ gaze data, we ask: How and how well do upper secondary pre-university school students estimate and compare arithmetic means of histograms and case-value plots? We designed four item types: two requiring mean estimation and two requiring means comparison. Analysis of gaze data of 50 students (15–19 years old) solving these items was triangulated with data from cued recall. We found five strategies. Two hypothesized most common strategies for estimating means were confirmed: a strategy associated with horizontal gazes and a strategy associated with vertical gazes. A third, new, count-and-compute strategy was found. Two more strategies emerged for comparing means that take specific features of the distribution into account. In about half of the histogram tasks, students used correct strategies. Surprisingly, when comparing two case-value plots, some students used distribution features that are only relevant for histograms, such as symmetry. As several incorrect strategies related to how and where the data and the distribution of these data are depicted in histograms, future interventions should aim at supporting students in understanding these concepts in histograms. A methodological advantage of eye-tracking data collection is that it reveals more details about students’ problem-solving processes than thinking-aloud protocols. We speculate that spatial gaze data can be re-used to substantiate ideas about the sensorimotor origin of learning mathematics.
LINK