A small dataset containing 15 high resolution point clouds of catenary arches is provided. The number of points per arch ranges from 1.6 M to 11 M points. They have been manually labelled into 14 distinct classes.
DOCUMENT
In the rapidly evolving field of Machine Learning , selecting the most appropriate model for a given dataset is crucial. Understanding the characteristics of a dataset can significantly influence the outcomes of predictive modeling efforts, making the study of the properties of the dataset an essential component of data science. This study investigates the possibilities of using simulated human data for personalized applications, specifically for testing clustering approaches. In particular, the study focuses on the relationship between dataset characteristics and the selection of the optimal classification model for clusters of datasets. The results of this study provide critical insights for researchers and practitioners in machine learning, emphasizing the importance of dataset characteristics and variability in building and selecting robust models for diverse data conditions. The use of human simulation data provide valuable insights but requires further refinement to capture the full variability of real-world conditions.
DOCUMENT
This applied research project aims to generate a better understanding of the effects of heatwaves on vulnerable population groups in the municipality of The Hague, and suggests ways in which the municipality can help such groups to cope with these heatwaves. The research was performed as a cooperation between The Hague University of Applied Sciences (THUAS), the International Institute of Social Studies (ISS, Erasmus University Rotterdam) and the International Centre for Frugal Innovation (ICFI, Leiden-Delft-Erasmus Universities). Heatwaves constitute an important yet often overlooked part of climate change and their impacts qualify as disasters. According to the World Disasters Report 2020, the three heatwaves affecting Belgium, France, Germany, Italy, the Netherlands, Spain, Switzerland and the UK in the summer of 2019 caused 3,453 deaths.1 2020 was a new record year for the Netherlands because it was the first time that a heatwave included five days in a row during which the temperature reached 35 degrees or more. In addition, 40 degrees was measured for the first time, and periods of tropical days and nights are generally getting longer. Most importantly, this trend is accelerating faster than the climate change models are predicting.2 In addition, the COVID-19 pandemic is compounding the effect of heatwaves, as vulnerable individuals may be reluctant to seek cool spaces out of fear of infection. Already in 2006, the Netherlands ranked near the top of the global disaster index due to the number of excess deaths that could be attributed to the heatwave. In the same year, the EU published the first climate strategy in which heat is recognised as a priority. In 2008, the Netherlands developed its first national heat plan.4 The municipality of The Hague has a municipal climate adaptation strategy and has developed a draft local heat plan in the summer of 2021, which was published in February 2022 . This research was not meant to be and was not set up as an evaluation of the current heat plan, which has not yet been activated. At the level of municipalities and cities, the concept of urban resilience is key. It refers to “the capacity of individuals, communities, institutions, businesses, and systems within a city to survive, adapt, and grow no matter what kinds of chronic stresses and acute shocks they experience”. Heatwaves clearly constitute acute shocks which are rapidly developing into chronic stresses. In turn, heatwaves also exacerbate the chronic stresses that are already there, i.e. existing chronic stresses also lead to greater impact of a heatwave. In other words, there are negative interaction effects. Addressing these effects requires overcoming the silo approach to urban governance, in which different municipal departments as well as other stakeholders (such as the Red Cross, housing corporations, tenants’ associations, care organisations, entrepreneurs etc.) each address different parts of the problem, rather than doing so in an integrated and inclusive manner. The dataset for this study is archived in DANS Easy: https://doi.org/10.17026/dans-xeb-h8uk
MULTIFILE
CC-BY-NC-NDSTUDY DESIGN:prospective cohort study.OBJECTIVE:To analyze responsiveness and minimal clinically important change (MCIC) of the US National Institutes of Health (NIH) minimal dataset for chronic low back pain (CLBP).SUMMARY OF BACKGROUND DATA:The NIH minimal dataset is a 40-item questionnaire developed to increase use of standardized definitions and measures for CLBP. Longitudinal validity of the total minimal dataset and the subscale Impact Stratification are unknown.METHODS:Total outcome scores on the NIH minimal dataset, Dutch Language Version, were calculated ranging from 0-100 points with higher scores representing worse functioning. Responsiveness and MCIC were determined with an anchor based method, calculating the area under the receiver operating characteristics (ROC) curve (AUC) and by determining the optimal cut-off point. Smallest detectable change (SDC) was calculated as a parameter of measurement error.RESULTS:In total 223 patients with CLBP were included. Mean total score on the NIH minimal dataset was 44 ± 14 points at baseline. The total outcome score was responsive to change with an AUC of 0.84. MCIC was 14 points with a sensitivity of 72% and specificity 82%, and SDC was 23 points. Mean total score on Impact Stratification (scale 8-50) was 34.4 ± 7.4 points at baseline, with an AUC of 0.91, an MCIC of 7.5 with a sensitivity 96% of and specificity of 78%, and an SDC of 14 points.CONCLUSION:The longitudinal validity of the NIH minimal dataset is adequate. An improvement of 14 points in total outcome score and 7.5 points in Impact Stratification can be interpreted as clinically important in individual patients. However, MCIC depends on baseline values and the method that is chosen to determine the optimal cut-off point. Furthermore, measurement error is larger than the MCIC. This means that individual change scores should be interpreted with caution.LEVEL OF EVIDENCE:4This is an open access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal
MULTIFILE
With the proliferation of misinformation on the web, automatic methods for detecting misinformation are becoming an increasingly important subject of study. If automatic misinformation detection is applied in a real-world setting, it is necessary to validate the methods being used. Large language models (LLMs) have produced the best results among text-based methods. However, fine-tuning such a model requires a significant amount of training data, which has led to the automatic creation of large-scale misinformation detection datasets. In this paper, we explore the biases present in one such dataset for misinformation detection in English, NELA-GT-2019. We find that models are at least partly learning the stylistic and other features of different news sources rather than the features of unreliable news. Furthermore, we use SHAP to interpret the outputs of a fine-tuned LLM and validate the explanation method using our inherently interpretable baseline. We critically analyze the suitability of SHAP for text applications by comparing the outputs of SHAP to the most important features from our logistic regression models.
DOCUMENT
With the proliferation of misinformation on the web, automatic misinformation detection methods are becoming an increasingly important subject of study. Large language models have produced the best results among content-based methods, which rely on the text of the article rather than the metadata or network features. However, finetuning such a model requires significant training data, which has led to the automatic creation of large-scale misinformation detection datasets. In these datasets, articles are not labelled directly. Rather, each news site is labelled for reliability by an established fact-checking organisation and every article is subsequently assigned the corresponding label based on the reliability score of the news source in question. A recent paper has explored the biases present in one such dataset, NELA-GT-2018, and shown that the models are at least partly learning the stylistic and other features of different news sources rather than the features of unreliable news. We confirm a part of their findings. Apart from studying the characteristics and potential biases of the datasets, we also find it important to examine in what way the model architecture influences the results. We therefore explore which text features or combinations of features are learned by models based on contextual word embeddings as opposed to basic bag-of-words models. To elucidate this, we perform extensive error analysis aided by the SHAP post-hoc explanation technique on a debiased portion of the dataset. We validate the explanation technique on our inherently interpretable baseline model.
DOCUMENT
The paper introduced an automatic score detection model using object detection techniques. The performance of sevenmodels belonging to two different architectural setups was compared. Models like YOLOv8n, YOLOv8s, YOLOv8m, RetinaNet-50, and RetinaNet-101 are single-shot detectors, while Faster RCNN-50 and Faster RCNN-101 belong to the two-shot detectors category. The dataset was manually captured from the shooting range and expanded by generating more versatile data using Python code. Before the dataset was trained to develop models, it was resized (640x640) and augmented using Roboflow API. The trained models were then assessed on the test dataset, and their performance was compared using matrices like mAP50, mAP50-90, precision, and recall. The results showed that YOLOv8 models can detect multiple objects with good confidence scores.
DOCUMENT
In this study we analyze a large dataset of Facebook activities of local restaurants in Amsterdam, Houston, London and New York. Doing so gives broad insights in their Facebook usage and the communication patterns between them and their costumers. The dataset is quite rich and the presented statistics are merely the tip of the iceberg.
DOCUMENT