Completeness of data is vital for the decision making and forecasting on Building Management Systems (BMS) as missing data can result in biased decision making down the line. This study creates a guideline for imputing the gaps in BMS datasets by comparing four methods: K Nearest Neighbour algorithm (KNN), Recurrent Neural Network (RNN), Hot Deck (HD) and Last Observation Carried Forward (LOCF). The guideline contains the best method per gap size and scales of measurement. The four selected methods are from various backgrounds and are tested on a real BMS and meteorological dataset. The focus of this paper is not to impute every cell as accurately as possible but to impute trends back into the missing data. The performance is characterised by a set of criteria in order to allow the user to choose the imputation method best suited for its needs. The criteria are: Variance Error (VE) and Root Mean Squared Error (RMSE). VE has been given more weight as its ability to evaluate the imputed trend is better than RMSE. From preliminary results, it was concluded that the best K‐values for KNN are 5 for the smallest gap and 100 for the larger gaps. Using a genetic algorithm the best RNN architecture for the purpose of this paper was determined to be Gated Recurrent Units (GRU). The comparison was performed using a different training dataset than the imputation dataset. The results show no consistent link between the difference in Kurtosis or Skewness and imputation performance. The results of the experiment concluded that RNN is best for interval data and HD is best for both nominal and ratio data. There was no single method that was best for all gap sizes as it was dependent on the data to be imputed.
With the proliferation of misinformation on the web, automatic misinformation detection methods are becoming an increasingly important subject of study. Large language models have produced the best results among content-based methods, which rely on the text of the article rather than the metadata or network features. However, finetuning such a model requires significant training data, which has led to the automatic creation of large-scale misinformation detection datasets. In these datasets, articles are not labelled directly. Rather, each news site is labelled for reliability by an established fact-checking organisation and every article is subsequently assigned the corresponding label based on the reliability score of the news source in question. A recent paper has explored the biases present in one such dataset, NELA-GT-2018, and shown that the models are at least partly learning the stylistic and other features of different news sources rather than the features of unreliable news. We confirm a part of their findings. Apart from studying the characteristics and potential biases of the datasets, we also find it important to examine in what way the model architecture influences the results. We therefore explore which text features or combinations of features are learned by models based on contextual word embeddings as opposed to basic bag-of-words models. To elucidate this, we perform extensive error analysis aided by the SHAP post-hoc explanation technique on a debiased portion of the dataset. We validate the explanation technique on our inherently interpretable baseline model.
Receiving the first “Rijbewijs” is always an exciting moment for any teenager, but, this also comes with considerable risks. In the Netherlands, the fatality rate of young novice drivers is five times higher than that of drivers between the ages of 30 and 59 years. These risks are mainly because of age-related factors and lack of experience which manifests in inadequate higher-order skills required for hazard perception and successful interventions to react to risks on the road. Although risk assessment and driving attitude is included in the drivers’ training and examination process, the accident statistics show that it only has limited influence on the development factors such as attitudes, motivations, lifestyles, self-assessment and risk acceptance that play a significant role in post-licensing driving. This negatively impacts traffic safety. “How could novice drivers receive critical feedback on their driving behaviour and traffic safety? ” is, therefore, an important question. Due to major advancements in domains such as ICT, sensors, big data, and Artificial Intelligence (AI), in-vehicle data is being extensively used for monitoring driver behaviour, driving style identification and driver modelling. However, use of such techniques in pre-license driver training and assessment has not been extensively explored. EIDETIC aims at developing a novel approach by fusing multiple data sources such as in-vehicle sensors/data (to trace the vehicle trajectory), eye-tracking glasses (to monitor viewing behaviour) and cameras (to monitor the surroundings) for providing quantifiable and understandable feedback to novice drivers. Furthermore, this new knowledge could also support driving instructors and examiners in ensuring safe drivers. This project will also generate necessary knowledge that would serve as a foundation for facilitating the transition to the training and assessment for drivers of automated vehicles.
Moderatie van lezersreacties onder nieuwsartikelen is erg arbeidsintensief. Met behulp van kunstmatige intelligentie wordt moderatie mogelijk tegen een redelijke prijs. Aangezien elke toepassing van kunstmatige intelligentie eerlijk en transparant moet zijn, is het belangrijk om te onderzoeken hoe media hieraan kunnen voldoen.
Moderatie van lezersreacties onder nieuwsartikelen is erg arbeidsintensief. Met behulp van kunstmatige intelligentie wordt moderatie mogelijk tegen een redelijke prijs. Aangezien elke toepassing van kunstmatige intelligentie eerlijk en transparant moet zijn, is het belangrijk om te onderzoeken hoe media hieraan kunnen voldoen.Doel Dit promotieproject zal zich richten op de rechtvaardigheid, accountability en transparantie van algoritmische systemen voor het modereren van lezersreacties. Het biedt een theoretisch kader en bruikbare matregelen die nieuwsorganisaties zullen ondersteunen in het naleven van recente beleidsvorming voor een waardegedreven implementatie van AI. Nu steeds meer nieuwsmedia AI gaan gebruiken, moeten ze rechtvaardigheid, accountability en transparantie in hun gebruik van algoritmen meenemen in hun werkwijzen. Resultaten Hoewel moderatie met AI zeer aantrekkelijk is vanuit economisch oogpunt, moeten nieuwsmedia weten hoe ze onnauwkeurigheid en bias kunnen verminderen (fairness), de werking van hun AI bekendmaken (accountability) en de gebruikers laten begrijpen hoe beslissingen via AI worden genomen (transparancy). Dit proefschrift bevordert de kennis over deze onderwerpen. Looptijd 01 februari 2022 - 01 februari 2025 Aanpak De centrale onderzoeksvraag van dit promotieonderzoek is: Hoe kunnen en moeten nieuwsmedia rechtvaardigheid, accountability en transparantie in hun gebruik van algoritmes voor commentmoderatie? Om deze vraag te beantwoorden is het onderzoek opgesplitst in vier deelvragen. Hoe gebruiken nieuwsmedia algoritmes voor het modereren van reacties? Wat kunnen nieuwsmedia doen om onnauwkeurigheid en bias bij het modereren via AI van reacties te verminderen? Wat moeten nieuwsmedia bekendmaken over hun gebruik van moderatie via AI? Wat maakt uitleg van moderatie via AI begrijpelijk voor gebruikers van verschillende niveaus van digitale competentie?