Agent-based modeling (ABM) is a powerful tool for simulating building users’ dynamic behavior in demand response (DR) programs. However, ABM faces several challenges, particularly in encoding building users’ natural language features and common sense into rules or mathematical equations. To overcome these limitations, this paper proposes an agent framework based on large language models (LLMs) to simulate building users’ air-conditioning setpoint adjustment behavior under DR. This framework leverages LLMs’ natural language processing capabilities to replicate building users’ reasoning and decision making processes. It consists of five modules: persona, perception, decision, reflection, and memory. Agents are assigned diverse personas through natural language descriptions based on empirical survey data. LLMs drive agents to reason and make decisions based on incentive prices and historical experiences. The results show that the LLM-based agent has common sense derived from natural language-defined personas and exhibits human-like irrational characteristics. This demonstrates the feasibility of replacing rules with natural language in ABM. The LLM-based agent can more effectively model hard-to-parameterize human features and provide decision explanations through LLM outputs. The results show that the inclusion of reflection and memory modules enables the agent to learn from previous decisions and reduce unreasonable choices.
MULTIFILE
This paper presents a generative large language model (LLM)- guided approach to detect discursive patterns in Dutch social media.
MULTIFILE
DOCUMENT
With the proliferation of misinformation on the web, automatic misinformation detection methods are becoming an increasingly important subject of study. Large language models have produced the best results among content-based methods, which rely on the text of the article rather than the metadata or network features. However, finetuning such a model requires significant training data, which has led to the automatic creation of large-scale misinformation detection datasets. In these datasets, articles are not labelled directly. Rather, each news site is labelled for reliability by an established fact-checking organisation and every article is subsequently assigned the corresponding label based on the reliability score of the news source in question. A recent paper has explored the biases present in one such dataset, NELA-GT-2018, and shown that the models are at least partly learning the stylistic and other features of different news sources rather than the features of unreliable news. We confirm a part of their findings. Apart from studying the characteristics and potential biases of the datasets, we also find it important to examine in what way the model architecture influences the results. We therefore explore which text features or combinations of features are learned by models based on contextual word embeddings as opposed to basic bag-of-words models. To elucidate this, we perform extensive error analysis aided by the SHAP post-hoc explanation technique on a debiased portion of the dataset. We validate the explanation technique on our inherently interpretable baseline model.
DOCUMENT
This paper presents a generative large language model (LLM)- guided approach to detect discursive patterns in Dutch social media. Newsrooms of municipalities and public organizations follow public debate on social media to be aware of and prepare for local and global issues and rumours. The onset of these issues and rumours are now detected by communication specialists in newsrooms. Using discourse analysis, we can ground their findings in theory. Devices from discursive psychology such as emotional evaluations are the lowest level components that can help spot and understand issues[ 1]. Thus, a rule-based NLP approach was developed to highlight these devices in a learning environment1. As a next step, we compare the rule-based approach to a large language modeling approach in order to assess the risks and benefits of both methods. We analyze the detection of two discursive patterns in Dutch tweets: magnifying (exaggerated) language use and assigning negative labels to persons or organizations. We compare the generated responses from two Dutch conversational LLMs finetuned on the Dutch language - Geitje-7B-Ultra and Fietje-2-Chat - and a rule-based NLP baseline using a two-fold evaluation process. The results show mixed performance, with the highest performing LLM setups yielding an accuracy of 64% for the maximizing language category and 73% for the negative labels to organizations/persons category. In comparison, the rule-based algorithm achieves an accuracy of 68% for both categories. Although the LLMs perform well in precision, they frequently find patterns in examples where no discursive markers were annotated. Moreover, the rationale analysis shows relatively poor results, pertaining to multiple factors including model size and interpretation of instructions. The results indicate that although there is merit in conducting discursive analysis using generative language models, it comes with the above risks. Recommendations for future work include combining the usage of language models with the rule-based setup for more robust detection as well as further indication of guidelines to improve upon the reasoning process.
DOCUMENT
Post-training quantization reduces the computational demand of Large Language Models (LLMs) but can weaken some of their capabilities. Since LLM abilities emerge with scale, smaller LLMs are more sensitive to quantization. In this paper, we explore how quantization affects smaller LLMs’ ability to perform retrieval-augmented generation (RAG), specifically in longer contexts. We chose personalization for evaluation because it is a challenging domain to perform using RAG as it requires long-context reasoning over multiple documents. We compare the original FP16 and the quantized INT4 performance of multiple 7B and 8B LLMs on two tasks while progressively increasing the number of retrieved documents to test how quantized models fare against longer contexts. To better understand the effect of retrieval, we evaluate three retrieval models in our experiments. Our findings reveal that if a 7B LLM performs the task well, quantization does not impair its performance and long-context reasoning capabilities. We conclude that it is possible to utilize RAG with quantized smaller LLMs.
MULTIFILE
Psychologists, psycholinguists, and other researchers using language stimuli have been struggling for more than 30 years with the problem of how to analyze experimental data that contain two crossed random effects (items and participants). The classical analysis of variance does not apply; alternatives have been proposed but have failed to catch on, and a statistically unsatisfactory procedure of using two approximations (known as F 1 and F 2) has become the standard. A simple and elegant solution using mixed model analysis has been available for 15 years, and recent improvements in statistical software have made mixed models analysis widely available. The aim of this article is to increase the use of mixed models by giving a concise practical introduction and by giving clear directions for undertaking the analysis in the most popular statistical packages. The article also introduces the djmixed add-on package for SPSS, which makes entering the models and reporting their results as straightforward as possible.
MULTIFILE
This final installment in our e-learning series offers a comprehensive look at the current impact and future potential of data science across industries. Using real-world examples like medical image analysis and operational efficiencies at Rotterdam The Hague Airport, we showcase data science’s transformative capabilities. The video also introduces the promise of Large Language Models (LLMs) such as Chat GPT and the simplification brought by Automated Machine Learning (AutoML). Emphasizing the blend of technology and human insight, we explore the evolving landscape of AI and data science for businesses.
VIDEO
Adjust your filters to see results