Among other things, learning to write entails learning how to use complex sentences effectively in discourse. Some research has therefore focused on relating measures of syntactic complexity to text quality. Apart from the fact that the existing research on this topic appears inconclusive, most of it has been conducted in English L1 contexts. This is potentially problematic, since relevant syntactic indices may not be the same across languages. The current study is the first to explore which syntactic features predict text quality in Dutch secondary school students’ argumentative writing. In order to do so, the quality of 125 argumentative essays written by students was rated and the syntactic features of the texts were analyzed. A multilevel regression analysis was then used to investigate which features contribute to text quality. The resulting model (explaining 14.5% of the variance in text quality) shows that the relative number of finite clauses and the ratio between the number of relative clauses and the number of finite clauses positively predict text quality. Discrepancies between our findings and those of previous studies indicate that the relations between syntactic features and text quality may vary based on factors such as language and genre. Additional (cross-linguistic) research is needed to gain a more complete understanding of the relationships between syntactic constructions and text quality and the potential moderating role of language and genre.
With the proliferation of misinformation on the web, automatic misinformation detection methods are becoming an increasingly important subject of study. Large language models have produced the best results among content-based methods, which rely on the text of the article rather than the metadata or network features. However, finetuning such a model requires significant training data, which has led to the automatic creation of large-scale misinformation detection datasets. In these datasets, articles are not labelled directly. Rather, each news site is labelled for reliability by an established fact-checking organisation and every article is subsequently assigned the corresponding label based on the reliability score of the news source in question. A recent paper has explored the biases present in one such dataset, NELA-GT-2018, and shown that the models are at least partly learning the stylistic and other features of different news sources rather than the features of unreliable news. We confirm a part of their findings. Apart from studying the characteristics and potential biases of the datasets, we also find it important to examine in what way the model architecture influences the results. We therefore explore which text features or combinations of features are learned by models based on contextual word embeddings as opposed to basic bag-of-words models. To elucidate this, we perform extensive error analysis aided by the SHAP post-hoc explanation technique on a debiased portion of the dataset. We validate the explanation technique on our inherently interpretable baseline model.
Live programming is a style of development characterized by incremental change and immediate feedback. Instead of long edit-compile cycles, developers modify a running program by changing its source code, receiving immediate feedback as it instantly adapts in response. In this paper, we propose an approach to bridge the gap between running programs and textual domain-specific languages (DSLs). The first step of our approach consists of applying a novel model differencing algorithm, tmdiff, to the textual DSL code. By leveraging ordinary text differencing and origin tracking, tmdiff produces deltas defined in terms of the metamodel of a language. In the second step of our approach, the model deltas are applied at run time to update a running system, without having to restart it. Since the model deltas are derived from the static source code of the program, they are unaware of any run-time state maintained during model execution. We therefore propose a generic, dynamic patch architecture, rmpatch, which can be customized to cater for domain-specific state migration. We illustrate rmpatch in a case study of a live programming environment for a simple DSL implemented in Rascal for simultaneously defining and executing state machines.