Among other things, learning to write entails learning how to use complex sentences effectively in discourse. Some research has therefore focused on relating measures of syntactic complexity to text quality. Apart from the fact that the existing research on this topic appears inconclusive, most of it has been conducted in English L1 contexts. This is potentially problematic, since relevant syntactic indices may not be the same across languages. The current study is the first to explore which syntactic features predict text quality in Dutch secondary school students’ argumentative writing. In order to do so, the quality of 125 argumentative essays written by students was rated and the syntactic features of the texts were analyzed. A multilevel regression analysis was then used to investigate which features contribute to text quality. The resulting model (explaining 14.5% of the variance in text quality) shows that the relative number of finite clauses and the ratio between the number of relative clauses and the number of finite clauses positively predict text quality. Discrepancies between our findings and those of previous studies indicate that the relations between syntactic features and text quality may vary based on factors such as language and genre. Additional (cross-linguistic) research is needed to gain a more complete understanding of the relationships between syntactic constructions and text quality and the potential moderating role of language and genre.
DOCUMENT
We present a number of methodological recommendations concerning the online evaluation of avatars for text-to-sign translation, focusing on the structure, format and length of the questionnaire, as well as methods for eliciting and faithfully transcribing responses.
LINK
Research into automatic text simplification aims to promote access to information for all members of society. To facilitate generalizability, simplification research often abstracts away from specific use cases, and targets a prototypical reader and an underspecified content creator. In this paper, we consider a real-world use case – simplification technology for use in Dutch municipalities – and identify the needs of the content creators and the target audiences in this scenario. The stakeholders envision a system that (a) assists the human writer without taking over the task; (b) provides diverse outputs, tailored for specific target audiences; and (c) explains the suggestions that it outputs. These requirements call for technology that is characterized by modularity, explainability, and variability. We argue that these are important research directions that require further exploration
MULTIFILE
This method paper presents a template solution for text mining of scientific literature using the R tm package. Literature to be analyzed can be collected manually or automatically using the code provided with this paper. Once the literature is collected, the three steps for conducting text mining can be performed as outlined below:• loading and cleaning of text from articles,• processing, statistical analysis, and clustering, and• presentation of results using generalized and tailor-made visualizations.The text mining steps can be applied to a single, multiple, or time series groups of documents.References are provided to three published peer reviewed articles that use the presented text mining methodology. The main advantages of our method are: (1) Its suitability for both research and educational purposes, (2) Compliance with the Findable Accessible Interoperable and Reproducible (FAIR) principles, and (3) code and example data are made available on GitHub under the open-source Apache V2 license.
DOCUMENT
Hoe overbruggen we de kloof tussen accountant en dataspecialist? Deel 1 van een drieluik over data-analyse. In dit eerste deel worden zes typen van data-analyse belicht.
DOCUMENT
With the proliferation of misinformation on the web, automatic misinformation detection methods are becoming an increasingly important subject of study. Large language models have produced the best results among content-based methods, which rely on the text of the article rather than the metadata or network features. However, finetuning such a model requires significant training data, which has led to the automatic creation of large-scale misinformation detection datasets. In these datasets, articles are not labelled directly. Rather, each news site is labelled for reliability by an established fact-checking organisation and every article is subsequently assigned the corresponding label based on the reliability score of the news source in question. A recent paper has explored the biases present in one such dataset, NELA-GT-2018, and shown that the models are at least partly learning the stylistic and other features of different news sources rather than the features of unreliable news. We confirm a part of their findings. Apart from studying the characteristics and potential biases of the datasets, we also find it important to examine in what way the model architecture influences the results. We therefore explore which text features or combinations of features are learned by models based on contextual word embeddings as opposed to basic bag-of-words models. To elucidate this, we perform extensive error analysis aided by the SHAP post-hoc explanation technique on a debiased portion of the dataset. We validate the explanation technique on our inherently interpretable baseline model.
DOCUMENT
Hoe overbruggen we de kloof tussen accountant en dataspecialist? Deel 2 van een drieluik over data-analyse. In een eerder artikel is de buitenste ring van het 'VTA-model toegelicht'. In dit vervolgartikel worden de twee binnenste ringen besproken.
DOCUMENT
The main goal of this study was to investigate if a computational analyses of text data from the National Student Survey (NSS) can add value to the existing, manual analysis. The results showed the computational analysis of the texts from the open questions of the NSS contain information which enriches the results of standard quantitative analysis of the NSS.
DOCUMENT
In this paper we describe our work in progress on the development of a set of criteria to predict text difficulty in Sign Language of the Netherlands (NGT). These texts are used in a four year bachelor program, which is being brought in line with the Common European Framework of Reference for Languages (Council of Europe, 2001). Production and interaction proficiency are assessed through the NGT Functional Assessment instrument, adapted from the Sign Language Proficiency Interview (Caccamise & Samar, 2009). With this test we were able to determine that after one year of NGT-study students produce NGT at CEFR-level A2, after two years they sign at level B1, and after four years they are proficient in NGT on CEFR-level B2. As a result of that we were able to identify NGT texts that were matched to the level of students at certain stages in their studies with a CEFR-level. These texts were then analysed for sign familiarity, morpheme-sign rate, use of space and use of non-manual signals. All of these elements appear to be relevant for the determination of a good alignment between the difficulty of NGT signed texts and the targeted CEFR level, although only the morpheme-sign rate appears to be a decisive indicator
DOCUMENT
Although governments are investing heavily in big data analytics, reports show mixed results in terms of performance. Whilst big data analytics capability provided a valuable lens in business and seems useful for the public sector, there is little knowledge of its relationship with governmental performance. This study aims to explain how big data analytics capability led to governmental performance. Using a survey research methodology, an integrated conceptual model is proposed highlighting a comprehensive set of big data analytics resources influencing governmental performance. The conceptual model was developed based on prior literature. Using a PLS-SEM approach, the results strongly support the posited hypotheses. Big data analytics capability has a strong impact on governmental efficiency, effectiveness, and fairness. The findings of this paper confirmed the imperative role of big data analytics capability in governmental performance in the public sector, which earlier studies found in the private sector. This study also validated measures of governmental performance.
MULTIFILE