The present study investigated whether text structure inference skill (i.e., the ability to infer overall text structure) has unique predictive value for expository text comprehension on top of the variance accounted for by sentence reading fluency, linguistic knowledge and metacognitive knowledge. Furthermore, it was examined whether the unique predictive value of text structure inference skill differs between monolingual and bilingual Dutch students or students who vary in reading proficiency, reading fluency or linguistic knowledge levels. One hundred fifty-one eighth graders took tests that tapped into their expository text comprehension, sentence reading fluency, linguistic knowledge, metacognitive knowledge, and text structure inference skill. Multilevel regression analyses revealed that text structure inference skill has no unique predictive value for eighth graders’ expository text comprehension controlling for reading fluency, linguistic knowledge and metacognitive knowledge. However, text structure inference skill has unique predictive value for expository text comprehension in models that do not include both knowledge of connectives and metacognitive knowledge as control variables, stressing the importance of these two cognitions for text structure inference skill. Moreover, the predictive value of text structure inference skill does not depend on readers’ language backgrounds or on their reading proficiency, reading fluency or vocabulary knowledge levels. We conclude our paper with the limitations of our study as well as the research and practical implications.
DOCUMENT
This method paper presents a template solution for text mining of scientific literature using the R tm package. Literature to be analyzed can be collected manually or automatically using the code provided with this paper. Once the literature is collected, the three steps for conducting text mining can be performed as outlined below:• loading and cleaning of text from articles,• processing, statistical analysis, and clustering, and• presentation of results using generalized and tailor-made visualizations.The text mining steps can be applied to a single, multiple, or time series groups of documents.References are provided to three published peer reviewed articles that use the presented text mining methodology. The main advantages of our method are: (1) Its suitability for both research and educational purposes, (2) Compliance with the Findable Accessible Interoperable and Reproducible (FAIR) principles, and (3) code and example data are made available on GitHub under the open-source Apache V2 license.
DOCUMENT
We present a number of methodological recommendations concerning the online evaluation of avatars for text-to-sign translation, focusing on the structure, format and length of the questionnaire, as well as methods for eliciting and faithfully transcribing responses.
LINK
Among other things, learning to write entails learning how to use complex sentences effectively in discourse. Some research has therefore focused on relating measures of syntactic complexity to text quality. Apart from the fact that the existing research on this topic appears inconclusive, most of it has been conducted in English L1 contexts. This is potentially problematic, since relevant syntactic indices may not be the same across languages. The current study is the first to explore which syntactic features predict text quality in Dutch secondary school students’ argumentative writing. In order to do so, the quality of 125 argumentative essays written by students was rated and the syntactic features of the texts were analyzed. A multilevel regression analysis was then used to investigate which features contribute to text quality. The resulting model (explaining 14.5% of the variance in text quality) shows that the relative number of finite clauses and the ratio between the number of relative clauses and the number of finite clauses positively predict text quality. Discrepancies between our findings and those of previous studies indicate that the relations between syntactic features and text quality may vary based on factors such as language and genre. Additional (cross-linguistic) research is needed to gain a more complete understanding of the relationships between syntactic constructions and text quality and the potential moderating role of language and genre.
DOCUMENT
Research into automatic text simplification aims to promote access to information for all members of society. To facilitate generalizability, simplification research often abstracts away from specific use cases, and targets a prototypical reader and an underspecified content creator. In this paper, we consider a real-world use case – simplification technology for use in Dutch municipalities – and identify the needs of the content creators and the target audiences in this scenario. The stakeholders envision a system that (a) assists the human writer without taking over the task; (b) provides diverse outputs, tailored for specific target audiences; and (c) explains the suggestions that it outputs. These requirements call for technology that is characterized by modularity, explainability, and variability. We argue that these are important research directions that require further exploration
MULTIFILE
With the proliferation of misinformation on the web, automatic misinformation detection methods are becoming an increasingly important subject of study. Large language models have produced the best results among content-based methods, which rely on the text of the article rather than the metadata or network features. However, finetuning such a model requires significant training data, which has led to the automatic creation of large-scale misinformation detection datasets. In these datasets, articles are not labelled directly. Rather, each news site is labelled for reliability by an established fact-checking organisation and every article is subsequently assigned the corresponding label based on the reliability score of the news source in question. A recent paper has explored the biases present in one such dataset, NELA-GT-2018, and shown that the models are at least partly learning the stylistic and other features of different news sources rather than the features of unreliable news. We confirm a part of their findings. Apart from studying the characteristics and potential biases of the datasets, we also find it important to examine in what way the model architecture influences the results. We therefore explore which text features or combinations of features are learned by models based on contextual word embeddings as opposed to basic bag-of-words models. To elucidate this, we perform extensive error analysis aided by the SHAP post-hoc explanation technique on a debiased portion of the dataset. We validate the explanation technique on our inherently interpretable baseline model.
DOCUMENT
Over the last two decades, institutions for higher education such as universities and colleges have rapidly expanded and as a result have experienced profound changes in processes of research and organization. However, the rapid expansion and change has fuelled concerns about issues such as educators' technology professional development. Despite the educational value of emerging technologies in schools, the introduction has not yet enjoyed much success. Effective use of information and communication technologies requires a substantial change in pedagogical practice. Traditional training and learning approaches cannot cope with the rising demand on educators to make use of innovative technologies in their teaching. As a result, educational institutions as well as the public are more and more aware of the need for adequate technology professional development. The focus of this paper is to look at action research as a qualitative research methodology for studying technology professional development in HE in order to improve teaching and learning with ICTs at the tertiary level. The data discussed in this paper have been drawn from a cross institutional setting at Fontys University of Applied Sciences, The Netherlands. The data were collected and analysed according to a qualitative approach.
DOCUMENT
Political Wordgame is a website featuring an interactive data visualisation on the speech of Dutch politicians aired on the public broadcaster.
LINK
Although governments are investing heavily in big data analytics, reports show mixed results in terms of performance. Whilst big data analytics capability provided a valuable lens in business and seems useful for the public sector, there is little knowledge of its relationship with governmental performance. This study aims to explain how big data analytics capability led to governmental performance. Using a survey research methodology, an integrated conceptual model is proposed highlighting a comprehensive set of big data analytics resources influencing governmental performance. The conceptual model was developed based on prior literature. Using a PLS-SEM approach, the results strongly support the posited hypotheses. Big data analytics capability has a strong impact on governmental efficiency, effectiveness, and fairness. The findings of this paper confirmed the imperative role of big data analytics capability in governmental performance in the public sector, which earlier studies found in the private sector. This study also validated measures of governmental performance.
MULTIFILE
The goal of this study was therefore to test the idea that computationally analysing the Fontys National Student Surveys (NSS) open answers using a selection of standard text mining methods (Manning & Schütze 1999) will increase the value of these answers for educational quality assurance. It is expected that human effort and time of analysis will decrease significally. The text data (in Dutch) of several years of Fontys National Student Surveys (2013-2018) was provided to Fontys students of the minor Applied Data Science. The results of the analysis were to include topic and sentiment modelling across multiple years of survey data. Comparing multiple years was necessary to capture and visualize any trends that a human investigator may have missed while analysing the data by hand. During data cleaning all stop words and punctuation were removed, all text was brought to a lower case, names and inappropriate language – such as swear words – were deleted. About 80% of 24.000 records were manually labelled with sentiment; reminder was used for algorithms’ validation. In the following step a machine learning analysis steps: training, testing, outcomes analysis and visualisation, for a better text comprehension, were executed. Students aimed to improve classification accuracy by applying multiple sentiment analysis algorithms and topics modelling methods. The models were chosen arbitrarily, with a preference for a low complexity of a model. For reproducibility of our study open source tooling was used. One of these tools was based on Latent Dirichlet allocation (LDA). LDA is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar (Blei, Ng & Jordan, 2003). For topic modelling the Gensim (Řehůřek, 2011) method was used. Gensim is an open-source vector space modelling and topic modelling toolkit implemented in Python. In addition, we recognized the absence of pretrained models for Dutch language. To complete our prototype a simple user interface was created in Python. This final step integrated our automated text analysis with visualisations of sentiments and topics. Remarkably, all extracted topics are related to themes defined by the NSS. This indicates that in general students’ answers are related to topics of interest for educational institutions. The extracted list of the words related to the topic is also relevant to this topic. Despite the fact that most of the results require further human expert interpretation, it is indicative to conclude that the computational analysis of the texts from the open questions of the NSS contain information which enriches the results of standard quantitative analysis of the NSS.
DOCUMENT