We propose a combined visual and text-based programming environment based on the actor model suitable for novice to expert programmers. This model encompasses simple communicating entities which easily scale from utilizing threads inside the computer to massive distributed computer systems. To design our proposed environment we classify different levels of programming users encounter when dealing with technologies in creative scenarios. We use this classification system as a foundation to design our proposed environment to support (novice) users on their way to a next level. This framework not only intends to support modern computing power through a concurrent programming paradigm, but is also intended to let users interact with it on the different classification levels.
A common strategy to assign keywords to documents is to select the most appropriate words from the document text. One of the most important criteria for a word to be selected as keyword is its relevance for the text. The tf.idf score of a term is a widely used relevance measure. While easy to compute and giving quite satisfactory results, this measure does not take (semantic) relations between words into account. In this paper we study some alternative relevance measures that do use relations between words. They are computed by defining co-occurrence distributions for words and comparing these distributions with the document and the corpus distribution. We then evaluate keyword extraction algorithms defined by selecting different relevance measures. For two corpora of abstracts with manually assigned keywords, we compare manually extracted keywords with different automatically extracted ones. The results show that using word co-occurrence information can improve precision and recall over tf.idf.
Objective:Acknowledging study limitations in a scientific publication is a crucial element in scientific transparency and progress. However, limitation reporting is often inadequate. Natural language processing (NLP) methods could support automated reporting checks, improving research transparency. In this study, our objective was to develop a dataset and NLP methods to detect and categorize self-acknowledged limitations (e.g., sample size, blinding) reported in randomized controlled trial (RCT) publications.Methods:We created a data model of limitation types in RCT studies and annotated a corpus of 200 full-text RCT publications using this data model. We fine-tuned BERT-based sentence classification models to recognize the limitation sentences and their types. To address the small size of the annotated corpus, we experimented with data augmentation approaches, including Easy Data Augmentation (EDA) and Prompt-Based Data Augmentation (PromDA). We applied the best-performing model to a set of about 12K RCT publications to characterize self-acknowledged limitations at larger scale.Results:Our data model consists of 15 categories and 24 sub-categories (e.g., Population and its sub-category DiagnosticCriteria). We annotated 1090 instances of limitation types in 952 sentences (4.8 limitation sentences and 5.5 limitation types per article). A fine-tuned PubMedBERT model for limitation sentence classification improved upon our earlier model by about 1.5 absolute percentage points in F1 score (0.821 vs. 0.8) with statistical significance (). Our best-performing limitation type classification model, PubMedBERT fine-tuning with PromDA (Output View), achieved an F1 score of 0.7, improving upon the vanilla PubMedBERT model by 2.7 percentage points, with statistical significance ().Conclusion:The model could support automated screening tools which can be used by journals to draw the authors’ attention to reporting issues. Automatic extraction of limitations from RCT publications could benefit peer review and evidence synthesis, and support advanced methods to search and aggregate the evidence from the clinical trial literature.
MULTIFILE
Organisations are increasingly embedding Artificial Intelligence (AI) techniques and tools in their processes. Typical examples are generative AI for images, videos, text, and classification tasks commonly used, for example, in medical applications and industry. One danger of the proliferation of AI systems is the focus on the performance of AI models, neglecting important aspects such as fairness and sustainability. For example, an organisation might be tempted to use a model with better global performance, even if it works poorly for specific vulnerable groups. The same logic can be applied to high-performance models that require a significant amount of energy for training and usage. At the same time, many organisations recognise the need for responsible AI development that balances performance with fairness and sustainability. This KIEM project proposal aims to develop a tool that can be employed by organizations that develop and implement AI systems and aim to do so more responsibly. Through visual aiding and data visualisation, the tool facilitates making these trade-offs. By showing what these values mean in practice, which choices could be made and highlighting the relationship with performance, we aspire to educate users on how the use of different metrics impacts the decisions made by the model and its wider consequences, such as energy consumption or fairness-related harms. This tool is meant to facilitate conversation between developers, product owners and project leaders to assist them in making their choices more explicit and responsible.