The platform for open and practice-oriented research

product

An original template solution for FAIR scientific text mining

This method paper presents a template solution for text mining of scientific literature using the R tm package. Literature to be analyzed can be collected manually or automatically using the code provided with this paper. Once the literature is collected, the three steps for conducting text mining can be performed as outlined below:• loading and cleaning of text from articles,• processing, statistical analysis, and clustering, and• presentation of results using generalized and tailor-made visualizations.The text mining steps can be applied to a single, multiple, or time series groups of documents.References are provided to three published peer reviewed articles that use the presented text mining methodology. The main advantages of our method are: (1) Its suitability for both research and educational purposes, (2) Compliance with the Findable Accessible Interoperable and Reproducible (FAIR) principles, and (3) code and example data are made available on GitHub under the open-source Apache V2 license.

DOCUMENT

An original template solution for FAIR scientific text mining

product

Considering Human Interaction and Variability in Automatic Text Simplification

Research into automatic text simplification aims to promote access to information for all members of society. To facilitate generalizability, simplification research often abstracts away from specific use cases, and targets a prototypical reader and an underspecified content creator. In this paper, we consider a real-world use case – simplification technology for use in Dutch municipalities – and identify the needs of the content creators and the target audiences in this scenario. The stakeholders envision a system that (a) assists the human writer without taking over the task; (b) provides diverse outputs, tailored for specific target audiences; and (c) explains the suggestions that it outputs. These requirements call for technology that is characterized by modularity, explainability, and variability. We argue that these are important research directions that require further exploration

MULTIFILE

Considering Human Interaction and Variability in Automatic Text Simplification

product

The controllability classification of safety events and its application to aviation investigation reports

This paper proposes an amendment of the classification of safety events based on their controllability and contemplates the potential of an event to escalate into higher severity classes. It considers (1) whether the end-user had the opportunity to intervene into the course of an event, (2) the level of end-user familiarity with the situation, and (3) the positive or negative effects of end-user intervention against expected outcomes. To examine its potential, we applied the refined classification to 296 aviation safety investigation reports. The results suggested that pilots controlled only three-quarters of the occurrences, more than three-thirds of the controlled cases regarded fairly unfamiliar situations, and the flight crews succeeded to mitigate the possible negative consequences of events in about 71% of the cases. Further statistical tests showed that the controllability-related characteristics of events had not significantly changed over time, and they varied across regions, aircraft, operational and event characteristics, as well as when fatigue had contributed to the occurrences. Overall, the findings demonstrated the value of using the controllability classification before considering the actual outcomes of events as means to support the identification of system resilience and successes. The classification can also be embedded in voluntary reporting systems to allow end-users to express the degree of each of the controllability characteristics so that management can monitor them over time and perform internal and external benchmarking. The mandatory reports concerned, the classification could function as a decision-making parameter for prioritising incident investigations.

DOCUMENT

The controllability classification of safety events and its application to aviation investigation reports

product

An introduction of accidents’ classification based on their outcome control

Most safety oriented organizations have established their accidents classification taking into account the magnitude of the combined adverse outcomes on humans, assets and the environment without considering the accidents‟ potential and the actual attempts of the involved persons to intervene with the accident progress. The specific research exploited a large sample of an aviation organization accident records for an11 years‟ time period and employed frequency and chi-square analyses to test a new accident classification scheme based on the distinction among the safety events with or without human intervention on the accident scene, indicating the management or not of their ultimate consequences. Furthermore, the research depicted the effectiveness of personnel strains to alleviate the accident potential outcomes and studied the contribution of time, local and complexity factors on the accident control attempt and the humans‟ positive or negative interference. The specific newly proposed accident classification successfully addressed the “controlled” or “uncontrolled” traits of the safety events studies, prior their severities consideration, and unveiled the effectiveness of personnel efforts to compensate for the adverse accident march. The portion between controlled and uncontrolled accidents in terms of the human intervention along with the effectiveness of the later may comprise a useful safety performance indicator that can be adopted by any industry sector and may be recommended through international and state safety related authorities.

DOCUMENT

An introduction of accidents’ classification based on their outcome control

product

A Classification of Modification Categories for Business Rules

Author supplied Business rules play a critical role in an organization’s daily activities. With the increased use of business rules (solutions) the interest in modelling guidelines that address the manageability of business rules has increased as well. However, current research on modelling guidelines is mainly based on a theoretical view of modifications that can occur to a business rule set. Research on actual modifications that occur in practice is limited. The goal of this study is to identify modifications that can occur to a business rule set and underlying business rules. To accomplish this goal we conducted a grounded theory study on 229 rules set, as applied from March 2006 till June 2014, by the National Health Service. In total 3495 modifications have been analysed from which we defined eleven modification categories that can occur to a business rule set. The classification provides a framework for the analysis and design of business rules management architectures.

DOCUMENT

A Classification of Modification Categories for Business Rules

product

Exploring Bias in Data and Models for Misinformation Detection from Text

With the proliferation of misinformation on the web, automatic misinformation detection methods are becoming an increasingly important subject of study. Large language models have produced the best results among content-based methods, which rely on the text of the article rather than the metadata or network features. However, finetuning such a model requires significant training data, which has led to the automatic creation of large-scale misinformation detection datasets. In these datasets, articles are not labelled directly. Rather, each news site is labelled for reliability by an established fact-checking organisation and every article is subsequently assigned the corresponding label based on the reliability score of the news source in question. A recent paper has explored the biases present in one such dataset, NELA-GT-2018, and shown that the models are at least partly learning the stylistic and other features of different news sources rather than the features of unreliable news. We confirm a part of their findings. Apart from studying the characteristics and potential biases of the datasets, we also find it important to examine in what way the model architecture influences the results. We therefore explore which text features or combinations of features are learned by models based on contextual word embeddings as opposed to basic bag-of-words models. To elucidate this, we perform extensive error analysis aided by the SHAP post-hoc explanation technique on a debiased portion of the dataset. We validate the explanation technique on our inherently interpretable baseline model.

DOCUMENT

product

Classification of condom lubricants in cyanoacrylate treated fingerprints by desorption electrospray ionization mass spectrometry

Traces of condom lubricants in fingerprints can be valuable information in cases of sexual assault. Ideally, not only confirmation of the presence of the condom but also determination of the type of condom brand used can be retrieved. Previous studies have shown to be able to retrieve information about the condom brand and type from fingerprints containing lubricants using various analytical techniques. However, in practice fingerprints often appear latent and need to be detected first, which is often achieved by cyanoacrylate fuming. In this study, we developed a desorption electrospray ionization mass spectrometry (DESI-MS) method which, combined with principal component analysis and linear discriminant analysis (PCA-LDA), allows for high accuracy classification of condom brands and types from fingerprints containing condom lubricant traces. The developed method is compatible with cyanoacrylate (CA) fuming. We collected and analyzed a representative dataset for the Netherlands comprising 32 different condoms. Distinctive lubricant components such as polyethylene glycol (PEG), polydimethylsiloxane (PDMS), octoxynol-9 and nonoxynol-9 were readily detected using the DESI-MS method. Based on the analysis of lubricant spots, a 99.0% classification accuracy was achieved. When analyzing lubricant containing fingerprints, an overall accuracy of 90.9% was obtained. Full chemical images could be generated from fingerprints, showing the distribution of lubricant components such as PEG and PDMS throughout the fingerprint, while still allowing for classification. The developed method shows potential for the development of DESI-MS based analyses of CA treated exogenous compounds from fingerprints for use in forensic science.

MULTIFILE

Classification of condom lubricants in cyanoacrylate treated fingerprints by desorption electrospray ionization mass spectrometry

product

Automatic categorization of self-acknowledged limitations in randomized controlled trial publications

Objective:Acknowledging study limitations in a scientific publication is a crucial element in scientific transparency and progress. However, limitation reporting is often inadequate. Natural language processing (NLP) methods could support automated reporting checks, improving research transparency. In this study, our objective was to develop a dataset and NLP methods to detect and categorize self-acknowledged limitations (e.g., sample size, blinding) reported in randomized controlled trial (RCT) publications.Methods:We created a data model of limitation types in RCT studies and annotated a corpus of 200 full-text RCT publications using this data model. We fine-tuned BERT-based sentence classification models to recognize the limitation sentences and their types. To address the small size of the annotated corpus, we experimented with data augmentation approaches, including Easy Data Augmentation (EDA) and Prompt-Based Data Augmentation (PromDA). We applied the best-performing model to a set of about 12K RCT publications to characterize self-acknowledged limitations at larger scale.Results:Our data model consists of 15 categories and 24 sub-categories (e.g., Population and its sub-category DiagnosticCriteria). We annotated 1090 instances of limitation types in 952 sentences (4.8 limitation sentences and 5.5 limitation types per article). A fine-tuned PubMedBERT model for limitation sentence classification improved upon our earlier model by about 1.5 absolute percentage points in F1 score (0.821 vs. 0.8) with statistical significance (). Our best-performing limitation type classification model, PubMedBERT fine-tuning with PromDA (Output View), achieved an F1 score of 0.7, improving upon the vanilla PubMedBERT model by 2.7 percentage points, with statistical significance ().Conclusion:The model could support automated screening tools which can be used by journals to draw the authors’ attention to reporting issues. Automatic extraction of limitations from RCT publications could benefit peer review and evidence synthesis, and support advanced methods to search and aggregate the evidence from the clinical trial literature.

MULTIFILE

Automatic categorization of self-acknowledged limitations in randomized controlled trial publications

product

Explainable misinformation detection from text

DOCUMENT

Search results

Products 1.031

An original template solution for FAIR scientific text mining

Considering Human Interaction and Variability in Automatic Text Simplification

The controllability classification of safety events and its application to aviation investigation reports

An introduction of accidents’ classification based on their outcome control

A Classification of Modification Categories for Business Rules

Exploring Bias in Data and Models for Misinformation Detection from Text

Classification of condom lubricants in cyanoacrylate treated fingerprints by desorption electrospray ionization mass spectrometry

Automatic categorization of self-acknowledged limitations in randomized controlled trial publications

Explainable misinformation detection from text

Projects 2

How to inherit stories? Artistic Research as Constructive and Critical Memory Work

Responsible Applied Artificial Intelligence Trade-Off Dashboard

Navigate to

Categories

Filters

Products 1.031

An original template solution for FAIR scientific text mining

Considering Human Interaction and Variability in Automatic Text Simplification

The controllability classification of safety events and its application to aviation investigation reports

An introduction of accidents’ classification based on their outcome control

A Classification of Modification Categories for Business Rules

Exploring Bias in Data and Models for Misinformation Detection from Text

Classification of condom lubricants in cyanoacrylate treated fingerprints by desorption electrospray ionization mass spectrometry

Automatic categorization of self-acknowledged limitations in randomized controlled trial publications

Explainable misinformation detection from text

Projects 2

How to inherit stories? Artistic Research as Constructive and Critical Memory Work

Responsible Applied Artificial Intelligence Trade-Off Dashboard