The platform for open and practice-oriented research

58 Results for 'Sentence classification'

Sort:Relevance

product

Toward assessing clinical trial publications for reporting transparency

Objective: To annotate a corpus of randomized controlled trial (RCT) publications with the checklist items of CONSORT reporting guidelines and using the corpus to develop text mining methods for RCT appraisal. Methods: We annotated a corpus of 50 RCT articles at the sentence level using 37 fine-grained CONSORT checklist items. A subset (31 articles) was double-annotated and adjudicated, while 19 were annotated by a single annotator and reconciled by another. We calculated inter-annotator agreement at the article and section level using MASI (Measuring Agreement on Set-Valued Items) and at the CONSORT item level using Krippendorff's α. We experimented with two rule-based methods (phrase-based and section header-based) and two supervised learning approaches (support vector machine and BioBERT-based neural network classifiers), for recognizing 17 methodology-related items in the RCT Methods sections. Results: We created CONSORT-TM consisting of 10,709 sentences, 4,845 (45%) of which were annotated with 5,246 labels. A median of 28 CONSORT items (out of possible 37) were annotated per article. Agreement was moderate at the article and section levels (average MASI: 0.60 and 0.64, respectively). Agreement varied considerably among individual checklist items (Krippendorff's α= 0.06–0.96). The model based on BioBERT performed best overall for recognizing methodology-related items (micro-precision: 0.82, micro-recall: 0.63, micro-F1: 0.71). Combining models using majority vote and label aggregation further improved precision and recall, respectively. Conclusion: Our annotated corpus, CONSORT-TM, contains more fine-grained information than earlier RCT corpora. Low frequency of some CONSORT items made it difficult to train effective text mining models to recognize them. For the items commonly reported, CONSORT-TM can serve as a testbed for text mining methods that assess RCT transparency, rigor, and reliability, and support methods for peer review and authoring assistance. Minor modifications to the annotation scheme and a larger corpus could facilitate improved text mining models. CONSORT-TM is publicly available at https://github.com/kilicogluh/CONSORT-TM.

DOCUMENT

product

Automatic categorization of self-acknowledged limitations in randomized controlled trial publications

Objective:Acknowledging study limitations in a scientific publication is a crucial element in scientific transparency and progress. However, limitation reporting is often inadequate. Natural language processing (NLP) methods could support automated reporting checks, improving research transparency. In this study, our objective was to develop a dataset and NLP methods to detect and categorize self-acknowledged limitations (e.g., sample size, blinding) reported in randomized controlled trial (RCT) publications.Methods:We created a data model of limitation types in RCT studies and annotated a corpus of 200 full-text RCT publications using this data model. We fine-tuned BERT-based sentence classification models to recognize the limitation sentences and their types. To address the small size of the annotated corpus, we experimented with data augmentation approaches, including Easy Data Augmentation (EDA) and Prompt-Based Data Augmentation (PromDA). We applied the best-performing model to a set of about 12K RCT publications to characterize self-acknowledged limitations at larger scale.Results:Our data model consists of 15 categories and 24 sub-categories (e.g., Population and its sub-category DiagnosticCriteria). We annotated 1090 instances of limitation types in 952 sentences (4.8 limitation sentences and 5.5 limitation types per article). A fine-tuned PubMedBERT model for limitation sentence classification improved upon our earlier model by about 1.5 absolute percentage points in F1 score (0.821 vs. 0.8) with statistical significance (). Our best-performing limitation type classification model, PubMedBERT fine-tuning with PromDA (Output View), achieved an F1 score of 0.7, improving upon the vanilla PubMedBERT model by 2.7 percentage points, with statistical significance ().Conclusion:The model could support automated screening tools which can be used by journals to draw the authors’ attention to reporting issues. Automatic extraction of limitations from RCT publications could benefit peer review and evidence synthesis, and support advanced methods to search and aggregate the evidence from the clinical trial literature.

MULTIFILE

Automatic categorization of self-acknowledged limitations in randomized controlled trial publications

product

Multi-dimensional deconstruction and theme evolution of China’s energy policy

Energy policies are vital tools used by countries to regulate economic and social development as well as guarantee national security. To address the problems of fragmented policy objectives, conflicting tools, and overlapping initiatives, the internal logic and evolutionary trends of energy policies must be explored using the policy content. This study uses 38,277 energy policies as a database and summarizes the four energy policy objectives: clean, low-carbon, safe, and efficient. Using the TextCNN model to classify and deconstruct policies, the LDA + Word2vec theme conceptualization and similarity calculations were compared with the EISMD evolution framework to determine the energy policy theme evolution path. Results indicate that the density of energy policies has increased. Policies have become more comprehensive, barriers between objectives have gradually been broken, and low-carbon objectives have been strengthened. The evolution types are more diversified, evolution paths are more complicated, and the evolution types are often related to technology, industry, and market maturity. Traditional energy themes evolve through inheritance and merger; emerging technology and industry themes evolve through innovation, inheritance, and splitting. Moreover, this study provides a replicable analytical framework for the study of policy evolution in other sectors and evidence for optimizing energy policy design

DOCUMENT

Multi-dimensional deconstruction and theme evolution of China’s energy policy

product

Automatic recognition of self-acknowledged limitations in clinical research literature

Objective: To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency.Methods: To develop our recognition methods, we used a set of 8431 sentences from 1197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing, and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM).Results: Annotators had good agreement in labeling limitation sentences (Krippendorff's α = 0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs 89.6%, 95% CI [88.1-91.1]).Conclusions: The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.

DOCUMENT

product

Background: Early detection and remediation of language disorders are important in helping children to establish appropriate communicative and social behaviour and acquire additional information about the world through the use of language. In the Netherlands, children with (a suspicion of) language disorders are referred to speech and hearing centres for multidisciplinary assessment. Reliable data are needed on the nature of language disorders, as well as the age and source of referral, and the effects of cultural and socioeconomic profiles of the population served in order to plan speech and language therapy service provision. Aims: To provide a detailed description of caseload characteristics of children referred with a possible language disorder by generating more understanding of factors that might influence early identification. Methods & Procedures: A database of 11,450 children was analysed consisting of data on children, aged 2–7 years (70% boys, 30% girls), visiting Dutch speech and hearing centres. The factors analysed were age of referral, ratio of boys to girls, mono‐ and bilingualism, nature of the language delay, and language profile of the children. Outcomes & Results:Results revealed an age bias in the referral of children with language disorders. On average, boys were referred 5 months earlier than girls, and monolingual children were referred 3 months earlier than bilingual children. In addition, bilingual children seemed to have more complex problems at referral than monolingual children. They more often had both a disorder in both receptive and expressive language, and a language disorder with additional (developmental) problems. Conclusions & Implications: This study revealed a bias in age of referral of young children with language disorders. The results implicate the need for objective language screening instruments and the need to increase the awareness of staff in primary child healthcare of red flags in language development of girls and multilingual children aiming at earlier identification of language disorders in these children.

DOCUMENT

Children with language delay referred to Dutch speech and hearing centres: caseload characteristics

product

Children with language delay referred to Dutch speech and hearing centres: caseload characteristics

DOCUMENT

Search results

58 Results for 'Sentence classification'

Toward assessing clinical trial publications for reporting transparency

Automatic categorization of self-acknowledged limitations in randomized controlled trial publications

Multi-dimensional deconstruction and theme evolution of China’s energy policy

Automatic recognition of self-acknowledged limitations in clinical research literature

Cluttering identified: differential diagnostics between cluttering, stuttering and speech impairment related to learning difficulties

On becoming a good probation worker

What feedback do reviewers give when reviewing qualitative manuscripts? A focused mapping review and synthesis

Computerized assessment of the acoustics of progressive aphasia

Considering Human Interaction and Variability in Automatic Text Simplification

Children with language delay referred to Dutch speech and hearing centres: caseload characteristics

Navigate to

Categories

Filters

Productsfilters

58 Results for 'Sentence classification'

Toward assessing clinical trial publications for reporting transparency

Automatic categorization of self-acknowledged limitations in randomized controlled trial publications

Multi-dimensional deconstruction and theme evolution of China’s energy policy

Automatic recognition of self-acknowledged limitations in clinical research literature

Cluttering identified: differential diagnostics between cluttering, stuttering and speech impairment related to learning difficulties

On becoming a good probation worker

What feedback do reviewers give when reviewing qualitative manuscripts? A focused mapping review and synthesis

Computerized assessment of the acoustics of progressive aphasia

Considering Human Interaction and Variability in Automatic Text Simplification

Children with language delay referred to Dutch speech and hearing centres: caseload characteristics