Objective: To annotate a corpus of randomized controlled trial (RCT) publications with the checklist items of CONSORT reporting guidelines and using the corpus to develop text mining methods for RCT appraisal. Methods: We annotated a corpus of 50 RCT articles at the sentence level using 37 fine-grained CONSORT checklist items. A subset (31 articles) was double-annotated and adjudicated, while 19 were annotated by a single annotator and reconciled by another. We calculated inter-annotator agreement at the article and section level using MASI (Measuring Agreement on Set-Valued Items) and at the CONSORT item level using Krippendorff's α. We experimented with two rule-based methods (phrase-based and section header-based) and two supervised learning approaches (support vector machine and BioBERT-based neural network classifiers), for recognizing 17 methodology-related items in the RCT Methods sections. Results: We created CONSORT-TM consisting of 10,709 sentences, 4,845 (45%) of which were annotated with 5,246 labels. A median of 28 CONSORT items (out of possible 37) were annotated per article. Agreement was moderate at the article and section levels (average MASI: 0.60 and 0.64, respectively). Agreement varied considerably among individual checklist items (Krippendorff's α= 0.06–0.96). The model based on BioBERT performed best overall for recognizing methodology-related items (micro-precision: 0.82, micro-recall: 0.63, micro-F1: 0.71). Combining models using majority vote and label aggregation further improved precision and recall, respectively. Conclusion: Our annotated corpus, CONSORT-TM, contains more fine-grained information than earlier RCT corpora. Low frequency of some CONSORT items made it difficult to train effective text mining models to recognize them. For the items commonly reported, CONSORT-TM can serve as a testbed for text mining methods that assess RCT transparency, rigor, and reliability, and support methods for peer review and authoring assistance. Minor modifications to the annotation scheme and a larger corpus could facilitate improved text mining models. CONSORT-TM is publicly available at https://github.com/kilicogluh/CONSORT-TM.
E-discovery projects typically start with an assessment of the collected electronic data in order to estimate the risk to prosecute or defend a legal case. This is not a review task but is appropriately called early case assessment, which is better known as exploratory search in the information retrieval community. This paper first describes text mining methodologies that can be used for enhancing exploratory search. Based on these ideas we present a semantic search dashboard that includes entities that are relevant to investigators such as who knew who, what, where and when. We describe how this dashboard can be powered by results from our ongoing research in the “Semantic Search for E-Discovery” project on topic detection and clustering, semantic enrichment of user profiles, email recipient recommendation, expert finding and identity extraction from digital forensic evidence.
MULTIFILE
'The Data Tales' is een langdurig samenwerkingsverband van onderzoekers en bedrijven die samen projecten uitvoeren en vragen beantwoorden als: hoe kan data ons helpen de relatie met klanten te verbeteren en hoe beschermen we privacy van de klant als we die klant ook beter van dienst zijn willen zijn met data technieken?Doel The Data Tales consortium wil bedrijven helpen om beter met hun klanten om te gaan. Want als bedrijven naar hun klanten luisteren, versterken ze hun band. Techniek biedt allerlei opties om sneller, gerichter en zinvoller te reageren op de behoeften van klanten. Daarbij moeten de toon, inhoud en presentatie van de boodschap aansluiten bij de geadresseerde. Het doel van The Data Tales is om samen met onderwijs, bedrijven en technologie-ontwikkelaars te werken aan technieken om direct inzicht te geven in hoe hun klanten de interactie met organisaties ervaren. Daarbij wordt altijd gewerkt volgens het principe 'ethics by design' Resultaten Consortium The Data Tales vormde de basis voor het KIEM-project, VERBIND. Dat staat voor verantwoorde, belevingsgerichte interactie op basis van data-analyse. VERBIND brengt meerdere invalshoeken samen. We kijken niet alleen naar wat technisch mogelijk is bij dataverzameling, maar ook naar ethische keuzes die bedrijven maken. Op thedatatales.org lees je meer over het project VERBIND. Looptijd 01 januari 2018 - 31 december 2020 Aanpak In het Data Tales consortium komen de volgende vakgebieden samen: Customer Journey & marketing Data Science, waaronder process mining, text mining en andere vormen van data mining Recht en Ethiek, waaronder AVG Gedragswetenschappen ICT
Under the umbrella of artistic sustenance, I question the life of materials, subjective value structures, and working conditions underlying exhibition making through three interconnected areas of inquiry: Material Life and Ecological Impact — how to avoid the accumulation of physical materials/storage after exhibitions? I aim to highlight the provenance and afterlife of exhibition materials in my practice, seeking economic and ecological alternatives to traditional practices through sustainable solutions like borrowing, reselling, and alternative storage methods that could transform exhibition material handling and thoughts on material storage and circulation. Value Systems and Economic Conditions —what do we mean when we talk about 'value' in relation to art? By examining the flow of financial value in contemporary art and addressing the subjectivity of worth in art-making and artists' livelihoods, I question traditional notions of sculptural skill while advocating for recognition of conceptual labour. The research considers how artists might be compensated for the elegance of thought rather than just material output. Text as Archive and Speculation— how can text can store, speculate, and circulate the invisible labour and layers of exhibition making? Through titles, material lists, and exhibition texts, I explore writing's potential to uncover latent structures and document invisible labor, considering text both as an archiving method and a tool for speculating about future exhibitions. Using personal practice as a case study, ‘Conditions for Raw Materials’ seeks to question notions of value in contemporary art, develop alternative economic models, and make visible the material, financial, and relational flows within exhibitions. The research will manifest through international exhibitions, a book combining poetic auto-theoretical reflection with exhibition speculation, new teaching formats, and long-term investigations. Following “sticky relations," of intimacy, economy and conditions, each exhibition serves as a case study exploring exhibition making from emotional, ecological, and economic perspectives.
In order to stay competitive and respond to the increasing demand for steady and predictable aircraft turnaround times, process optimization has been identified by Maintenance, Repair and Overhaul (MRO) SMEs in the aviation industry as their key element for innovation. Indeed, MRO SMEs have always been looking for options to organize their work as efficient as possible, which often resulted in applying lean business organization solutions. However, their aircraft maintenance processes stay characterized by unpredictable process times and material requirements. Lean business methodologies are unable to change this fact. This problem is often compensated by large buffers in terms of time, personnel and parts, leading to a relatively expensive and inefficient process. To tackle this problem of unpredictability, MRO SMEs want to explore the possibilities of data mining: the exploration and analysis of large quantities of their own historical maintenance data, with the meaning of discovering useful knowledge from seemingly unrelated data. Ideally, it will help predict failures in the maintenance process and thus better anticipate repair times and material requirements. With this, MRO SMEs face two challenges. First, the data they have available is often fragmented and non-transparent, while standardized data availability is a basic requirement for successful data analysis. Second, it is difficult to find meaningful patterns within these data sets because no operative system for data mining exists in the industry. This RAAK MKB project is initiated by the Aviation Academy of the Amsterdam University of Applied Sciences (Hogeschool van Amsterdan, hereinafter: HvA), in direct cooperation with the industry, to help MRO SMEs improve their maintenance process. Its main aim is to develop new knowledge of - and a method for - data mining. To do so, the current state of data presence within MRO SMEs is explored, mapped, categorized, cleaned and prepared. This will result in readable data sets that have predictive value for key elements of the maintenance process. Secondly, analysis principles are developed to interpret this data. These principles are translated into an easy-to-use data mining (IT)tool, helping MRO SMEs to predict their maintenance requirements in terms of costs and time, allowing them to adapt their maintenance process accordingly. In several case studies these products are tested and further improved. This is a resubmission of an earlier proposal dated October 2015 (3rd round) entitled ‘Data mining for MRO process optimization’ (number 2015-03-23M). We believe the merits of the proposal are substantial, and sufficient to be awarded a grant. The text of this submission is essentially unchanged from the previous proposal. Where text has been added – for clarification – this has been marked in yellow. Almost all of these new text parts are taken from our rebuttal (hoor en wederhoor), submitted in January 2016.