Report of the project 'FAIR: geen woorden maar data' about the FAIRification of research data (in Dutch). It describes the proof of concept for implementation of the FAIR principles. The implementation is based on the resource description framework (RDF) and semantic knowledge representations using ontologies.
DOCUMENT
The ultimate goal of FAIR is to optimize the Reuse of data. To achieve this, metadata and data should be well-described and documented so that they can be replicated, understood and/or combined in different settings. Think of variable labels, codebooks, protocols and instruments used, attaching a license, etc. This checklist details what to include in a data package besides the data itself. The data package can be deposited in data repository such as UvA/HvA figshare.
DOCUMENT
Vanuit Fontys Hogescholen wordt veel onderzoek gedaan, met name door onderzoekers van de verschillende lectoraten. Vanzelfsprekend worden er binnen deze onderzoeken veel data verzameld en verwerkt. Fontys onderschrijft het belang van zorgvuldige omgang met onderzoeksdata en vraagt daarom van onderzoekers dat zij hun Research Data Management (RDM) op orde hebben. Denk hierbij aan veilige opslag en duurzame toegankelijkheid van data. Maar ook (open access) publiceren en archiveren van onderzoeksdata maken onderdeel uit van RDM. Hoe je hier als onderzoeker invulling aan geeft kan soms best een zoektocht zijn, mede doordat nog niet iedereen even bekend is met het onderwerp RDM. Met dit boek hopen we onderzoekers binnen Fontys de belangrijkste informatie te bieden die nodig is om goed invulling te geven aan Research Data Management en daarbij ook te wijzen op de ondersteuning die op dit gebied voorhanden is.
DOCUMENT
This method paper presents a template solution for text mining of scientific literature using the R tm package. Literature to be analyzed can be collected manually or automatically using the code provided with this paper. Once the literature is collected, the three steps for conducting text mining can be performed as outlined below:• loading and cleaning of text from articles,• processing, statistical analysis, and clustering, and• presentation of results using generalized and tailor-made visualizations.The text mining steps can be applied to a single, multiple, or time series groups of documents.References are provided to three published peer reviewed articles that use the presented text mining methodology. The main advantages of our method are: (1) Its suitability for both research and educational purposes, (2) Compliance with the Findable Accessible Interoperable and Reproducible (FAIR) principles, and (3) code and example data are made available on GitHub under the open-source Apache V2 license.
DOCUMENT
Academic cheating poses a significant challenge to conducting fair online assessments. One common way is collusion, where students unethically share answers during the assessment. While several researchers proposed solutions, there is lack of clarity regarding the specific types they target among the different types of collusion. Researchers have used statistical techniques to analyze basic attributes collected by the platforms, for collusion detection. Only few works have used machine learning, considering two or three attributes only; the use of limited features leading to reduced accuracy and increased risk of false accusations. In this work, we focus on In-Parallel Collusion, where students simultaneously work together on an assessment. For data collection, a quiz tool is improvised to capture clickstream data at a finer level of granularity. We use feature engineering to derive seven features and create a machine learning model for collusion detection. The results show: 1) Random Forest exhibits the best accuracy (98.8%), and 2) In contrast to less features as used in earlier works, the full feature set provides the best result; showing that considering multiple facets of similarity enhance the model accuracy. The findings provide platform designers and teachers with insights into optimizing quiz platforms and creating cheat-proof assessments.
DOCUMENT
What you don’t know can’t hurt you: this seems to be the current approach for responding to disinformation by public regulators across the world. Nobody is able to say with any degree of certainty what is actually going on. This is in no small part because, at present, public regulators don’t have the slightest idea how disinformation actually works in practice. We believe that there are very good reasons for the current state of affairs, which stem from a lack of verifiable data available to public institutions. If an election board or a media regulator wants to know what types of digital content are being shared in their jurisdiction, they have no effective mechanisms for finding this data or ensuring its veracity. While there are many other reasons why governments would want access to this kind of data, the phenomenon of disinformation provides a particularly salient example of the consequences of a lack of access to this data for ensuring free and fair elections and informed democratic participation. This chapter will provide an overview of the main aspects of the problems associated with basing public regulatory decisions on unverified data, before sketching out some ideas of what a solution might look like. In order to do this, the chapter develops the concept of auditing intermediaries. After discussing which problems the concept of auditing intermediaries is designed to solve, it then discusses some of the main challenges associated with access to data, potential misuse of intermediaries, and the general lack of standards for the provision of data by large online platforms. In conclusion, the chapter suggests that there is an urgent need for an auditing mechanism to ensure the accuracy of transparency data provided by large online platform providers about the content on their services. Transparency data that have been audited would be considered verified data in this context. Without such a transparency verification mechanism, existing public debate is based merely on a whim, and digital dominance is likely to only become more pronounced.
MULTIFILE
Over the past few years, there has been an explosion of data science as a profession and an academic field. The increasing impact and societal relevance of data science is accompanied by important questions that reflect this development: how can data science become more responsible and accountable while also responding to key challenges such as bias, fairness, and transparency in a rigorous and systematic manner? This Patterns special collection has brought together research and perspective from academia, the public and the private sector, showcasing original research articles and perspectives pertaining to responsible and accountable data science.
MULTIFILE
An example for the development of a potential Minimum Data Set (MDS) within the Urban Vitality (UV) themes ‘Gezond ouder worden / Mensen in Beweging’. The goal is to ensure more uniform collection of outcome measures, based on FAIR principles (ref 1), and to facilitate reuse of data and analyses spanning multiple studies. This prototype MDS is based on The Older Persons and Informal Caregivers Survey Minimum DataSet (TOPICS-MDS) (ref 2), the project FAIR: geen woorden maar data (ref 3) in which we examined 14 UV-studies about ageing and frailty of elderly, and the set of common data elements for rare disease registration (ref 4).
DOCUMENT
Although governments are investing heavily in big data analytics, reports show mixed results in terms of performance. Whilst big data analytics capability provided a valuable lens in business and seems useful for the public sector, there is little knowledge of its relationship with governmental performance. This study aims to explain how big data analytics capability led to governmental performance. Using a survey research methodology, an integrated conceptual model is proposed highlighting a comprehensive set of big data analytics resources influencing governmental performance. The conceptual model was developed based on prior literature. Using a PLS-SEM approach, the results strongly support the posited hypotheses. Big data analytics capability has a strong impact on governmental efficiency, effectiveness, and fairness. The findings of this paper confirmed the imperative role of big data analytics capability in governmental performance in the public sector, which earlier studies found in the private sector. This study also validated measures of governmental performance.
MULTIFILE
Analyzing historical decision-related data can help support actual operational decision-making processes. Decision mining can be employed for such analysis. This paper proposes the Decision Discovery Framework (DDF) designed to develop, adapt, or select a decision discovery algorithm by outlining specific guidelines for input data usage, classifier handling, and decision model representation. This framework incorporates the use of Decision Model and Notation (DMN) for enhanced comprehensibility and normalization to simplify decision tables. The framework’s efficacy was tested by adapting the C4.5 algorithm to the DM45 algorithm. The proposed adaptations include (1) the utilization of a decision log, (2) ensure an unpruned decision tree, (3) the generation DMN, and (4) normalize decision table. Future research can focus on supporting on practitioners in modeling decisions, ensuring their decision-making is compliant, and suggesting improvements to the modeled decisions. Another future research direction is to explore the ability to process unstructured data as input for the discovery of decisions.
MULTIFILE