In this post I give an overview of the theory, tools, frameworks and best practices I have found until now around the testing (and debugging) of machine learning applications. I will start by giving an overview of the specificities of testing machine learning applications.
LINK
A previous study found a variety of unusual sexual interests to cluster in a five-factor structure, namely submission/masochism, forbidden sexual activities, dominance / sadism, mysophilia, and fetishism (Schippers et al., 2021). The current study was an empirical replication to examine whether these findings generalized to a representative population sample. An online, anonymous sample (N = 256) representative of the Dutch adult male population rated 32 unusual sexual interests on a scale from 1 (very unappealing) to 7 (very appealing). An exploratory factor analysis assessed whether similar factors would emerge as in the original study. A subsequent confirmatory factor analysis served to confirm the factor structure. Four slightly different factors of sexual interest were found: extreme, illegal and mysophilic sexual activities; light BDSM without real pain or suffering; heavy BDSM that may include pain or suffering; and illegal but lower-sentenced and fetishistic sexual activities. The model fit was acceptable. The representative replication sample was more sexually conservative and showed less sexual engagement than the original convenience sample. On a fundamental level, sexual interest in light BDSM activities and extreme, forbidden, and mysophilic activities seem to be relatively separate constructs.
DOCUMENT
Purpose – purpose of this article is to report about the progress of the development of a method that makes sense of knowledge productivity, in order to be able to give direction to knowledge management initiatives. Methodology/approach – the development and testing of the method is based on the paradigm of the Design Sciences. In order to increase the objectivity of the research findings, and in order to test the transferability of the method, this article suggests a methodology for beta testing. Findings – based on the experiences within this research, the concept of beta testing seems to fit Design Science Research very well. Moreover, applying this concept within this research resulted in valuable findings for further development of the method. Research implications – this is the first article that explicitly applies the concept of beta testing to the process of developing solution concepts. Originality/value – this article contributes to the further operationalization of the relatively new concept of knowledge productivity. From a methodological point of view, this article aims to contribute to the paradigm of the Design Sciences in general, and the concept of beta testing in particular.
DOCUMENT
In this study, we investigated the effects of wearing a police uniform and gear on officers’ performance during the Physical Competence Test (PCT) of the Dutch National Police. In a counterbalanced within-subjects design, twenty-seven police officers performed the PCT twice, once wearing sportswear and once wearing a police uniform. The results showed clear indications that wearing a police uniform influenced the performance on the PCT. Participants were on average 14 seconds slower in a police uniform than in sportswear. Furthermore, performing the test in uniform was accompanied by higher RPE-scores and total physiological load. It seems that wearing a police uniform during the test diminishes the discrepancy between physical fitness needed to pass the simulated police tasks in the PCT and the job-specific physical fitness that is required during daily police work. This suggests that wearing a police uniform during the test will increase the representativeness of the testing environment for the work field.
DOCUMENT
While the original definition of replacement focuses on the replacement of the use of animals in science, a more contemporary definition focuses on accelerating the development and use of predictive and robust models, based on the latest science and technologies, to address scientific questions without the use of animals. The transition to animal free innovation is on the political agenda in and outside the European Union. The Beyond Animal Testing Index (BATI) is a benchmarking instrument designed to provide insight into the activities and contributions of research institutes to the transition to animal free innovation. The BATI allows participating organizations to learn from each other and stimulates continuous improvement. The BATI was modelled after the Access to Medicine Index, which benchmarks pharmaceutical companies on their efforts to make medicines widely available in developing countries. A prototype of the BATI was field-tested with three Dutch academic medical centers and two universities in 2020-2021. The field test demonstrated the usability and effectiveness of the BATI as a benchmarking tool. Analyses were performed across five different domains. The participating institutes concluded that the BATI served as an internal as well as an external stimulus to share, learn, and improve institutional strategies towards the transition to animal free innovation. The BATI also identified gaps in the development and implementation of 3R technologies. Hence, the BATI might be a suitable instrument for monitoring the effectiveness of policies. BATI version 1.0 is ready to be used for benchmarking at a larger scale.
DOCUMENT
This systematic review aims to get insight into the feasibility of cardiopulmonary exercise testing (CPET) in patients with cancer prior to a physical exercise programme. We will focus on quality (defined as the adherence to international guidelines for methods of CPET) and safety of CPET. Furthermore, we compare the peak oxygen uptake (V̇O2peak) values of patients with cancer with reference values for healthy persons to put these values into a clinical perspective. A computer aided search with ‘cardiopulmonary exercise testing’ and ‘cancer’ using MEDLINE, EMBASE, Pedro, CINAHL® and SPORTDiscus™ was carried out. We included studies in which CPET with continuous gas exchange analysis has been performed prior to a physical exercise programme in adults with cancer. Twenty studies describing 1158 patients were eligible. Reported adherence to international recommendations for CPET varied per item. In most studies, the methods of CPET were not reported in detail. Adverse events occurred in 1% of patients. The percentage V̇O2peak of reference values for healthy persons varied between 65% and 89% for tests before treatment, between 74% and 96% for tests during treatment and between 52% and 117% for tests after treatment. Our results suggest that CPET is feasible and seems to be safe for patients with cancer prior to a physical exercise programme. We recommend that standard reporting and quality guidelines should be followed for CPET methods. The decreased V̇O2peak values of patients with cancer indicate that physical exercise should be implemented in their standard care.
DOCUMENT
Implementation of reliable methodologies allowing Reduction, Refinement, and Replacement (3Rs) of animal testing is a process that takes several decades and is still not complete. Reliable methods are essential for regulatory hazard assessment of chemicals where differences in test protocol can influence the test outcomes and thus affect the confidence in the predictive value of the organisms used as an alternative for mammals. Although test guidelines are common for mammalian studies, they are scarce for non-vertebrate organisms that would allow for the 3Rs of animal testing. Here, we present a set of 30 reporting criteria as the basis for such a guideline for Developmental and Reproductive Toxicology (DART) testing in the nematode Caenorhabditis elegans. Small organisms like C. elegans are upcoming in new approach methodologies for hazard assessment; thus, reliable and robust test protocols are urgently needed. A literature assessment of the fulfilment of the reporting criteria demonstrates that although studies describe methodological details, essential information such as compound purity and lot/batch number or type of container is often not reported. The formulated set of reporting criteria for C. elegans testing can be used by (i) researchers to describe essential experimental details (ii) data scientists that aggregate information to assess data quality and include data in aggregated databases (iii) regulators to assess study data for inclusion in regulatory hazard assessment of chemicals.
DOCUMENT
From an evidence-based perspective, cardiopulmonary exercise testing (CPX) is a well-supported assessment technique in both the United States (US) and Europe. The combination of standard exercise testing (ET) (ie, progressive exercise provocation in association with serial electrocardiograms [ECG], hemodynamics, oxygen saturation, and subjective symptoms) and measurement of ventilatory gas exchange amounts to a superior method to: 1) accurately quantify cardiorespiratory fitness (CRF), 2) delineate the physiologic system(s) underlying exercise responses, which can be applied as a means to identify the exercise-limiting pathophysiologic mechanism(s) and/or performance differences, and 3) formulate function-based prognostic stratification. Cardiopulmonary ET certainly carries an additional cost as well as competency requirements and is not an essential component of evaluation in all patient populations. However, there are several conditions of confirmed, suspected, or unknown etiology where the data gained from this form of ET is highly valuable in terms of clinical decision making
DOCUMENT
Background: A patient decision aid (PtDA) can support shared decision making (SDM) in preference-sensitive care, with more than one clinically applicable treatment option. The development of a PtDA is a complex process, involving several steps, such as designing, developing and testing the draft with all the stakeholders, known as alpha testing. This is followed by testing in ‘real life’ situations, known as beta testing, and then finalising the definite version. Our aim was developing and alpha testing a PtDA for primary treatment of early stage breast cancer, ensuring that the tool is considered relevant, valid and feasible by patients and professionals. Methods: Our qualitative descriptive study applied various methods including face-to-face think-aloud interviews, a focus group and semi-structured telephone interviews. The study population consisted of breast cancer patients facing the choice between breast-conserving therapy with or without preceding neo-adjuvant chemotherapy and mastectomy, and professionals involved in breast cancer care in dedicated multidisciplinary breast cancer teams. Results: A PtDA was developed in four iterative test rounds, taking nearly 2 years, involving 26 patients and 26 professionals. While the research group initially opted for simplicity for the sake of implementation, the clinicians objected that the complexity of the decision could not be ignored. Other topics of concern were the conflicting views of professionals and patients regarding side effects, the amount of information and how to present it. Conclusion: The development was an extensive process, because the professionals rejected the simplifications proposed by the research group. This resulted in the development of a completely new draft PtDA, which took double the expected time and resources. The final version of the PtDA appeared to be well-appreciated by professionals and patients, although its acceptability will only be proven in actual practice (beta testing)
DOCUMENT
From an evidence-based perspective, cardiopulmonary exercise testing (CPX) is a well-supported assessment technique in both the United States (US) and Europe. The combination of standard exercise testing (ET) [i.e. progressive exercise provocation in association with serial electrocardiograms (ECGs), haemodynamics, oxygen saturation, and subjective symptoms] and measurement of ventilatory gas exchange amounts to a superior method to: (i) accurately quantify cardiorespiratory fitness (CRF), (ii) delineate the physiologic system(s) underlying exercise responses, which can be applied as a means to identify the exercise-limiting pathophysiological mechanism(s) and/or performance differences, and (iii) formulate function-based prognostic stratification. Cardiopulmonary ET certainly carries an additional cost as well as competency requirements and is not an essential component of evaluation in all patient populations. However, there are several conditions of confirmed, suspected, or unknown aetiology where the data gained from this form of ET is highly valuable in terms of clinical decision making.1
DOCUMENT