There is emerging evidence that the performance of risk assessment instruments is weaker when used for clinical decision‐making than for research purposes. For instance, research has found lower agreement between evaluators when the risk assessments are conducted during routine practice. We examined the field interrater reliability of the Short‐Term Assessment of Risk and Treatability: Adolescent Version (START:AV). Clinicians in a Dutch secure youth care facility completed START:AV assessments as part of the treatment routine. Consistent with previous literature, interrater reliability of the items and total scores was lower than previously reported in non‐field studies. Nevertheless, moderate to good interrater reliability was found for final risk judgments on most adverse outcomes. Field studies provide insights into the actual performance of structured risk assessment in real‐world settings, exposing factors that affect reliability. This information is relevant for those who wish to implement structured risk assessment with a level of reliability that is defensible considering the high stakes.
DOCUMENT
Caregivers of persons with profound intellectual and multiple disabilities (PIMD) often describe the quality of the daily movements of these persons in terms of flexibility or stiffness. Objective outcome measures for flexibility and stiffness are muscle tone or level of spasticity. Two instruments used to grade muscle tone and spasticity are the Modified Ashworth Scale (MAS) and the Modified Tardieu Scale (MTS). To date, however, no research has been performed to determine the psychometric properties of the MAS and MTS in persons with PIMD. Therefore, the purpose of this study was to determine the feasibility, test-retest reliability, and interrater reliability of the MAS and MTS in persons with PIMD. We assessed 35 participants on the MAS and MTS twice, first for the test and second a week later for the retest. Two observers performed the measurements. Feasibility was assessed based on the percentage of successful measurements. Test-retest and interrater reliability were determined by using the Wilcoxon signed rank test, intraclass correlation coefficients (ICC), Spearman's correlation, and either limits of agreement (LOA) or quadratically weighted kappa. The feasibility of the measurements was good, because an acceptable percentage of successful measurements were performed. MAS measurements had substantial to almost perfect quadratically weighted kappa (>0.8) and an acceptable ICC (>0.8) for both inter- and intrarater reliability. However, MTS measurements had insufficient ICCs, Spearman's correlations, and LOAs for both inter- and interrater reliability. Our data indicated that the feasibility of the MAS and MTS for measuring muscle tone in persons with PIMD was good. The MAS had sufficient test-retest and interrater reliability; however, the MTS had an insufficient test-retest and interrater reliability in persons with PIMD. Thus, the MAS may be a good method for evaluating the quality of daily movements in persons with PIMD. Providing test administrators with training and clear instructions will improve test reliability.
DOCUMENT
The Nociception Coma Scale (NCS) is a pain observation tool, developed for patients with disorders of consciousness (DOC) due to acquired brain injury (ABI). The aim of this study was to assess the interrater reliability of the NCS and NCS-R among nurses for the assessment of pain in ABI patients with DOC. A secondary aim was further validation of both scales by assessing its discriminating abilities for the presence or absence of pain. Hospitalized patients with ABI (n = 10) were recorded on film during three conditions: baseline, after tactile stimulation, and after noxious stimulation. All stimulations were part of daily treatment for these patients. The 30 recordings were assessed with the NCS and NCS-R by 27 nurses from three university hospitals in the Netherlands. Each nurse viewed 9 to 12 recordings, totaling 270 assessments. Interrater reliability of the NCS/NCS-R items and total scores were estimated by intraclass correlations (ICC), which showed excellent and equal average measures reliability for the NCS and NCR-R total scores (ICC 0.95), and item scores (range 0.87-0.95). Secondary analysis was performed to assess differences in ICCs among nurses' education and experience and to assess the scales discriminating properties for the presence of pain. The NCS and NCS-R are valid and reproducible scales that can be used by nurses with an associate (of science) in nursing degree or baccalaureate (of science) in nursing degree. It seems that more experience with ABI patients is not a predictor for good agreement in the assessment of the NCS(-R).
DOCUMENT
Purpose: The main purpose of the research was to measure reliability and validity of the Scoring Rubric for Information Literacy (Van Helvoort, 2010). Design/methodology/approach: Percentages of agreement and Intraclass Correlation were used to describe interrater reliability. For the determination of construct validity, factor analysis and reliability analysis were used. Criterion validity was calculated with Pearson correlations. Findings: In the described case, the Scoring Rubric for Information Literacy appears to be a reliable and valid instrument for the assessment of information literate performance. Originality/value: Reliability and validity are prerequisites to recommend a rubric for application. The results confirm that this Scoring Rubric for Information Literacy can be used in courses in higher education, not only for assessment purposes but also to foster learning. Oorspronkelijke artikel bij Emerald te vinden bij http://dx.doi.org/10.1108/JD-05-2016-0066
MULTIFILE
Poster presentatie op conferentie Background: Assessments of functional communication skills of children with cerebral palsy (CP), classified with the Communication Function Classification System (CFCS), often differ between the child's school teacher and the speech language therapist (SLT). Assessment by the SLT is usually based on observations in a clinical setting, which may not be representative of the functional communication skills in daily life. This study evaluated the inter-rater agreement of the CFCS assessed by the school teacher and SLT before and after observation of a communicative situation in the classroom. Methods: Functional communication of 35 children with CP (4 to 18 years; 26 with Alternative and Augmentative Communication, AAC) was classified by the own SLT and teacher using the CFCS. SLT's performed two assessments: the first without additional instructions and the second after observation of the child during a communicative situation in the classroom. For both assessments of the SLT inter-rater agreement on CFCS-level between SLT and teacher was determined using Cohen's weighted kappa statistics. Results: For the whole group, inter-rater reliability was 0.6 before observation in de classroom and 0.7 after observation. In the group without AAC weighted K was 0.67 for both assessments. In the group with AAC weighted K increased from 0.2 to 0.61. Interpretation The increased inter-rater agreement of CFCS classification between teacher and SLT after observation in the classroom, especially for children with AAC, emphasizes the need for professionals to base their CFCS assessment on observation of functional communication in everyday situations.
MULTIFILE
In research methodology, epistemology is concerned with the question how humans generate knowledge. In facility management (FM) research, for instance, it deals with the evaluation criteria such as validity and reliability by which researchers discriminate good knowledge from bad. The objective of this paper is to add to the scholarly methodological aspects in FM research. The paper takes a postpositivist stance and pre-supposes that scholars are able to discover what happens in FM through the categorization and scientific measurement of affective responses. It applies a method by which scholars are able to develop good knowledge and by which talented bachelor students are involved in FM research.In this study 26 semi-structured interviews were conducted at nine different organizations in the Netherlands. Interviews, which focused on office environments and productivity, were conducted in pairs by Honours students. This paper reports on methodological issues of this study. Data collection and analysis by different researchers revealed serious threats to validity and reliability. Consequently an interrater agreement (IRA), measuring the degree of agreement between raters, was introduced to reveal and overcome differences in interpretations.In this paper the difficulties of achieving good agreement were considered. Adjustment between raters and clear demarcation of constructs are necessary. A synopsis of usage and reporting of qualitative interview approaches is shown.
DOCUMENT
Purpose: The aim of this study is to measure the concurrent validity of the Athletic Skills Track (AST) by examining whether its outcome score correlates with the holistic judgments of experts about the quality of movement. Method: Video recordings of children performing the AST were shown to physical education teachers who independently gave a holistic rating of the movement quality of each child. Results: Both intra- and interrater reliability of the teachers’ ratings were moderate to good. The holistic judgments on movement quality were significantly correlated with AST time, showing that higher ratings were associated with less time required to complete the track. Next, hierarchical stepwise regression indicated that in addition to the holistic rating, also age, but not gender, explained part of the variance in AST time. Conclusion: The findings show that the AST has good concurrent validity and provides a fast, indirect indication for quality of movement.
DOCUMENT
Objectives: To develop an instrument to measure adherence to frequency, intensity, and quality of performance of home-based exercise (HBE) programs recommended by a physical therapist and to evaluate its construct validity and reliability in patients with low back pain. Methods: The Exercise Adherence Scale (EXAS) was developed following a literature search, an expert panel review, and a pilot test. The construct validity of the EXAS was determined based on data from 27 participants through an investigation of the convergent validity between adherence, lack of time to exercise, and lack of motivation to exercise. Associations between adherence, pain, and disability were determined to test divergent validity. The reliability of the EXAS quality of performance score was assessed using video recordings from 50 participants performing four exercises. Results: Correlations between the EXAS and lack of time to exercise, lack of motivation to exercise, pain, and disability were rho = 0.47, rho = 0.48, rho = 0.005, and rho = 0.24, respectively. The intrarater reliability of the quality of performance score was Kappa quadratic weights (Kqw) = 0.87 (95%-CI 0.83–0.92). The interrater reliability was Kqw = 0.36 (95%-CI 0.27–0.45). Conclusions: The EXAS demonstrates acceptable construct validity for the measurement of adherence to HBE programs. Additionally, the EXAS shows excellent intrarater reliability and poor interrater reliability for the quality of performance score and is the first instrument to measure adherence to frequency, intensity, and quality of performance of HBE programs. The EXAS allows researchers and clinicians to better investigate the effects of adherence to HBE programs on the outcomes of interventions and treatments.
LINK
OBJECTIVE: PRECIS - 2 is a tool that could improve design insight for trialists. Our aim was to validate the PRECIS - 2 tool, unlike its predecessor, testing the discriminant validity and inter-rater reliability.STUDY DESIGN AND SETTING: Over 80 international trialists, methodologists, clinicians and policymakers created PRECIS - 2 helping to ensure face and content validity. The inter-rater reliability of PREC IS - 2 was measured using 19 experienced trialists who used PRECIS - 2 to score a diverse sample of 15 RCT protocols. Discriminant validity was tested with two raters to independently determine if the trial protocols were more pragmatic or more explanatory, with scores from the 19 raters for the 15 trials as predictors of pragmatism.RESULTS: Inter-rater reliability was generally good, with seven out of nine domains having an ICC over 0.65. Flexibility (Adherence) and Recruitment had wide confidence intervals but raters found these difficult to rate and wanted more information. Each of the nine PRECIS - 2 domains could be used to differentiate between trials taking more pragmatic or more explanatory approaches with better than chance discrimination for all domains.CONCLUSION: We have assessed the validity and reliability of PRECIS - 2. An elaboration paper and website provide guidance to help future users of the tool which is continuing to be tested by trial teams, systematic reviewers and funders.
DOCUMENT
BackgroundPatients undergoing total knee arthroplasty (TKA) often experience strength deficits both pre- and post-operatively. As these deficits may have a direct impact on functional recovery, strength assessment should be performed in this patient population. For these assessments, reliable measurements should be used. This study aimed to determine the inter- and intrarater reliability of hand-held dynamometry (HHD) in measuring isometric knee strength in patients awaiting TKA.MethodsTo determine interrater reliability, 32 patients (81.3% female) were assessed by two examiners. Patients were assessed consecutively by both examiners on the same individual test dates. To determine intrarater reliability, a subgroup (n = 13) was again assessed by the examiners within four weeks of the initial testing procedure. Maximal isometric knee flexor and extensor strength were tested using a modified Citec hand-held dynamometer. Both the affected and unaffected knee were tested. Reliability was assessed using the Intraclass Correlation Coefficient (ICC). In addition, the Standard Error of Measurement (SEM) and the Smallest Detectable Difference (SDD) were used to determine reliability.ResultsIn both the affected and unaffected knee, the inter- and intrarater reliability were good for knee flexors (ICC range 0.76-0.94) and excellent for knee extensors (ICC range 0.92-0.97). However, measurement error was high, displaying SDD ranges between 21.7% and 36.2% for interrater reliability and between 19.0% and 57.5% for intrarater reliability. Overall, measurement error was higher for the knee flexors than for the knee extensors.ConclusionsModified HHD appears to be a reliable strength measure, producing good to excellent ICC values for both inter- and intrarater reliability in a group of TKA patients. High SEM and SDD values, however, indicate high measurement error for individual measures. This study demonstrates that a modified HHD is appropriate to evaluate knee strength changes in TKA patient groups. However, it also demonstrates that modified HHD is not suitable to measure individual strength changes. The use of modified HHD is, therefore, not advised for use in a clinical setting.
MULTIFILE