The platform for open and practice-oriented research

product

Comparing student and expert-based tagging of recorded lectures

In this paper we analyse the way students tag recorded lectures. We compare their tagging strategy and the tags that they create with tagging done by an expert. We look at the quality of the tags students add, and we introduce a method of measuring how similar the tags are, using vector space modelling and cosine similarity. We show that the quality of tagging by students is high enough to be useful. We also show that there is no generic vocabulary gap between the expert and the students. Our study shows no statistically significant correlation between the tag similarity and the indicated interest in the course, the perceived importance of the course, the number of lectures attended, the indicated difficulty of the course, the number of recorded lectures viewed, the indicated ease of finding the needed parts of a recorded lecture, or the number of tags used by the student.

LINK

product

Prediction of Medical Outcomes with Modern Modelling Techniques

Het doel van dit onderzoek is te onderzoeken onder welke omstandigheden en onder welke condities relatief moderne modelleringstechnieken zoals support vector machines, neural networks en random forests voordelen zouden kunnen hebben in medisch-wetenschappelijk onderzoek en in de medische praktijk in vergelijking met meer traditionele modelleringstechnieken, zoals lineaire regressie, logistische regressie en Cox regressie.

MULTIFILE

Prediction of Medical Outcomes with Modern Modelling Techniques

product

A Comparison of Different Modeling Techniques in Predicting Mortality With the Tilburg Frailty Indicator

Background: Modern modeling techniques may potentially provide more accurate predictions of dichotomous outcomes than classical techniques. Objective: In this study, we aimed to examine the predictive performance of eight modeling techniques to predict mortality by frailty. Methods: We performed a longitudinal study with a 7-year follow-up. The sample consisted of 479 Dutch community-dwelling people, aged 75 years and older. Frailty was assessed with the Tilburg Frailty Indicator (TFI), a self-report questionnaire. This questionnaire consists of eight physical, four psychological, and three social frailty components. The municipality of Roosendaal, a city in the Netherlands, provided the mortality dates. We compared modeling techniques, such as support vector machine (SVM), neural network (NN), random forest, and least absolute shrinkage and selection operator, as well as classical techniques, such as logistic regression, two Bayesian networks, and recursive partitioning (RP). The area under the receiver operating characteristic curve (AUROC) indicated the performance of the models. The models were validated using bootstrapping. Results: We found that the NN model had the best validated performance (AUROC=0.812), followed by the SVM model (AUROC=0.705). The other models had validated AUROC values below 0.700. The RP model had the lowest validated AUROC (0.605). The NN model had the highest optimism (0.156). The predictor variable “difficulty in walking” was important for all models. Conclusions: Because of the high optimism of the NN model, we prefer the SVM model for predicting mortality among community-dwelling older people using the TFI, with the addition of “gender” and “age” variables. External validation is a necessary step before applying the prediction models in a new setting.

DOCUMENT

A Comparison of Different Modeling Techniques in Predicting Mortality With the Tilburg Frailty Indicator

product

External Validation of Models for Predicting Disability in Community-Dwelling Older People in the Netherlands

Background: Advanced statistical modeling techniques may help predict health outcomes. However, it is not the case that these modeling techniques always outperform traditional techniques such as regression techniques. In this study, external validation was carried out for five modeling strategies for the prediction of the disability of community-dwelling older people in the Netherlands. Methods: We analyzed data from five studies consisting of community-dwelling older people in the Netherlands. For the prediction of the total disability score as measured with the Groningen Activity Restriction Scale (GARS), we used fourteen predictors as measured with the Tilburg Frailty Indicator (TFI). Both the TFI and the GARS are self-report questionnaires. For the modeling, five statistical modeling techniques were evaluated: general linear model (GLM), support vector machine (SVM), neural net (NN), recursive partitioning (RP), and random forest (RF). Each model was developed on one of the five data sets and then applied to each of the four remaining data sets. We assessed the performance of the models with calibration characteristics, the correlation coefficient, and the root of the mean squared error. Results: The models GLM, SVM, RP, and RF showed satisfactory performance characteristics when validated on the validation data sets. All models showed poor performance characteristics for the deviating data set both for development and validation due to the deviating baseline characteristics compared to those of the other data sets. Conclusion: The performance of four models (GLM, SVM, RP, RF) on the development data sets was satisfactory. This was also the case for the validation data sets, except when these models were developed on the deviating data set. The NN models showed a much worse performance on the validation data sets than on the development data sets.

DOCUMENT

External Validation of Models for Predicting Disability in Community-Dwelling Older People in the Netherlands

product

Extendable linearised adjustment model for deformation analysis

Author supplied: "This paper gives a linearised adjustment model for the affine, similarity and congruence transformations in 3D that is easily extendable with other parameters to describe deformations. The model considers all coordinates stochastic. Full positive semi-definite covariance matrices and correlation between epochs can be handled. The determination of transformation parameters between two or more coordinate sets, determined by geodetic monitoring measurements, can be handled as a least squares adjustment problem. It can be solved without linearisation of the functional model, if it concerns an affine, similarity or congruence transformation in one-, two- or three-dimensional space. If the functional model describes more than such a transformation, it is hardly ever possible to find a direct solution for the transformation parameters. Linearisation of the functional model and applying least squares formulas is then an appropriate mode of working. The adjustment model is given as a model of observation equations with constraints on the parameters. The starting point is the affine transformation, whose parameters are constrained to get the parameters of the similarity or congruence transformation. In this way the use of Euler angles is avoided. Because the model is linearised, iteration is necessary to get the final solution. In each iteration step approximate coordinates are necessary that fulfil the constraints. For the affine transformation it is easy to get approximate coordinates. For the similarity and congruence transformation the approximate coordinates have to comply to constraints. To achieve this, use is made of the singular value decomposition of the rotation matrix. To show the effectiveness of the proposed adjustment model total station measurements in two epochs of monitored buildings are analysed. Coordinate sets with full, rank deficient covariance matrices are determined from the measurements and adjusted with the proposed model. Testing the adjustment for deformations results in detection of the simulated deformations."

MULTIFILE

Extendable linearised adjustment model for deformation analysis

product

Time Series Analysis of 3D Coordinates Using Nonstochastic Observations

From the article: Abstract Adjustment and testing of a combination of stochastic and nonstochastic observations is applied to the deformation analysis of a time series of 3D coordinates. Nonstochastic observations are constant values that are treated as if they were observations. They are used to formulate constraints on the unknown parameters of the adjustment problem. Thus they describe deformation patterns. If deformation is absent, the epochs of the time series are supposed to be related via affine, similarity or congruence transformations. S-basis invariant testing of deformation patterns is treated. The model is experimentally validated by showing the procedure for a point set of 3D coordinates, determined from total station measurements during five epochs. The modelling of two patterns, the movement of just one point in several epochs, and of several points, is shown. Full, rank deficient covariance matrices of the 3D coordinates, resulting from free network adjustments of the total station measurements of each epoch, are used in the analysis.

MULTIFILE

product

Improving Routine Immunization Coverage Through Optimally Designed Predictive Models

Routine immunization (RI) of children is the most effective and timely public health intervention for decreasing child mortality rates around the globe. Pakistan being a low-and-middle-income-country (LMIC) has one of the highest child mortality rates in the world occurring mainly due to vaccine-preventable diseases (VPDs). For improving RI coverage, a critical need is to establish potential RI defaulters at an early stage, so that appropriate interventions can be targeted towards such population who are identified to be at risk of missing on their scheduled vaccine uptakes. In this paper, a machine learning (ML) based predictive model has been proposed to predict defaulting and non-defaulting children on upcoming immunization visits and examine the effect of its underlying contributing factors. The predictive model uses data obtained from Paigham-e-Sehat study having immunization records of 3,113 children. The design of predictive model is based on obtaining optimal results across accuracy, specificity, and sensitivity, to ensure model outcomes remain practically relevant to the problem addressed. Further optimization of predictive model is obtained through selection of significant features and removing data bias. Nine machine learning algorithms were applied for prediction of defaulting children for the next immunization visit. The results showed that the random forest model achieves the optimal accuracy of 81.9% with 83.6% sensitivity and 80.3% specificity. The main determinants of vaccination coverage were found to be vaccine coverage at birth, parental education, and socio-economic conditions of the defaulting group. This information can assist relevant policy makers to take proactive and effective measures for developing evidence based targeted and timely interventions for defaulting children.

MULTIFILE

Improving Routine Immunization Coverage Through Optimally Designed Predictive Models

product

Take out what you can

The goal of this study was therefore to test the idea that computationally analysing the Fontys National Student Surveys (NSS) open answers using a selection of standard text mining methods (Manning & Schütze 1999) will increase the value of these answers for educational quality assurance. It is expected that human effort and time of analysis will decrease significally. The text data (in Dutch) of several years of Fontys National Student Surveys (2013-2018) was provided to Fontys students of the minor Applied Data Science. The results of the analysis were to include topic and sentiment modelling across multiple years of survey data. Comparing multiple years was necessary to capture and visualize any trends that a human investigator may have missed while analysing the data by hand. During data cleaning all stop words and punctuation were removed, all text was brought to a lower case, names and inappropriate language – such as swear words – were deleted. About 80% of 24.000 records were manually labelled with sentiment; reminder was used for algorithms’ validation. In the following step a machine learning analysis steps: training, testing, outcomes analysis and visualisation, for a better text comprehension, were executed. Students aimed to improve classification accuracy by applying multiple sentiment analysis algorithms and topics modelling methods. The models were chosen arbitrarily, with a preference for a low complexity of a model. For reproducibility of our study open source tooling was used. One of these tools was based on Latent Dirichlet allocation (LDA). LDA is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar (Blei, Ng & Jordan, 2003). For topic modelling the Gensim (Řehůřek, 2011) method was used. Gensim is an open-source vector space modelling and topic modelling toolkit implemented in Python. In addition, we recognized the absence of pretrained models for Dutch language. To complete our prototype a simple user interface was created in Python. This final step integrated our automated text analysis with visualisations of sentiments and topics. Remarkably, all extracted topics are related to themes defined by the NSS. This indicates that in general students’ answers are related to topics of interest for educational institutions. The extracted list of the words related to the topic is also relevant to this topic. Despite the fact that most of the results require further human expert interpretation, it is indicative to conclude that the computational analysis of the texts from the open questions of the NSS contain information which enriches the results of standard quantitative analysis of the NSS.

DOCUMENT

product

Towards a Culture of Regional Design

There has probably never been such an intense debate about the layout of the countryside as the one that is currently raging. There are serious concerns about the landscape, which is being rapidly transformed by urbanization and everything associated with this process, and not only in the Netherlands but also far beyond its borders. Everyone has something to say in this society-wide debate, from local to national governments, from environmental factions to the road-user's lobby, and from those who are professionally involved to concerned private parties. In many cases it is a battle between idealized images and economic models, between agricultural reality and urban park landscapes, between ecological concerns and mobility. This issue of OASE explores the potential significance of architectonic design for transformation processes on the regional scale. Besides considering the instruments that are available to the designer to fulfil this task, the authors also consider how the design can exercise a 'positive' influence on such processes. The various contributions shed light on the potential significance of territory in contemporary design practice and offer critical reflection on the topical discourse that has evolved over recent years.

DOCUMENT