A common strategy to assign keywords to documents is to select the most appropriate words from the document text. One of the most important criteria for a word to be selected as keyword is its relevance for the text. The tf.idf score of a term is a widely used relevance measure. While easy to compute and giving quite satisfactory results, this measure does not take (semantic) relations between words into account. In this paper we study some alternative relevance measures that do use relations between words. They are computed by defining co-occurrence distributions for words and comparing these distributions with the document and the corpus distribution. We then evaluate keyword extraction algorithms defined by selecting different relevance measures. For two corpora of abstracts with manually assigned keywords, we compare manually extracted keywords with different automatically extracted ones. The results show that using word co-occurrence information can improve precision and recall over tf.idf.
In this paper we describe our work in progress on the development of a set of criteria to predict text difficulty in Sign Language of the Netherlands (NGT). These texts are used in a four year bachelor program, which is being brought in line with the Common European Framework of Reference for Languages (Council of Europe, 2001). Production and interaction proficiency are assessed through the NGT Functional Assessment instrument, adapted from the Sign Language Proficiency Interview (Caccamise & Samar, 2009). With this test we were able to determine that after one year of NGT-study students produce NGT at CEFR-level A2, after two years they sign at level B1, and after four years they are proficient in NGT on CEFR-level B2. As a result of that we were able to identify NGT texts that were matched to the level of students at certain stages in their studies with a CEFR-level. These texts were then analysed for sign familiarity, morpheme-sign rate, use of space and use of non-manual signals. All of these elements appear to be relevant for the determination of a good alignment between the difficulty of NGT signed texts and the targeted CEFR level, although only the morpheme-sign rate appears to be a decisive indicator
Background to the problem Dutch society demonstrates a development which is apparent in many societies in the 21st century; it is becoming ethnically heterogeneous. This means that children who are secondlanguage speakers of Dutch are learning English, a core curriculum subject, through the medium of the Dutch language. Research questions What are the consequences of this for the individual learner and the class situation?Is a bi-lingual background a help or a hindrance when acquiring further language competences. Does the home situation facilitate or impede the learner? Additionally, how should the TEFL professional respond to this situation in terms of methodology, use of the Dutch language, subject matter and assessment? Method of approach A group of ethnic minority students at Fontys University of Professional Education was interviewed. The interviews were subjected to qualitative analysis. To ensure triangulation lecturers involved in teaching English at F.U.P.E. were asked to fill in a questionnaire on their teaching approach to Dutch second language English learners. Thier response was quantitatively and qualitatively analysed. Findings and conclusions The students encountered surprisingly few problems. Their bi-lingualism and home situation were not a constraint in their English language development. TEFL professionals should bear the heterogeneous classroom in mind when developing courses and lesson material. The introduction to English at primary school level and the assessment of DL2 learners require further research.