This method paper presents a template solution for text mining of scientific literature using the R tm package. Literature to be analyzed can be collected manually or automatically using the code provided with this paper. Once the literature is collected, the three steps for conducting text mining can be performed as outlined below:• loading and cleaning of text from articles,• processing, statistical analysis, and clustering, and• presentation of results using generalized and tailor-made visualizations.The text mining steps can be applied to a single, multiple, or time series groups of documents.References are provided to three published peer reviewed articles that use the presented text mining methodology. The main advantages of our method are: (1) Its suitability for both research and educational purposes, (2) Compliance with the Findable Accessible Interoperable and Reproducible (FAIR) principles, and (3) code and example data are made available on GitHub under the open-source Apache V2 license.
Studying images in social media poses specific methodological challenges, which in turn have directed scholarly attention toward the computational interpretation of visual data. When analyzing large numbers of images, both traditional content analysis as well as cultural analytics have proven valuable. However, these techniques do not take into account the contextualization of images within a socio-technical environment. As the meaning of social media images is co-created by online publics, bound through networked practices, these visuals should be analyzed on the level of their networked contextualization. Although machine vision is increasingly adept at recognizing faces and features, its performance in grasping the meaning of social media images remains limited. Combining automated analyses of images with platform data opens up the possibility to study images in the context of their resonance within and across online discursive spaces. This article explores the capacities of hashtags and retweet counts to complement the automated assessment of social media images, doing justice to both the visual elements of an image and the contextual elements encoded through the hashtag practices of networked publics.
A common strategy to assign keywords to documents is to select the most appropriate words from the document text. One of the most important criteria for a word to be selected as keyword is its relevance for the text. The tf.idf score of a term is a widely used relevance measure. While easy to compute and giving quite satisfactory results, this measure does not take (semantic) relations between words into account.