Preprint submitted to Information Processing & Management Tags are a convenient way to label resources on the web. An interesting question is whether one can determine the semantic meaning of tags in the absence of some predefined formal structure like a thesaurus. Many authors have used the usage data for tags to find their emergent semantics. Here, we argue that the semantics of tags can be captured by comparing the contexts in which tags appear. We give an approach to operationalizing this idea by defining what we call paradigmatic similarity: computing co-occurrence distributions of tags with tags in the same context, and comparing tags using information theoretic similarity measures of these distributions, mostly the Jensen-Shannon divergence. In experiments with three different tagged data collections we study its behavior and compare it to other distance measures. For some tasks, like terminology mapping or clustering, the paradigmatic similarity seems to give better results than similarity measures based on the co-occurrence of the documents or other resources that the tags are associated to. We argue that paradigmatic similarity, is superior to other distance measures, if agreement on topics (as opposed to style, register or language etc.), is the most important criterion, and the main differences between the tagged elements in the data set correspond to different topics
DOCUMENT
Privacy, copyright, classified documents and state secrets, but also spontaneous network phenomena like flash mobs and hashtag revolutions, reveal one thing – we lost control over the digital world. We experience a digital tailspin, or as Michael Seemann calls it in this essay: a loss of control or Kontrollverlust. Data we never knew existed is finding paths that were not intended and reveals information that we would never have thought of on our own. Traditional institutions and concepts of freedom are threatened by this digital tailspin. But that doesn’t mean we are lost. A new game emerges, where a different set of rules applies. To take part, we need to embrace a new way of thinking and a radical new ethics – we need to search for freedom in completely different places. While the Old Game depended upon top-down hierarchies and a trust in the protective power of state justice systems, the New Game asks you to let go of all these certainties. Strategies to play the game of digital tailspin rely on flexibility, openness, transparency and what is dubbed ‘antifragility’. In Digital Tailspin: Ten Rules for the Internet After Snowden Michael Seemann examines which strategies are most appropriate in the New Game and why.
DOCUMENT