E-discovery projects typically start with an assessment of the collected electronic data in order to estimate the risk to prosecute or defend a legal case. This is not a review task but is appropriately called early case assessment, which is better known as exploratory search in the information retrieval community. This paper first describes text mining methodologies that can be used for enhancing exploratory search. Based on these ideas we present a semantic search dashboard that includes entities that are relevant to investigators such as who knew who, what, where and when. We describe how this dashboard can be powered by results from our ongoing research in the “Semantic Search for E-Discovery” project on topic detection and clustering, semantic enrichment of user profiles, email recipient recommendation, expert finding and identity extraction from digital forensic evidence.
MULTIFILE
1e alinea column: De ontstellende hoeveelheid informatie en contactmogelijkheden op internet stelt ons voor de keuze wie we willen zijn en volgens welke waarden we willen leven. Waar Internet 1.0 nog vooral gezien kon worden als een grote database met Google als markt-hit, speelt in het semantic web sociale interactie een grote rol. In het semantic web kan alle data en dus bijvoorbeeld ook al uw berichtjes, profielgegevens, bestandjes en teksten en dat van anderen, nog gemakkelijker verspreid, gecombineerd, maar ook geanalyseerd en op maat worden gepresenteerd. Op iedere unieke vraag of zoekopdracht direct dus een uniek antwoord.
LINK
Corporate reputation is becoming increasingly important for firms; social media platforms such as Twitter are used to convey their message. In this paper, corporate reputation will be assessed from a sustainability perspective. Using sentiment analysis, the top 100 brands of the Netherlands were scraped and analyzed. The companies were registered in the sustainable industry classification system (SICS) to perform the analysis on an industry level. A semantic search tool called Open Semantic Desktop Search was used to filter through the data to find keywords related to sustainability and corporate reputation. Findings show that companies that tweet more often about corporate reputation and sustainability receive overall a more positive sentiment from the public.
DOCUMENT
We present our ongoing work on upgrading the Amsterdam Public Library's book database search capabilities. So far, users have had to input the exact book title and/or author name without any typos or misspellings in order to retrieve any results. This is in sharp contrast with the manner in which users typically use the interface: they frequently search for books on a particular topic, input the names of the characters, or even ask fully-fledged questions. The aim of this project is therefore to enable smart search in natural language based on book content. The initial focus is on the Dutch language, with the possibility of including English and other languages later. In the first phase of the project, we built a proof-of-concept knowledge graph from a sample of the existing tabular database and enriched the data with named entities extracted from book summaries. Based on this first step, a user query like "Heeft u boeken over de Tweede Wereldoorlog in Amsterdam?" would yield all books that mention both WW2 and Amsterdam. We are currently working on augmenting the knowledge graph with embeddings, which will enable us to retrieve semantically similar results. The final step of the research involves integrating our knowledge graph with a pre-trained large language model.
DOCUMENT
1e alinea column: De grote beweging via ketenomkering naar customer self care en bottom-up self assembled teaming is zich snel aan het voltrekken. De klant neemt het initiatief en Tofflers prosumership wordt zichtbaar. Het aantal business voorbeelden wordt snel groter, al gaat het om je auto zelf samenstellen, onderdelen bestellen, 3D printing, zelfroosteren, civil journalism, klanten die restaurants recenseren, tracking &tracing van de post, medische zorg. Neem Qlinx als open architectuur in combinatie met bijvoorbeeld Twitter, dat laat goed zien wat dit kan gaan betekenen voor de dynamiek op de arbeidsmarkt. Wolfram-alpha toont de potentie van het semantic web. In bijvoorbeeld Share2Start - power of the open mind zien we de kracht van crowdfunding en het begin van ‘financials 2.0’. Deze sites laten goed zien welke richting het uitgaat.
LINK
This chapter explores qualitative career assessment as an identity learning process where meaning-oriented learning is essential and distinguished from conditioned or semantic types of learning. In order to construct a career identity in the form of a future-oriented narrative, it is essential that learners are helped through cognitive learning stages with the help of a dialogue about concrete experiences which aims to pay attention to emotions and broadens and deepens what is expressed.
DOCUMENT
Metaphors are common phenomena intellectual capital and knowledge management theories and practice. An important question to ask is: what are the ‗best‘ metaphors we can use in our theorizing on intellectual capital and knowledge management? This paper addresses the question of the aptness of knowledge related metaphors. It concludes that the aptness of metaphorical expressions depends on three factors: the richness of the semantic field of the source domain, the validity of the mapping, and the ideological implications of the mapping. This conclusion results in a research agenda on the aptness of metaphor in knowledge management and intellectual capital theory and practice.
DOCUMENT
Preprint submitted to Information Processing & Management Tags are a convenient way to label resources on the web. An interesting question is whether one can determine the semantic meaning of tags in the absence of some predefined formal structure like a thesaurus. Many authors have used the usage data for tags to find their emergent semantics. Here, we argue that the semantics of tags can be captured by comparing the contexts in which tags appear. We give an approach to operationalizing this idea by defining what we call paradigmatic similarity: computing co-occurrence distributions of tags with tags in the same context, and comparing tags using information theoretic similarity measures of these distributions, mostly the Jensen-Shannon divergence. In experiments with three different tagged data collections we study its behavior and compare it to other distance measures. For some tasks, like terminology mapping or clustering, the paradigmatic similarity seems to give better results than similarity measures based on the co-occurrence of the documents or other resources that the tags are associated to. We argue that paradigmatic similarity, is superior to other distance measures, if agreement on topics (as opposed to style, register or language etc.), is the most important criterion, and the main differences between the tagged elements in the data set correspond to different topics
DOCUMENT
DOCUMENT
A common strategy to assign keywords to documents is to select the most appropriate words from the document text. One of the most important criteria for a word to be selected as keyword is its relevance for the text. The tf.idf score of a term is a widely used relevance measure. While easy to compute and giving quite satisfactory results, this measure does not take (semantic) relations between words into account. In this paper we study some alternative relevance measures that do use relations between words. They are computed by defining co-occurrence distributions for words and comparing these distributions with the document and the corpus distribution. We then evaluate keyword extraction algorithms defined by selecting different relevance measures. For two corpora of abstracts with manually assigned keywords, we compare manually extracted keywords with different automatically extracted ones. The results show that using word co-occurrence information can improve precision and recall over tf.idf.
DOCUMENT