Odysseus 2008

For ODCSSS 2008 our theme was, The Global Family; The Global Workplace - "Technologies for Social Connectedness". We had 16 students in 2008 working under this theme from around the world.

Project 0406-ucd: mining semantic relations from wikipedia

Wikipedia is a remarkable new resource that has the potential to be exploited as a semi-structured knowledge source for research in Natural Language Processing. As an open-source encyclopaedia that is maintained by an army of dedicated volunteers, it represents a constantly growing repository of cultural knowledge, most of it authoritative. The cross-referential topology of inter-term connections in Wikipedia is a valuable knowledge source in itself. However, a lexico-semantic analysis of the textual connections between Wikipedia articles (and thus, concepts), would allow rich knowledge structures to additionally be automatically extracted. Consider the concept-term anarchist: Wikipedia links this term to anarchy, anarchism, socialism, communism, capitalism and democracy. Each of these represents a different belief system, and of course, anarchist has a different semantic relation to each (e.g., anarchists practice anarchism and nurture a state of anarchy; anarchism is in turn an extreme example of democracy, and so on). To understand these connections semantically, we need access to more than the topology of Wikipedia; we need to understand the semantics of the textual references themselves, in the context of particular sentences. Work at DCU by Dr. Josef Van Genabith and his team, on the annotation of functional structural roles in text, will help greatly in this task. We intend to exploit a research student to apply these techniques toWikipedia; the application will not be trivial, as we hope to guide the process via a knowledge of the link topology of Wikipedia. At UCD we have several research students engaged in the analysis of Wikipedia for NLP ends; this research student will thus work as part of this team.

Relevance of Project to the Host Laboratories:

In the UCD Creative Languages Systems Group we are currently exploiting the topology of cross-referencing in Wikipedia in a variety of ways. One such application is to the semi-automated construction of ontologies for Natural Language Processing that are capable of supporting metaphor, analogy and metonymy (the cross-referential structure, through which related concepts are explicitly connected, is particularly apropos to the latter).

Supervisors:

Dr. Tony Veale (Computer Science and Informatics, UCD)

 

Keywords:

Wikipedia, ontologies; WordNet; metaphor; analogy; text parsing