News

Thank you to the over 140 undergraduates who applied to ODCSSS 2009 from around the world.

ActiveGraph: accessing and manipulating large graphs in a database

Odysseus: 
2010

 

How do you define a “large” dataset? One with thousands of entities? ... millions of entities? ... billions of entities? Perhaps, a large dataset is one that cannot be comfortably loaded into the main memory of your computer but must reside in external memory (e.g. a hard drive) and can only be examined in parts. Social networks (e.g. friend links on Facebook), mobile phone networks, and the web graph certainly meet this criterion. The sheer size of the datasets make them difficult to work with.

 

The ActiveRecord design pattern [1], first proposed by Martin Fowler [2], ties database tables and classes together. It allows a programmer to find, create, update, save, and destroy business objects like "Customer" and "Product" in a database without having to resort to (the tedium of) SQL. It is an important part of the popular Ruby on Rails (RoR) open source web application framework [3] and greatly simplifies the development process.

This project will use the ActiveRecord design pattern to access and manipulate large graphs stored in a database without having to load them into main memory. By graph, we mean a set of nodes and edges that can represent a social network, a mobile phone network, a function call graph, etc.

This will allow any programmer to use the same interface to interact with a graph in external memory as he or she would use with a graph in main memory.

Outcomes:
• The student will use the ActiveRecord design pattern to create an "ActiveGraph"; a model of a graph that is tied to a database schema. The advantage of this is that a programmer can access
and manipulate a large graph stored in a database without having to worry about keeping the inmemory model and the database model synchronised. They can focus on the graph itself and not
on the mappings between the two models.
• The student will also enhance the ActiveGraph model so that a programmer can access and manipulate a large graph stored in a database without having the load the entire graph into memory. In this case, there is no distinction between the in-memory model and the database model - everything is located in the database all of the time. Such an implementation would allow a programmer to handle graphs that are too large to fit in memory without making any special changes to their code.
• The student will present the software through a research presentation and/or poster.

Relevance of the Project to Current Research Theme:
The Clique Research Cluster studies many datasets that are extremely large. This project will provide an interface to accessing these datasets without having to load the entire datasets into main memory. The student will collaborate with other postdocs and PhD students working in this area.


 

Supervisors and Mentors: 
Prof. Pádraig Cunningham
Martin Harrigan, Daniel Archambault
Derek Greene
Host: 
UCD