Thank you to the over 140 undergraduates who applied to ODCSSS 2009 from around the world.
In late 2001, the Enron Corporation filed for bankruptcy protection in the southern district of New York, launching one of the most complex cases in in U.S. history. During these proceedings, it was revealed that the company had been largely sustained by systematic accounting fraud. The Federal Energy Regulatory Commission in its investigation of the company released a large collection of emails sent by 150 users in the company, who were mostly in senior management at the time of the case.
Frequently, the Enron email corpus has been analysed as a social network, where links represent messages sent from one employee to another. In this project, we would like to extend this analysis by applying both text mining and information visualisation techniques to the actual email message content. The primary objective here would be to gain a greater insight into the major themes discussed by Enron employees before and during the companyʼs collapse.
Outcomes:
• The development of a software tool to allow for the exploration of large collections of email from the perspective of message content. While the project will focus on the Enron corpus, ideally this tool could also be applied to explore themes in other email collections.
• The application of the software tool to investigate and interpret the relationships between message themes, with possible links to the major players in the Enron case as documented by the media at the time.
• The student will present the software tool and the results of the data investigation phase as a research presentation and/or poster at a relevant venue.
The project will touch on a number of core areas examined by the Clique Research Cluster: finding structure in social networks, visualising large networks, integrating information from different data sources. The student will collaborate with other postdocs and PhD students working in these areas.
Additional Information:
Enron collection
http://www.cs.cmu.edu/~enron/
K. Bryan, Y. Yang. “The Enron corpus: A new dataset for email classification research”. ECML 2004
http://nyc.lti.cs.cmu.edu/yiming/Publications/klimt-ecml04.pdf
Recent comments
1 year 14 weeks ago
1 year 14 weeks ago
2 years 2 weeks ago
2 years 2 weeks ago