(Research project, suitable for
PhD and
MPhil or
Honours)
Increasingly, people communicate with each other using
electronic techniques such as e-mails, SMS, or discussion forums,
bulletin boards or chat rooms. So far, only limited research has
been done in trying to find patterns in such online discussions.
This project aims to investigate data mining techniques in order
to find patterns in publicly available online discussion forums
(for example forums that discuss new movies, DVDs, games, music,
or electronic products). The challenges with such data is that
participants often use nick names, typos and abbreviations are
common, as are slang expressions and emoticons, like ;-) or ;-(.
Questions that this research could address are: Who are the
participants in an online discussion? What topic are they
discussing? Can we find new trends being discussed? Who is
starting new trends? How are the participants interacting? When
and how long are participants online?
This research is fairly open and involves many challenges,
including: extracting the participants and what they write;
extracting the conversations in a discussion forum (there might
be several discussions going on at the same time); or finding the
topics of a discussion by using external data (e.g. by querying a
search engine). The research therefore involves techniques such
as entity resolution, link analysis, time series analysis, text
mining, and information retrieval.