Knowledge Discovery

About Knowledge Discovery

Nearly ever action that we take results in digital data being generated, and as a result, each of us gives rise to gigabytes of digital trail every year. For example, when we purchase music on-line, our genre preferences are registered; when we use our credit card, the stores we frequent are identified and the items we purchase are tabulated; when we post a message on a blog or to twitter, our language and sentiment and time-of-day preferences are recorded; and when we walk around, our mobile phone tracks our location, and traces our movements.

There is enormously useful information that can be extracted from this ‘big data’ mountain, to make use of a phrase that is becoming increasingly prevalent as more and more electronic records are accumulated. Our overarching purpose in this research group is to develop efficient and effective techniques for distilling information from data, with the goal of improving future interactions and anticipating societal needs. For example, by identifying trends and relationships in medical data it will be possible to improve the accuracy with which medical diagnoses are made and provide earlier and less invasive interventions; by monitoring vehicle movement and speeds, future traffic and road needs can be better predicted, and better use made of existing resources; and by developing better techniques for processing human language, we will be able to provide more intuitive interfaces to software, and more precise on-line search services.

The themes listed below encompass these ideas, plus many more. We invite you to read these descriptions, and if you are interested, come and join us.

Research Themes

Algorithms for Collaborative Micro-Navigation Based on Spatio-Temporal Data Management and Data Mining

Contact

Prof Rao Kotagiri

Project Description

Traffic congestion is a key challenge in modern society with an estimated cost of $20 billion for Australia’s economy by 2020. This proposal aims to develop collaborative data management and mining algorithms that will utilize real-time data of connected cars and traffic data centers for continuously optimizing a road user's route in terms of travel time and fuel consumption. The outcome will be effective micro-navigation techniques that will integrate local information of neighboring cars to provide directions on a much finer level of precision. Our spatio-temporal algorithms are able to advise which lanes, positions and actions are optimal to reach a destination safely in minimal time, and pave the way for future automated transport.

Effective and Easy-to-Implement Methods for Supporting Efficient Querying and Mining of Spatio-Temporal Data

Contact

Dr Rui Zhang

Project Description

Querying and mining large scale spatio-temporal datasets is critical for the success of many important new generation applications including digital maps, location based social networks and services, and 3D molecular-shape-comparison based drug discovery. However, the lack of an efficient and easily implementable spatial data retrieval method has seriously hindered the ability of businesses and medical research institutions to efficiently browse, explore and mine their large scale spatial datasets. This project will develop novel spatial data retrieval methods that are both highly efficient and easy to implement so that even small/medium sized businesses and research institutions can rapidly adopt them at low cost.

Efficient String Processing With Succinct Data Structures

Contact

Prof Alistair Moffat

Project Description

Pattern matching, string search, and bag-of-words and phrase-based information retrieval remain critically important computational tasks, with applications ranging form whole-of-web search through to manipulation of genomic data. In this project we are investigating approaches to these problems that make use of compressed and succinct data structures, with particular attention to memory hierarchies, and the balance between sequential and random access computations. Our goal is to devise improved approaches that provide better tradeoffs between speed, space, and answer quality; and hence allow broader implementation choices in particular application domains.

Efficient Synchronization of Large Repositories

Contact

Dr Tony Wirth

Project Description

Repositories are often replicated, or partially, replicated at multiple sites. Collections of source code and executables, or genetic databases, are ready examples. In this project, we aim to explore collection-based compression, exploiting similarities between new files and possibly several existing files. We expect to discover new search techniques to find similar files; new compression techniques, based on file differences; and new codings to represent compressed

Electronic Voting for Australian Elections

Contact

Dr Vanessa Teague

Project Description

Voting by computer sounds like a great step forward for convenience and accessibility. Everyone knows that computers are good at adding up lots of numbers quickly and accurately, so what could possibly go wrong? In fact, achieving adequate security for voting is extremely challenging, because voting has special security requirements -- verifiability, everyone should get good evidence that the system got the right answer; and privacy; it should be very difficult to find out how someone voted, or even to prove how you voted. This project uses advanced cryptography to design extremely secure computerized voting systems that achieve good privacy and transparency, even compared to traditional pencil-and-paper systems. The aim is to find the best combination of the convenience of modern ICT and the verifiability of a human-readable permanent record of the vote.

Information Processing in Wireless Sensor Networks

Contact

Prof Chris Leckie

Project Description

Our group is developing energy-efficient algorithms for in-network processing of sensor measurements in heterogeneous sensor networks. In particular, we interested in the challenges of detecting unusual or abnormal events from sensor data. Areas of interest include distributed event detection, spatio-temporal inference, privacy-preserving aggregation. Application areas include environmental monitoring, air quality monitoring, biomedical monitoring.

Language Technologies

Contact

Prof Tim Baldwin or Assoc Prof Steven Bird

Most human knowledge and human communication is represented and expressed using language, both in written and spoken forms. Language technologies permit computers to process human language, providing more natural human-machine interfaces, and more sophisticated access to stored information. Language technologies play a central role in the multilingual information society of the future. The language technology group conducts research in statistical language modelling, language understanding, knowledge discovery, linguistic annotation, high performance computing, and digital language archiving. The LT Group has a regular seminar series, group lunch, a reading group and a writing group.

For more information see http://www.csse.unimelb.edu.au/research/lt/

Location Privacy

Contact

Assoc Prof Lars Kulik

Project Description

Collecting location based data about individuals has undergone a revolution, since its smart use offers enormous benefits to society. Real time traffic management and personalized location based services are prime applications. Although collecting a person's location data holds tremendous potential, there are also significant risks to privacy. Our goal is to develop methods that balance application needs for location accuracy, against a user's need for location privacy.

Management and Mining of Trajectory Data for Travel and Transportation Decision Support

Contact

Dr Rui Zhang

Project Description

Availability of large collections of trajectories of humans and vehicles (via GPS and GSM) has enabled interesting applications such as recommending tourist locations, finding people of similar life patterns, etc, but these applications are based on mining trajectories in a static manner. This project addresses the challenge of mining large numbers of trajectories as they are being continually generated for supporting a suite of novel applications that require almost real-time response, which cannot be accommodated by existing approaches. Examples of such applications include traffic overload prediction, real time event detection, route recommendation, etc, which bring wide ranges of social, economic, environmental benefits to Australia.

Measurement in Experimental Computer Science

Contact

Prof Justin Zobel

Project Description

To be useful, measurement must be of some attribute or facet of behavior that is both predictive of future behavior on unseen data, and also perceivable to system users. In computing research we measure many things, such as precision and recall in information retrieval systems, or fluency and adequacy in response time in machine translation systems, or throughput and latency of distributed algorithms. In this project we are examining a range of such experimental methodologies, seeking to ensure that what we measure, and what we compare against when doing so, are well founded; and seeking to ensure that research results can be properly interpreted.

Mining Foundations and Applications

Contact

Prof James Bailey

Project Description

Our group is developing innovative algorithms for a range of challenging tasks in data mining and applying and evaluating these algorithms in real world settings. Areas of interest include anomaly detection, block modelling, data clustering, data classification, graph mining, heterogeneous data mining, pattern mining, spatio temporal data mining, time series data mining and uncertain data mining. Application areas include biomedical data, medical imaging, road traffic networks, social networks and video analysis.

Network Intrusion Detection and Defense

Contact

Prof Chris Leckie

Project Description

Our group is developing models of different types of attacks and their impact in a variety of network environments. We are also developing accurate and robust probabilistic algorithms for detecting and filtering these attacks in high speed networks. In particular, we are developing distributed algorithms that provide highly scalable defense platforms in broadband networks. Areas of interest include attack modelling, detection algorithms, defense mechanisms, event correlation and distributed techniques. Application areas include distributed denial-of-service attacks, anomaly detection and defense in cloud computing environments, defense of cyber-physical systems.

Understanding Individual and Collective Behavior through Large Scale Social Network Information Mining

Contact

Assoc Prof Shanika Karunasekera

Project Description

The goal of the project is to utilize social media interactions to understand the individual and collective behavior. Such analysis can be beneficial in predicting market trends, political/social opinions, information dissemination and disaster management. The project will develop scalable information network (graph) analysis and mining techniques, such as large-scale graph mining, statistical methods such as PCA, and tensor analysis, to identify and analyze group behavior in social networks.

Research Projects

Computational problems in glaucoma diagnosis, monitoring and understanding

Researchers: Andrew Turpin, Jonathan Denniss

Collaborators: Allison McKendrick (Optometry and Vision Science), Jonathan Crowston (Ophthalmology), Gerhard Zinser (Heidelberg Engineering), David Crabb (City University, London), Ted Garway-Heath (Moorfields Eye Hospital, London), Chota Matsumoto (Kinki University, Japan)

Sponsors: ARC, Heidelberg Engineering

Glaucoma is the second leading cause of blindness in Australia, and is typified by death of the optic nerve, and associated loss in visual field. This project is working on the following computational aspects of diagnosis, monitoring and modelling … Read more

Data retrieval from massive information structures

Researchers: Alistair Moffat, Anthony Wirth

Sponsors: Australian Research Council

Information search is an essential tool.  But most current services regard the data as unstructured collections of independent documents, free of context.  Next-generation search applications, such as over social networks, or corporate … Read more

Efficient and Effective Algorithms for Searching Strings in Secondary Storage

Researchers: Alistair Moffat, Anthony Wirth, Andrew Turpin, Simon Gog

Collaborators: Shane Culpepper (RMIT University)

Sponsors: Australian Research Council

Compressed data structure are surprisingly versatile, and often allow fundamental querying operations to be carried out just as quickly as is possible in more expansive arrangements. They also offer operations that might not be possible with conventional … Read more

Language preservation 2.0: Crowdsourcing oral language documentation using mobile devices

Researchers: Steven Bird

Collaborators: Mark Liberman (U Penn), David Chiang (USC)

Sponsors: NSF, SNSF

The purpose of this project is to demonstrate the feasibility of a new approach to documenting endangered languages. To allow wide-ranging investigation of a language even after it is no longer spoken, we need the equivalent of the million words … Read more

Principles, practice and pragmatics of measurement in experimental computer science

Researchers: Justin Zobel, Alistair Moffat, Timothy Baldwin, Yvette Graham

Sponsors: Australian Research Council

We aim to develop and empirically demonstrate principles and processes for measuring the outcomes of Computer Science experiments. Measurement is well-understood in the traditional physical sciences, and our work is intended to extend that strong … Read more

Prosodic systems in New Guinea: Integrating computational and typological approaches to linguistic analysis

Researchers: Steven Bird

Collaborators: Mark Liberman (U Penn), Larry Hyman (UCB), Mark Donohue (ANU)

Sponsors: NSF

The world's languages make heavy use of prosody – tone, stress, intonation, and length – to communicate meaning. Tone is the most complex of these elements. Although non-tone languages typically exploit pitch for intonational purposes, … Read more

What If?

Researchers: Richard Sinnott, Chris Duran

To fully assess the effects of environmental change and urban regeneration choices, it is essential to understand land use scenarios. Frequently, research tools that attempt to support projections of land use … Read more

Further Information

Prof Alistair Moffat

Student Supervision

We are keen to discuss supervision opportunities with talented students considering a research higher degree, Masters or PhD.

View the Department of Computing and Information System’s Graduate Research page for individual staff contact information and topic areas, or contact staff to arrange a discussion.

Graduate Research

Personnel

Academic Staff

Prof James Bailey
Prof Tim Baldwin
Dr Aaron Harwood
Assoc Prof Shanika Karunasekera
Prof Rao Kotagiri
Assoc Prof Lars Kulik
Prof Chris Leckie
Prof Alistair Moffat
Dr Lee Naish
Dr Udaya Parampalli
Assoc Prof Egemen Tanin
Assoc Prof Andrew Turpin
Dr Tony Wirth
Dr Rui Zhang
Prof Justin Zobel

Researchers

Dr Jeffrey Chan
Dr Simon Gog
Dr Yvette Graham
Dr Angelos Molfetas
Dr Vanessa Teague

Collaborators

Dr Josh Benaloh (Microsoft)
Prof James Bezdek (University of West Florida)
Assoc Prof Richard Buckland (University of NSW)
Mr Craig Burton (Victorian Electoral Commission)
Dr Chris Culnane (University of Surrey)
Dr Shane Culpepper (RMIT University)
Dr Lucia Falzon (DSTO, Adelaide)
Tanzima Hashem (BUET, Bangladesh)
Dr Conor Hayes (DERI, Ireland)
Dr James Heather (University of Surrey)
Dr Kerri Morgan (Monash University)
Prof Philippa Pattison (University of Melbourne)
Dr Sutharshan Rajasegarar (University of Melbourne)
Prof Garry Robins (University of Melbourne)
Prof Peter Y A Ryan (University of Luxembourg)
Prof Hanan Samet (University of Maryland, College Park)
Prof Steve Schneider (University of Surrey)
Prof Yufei Tao (Chinese University of Hong Kong)
Dr Roland Wen (University of NSW)
Dr Xing Xie (Microsoft Research Asia)
Dr Nicholas Yuan (Microsoft Research Asia)
Dr Yu Zheng (Microsoft Research Asia)