About Knowledge Discovery
Nearly ever action that we take results in digital data being generated, and as a result, each of us gives rise to gigabytes of digital trail every year. For example, when we purchase music on-line, our genre preferences are registered; when we use our credit card, the stores we frequent are identified and the items we purchase are tabulated; when we post a message on a blog or to twitter, our language and sentiment and time-of-day preferences are recorded; and when we walk around, our mobile phone tracks our location, and traces our movements.
There is enormously useful information that can be extracted from this ‘big data’ mountain, to make use of a phrase that is becoming increasingly prevalent as more and more electronic records are accumulated. Our overarching purpose in this research group is to develop efficient and effective techniques for distilling information from data, with the goal of improving future interactions and anticipating societal needs. For example, by identifying trends and relationships in medical data it will be possible to improve the accuracy with which medical diagnoses are made and provide earlier and less invasive interventions; by monitoring vehicle movement and speeds, future traffic and road needs can be better predicted, and better use made of existing resources; and by developing better techniques for processing human language, we will be able to provide more intuitive interfaces to software, and more precise on-line search services.
The themes listed below encompass these ideas, plus many more. We invite you to read these descriptions, and if you are interested, come and join us.
Algorithms for Collaborative Micro-Navigation Based on Spatio-Temporal Data Management and Data Mining
Traffic congestion is a key challenge in modern society with an estimated cost of $20 billion for Australia’s economy by 2020. This proposal aims to develop collaborative data management and mining algorithms that will utilize real-time data of connected cars and traffic data centers for continuously optimizing a road user's route in terms of travel time and fuel consumption. The outcome will be effective micro-navigation techniques that will integrate local information of neighboring cars to provide directions on a much finer level of precision. Our spatio-temporal algorithms are able to advise which lanes, positions and actions are optimal to reach a destination safely in minimal time, and pave the way for future automated transport.
Effective and Easy-to-Implement Methods for Supporting Efficient Querying and Mining of Spatio-Temporal Data
Querying and mining large scale spatio-temporal datasets is critical for the success of many important new generation applications including digital maps, location based social networks and services, and 3D molecular-shape-comparison based drug discovery. However, the lack of an efficient and easily implementable spatial data retrieval method has seriously hindered the ability of businesses and medical research institutions to efficiently browse, explore and mine their large scale spatial datasets. This project will develop novel spatial data retrieval methods that are both highly efficient and easy to implement so that even small/medium sized businesses and research institutions can rapidly adopt them at low cost.
Efficient String Processing With Succinct Data Structures
Pattern matching, string search, and bag-of-words and phrase-based information retrieval remain critically important computational tasks, with applications ranging form whole-of-web search through to manipulation of genomic data. In this project we are investigating approaches to these problems that make use of compressed and succinct data structures, with particular attention to memory hierarchies, and the balance between sequential and random access computations. Our goal is to devise improved approaches that provide better tradeoffs between speed, space, and answer quality; and hence allow broader implementation choices in particular application domains.
Efficient Synchronization of Large Repositories
Repositories are often replicated, or partially, replicated at multiple sites. Collections of source code and executables, or genetic databases, are ready examples. In this project, we aim to explore collection-based compression, exploiting similarities between new files and possibly several existing files. We expect to discover new search techniques to find similar files; new compression techniques, based on file differences; and new codings to represent compressed
Electronic Voting for Australian Elections
Voting by computer sounds like a great step forward for convenience and accessibility. Everyone knows that computers are good at adding up lots of numbers quickly and accurately, so what could possibly go wrong? In fact, achieving adequate security for voting is extremely challenging, because voting has special security requirements -- verifiability, everyone should get good evidence that the system got the right answer; and privacy; it should be very difficult to find out how someone voted, or even to prove how you voted. This project uses advanced cryptography to design extremely secure computerized voting systems that achieve good privacy and transparency, even compared to traditional pencil-and-paper systems. The aim is to find the best combination of the convenience of modern ICT and the verifiability of a human-readable permanent record of the vote.
Information Processing in Wireless Sensor Networks
Our group is developing energy-efficient algorithms for in-network processing of sensor measurements in heterogeneous sensor networks. In particular, we interested in the challenges of detecting unusual or abnormal events from sensor data. Areas of interest include distributed event detection, spatio-temporal inference, privacy-preserving aggregation. Application areas include environmental monitoring, air quality monitoring, biomedical monitoring.
Most human knowledge and human communication is represented and expressed using language, both in written and spoken forms. Language technologies permit computers to process human language, providing more natural human-machine interfaces, and more sophisticated access to stored information. Language technologies play a central role in the multilingual information society of the future. The language technology group conducts research in statistical language modelling, language understanding, knowledge discovery, linguistic annotation, high performance computing, and digital language archiving. The LT Group has a regular seminar series, group lunch, a reading group and a writing group.
For more information see http://www.csse.unimelb.edu.au/research/lt/
Collecting location based data about individuals has undergone a revolution, since its smart use offers enormous benefits to society. Real time traffic management and personalized location based services are prime applications. Although collecting a person's location data holds tremendous potential, there are also significant risks to privacy. Our goal is to develop methods that balance application needs for location accuracy, against a user's need for location privacy.
Management and Mining of Trajectory Data for Travel and Transportation Decision Support
Availability of large collections of trajectories of humans and vehicles (via GPS and GSM) has enabled interesting applications such as recommending tourist locations, finding people of similar life patterns, etc, but these applications are based on mining trajectories in a static manner. This project addresses the challenge of mining large numbers of trajectories as they are being continually generated for supporting a suite of novel applications that require almost real-time response, which cannot be accommodated by existing approaches. Examples of such applications include traffic overload prediction, real time event detection, route recommendation, etc, which bring wide ranges of social, economic, environmental benefits to Australia.
Measurement in Experimental Computer Science
To be useful, measurement must be of some attribute or facet of behavior that is both predictive of future behavior on unseen data, and also perceivable to system users. In computing research we measure many things, such as precision and recall in information retrieval systems, or fluency and adequacy in response time in machine translation systems, or throughput and latency of distributed algorithms. In this project we are examining a range of such experimental methodologies, seeking to ensure that what we measure, and what we compare against when doing so, are well founded; and seeking to ensure that research results can be properly interpreted.
Mining Foundations and Applications
Our group is developing innovative algorithms for a range of challenging tasks in data mining and applying and evaluating these algorithms in real world settings. Areas of interest include anomaly detection, block modelling, data clustering, data classification, graph mining, heterogeneous data mining, pattern mining, spatio temporal data mining, time series data mining and uncertain data mining. Application areas include biomedical data, medical imaging, road traffic networks, social networks and video analysis.
Network Intrusion Detection and Defense
Our group is developing models of different types of attacks and their impact in a variety of network environments. We are also developing accurate and robust probabilistic algorithms for detecting and filtering these attacks in high speed networks. In particular, we are developing distributed algorithms that provide highly scalable defense platforms in broadband networks. Areas of interest include attack modelling, detection algorithms, defense mechanisms, event correlation and distributed techniques. Application areas include distributed denial-of-service attacks, anomaly detection and defense in cloud computing environments, defense of cyber-physical systems.
Understanding Individual and Collective Behavior through Large Scale Social Network Information Mining
The goal of the project is to utilize social media interactions to understand the individual and collective behavior. Such analysis can be beneficial in predicting market trends, political/social opinions, information dissemination and disaster management. The project will develop scalable information network (graph) analysis and mining techniques, such as large-scale graph mining, statistical methods such as PCA, and tensor analysis, to identify and analyze group behavior in social networks.
Computational problems in glaucoma diagnosis, monitoring and understanding
Researchers: Andrew Turpin, Jonathan Denniss
Collaborators: Allison McKendrick (Optometry and Vision Science), Jonathan Crowston (Ophthalmology), Gerhard Zinser (Heidelberg Engineering), David Crabb (City University, London), Ted Garway-Heath (Moorfields Eye Hospital, London), Chota Matsumoto (Kinki University, Japan)
Sponsors: ARC, Heidelberg Engineering
Glaucoma is the second leading cause of blindness in Australia, and is typified by death of the optic nerve, and associated loss in visual field. This project is working on the following computational aspects of diagnosis, monitoring and modelling of the disease.
- Improved algorithms for testing the visual field.
- Constructing models of retinal nerve pathways.
- Inventing algorithms for linking medical images of the optic nerve with measures of visual function.
Data retrieval from massive information structures
Researchers: Alistair Moffat, Anthony Wirth
Sponsors: Australian Research Council
Efficient and Effective Algorithms for Searching Strings in Secondary Storage
Researchers: Alistair Moffat, Anthony Wirth, Andrew Turpin, Simon Gog
Collaborators: Shane Culpepper (RMIT University)
Sponsors: Australian Research Council
Compressed data structure are surprisingly versatile, and often allow fundamental querying operations to be carried out just as quickly as is possible in more expansive arrangements. They also offer operations that might not be possible with conventional structures. In this project we are examining how to carry out pattern serach over texts that are so large that only a small index can be held in main memory, and not the string itself. We have already developed a range of such techniques, and expect to further enhance them in the remaining term of the project.
Language preservation 2.0: Crowdsourcing oral language documentation using mobile devices
Researchers: Steven Bird
Collaborators: Mark Liberman (U Penn), David Chiang (USC)
Sponsors: NSF, SNSF
The purpose of this project is to demonstrate the feasibility of a new approach to documenting endangered languages. To allow wide-ranging investigation of a language even after it is no longer spoken, we need the equivalent of the million words of extant biblical Hebrew texts, or the five million words of extant classical Latin. But for endangered languages without a significant culture of literacy, diverse text collections on this scale seem out of reach. Given typical speaking rates of about 10,000 word-equivalents per hour, a hundred hours of recorded speech – conversations, narratives, or oral histories – would give us the equivalent of a million words of text. With community involvement, hundreds of hours of such recordings are easily within reach. However, transcribing such large audio collections is a daunting task, given the small number of literate native speakers and the time-consuming nature of such transcription, which can take 200 hours of work for every hour of audio. We propose to solve this problem by substituting re-speaking and verbal translation: one or more native speakers repeats each phrase of a recording, speaking slowly and carefully, and then translates it into a better-documented language. The utility of translated passages as a way to analyze otherwise-unknown languages has been demonstrated many times, starting with the Rosetta Stone. Our goal in this project is to demonstrate the utility of re-speaking. We believe that linguists, starting out with relatively little knowledge of a language, can produce phonetic transcriptions that will be good enough to support subsequent analysis resulting in coherent texts, in a process analogous to (but easier than) the process that allowed previous generations of scholars to learn to read ancient Egyptian or Sumerian.
Principles, practice and pragmatics of measurement in experimental computer science
Researchers: Justin Zobel, Alistair Moffat, Timothy Baldwin, Yvette Graham
Sponsors: Australian Research Council
We aim to develop and empirically demonstrate principles and processes for measuring the outcomes of Computer Science experiments. Measurement is well-understood in the traditional physical sciences, and our work is intended to extend that strong foundation and robustness to the computational sciences. In particular, we aim to examine and critique current methodologies for measuring outcomes in a number of research-based computational tasks, and to develop improved systems of measurement that should be used when accurate knowledge of the performance of computational methods is of critical importance, for example, when those methods will be embedded in widely-used or mission critical software tools.
Prosodic systems in New Guinea: Integrating computational and typological approaches to linguistic analysis
Researchers: Steven Bird
Collaborators: Mark Liberman (U Penn), Larry Hyman (UCB), Mark Donohue (ANU)
The world's languages make heavy use of prosody – tone, stress, intonation, and length – to communicate meaning. Tone is the most complex of these elements. Although non-tone languages typically exploit pitch for intonational purposes, the more sophisticated use of pitch in tone languages means that speakers of such languages will have quite different mental representations of pitch from speakers of English and better-known European non-tone languages. This project will investigate the tone and reduced-tone languages of New Guinea, a linguistically under-investigated area of the world which is home to a sixth of the world's languages. The project will collect substantial new bodies of recorded and transcribed language data from several undescribed tone languages. It will then use computational and theoretical methods to analyse the geographical distribution of tonal properties and the interaction of tone and other prosodic features. The project will incorporate technology into linguistic field work and develop an exemplary model of prosodic description. Language consultants will be trained in the model's use, leading to more accessible primary data and more accountable descriptions. The data will be made available in a form that can be readily used by scholars, language teachers, and communities of speakers and will support the development of writing systems and literacy programs for these languages.
Researchers: Richard Sinnott, Chris Duran
To fully assess the effects of environmental change and urban regeneration choices, it is essential to understand land use scenarios. Frequently, research tools that attempt to support projections of land use allocations are built upon frameworks and programming languages which are tailor-made for a particular purpose, and not easily extended to support a wider sharing of resources and collaborative work. The AURIN project has enhanced one leading scenario optimization based tool: What If?™ and made this a core part of its e-Infrastructure. This one-year project, which began in February 2012, will deliver a web-based version of What If? and allow a wide variety of land use scenarios to be explored. A demonstration of What If? is available through https://portal.aurin.org.au
Prof Alistair Moffat
We are keen to discuss supervision opportunities with talented students considering a research higher degree, Masters or PhD.
View the Department of Computing and Information System’s Graduate Research page for individual staff contact information and topic areas, or contact staff to arrange a discussion.
Prof James Bailey
Prof Tim Baldwin
Dr Aaron Harwood
Assoc Prof Shanika Karunasekera
Prof Rao Kotagiri
Assoc Prof Lars Kulik
Prof Chris Leckie
Prof Alistair Moffat
Dr Lee Naish
Dr Udaya Parampalli
Assoc Prof Egemen Tanin
Assoc Prof Andrew Turpin
Dr Tony Wirth
Dr Rui Zhang
Prof Justin Zobel
Dr Josh Benaloh (Microsoft)
Prof James Bezdek (University of West Florida)
Assoc Prof Richard Buckland (University of NSW)
Mr Craig Burton (Victorian Electoral Commission)
Dr Chris Culnane (University of Surrey)
Dr Shane Culpepper (RMIT University)
Dr Lucia Falzon (DSTO, Adelaide)
Tanzima Hashem (BUET, Bangladesh)
Dr Conor Hayes (DERI, Ireland)
Dr James Heather (University of Surrey)
Dr Kerri Morgan (Monash University)
Prof Philippa Pattison (University of Melbourne)
Dr Sutharshan Rajasegarar (University of Melbourne)
Prof Garry Robins (University of Melbourne)
Prof Peter Y A Ryan (University of Luxembourg)
Prof Hanan Samet (University of Maryland, College Park)
Prof Steve Schneider (University of Surrey)
Prof Yufei Tao (Chinese University of Hong Kong)
Dr Roland Wen (University of NSW)
Dr Xing Xie (Microsoft Research Asia)
Dr Nicholas Yuan (Microsoft Research Asia)
Dr Yu Zheng (Microsoft Research Asia)