Publications:
(see also my Google Scholar profile).
2024:
- Privately evaluating sensitive population record linkage without ground truth data
Jie Song, Charini Nanayakkara, and Peter Christen.
In the journal International Journal of Data Science and Analytics (Springer),
October 2024.
See here for open access article (pdf, 651 KB)
- Selecting a classification
performance measure: matching the measure to the problem
David J. Hand, Peter Christen, and Sumayya Ziyad.
Preprint (September 2024) available from
arXiv.org,
- A Critical Re-evaluation of
Benchmark Datasets for (Deep) Learning-Based Matching Algorithms
George Papadakis, Nishadi Kirielle, Peter Christen, and Themis Palpanas.
Proceedings of the IEEE International
Conference on Data Engineering (ICDE), Utrecht, May 2024.
Article available from
IEEE Explore,
preprint (November 2023) available from
arXiv.org.
- Class ratio and its implications
for reproducibility and performance in record linkage
Jeremy Foxcroft, Peter Christen, and Luiza Antonie.
Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD), Taipei, May 2024.
Camera-ready paper
(pdf, 493 KB)
- (Privately)
Estimating Linkage Quality for Record Linkage
Martin Franke, Victor Christen, Peter Christen, Florens Rohde, and Erhard Rahm.
Proceedings of the
International Conference
on Extending Database Technology (EDBT), March 2024.
Accepted paper
(pdf, 926 KB)
- Pattern Masking
for Dictionary Matching: Theory and Practice
Panagiotis Charalampopoulos, Huiping Chen, Peter Christen, Grigorios Loukides, Nadia Pisanti,
Solon P Pissis, and Jakub Radoszewski.
In the journal Algorithmica, March 2024.
See here for open
online access.
- When Data Science Goes Wrong: How
Misconceptions About Data Capture and Processing Causes Wrong Conclusions
Peter Christen and Rainer Schnell.
In the Harvard Data Science Review (HDSR), February 2024.
See here for open online access.
- Encryption-based sub-string matching for privacy-preserving record linkage
Sirintra Vaiwsri, Thilina Ranbaduge, and Peter Christen.
In the Journal of Information Security and Applications (Elsevier), January 2024.
Camera-ready paper
(pdf, 543 KB)
2023:
- A review of the F-measure:
Its History, Properties, Criticism, and Alternatives
Peter Christen, David J. Hand, and Nishadi Kirielle.
In the journal ACM Computing Surveys,
June 2023.
Accepted paper
(pdf, 723 KB)
- An
Analysis of One-to-One Matching Algorithms for Entity Resolution
George Papadakis, Vasilis Efthymiou, Emmanouil Thanos, Oktie Hassanzadeh, and
Peter Christen.
In the VLDB Journal, April 2023.
- A Vulnerability
Assessment Framework for Privacy-Preserving Record Linkage
Anushka Vidanage, Peter Christen, Thilina Ranbaduge, and Rainer Schnell.
In ACM Transactions on Privacy and
Security, April 2023.
- Tuning
the Utility-Privacy Trade-Off in Trajectory Data
Maja Schneider, Jonathan Schneider, Lea Löffelmann, Peter
Christen, and Erhard Rahm.
Proceedings of the International Conference on Extending Database Technology
(EDBT), Ioannina, Greece, March 2023.
- Rule-Based
Knowledge Discovery via Anomaly Detection in Tabular Data
Asara Senaratne, Peter Christen, Graham Williams, and Pouya Ghiasnezhad Omran.
In proceeding of the AAAI Spring
Symposium on Challenges Requiring the Combination of Machine Learning and
Knowledge Engineering (AAAI-MAKE 2023), San Francisco, March 2023.
- Evolution of Degree Metrics in Large Temporal Graphs
Christopher Rost, Kevin Gomez, Peter Christen, and Erhard Rahm.
Proceedings of the 20th Conference on Database
Systems for Business, Technology and Web (BTW-2023), Dresden, March 2023.
- Thirty-three
Myths and Misconceptions about Population Data: from Data
Capture and Processing to Linkage
Peter Christen and Rainer Schnell.
In the International Journal of
Population Data Science (IJPDS), vol 8, number 1, January 2023.
See here for a brief news article:
Why misconceptions about population data can lead to bad
outcomes.
2022:
- Privacy-Preserving Record Linkage using Autoencoders
Victor Christen, Tim Häntschel, Peter Christen, and
Erhard Rahm.
In the journal International Journal of Data Science and Analytics (Springer),
November 2022.
Camera-ready paper
(pdf, 466 KB)
- Big Data is not the New Oil: Common
Misconceptions about Population Data
Peter Christen and Rainer Schnell.
Article available from
arXiv.org,
September 2022 (first published December 2021).
- Locality Sensitive
Hashing with Temporal and Spatial Constraints for Efficient Population Record Linkage
Charini Nanayakkara and Peter Christen.
Proceedings of the
International Conference on Information
and Knowledge Management (CIKM), Atlanta, October 2022.
Camera-ready paper
(pdf, 709 KB)
- D-TOUR: Detour-based point of interest detection in privacy-sensitive trajectories
Maja Schneider, Lukas Gehrke, Peter Christen, and Erhard Rahm.
Privacy and Security at Large Workshop, Bonn, September 2022.
- A Taxonomy of Attacks on Privacy-Preserving Record Linkage
Anushka Vidanage, Thilina Ranbaduge, Peter Christen, and Rainer Schnell.
In the Journal of Privacy and Confidentiality (JPC), July 2022.
- Unsupervised Identification of Abnormal Nodes and Edges in Graphs
Asara Senaratne, Peter Christen, Graham Williams, and Pouya G. Omran.
In the ACM Journal of
Data and Information Quality (JDIQ), July 2022.
- Unsupervised Graph-Based
Entity Resolution for Complex Entities
Nishadi Kirielle, Peter Christen, and Thilina Ranbaduge.
Accepted by the
ACM Transactions on Knowledge Discovery
from Data (TKDD), May 2022.
Accepted paper (pdf, 484 KB)
- Accurate and Efficient Privacy-Preserving String Matching
Sirintra Vaiwsri, Thilina Ranbaduge, and Peter Christen.
In the International Journal of Data Science and Analytics (Springer), March 2022.
Camera-ready paper
(pdf, 1.4 MB)
- TransER: Homogeneous Transfer Learning for Entity
Resolution
Nishadi Kirielle, Peter Christen, and Thilina Ranbaduge.
Accepted by the
International Conference on Extending Database Technology
(EDBT), March 2022.
Camera-ready paper
(pdf, 910 KB)
- Unsupervised Graph-based Entity Resolution for Accurate
and Efficient Family Pedigree Search
Nishadi Kirielle, Charini Nanayakkara, Peter Christen,
Chris Dibben, Lee Williamson, Eilidh Garrett, and Claire
Manson.
Accepted by the
International Conference on Extending Database Technology
(EDBT), March 2022.
Camera-ready paper
(pdf, 864 KB)
- Accurate Privacy-preserving Record Linkage for Databases
with Missing Values
Sirintra Vaiwsri, Thilina Ranbaduge, Peter Christen, and
Rainer Schnell.
In Information Systems (Elsevier), January 2022.
Camera-ready paper
(pdf, 668 KB)
2021:
- Unsupervised Anomaly Detection in Knowledge Graphs.
Asara Senaratne, Pouya. Omran, Graham Williams, and
Peter Christen.
Proceedings of the International Joint Conference on Knowledge Graphs (IJCKG'21),
December 2021.
Camera-ready paper
(pdf, 693 KB)
- A Critique and Attack on: Blockchain-based Privacy-preserving Record Linkage
Peter Christen, Rainer Schnell, Thilina Ranbaduge, and Anushka Vidanage.
In Information Systems (Elsevier), October 2021.
- Data Science for Society: Challenges, Developments and Applications
Pia Hardelid, Peter Christen, Elizabeth Williamson, Katie Harron, Bianca L De Stavola.
Journal of the Royal Statistical Society: Series A (Statistics in Society), October 2021.
- Active Learning based Similarity Filtering for Efficient and Effective Record Linkage
Charini Nanayakkara, Peter Christen, and Thilina Ranbaduge.
Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'21), Delhi, May 2021.
Camera-ready paper
(pdf, 374 KB)
- Large Scale Record Linkage in the Presence of Missing Data
Thilina Ranbaduge, Peter Christen, and Rainer Schnell.
Article available from arXiv.org, April 2021.
- Accurate and Efficient Suffix Tree Based Privacy-Preserving String Matching
Sirintra Vaiwsri, Thilina Ranbaduge, Peter Christen, and Kee Siong Ng.
Article available from arXiv.org, April 2021.
- F*: An
Interpretable Transformation of the F-measure
David J. Hand, Peter Christen, and Nishadi Kirielle.
In the journal Machine Learning, March 2021.
Article available online from the
SpringerLink.
2020:
- Linking
Sensitive Data - Methods and Techniques for Practical
Privacy-Preserving Information Sharing
Peter Christen, Thilina Ranbaduge, and Rainer Schnell.
Springer, November 2020.
- Estimating Maternal Mortality Rates during the 1918 Flu using
Birth to Death Linkage
Peter Christen, Eilidh Garrett, Beata Nowok, Alice Reid, Lee
Williamson, and Chris Dibben.
International Population
Data Linkage Conference (IPDLN), Adelaide (virtual),
November 2020.
- Evaluating Binary Encoding Techniques in the Presence of
Missing Values in Privacy-Preserving Record Linkage
Thilina Ranbaduge and Peter Christen.
International Population
Data Linkage Conference (IPDLN), Adelaide (virtual),
November 2020.
- Linking Sensitive Data
Peter Christen, Thilina Ranbaduge, and Rainer Schnell.
International Population
Data Linkage Conference (IPDLN), Adelaide (virtual),
November 2020.
- A Graph Matching Attack on Privacy-Preserving Record
Linkage
Anushka Vidanage, Peter Christen, Thilina Ranbaduge, and Rainer
Schnell.
ACM International
Conference on Information and Knowledge Management
(CIKM 2020), Galway (virtual), October 2020.
Camera-ready paper
(pdf, 1.5 MB)
- An Anonymiser
Tool for Sensitive Graph Data
Charini Nanayakkara, Peter Christen, and Thilina Ranbaduge.
Workshop on
EntitY Retrieval and lEarning (EYRE 2020), held at the
ACM International
Conference on Information and Knowledge Management
(CIKM 2020), Galway (virtual), October 2020.
- Quality assessment in data linkage
James Doidge, Peter Christen, and Katie Harron.
In Guidance:
Joined up data in government: the future of data linking
methods, UK
Office for National
Statistics, August 2020.
- A Privacy
Attack on Multiple Dynamic Match-key based Privacy-Preserving
Record Linkage
Anushka Vidanage, Thilina Ranbaduge, Peter Christen, and Sean
Randall.
In the International Journal of
Population Data Science (IJPDS), vol 5, number 1, August
2020.
- Secure and Accurate Two-step Hash Encoding for Privacy-Preserving
Record Linkage
Thilina Ranbaduge, Peter Christen, and Rainer Schnell.
Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'20), Singapore, May 2020.
Camera-ready paper
(pdf, 1.1 MB)
- Secure
Multi-party Summation Protocols: Are They Secure Enough Under
Collusion?
Thilina Ranbaduge, Dinusha Vatsalan, and Peter Christen.
Transactions on Data Privacy,
April 2020.
Paper
(pdf, 528 KB)
- Incremental Clustering Techniques for Multi-Party
Privacy-Preserving Record Linkage
Dinusha Vatsalan, Peter Christen, and Erhard Rahm.
In the journal
Data and Knowledge Engineering, March 2020.
Camera-ready paper
(pdf, 510 KB)
2019:
- Transforming Pairwise Duplicates to Entity Clusters for
High-quality Duplicate Detection
Uwe Draisbach, Peter Christen, and Felix Naumann.
In the ACM Journal of
Data and Information Quality (JDIQ), vol. 12, issue 1, 2019.
Article available online from the
ACM Digital Library.
- Evaluation
Measure for Group-Based Record Linkage
Charini Nanayakkara, Peter Christen, Thilina Ranbaduge, an
Eilidh Garrett.
In the International Journal of
Population Data Science (IJPDS), vol 4, number 1, November
2019.
- Outlier Detection Based Accurate Geocoding of Historical
Addresses
Nishadi Kirielle, Peter Christen, Thilina Ranbaduge
Proceedings of the Australasian Data Mining Conference (AusDM), Adelaide,
December 2019.
Camera ready paper (12 pages, pdf, 600 KB)
- Data Linkage: The Big Picture
Peter Christen.
Invited
Diving
into Data article in the
Harvard Data Science Review, issue 1.2, November 2019.
- Informativeness-Based Active Learning for Entity
Resolution
Victor Christen, Peter Christen, and Erhard Rahm.
International workshop on
Data
Integration and Applications, held at the ECML/PKDD
Conference, Würzburg, Germany, September 2019.
The article is
available online (PDF, 3.1 MBytes).
- A Scalable Privacy-Preserving Framework for Temporal Record
Linkage
Thilina Ranbaduge and Peter Christen.
In the journal Knowledge and Information Systems, June 2019.
Camera-ready paper
(pdf, 909 KB)
- Robust Temporal Graph Clustering for Group Record Linkage
Charini Nanayakkara, Peter Christen, and Thilina Ranbaduge.
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'19), Macau, April 2019.
Camera-ready paper (pdf, 429 KB)
- Efficient Pattern Mining Based Cryptanalysis for Privacy-Preserving
Record Linkage
Anushka Vidanage, Thilina Ranbaduge, Peter Christen, and Rainer
Schnell.
Proceedings of the IEEE International Conference on Data Engineering
(ICDE'19), Macau, April 2019.
Camera-ready paper
(pdf, 337 KB)
- Linking Scottish Vital Event Records using Family Groups
Özgür Akgün, Alan Dearle, Graham Kirby, Eilidh
Garrett, Tom Dalton, Peter Christen, Chris Dibben, and Lee Williamson.
In Historical Methods: A Journal of Quantitative and Interdisciplinary
History, 2019.
2018:
- Reference Values Based Hardening for Bloom Filters Based
Privacy-Preserving Record Linkage
Sirintra Vaiwsri, Thilina Ranbaduge, Peter Christen.
Proceedings of the Australasian Data Mining Conference (AusDM), Bathurst,
November 2018.
Camera ready paper (12 pages, pdf, 290 KB)
- Privacy-Preserving Temporal Record Linkage
Thilina Ranbaduge and Peter Christen.
Proceedings of the IEEE International
Conference on Data Mining (ICDM'18), Singapore, November 2018.
Camera-ready paper
(pdf, 591 KB)
- Towards a `Smart' Cost-Benefit Tool: Using Machine Learning
to Predict the Costs of Criminal Justice Policy Interventions
Matthew Manning, Gabriel Wong, Timothy Graham, Thilina Ranbaduge,
Peter Christen, Kerry Taylor, Richard Wortley, Toni Makkai, and
Pierre Skorich.
In Crime Science, October 2018.
Preprint available at
Springer Link
-
Precise and Fast Cryptanalysis for Bloom Filter Based
Privacy-Preserving Record Linkage
Peter Christen, Thilina Ranbaduge, Dinusha Vatsalan,
and Rainer Schnell.
In the IEEE Transactions on Knowledge and Data Engineering,
October 2018.
Final
submitted paper (14 pages, pdf, 562 KB)
- Evaluating
Hardening Techniques Against Cryptanalysis Attacks on Bloom
Filter
Thilina Ranbaduge, Anushka Vidanage, Sirintra Vaiwsri, Rainer Schnell,
and Peter Christen.
In the International Journal of
Population Data Science (IJPDS), vol 3, number 4, August
2018.
- Temporal Graph-Based Clustering for Historical Record Linkage
Charini Nanayakkara, Peter Christen, and Thilina Ranbaduge.
International workshop on
Mining and Learning from
Graphs (MLG 2018),
held at ACM
SIGKDD 2018,
London, August 2018.
Submitted paper is available online from
arXiv.org
- Developing a Temporal Bibliographic Data Set for Entity
Resolution
Yichen Hu, Qing Wang, and Peter Christen.
Workshop
BigScholar 2018,
held at ACM
SIGKDD 2018, London, August 2018.
Submitted paper is available online from
arXiv.org
- Distributed
Privacy-Preserving Record Linkage Using Pivot-Based Filter Techniques
Marcel Gladbach, Ziad Sehili, Thomas Kudrass, Peter Christen, and Erhard Rahm.
In IEEE 34th International Conference on Data Engineering Workshops, Paris, July 2018.
-
Pattern-Mining based Cryptanalysis of Bloom Filters for
Privacy-Preserving Record Linkage
Peter Christen, Anushka Vidanage, Thilina Ranbaduge, and
Rainer Schnell.
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'18), Melbourne, Australia, June 2018.
Camera-ready paper
(pdf, 462 KB)
- A Scalable and Efficient Subgroup Blocking Scheme for
Multidatabase Record Linkage
Thilina Ranbaduge, Dinusha Vatsalan, and Peter Christen.
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'18), Melbourne, Australia, June 2018.
Camera-ready paper
(pdf, 695 KB)
- Using Metric Space Indexing for Complete and Efficient Record
Linkage
Özgür Akgün, Alan Dearle, Graham Kirby, and Peter Christen.
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'18), Melbourne, Australia, June 2018.
- A Decision Tree Approach to Predicting Recidivism in Domestic
Violence (BDASC best paper award)
Senuri Wijenayake, Timothy Graham, Peter Christen.
Proceedings of the Big Data
Analytics for Social Computing (BDASC) workshop, held at the
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'18), Melbourne, Australia, June 2018.
Submitted paper is available online from
arXiv.org
- DLforum - A
Multidisciplinary Online Discussion Forum for Data Linkage
Researchers and Practitioners
See also: https://dmm.anu.edu.au/DLforum/
Peter Christen, Thilina Ranbaduge, and Dinusha Vatsalan.
In the International Journal of
Population Data Science (IJPDS), vol 3, number 1, February
2018.
2017:
- Scalable Entity
Resolution Using Probabilistic Signatures on Parallel
Databases
Yuhang Zhang, Kee Siong Ng, Michael Walker, Pauline Chou, Tania
Churchill, and Peter Christen.
arXiv.org, December 2017.
Paper (pdf, 536 KB)
- Efficient Cryptanalysis of Bloom Filters for Privacy-Preserving
Record Linkage
Peter Christen, Rainer Schnell, Dinusha Vatsalan, and
Thilina Ranbaduge.
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'17), Jeju Island, South Korea, May 2017.
Submitted paper is available online from
INI DLA Preprints:
Paper (pdf, 588 KB)
- Improving Temporal Record Linkage using Regression
Classification
Yichen Hu, Qing Wang, Dinusha Vatsalan, and Peter Christen.
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'17), Jeju Island, South Korea, May 2017.
Camera-ready paper
(pdf, 360 KB)
- Advanced Methods for
Linking Complex Historical Birth, Death, Marriage and Census
Data
Peter Christen.
In the International Journal of
Population Data Science (IJPDS), issue 1, vol 1,
April 2017.
- Evaluation of
Advanced Techniques for Multi-Party Privacy-Preserving Record
Linkage on Real-World Health Databases
Thilina Ranbaduge, Dinusha Vatsalan, Sean Randall, and
Peter Christen.
In the International Journal of
Population Data Science (IJPDS), issue 1, vol 1,
April 2017.
- A Note on Using the F-measure for Evaluating Record Linkage
Algorithms
David Hand and Peter Christen.
In the journal Statistics and Computing, Online first, April 2017.
Article available online from the
SpringerLink.
- Data Scrubbing
Peter Christen.
Chapter in the
Encyclopedia of Database Systems, Springer, April 2017.
Article available online from
SpringerLink.
- Temporal Group Linkage and Evolution Analysis for Census Data
Victor Christen, Anika Gross, Jeffrey Fisher, Qing Wang, Peter
Christen, and Erhard Rahm.
Proceedings of the
International Conference
on Extending Database Technology (EDBT'17), Venice, March 2017.
- Privacy-Preserving Record Linkage for Big Data: Current
Approaches and Research Challenges
Dinusha Vatsalan, Ziad Sehili, Peter Christen, Erhard Rahm.
Handbook of Big Data Technologies, Springer, 2017.
Preprint (pdf, 531 KB)
- Scalable
Multi-Database Privacy-Preserving Record Linkage using
Counting Bloom Filters
Dinusha Vatsalan, Peter Christen, and Erhard Rahm.
arXiv.org, January 2017.
Paper (pdf, 422 KB)
2016:
- Multi-Party
Privacy-Preserving Record Linkage using Bloom Filters
Dinusha Vatsalan and Peter Christen.
arXiv.org, December 2016.
Paper, (pdf, 348 KB)
- Scalable Block Scheduling for Efficient Multi-Database Record
Linkage
Thilina Ranbaduge, Dinusha Vatsalan, and Peter Christen.
Proceedings of the IEEE International Conference on Data
Mining (ICDM'16), Barcelona, December 2016.
- Scalable Privacy-Preserving Linking of Multiple Databases
Using Counting Bloom filter
Dinusha Vatsalan, Peter Christen, and Erhard Rahm.
Proceedings of the workshop Privacy and Discrimination in Data Mining
(PDDM), held at the IEEE International Conference on Data
Mining (ICDM'16), Barcelona, December 2016.
Camera-ready paper
(pdf, 555 KB)
- Application of
Advanced Record Linkage Techniques for Complex Population
Reconstruction
Peter Christen.
arXiv.org, December 2016.
Paper (pdf, 1.4 MB)
- Regression Classifier
for Improved Temporal Record Linkage
Yichen Hu, Qing Wang, Dinusha Vatsalan, and Peter Christen.
Proceedings of the Fourteenth Australasian Data Mining Conference
(AusDM'16), Canberra, December 2016.
Paper, (pdf, 636 KB)
- A Note on using the F-measure for Evaluating Data Linkage
Algorithms
David Hand and Peter Christen.
Preprint, Isaac Newton Institute
for Mathematical Sciences (INI), Cambridge, November 2016.
Article available online from
INI DLA Preprints:
Paper (pdf, 684 KB)
- Record Linkage
Peter Christen and William Winkler.
Chapter in the
Encyclopedia of Machine Learning and Data Mining.
Claude Sammut and Geoff Webb,
Springer, June 2015.
Article available online from
SpringerLink.
- Hashing-based Distributed Multi-party Blocking for
Privacy-preserving Record Linkage
Thilina Ranbaduge, Dinusha Vatsalan, Peter Christen, and
Vassilios Verykios.
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'16), Auckland, New Zealand, April 2016.
Article available online from
SpringerLink.
- A Clustering-Based Framework for Incrementally
Repairing Entity Resolution
Qing Wang, Jingyi Gao, and Peter Christen.
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'16), Auckland, New Zealand, April 2016.
Article available online from
SpringerLink.
- Active Learning Based Entity Resolution Using Markov
Logic
Jeffrey Fisher, Peter Christen, and Qing Wang.
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'16), Auckland, New Zealand, April 2015.
Article available online from
SpringerLink
- Efficient Record Linkage Using a Compact Hamming Space
Dimitrios Karapiperis, Dinusha Vatsalan, Vassilios Verykios,
and Peter Christen.
Proceedings of the International Conference on Extending Database Technology
(EDBT'16), Bordeaux, France, March 2016.
Article available online from
OpenProceedings.
- Automatic Discovery of Abnormal Values in Large Textual
Databases
Peter Christen, Ross Gayler, Khoi-Nguyen Tran, Jeffrey Fisher
and Dinusha Vatsalan.
In the ACM Journal of
Data and Information Quality (JDIQ), vol. 7, issues 1-2, 2016.
Article available online from the
ACM Digital Library.
- Privacy-Preserving Matching of Similar Patients
Dinusha Vatsalan and Peter Christen.
In Journal of Biomedical Informatics (JBI), vol. 59, pages 285-298,
2016.
Article available online from
Science Direct.
- Macro-Level Information Transfer in Social Media:
Reflections of Crowd Phenomena
Minkyoung Kim, David Newth, and Peter Christen.
In Elsevier Neurocomputing, vol. 172, pages 84-99, 2016.
Article available online from
Science Direct.
2015:
- Efficient Entity Resolution with Adaptive and
Interactive Training Data Selection
Peter Christen, Dinusha Vatsalan, and Qing Wang.
Proceedings of the IEEE International Conference on Data
Mining (ICDM'15), Atlantic City, November 2015.
Article available online from
IEEE Explore.
- MERLIN - A Tool for Multi-party Privacy-preserving
Record Linkage
Thilina Ranbaduge, Dinusha Vatsalan, and Peter Christen.
Demo paper. Proceedings of the IEEE International Conference on Data
Mining (ICDM'15), Atlantic City, November 2015.
Article available online from
IEEE Explore.
- Context-aware Approximate String Matching for
Large-scale Real-time Entity Resolution
Peter Christen and Ross Gayler.
Proceedings of the workshop Data Integration and Applications
(DINA), held at the IEEE International Conference
on Data Mining (ICDM'15), Atlantic City, November 2015.
Article available online from
IEEE Explore.
Camera-ready paper
(pdf, 383 KB)
- Dynamic Sorted Neighborhood Indexing for Real-Time
Entity Resolution
Banda Ramadan, Peter Christen, Huizhi Liang, and Ross Gayler.
In ACM Journal of Data and Information
Quality (JDIQ), vol. 6, issue 4, 2015.
Article available online from the
ACM Digital Library.
- Population Reconstruction
Gerrit Bloothooft, Peter Christen, Kees Mandemakers,
and Marijn Schraagen (editors).
Springer, August 2015.
- Advanced Record Linkage Methods and Privacy Aspects for
Population Reconstruction - A Survey and Case Studies.
Peter Christen, Dinusha Vatsalan, and Zhichun Fu.
Invited book chapter in
Population Reconstruction.
Gerrit Bloothooft, Peter Christen, Kees Mandemakers,
and Marijn Schraagen (editors).
Springer, August 2015.
- Clustering-Based Framework to Control Block Sizes for
Entity Resolution
Jeffrey Fisher, Peter Christen, Qing Wang, and Erhard
Rahm.
Proceedings of the ACM International Conference on
Knowledge Discovery and Data Mining (KDD'15), Sydney,
August 2015.
Article available online from the
ACM Digital Library.
- Efficient Interactive Training Selection for Large-scale
Entity Resolution
Qing Wang, Dinusha Vatsalan, and Peter Christen.
Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'15), Ho Chi Minh City, Vietnam, May 2015.
Article available online from
Springer.
- Clustering-based Scalable Indexing for Multi-party
Privacy-preserving Record Linkage
Thilina Ranbaduge, Peter Christen, and Dinusha Vatsalan.
Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'15), Ho Chi Minh City, Vietnam, May 2015.
Article available online from
Springer.
Camera-ready paper
(pdf, 591 KB)
- Unsupervised Blocking Key Selection for Real-Time
Entity Resolution
Banda Ramadan and Peter Christen.
Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'15), Ho Chi Minh City, Vietnam, May 2015.
Article available online from
Springer.
Camera-ready paper
(pdf, 393 KB)
- Context-Aware Detection of Sneaky Vandalism on Wikipedia
across Multiple Languages
(Best Student Paper Award)
Khoi-Nguyen Tran and Peter Christen.
Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'15), Ho Chi Minh City, Vietnam, May 2015.
Article available online from
Springer.
- Large-Scale Multi-Party Counting Set Intersection
Using a Space Efficient Global Synopsis
Dimitrios Karapiperis, Dinusha Vatsalan, Vassilios
Verykios, and Peter Christen.
Proceedings of the International
Conference on Database Systems for Advanced Applications
(DASFAA), Hanoi, Vietnam, April 2015.
Article available online from
Springer.
Camera-ready paper
(pdf, 386 KB)
- Cross Language Learning from Bots and Users to detect
Vandalism on Wikipedia
Khoi-Nguyen Tran and Peter Christen.
In IEEE
Transactions on Knowledge and Data Engineering (TKDE),
vol. 27, no 3, March 2015.
Article available online from
IEEE Explore.
2014:
- Uncovering Diffusion in Academic Publications using
Model-Driven and Model-Free Approaches
Minkyoung Kim, David Newth, and Peter Christen.
Proceedings of the IEEE Conference on Social Computing and Networking
(SocialCom 2014), Sydney, December 2014.
- Tree Based Scalable Indexing for Multi-Party
Privacy-Preserving Record Linkage
Thilina Ranbaduge, Peter Christen, and Dinusha Vatsalan.
Proceedings of the Twelfth Australasian Data Mining Conference
(AusDM'14), Brisbane, November 2014.
Paper
(pdf, 754 KB)
- Privacy Aspects in Big Data Integration: Challenges and
Opportunities
Peter Christen.
Invited keynote at the
1st International Workshop on Privacy and Security of
Big Data (PSBD 2014),
held at the
ACM International
Conference on Information and Knowledge Management
(CIKM 2014), Shanghai, November 2014.
Article available online from the
ACM
Digital Library
- Scalable Privacy-Preserving Record Linkage for Multiple
Databases
Dinusha Vatsalan and Peter Christen.
Poster paper at the
ACM International
Conference on Information and Knowledge Management
(CIKM 2014), Shanghai, November 2014.
Article available online from the
ACM
Digital Library
Camera-ready paper
(pdf, 240 KB)
- Forest-Based Dynamic Sorted Neighborhood Indexing
for Real-Time Entity Resolution
Banda Ramadan and Peter Christen.
Poster paper at the
ACM International
Conference on Information and Knowledge Management
(CIKM 2014), Shanghai, November 2014.
Article available online from the
ACM
Digital Library
- Automatic Record Linkage of Individuals and Households in
Historical Census Data
Zhichun Fu, Mac Boot, Peter Christen, and Jun Zhou.
In International Journal of Humanities and Arts Computing,
October 2014.
- Dynamic Sorted Neighborhood Indexing for Real-Time Entity
Resolution
Banda Ramadan, Peter Christen and Huizhi Liang.
Proceedings of the
Australasian
Database Conference (ADC'14), Brisbane, July 2014.
Article available online from
Springer Link.
- An Evaluation Framework for Privacy-Preserving Record
Linkage
Dinusha Vatsalan, Peter Christen, Christine M. O'Keefe, and
Vassilios Verykios.
Journal of Privacy
and Confidentiality (CMU), 2014.
Article available online from the
Journal of Privacy and Confidentiality Web site.
- Challenges for Privacy Preservation in Data Integration
Peter Christen, Dinusha Vatsalan, and Vassilios Verykios.
In ACM Journal of
Data and Information Quality (JDIQ), vol. 5, issue 1-2, 2014.
Article available online from the
ACM Digital Library.
Camera-ready paper
(pdf, 41 KB)
- Preparation of a real temporal voter data set for record
linkage and duplicate detection research
Peter Christen.
Research School of Computer Science, The Australian National
University.
Technical Report, June 2014.
Paper
(pdf, 190 KB)
- A Graph Matching Method for Historical Census Household
Linkage
Zhichun Fu, Peter Christen, and Jun Zhou.
Proceedings of the
Eighteenth
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'14), Tainan, Taiwan, May 2014.
Article available online from
Springer Link.
- Noise-Tolerant Approximate Blocking for Dynamic Real-Time
Entity Resolution
Huizhi Liang, Yanzhe Wang, Peter Christen, and Ross Gayler.
Proceedings of the
Eighteenth
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'14), Tainan, Taiwan, May 2014.
Article available online from
Springer Link.
- Trends of news diffusion in social media based on crowd
phenomena
Minkyoung Kim, David Newth, and Peter Christen.
Workshop on Social News on
the Web (SNOW), held at the
World Wide Web conference (WWW'14),
Seoul, April 2014.
Article available online from the
ACM Digital
Library.
- Macro-level information transfer across social networks
Minkyoung Kim, David Newth, and Peter Christen.
World Wide Web conference (WWW'14),
Seoul, April 2014.
Article available online from the
ACM Digital
Library.
- Sensor discovery and configuration framework for the Internet
of Things paradigm.
Charith Perera, Prem Prakash Jayaraman, Arkady Zaslavsky, Dimitrios
Georgakopoulos, and Peter Christen
IEEE World Forum on Internet of Things (WF-IoT), Seoul, March
2014.
Available online
- Advanced record linkage methods and privacy aspects for
population reconstruction
Peter Christen
Keynote paper at the workshop
Population Reconstruction, Amsterdam, February 2014.
Article available online from the
workshop programme as
PDF document (105 KB).
- Context-aware Dynamic Discovery and Configuration of `Things'
in Smart Environments
Charith Perera, Prem Jayaraman, Arkady Zaslavsky, Peter Christen,
and Dimitrios Georgakopoulos.
Chapter in the book
Big Data and Internet of Things: A Roadmap for Smart
Environments, Studies in Computational Intelligence.
Springer Berlin Heidelberg, 2014,
- Sensor Search Techniques for Sensing as a Service Architecture
for The Internet of Things
Charith Perera, Arkady Zaslavsky, Chi Harold Liu, Michael Compton,
Peter Christen, and Dimitrios Georgakopoulos.
In
IEEE Sensors Journal, 2014.
- Sensing as a Service Model for Smart Cities Supported by Internet
of Things
Charith Perera, Arkady Zaslavsky, Peter Christen, and Dimitrios
Georgakopoulos.
In Transactions on Emerging Telecommunications Technologies,
2014.
- MOSDEN: An Internet of Things Middleware for Resource
Constrained Mobile Devices
Charith Perera, Prem Prakash Jayaraman, Arkady Zaslavsky, Peter
Christen, and Dimitrios Georgakopoulos.
Proceedings of the
47th Hawaii International Conference on System Sciences
(HICSS), Kona, Hawaii, January, 2014.
2013:
- Data Cleaning and Matching of Institutions in
Bibliographic Databases
Jeffrey Fisher, Qing Wang, Paul Wong and Peter Christen.
Proceedings of the Eleventh Australasian Data Mining Conference
(AusDM'13), Canberra, November 2013.
Paper
(pdf, 171 KB)
- Efficient two-party private blocking based on sorted
nearest neighborhood clustering
Dinusha Vatsalan, Peter Christen, and Vassilios Verykios.
Proceedings of the
ACM International Conference on
Information and Knowledge Management (CIKM 2013),
San Francisco, October 2013.
Article available online from the
ACM
Digital Library
- GeCo: an online personal data generator and corruptor
Khoi-Nguyen Tran, Dinusha Vatsalan, and Peter Christen.
Proceedings of the
ACM International Conference on
Information and Knowledge Management (CIKM 2013),
San Francisco, October 2013.
Article available online from the
ACM
Digital Library
- Flexible and extensible generation and corruption of
personal data
Peter Christen and Dinusha Vatsalan.
Proceedings of the
ACM International Conference on
Information and Knowledge Management (CIKM 2013),
San Francisco, October 2013.
Article available online from the
ACM
Digital Library
Camera-ready paper
(pdf, 100 KB)
- Modeling dynamics of meta-populations with a probabilistic
approach: global diffusion in social media
Minkyoung Kim, David Newth, and Peter Christen.
Proceedings of the
ACM International Conference on
Information and Knowledge Management (CIKM 2013),
San Francisco, October 2013.
Article available online from the
ACM
Digital Library
- Identifying multilingual Wikipedia articles based on
cross language similarity and activity
Khoi-Nguyen Tran and Peter Christen.
Proceedings of the
ACM International Conference on
Information and Knowledge Management (CIKM 2013),
San Francisco, October 2013.
Article available online from the
ACM
Digital Library
- Social affinity filtering: recommendation through
fine-grained analysis of user interactions and activities
Suvash Sedhain, Scott Sanner, Lexing Xie, Riley Kidd,
Khoi-Nguyen Tran, and Peter Christen.
Proceedings of the
Conference on Online
Social Networks (COSN 2013), Boston, October 2013.
Article available online from the
ACM
Digital Library
- Semantic-driven Configuration of Internet of Things
Middleware
Charith Perera, Arkady Zaslavsky, Michael Compton, Peter
Christen, and Dimitrios Georgakopoulos.
Proceedings of the
International Conference on Semantics, Knowledge and Grids
(SKG 2013), Beijing, October 2013.
Article available online from
arXiv.org
- Modeling Dynamics of Diffusion Across Heterogeneous Social
Networks: News Diffusion in Social Media
Minkyoung Kim, David Newth, and Peter Christen.
In
Entropy,
volume 15, number 10, October 2013, Pages 4215-4242.
Article available online at
http://dx.doi.org/10.3390/e15104215
- Context Aware Sensor Configuration Model for Internet of
Things
Charith Perera, Arkady Zaslavsky, Michael Compton, Peter
Christen, and Dimitrios Georgakopoulos.
Proceedings of the
International
Semantic Web Conference (ISWC), Posters and Demos,
Sydney, Australia, October 2013.
Article available online at
arXiv.org
- Privacy-preserving record linkage
Vassilios Verykios and Peter Christen.
In Wiley Interdisciplinary Reviews: Data Mining and
Knowledge Discovery, volume 3, issue 5, pages 321- 332,
September/October 2013.
Article available online at
Wiley Online Library
- A Taxonomy of Privacy-Preserving Record Linkage Techniques
Dinusha Vatsalan, Peter Christen, and Vassilios Verykios.
In
Information Systems (Elsevier), volume 38, issue 6,
pages 946-969, September 2013.
Article available online at
http://dx.doi.org/10.1016/j.is.2012.11.005.
- Modeling direct and indirect influence across heterogeneous
social networks
Minkyoung Kim, David Newth, and Peter Christen.
Proceedings of the
Workshop on Social Network Mining and Analysis (SNAKDD 2013),
held at ACM SIGKDD 2013,
Chicago, August 2013.
Article available online from the
ACM
Digital Library
- Context-aware Sensor Search, Selection and Ranking Model
for Internet of Things Middleware
Charith Perera, Arkady Zaslavsky, Peter Christen, Michael
Compton, and Dimitrios Georgakopoulos.
Proceedings of the
International
Conference on Mobile Data Management (MDM 2013),
Milan, Italy, June 2013.
Article available online from
IEEE Explore
- Dynamic Configuration of Sensors Using Mobile Sensor Hub
in Internet of Things Paradigm
Charith Perera, Prem Jayaraman, Arkady Zaslavsky, Peter Christen,
and Dimitrios Georgakopoulos.
Proceedings of the
International
Conference on Intelligent Sensors, Sensor Networks and
Information Processing (ISSNIP 2013), Melbourne, April 2013.
Article available online from
IEEE Explore
- Adaptive Temporal Entity Resolution on Dynamic Databases
Peter Christen and Ross Gayler.
Proceedings of the
Seventeenth
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'13), Gold Coast, Australia, April 2013.
Article available online from
Springer Link.
- Sorted Nearest Neighborhood Clustering for Efficient Private
Blocking
Dinusha Vatsalan and Peter Christen.
Proceedings of the
Seventeenth
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'13), Gold Coast, Australia, April 2013.
Article available online from
Springer Link.
- Cross Language Prediction of Vandalism on Wikipedia using
Article Views and Revisions
Khoi-Nguyen Tran and Peter Christen.
Proceedings of the
Seventeenth
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'13), Gold Coast, Australia, April 2013.
Article available online from
Springer Link.
- Dynamic Similarity-Aware Inverted Indexing for Real-Time Entity
Resolution
Banda Ramadan, Peter Christen, Huizhi Liang, Ross Gayler, and David
Hawking.
In proceedings of the
International Workshop
on Data Mining Applications in Industry and Government
(DMApps 2013),
held at the Seventeenth Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'13), Gold Coast, Australia, April 2013.
Paper
(pdf, 272 KB)
- Predicting High Impact Academic Papers using Citation Network
Features
Daniel McNamara, Paul Wong, Peter Christen and Kee Siong Ng.
In proceedings of the
International Workshop
on Data Mining Applications in Industry and Government
(DMApps 2013),
held at the Seventeenth Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'13), Gold Coast, Australia, April 2013.
Paper
(pdf, 328 KB)
- Context Aware Computing for The Internet of Things: A
Survey
Charith Perera, Arkady Zaslavsky, Peter Christen, and Dimitrios
Georgakopoulos.
In
IEEE Communications Surveys and Tutorials, 2013.
Article available online from
IEEE Explore
2012:
- A Bag Reconstruction Method for Multiple
Instance Classification and Group Record Linkage
Zhichun Fu, Jun Zhou, Furong Peng, and Peter Christen.
Proceedings of the
Eighth International
Conference on Advanced Data Mining and Applications
(ADMA'12), Nanjing, China, December 2012.
Article available online from
Springer Link.
- An Iterative Two-Party Protocol for Scalable
Privacy-Preserving Record Linkage
Dinusha Vatsalan and Peter Christen.
Proceedings of the Tenth Australasian Data Mining Conference
(AusDM'12), Sydney, December 2012.
Paper,
(pdf, 279 KB)
- CA4IOT: Context Awareness for Internet of Things
Charith Perera, Arkady Zaslavsky, Peter Christen, and Dimitrios
Georgakopoulos,
Proceedings of the
IEEE International
Conference on Green Computing and Communications, Conference
on Internet of Things, and Conference on Cyber, Physical and
Social Computing (GreenCom/iThings/CPSCom'12), Besancon,
France, November 2012.
- Time-aware Topic Recommendation Based on Micro-blogs
Huizhi Liang, Yue Xu, Dian Tjondronegoro, and Peter Christen.
Proceedings of the
ACM Conference on Information
and Knowledge Management (CIKM'12), Hawaii, October 2012
- Capturing Sensor Data from Mobile Phones using Global
Sensor Network Middleware
Charith Perera, Arkady Zaslavsky, Peter Christen, Ali Salehi,
and Dimitrios Georgakopoulos.
Proceedings of the
IEEE International
Workshop on Internet-of-Things Communications and Networking
(IoT-CN),
held at the 23rd IEEE International Symposium on
Personal, Indoor and Mobile Radio Communications (PIMRC),
Sydney, September 2012.
- A Survey of Indexing Techniques for Scalable Record
Linkage and Deduplication
Peter Christen
In
IEEE
Transactions on Knowledge and Data Engineering (TKDE),
vol. 12, no. 9, September 2012.
Article available online from
Computer.org digital library.
- Data Matching -
Concepts and Techniques for Record Linkage, Entity Resolution,
and Duplicate Detection
Peter Christen.
Springer, Data-Centric Systems and Applications, August 2012.
Preface, table of
contents, and references are available for download.
- Event Diffusion Patterns in Social Media
Minkyoung Kim, Lexing Xie, and Peter Christen.
International
AAAI Conference on Weblogs and Social Media,
Dublin, June 2012.
Paper (pdf, 1.3 MB)
- Multiple Instance Learning for Group Record Linkage
Zhichun Fu, Jun Zhou, Peter Christen and Mac Boot.
Proceedings of the
Sixteenth
Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD'12), Kuala Lumpur, May-June 2012.
Article available online from
Springer Link.
- Connecting Mobile Things to Global Sensor Network
Middleware using System-generated Wrappers
Charith Perera, Arkady Zaslavsky, Peter Christen, Ali Salehi,
and Dimitrios Georgakopoulos.
Proceedings of the ACM International Workshop on Data
Engineering for Wireless and Mobile Access (MobiDE),
ACM Special Interest Group on Management of Data and
Principles of Database Systems (SIGMOD/PODS),
Scottsdale, Arizona, May 2012.
Paper available online from
ACM Digital Library.
- New Objective Functions for Social Collaborative Filtering.
Joseph Noel, Scott Sanner, Khoi-Nguyen Tran, Peter Christen,
Lexing Xie, Edwin Bonilla, Ehsan Abbasnejad, and
Nicolas Della Penna.
World Wide Web
conference (WWW'12), Lyon, April 2012.
Paper available online from
WWW'12 Proceedings.
2011:
- Automatic Cleaning and Linking of Historical Census Data
using Household Information
Zhichun Fu, Peter Christen and Mac Boot.
Proceedings of the
Fifth
International Workshop on Domain Driven Data Mining
(DDDM'11), held at
IEEE ICDM,
Vancouver, December 2011.
- Proceedings of the Ninth Australasian Data Mining
Conference (AusDM'11)
Peter Vamplew, Andrew Stranieri, Kok-Leong Ong, Peter Christen
and Paul Kennedy (editors).
Proceedings of the
Ninth Australasian Data Mining Conference,
Ballarat, December 2011.
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 121.
- An Efficient Two-Party Protocol for Approximate Matching
in Private Record Linkage
Dinusha Vatsalan, Peter Christen and Vassilios Verykios.
Proceedings of the Ninth Australasian Data Mining Conference
(AusDM'11), Ballarat, December 2011.
Paper (pdf, 880 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 121.
- A Supervised Learning and Group Linking Method
for Historical Census Household Linkage
Zhichun Fu, Peter Christen and Mac Boot.
Proceedings of the Ninth Australasian Data Mining Conference
(AusDM'11), Ballarat, December 2011.
Paper (pdf, 860 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 121.
- Fake Injection Strategies for Private Phonetic Matching
Alexandros Karakasidis, Vassilios Verykios and Peter Christen.
Proceedings of the
International Workshop on
Data Privacy Management (DPM2011), Leuven, Belgium, September 2011.
- Analysis of Cluster Migrations using Self-Organizing Maps
Denny, Peter Christen and Graham Williams.
Proceedings of the
International Workshop on Behavior Informatics (BI2011), 15th
Pacific-Asia Conference on Knowledge Discovery and Data Mining
(PAKDD2011), Shenzhen, China, May 2011.
- Robust Record Linkage Blocking using Suffix Arrays and Bloom
Filters
Timothy de Vries, Hui Ke, Sanjay Chawla and Peter Christen.
In ACM Transactions
on Knowledge Discovery from Data, vol. 2, no. 5, February 2011.
Available online.
2010:
- Visualizing Temporal Cluster Changes using Relative Density
Self-Organizing Maps
Denny, Graham Williams and Peter Christen
In Knowledge and Information Systems Springer, vol. 25, no. 2,
November 2010.
Paper
available online.
-
New Frontiers in Applied Data Mining
T. Theeramunkong, C. Nattee, P.J.L. Adeodato, N. Chawla; Peter Christen,
P. Lenca, J. and G Williams (editors).
Revised Selected Papers from the Pacific-Asia Conference on Knowledge
Discovery and Data Mining (PAKDD) workshops, Bangkok, Thailand,
April 2009.
2009:
- Data Mining and Analytics 2009
Paul Kennedy, Kok-Leong Ong and Peter Christen (editors).
Proceedings of the
Seventh Australasian Data Mining Conference
(AusDM 2009), Melbourne, December 2009.
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 101.
- Robust Record Linkage Blocking using Suffix Arrays
Timothy de Vries, Hui Ke, Sanjay Chawla and Peter Christen.
Proceedings of the
ACM Conference on Information and Knowledge
Management (CIKM), Hong Kong, November 2009.
Paper
available online.
- Similarity-Aware Indexing for Real-Time Entity
Resolution
Peter Christen, Ross Gayler and David Hawking.
Proceedings of the
ACM Conference on Information and Knowledge
Management (CIKM), Hong Kong, November 2009.
Paper
available online.
The full paper (10 pages) is published as an
ANU Computer Science
technical report.
Report (pdf, 273 KB)
Report (ps.gz, 285 KB)
- Development and User Experiences of an Open Source
Data Cleaning, Deduplication and Record Linkage System
Peter Christen
In SIGKDD Explorations, Volume 11, Issue 1,
July 2009.
Available online:
Paper (pdf, 778 KB)
- Accurate Synthetic Generation of Realistic Personal
Information
Peter Christen and Agus Pudjijono
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery
and Data Mining (PAKDD), Bangkok, Thailand, April 2009.
Paper available online.
Submitted paper
(12 pages, pdf, 645 KB)
- Geocode Matching and Privacy Preservation
Peter Christen
Invited Presentation at the
PinKDD 2008 workshop held at the
ACM SIGKDD 2008 conference, Las Vegas,
August 2008.
In Revised, Selected Papers, F. Bonchi, E. Ferrari,
W. Jiang and B. Malin (editors).
Springer Lecture Notes in Computer Science (LNCS), vol. 5456, 2009.
Paper available online.
2008:
- Visualization of Temporal Changes in Cluster Structures
using Self-Organizing Maps
Denny, Graham Williams, and Peter Christen
In proceedings of the
IEEE
International Conference on Data Mining (ICDM), Pisa, Italy,
December 2008.
Please contact
Denny if you are interested in this paper.
- Data Mining and Analytics 2008
John Roddick, Jiuyong Li, Peter Christen and Paul Kennedy
(editors).
Proceedings of the
Seventh Australasian Data Mining Conference
(AusDM 2008), Glenelg, Adelaide, November
2008.
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 87.
- Towards Scalable Real-Time Entity Resolution using a
Similarity-Aware Inverted Index Approach
Peter Christen and Ross Gayler
In proceedings of the Seventh Australasian Data Mining
Conference (AusDM 2008), Glenelg, Adelaide, November
2008.
Paper
(pdf, 218 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 87.
- Automatic Record Linkage using Seeded Nearest Neighbour
and Support Vector Machine Classification
Peter Christen
Proceedings of the ACM SIGKDD 2008 conference, Las Vegas,
August 2008.
Paper available online.
- Febrl - An Open Source Data Cleaning, Deduplication and
Record Linkage System with a Graphical User Interface
Peter Christen
Proceedings of the
demo session
at the ACM SIGKDD 2008
conference, Las Vegas, August 2008.
Paper available online.
- Automatic Training Example Selection for Scalable
Unsupervised Record Linkage
Peter Christen
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery
and Data Mining (PAKDD), Osaka, Japan, May 2008.
Paper available online.
Submitted paper
(12 pages, pdf, 146 KB)
Submitted
paper (12 pages, ps.gz, 142 KB)
- Exploratory Hot Spot Profile Analysis using Interactive
Visual Drill-Down Self-Organizing Maps
Denny, Graham J. Williams and Peter Christen.
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery
and Data Mining (PAKDD), Osaka, Japan, May 2008.
Paper available online.
- Febrl - A Freely Available Record Linkage System
with a Graphical User Interface
Peter Christen
Proceedings of the
Australasian Workshop on Health Data and
Knowledge Management (HDKM), Wollongong, January 2008.
Paper
(pdf, 748 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 80.
2007:
- Data Mining and Analytics 2007
Peter Christen, Paul J. Kennedy, Jiuyong Li, Inna Kolyshkina
and Graham J. Williams (editors).
Proceedings of the
Sixth Australasian Data Mining Conference
(AusDM 2007), Gold Coast, Australia, December 2007.
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 70.
- A Two-Step Classification Approach to Unsupervised Record
Linkage
Peter Christen
In proceedings of the Sixth Australasian Data Mining Conference
(AusDM 2007), Gold Coast, December 2007.
Paper
(pdf, 440 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 70.
- Exploratory Multilevel Hot Spot Analysis: Australian
Taxation Office Case Study
Denny, Graham J. Williams, and Peter Christen
In proceedings of the Sixth Australasian Data Mining Conference
(AusDM 2007), Gold Coast, December 2007.
Paper
(pdf, 759 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 70.
- Evaluation of a Graduate Level Data Mining Course
with Industry Participants
Peter Christen
In proceedings of the Sixth Australasian Data Mining Conference
(AusDM 2007), Gold Coast, December 2007.
Paper
(pdf, 436 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 70.
- Towards parameter-free blocking for scalable record
linkage
Peter Christen
Technical Report TR-CS-07-03
ANU Joint Computer Science Technical Report
Series, August 2007.
Report
(pdf, 201 KB)
Report
(ps.gz, 199 KB)
- Quality and Complexity Measures for Data Linkage and
Deduplication
Peter Christen and Karl Goiser
Chapter in the book
Quality
Measures in Data Mining, vol. 43,
Studies in Computational Intelligence.
F. Guillet and H. Hamilton (eds), Springer, March 2007.
Available online at
SpringerLink.
2006:
- Privacy-Preserving Data Linkage and Geocoding: Current
Approaches and Research Directions
Peter Christen
In proceedings of the Workshop on Privacy Aspects of Data Mining (PADM)
held at the IEEE International Conference on Data
Mining (ICDM), Hong Kong, December 2006.
Final 5-page version:
Paper
(pdf, 53 KB)
Paper
(ps.gz, 35 KB)
Submitted 11-page version:
Paper
(pdf, 118 KB)
Paper
(ps.gz, 74 KB)
- A Comparison of Personal Name Matching: Techniques and
Practical Issues
Peter Christen
In proceedings of the Workshop on Mining Complex Data (MCD)
held at the IEEE International Conference on Data
Mining (ICDM), Hong Kong, December 2006.
Final 5-page version:
Paper
(pdf, 57 KB)
Paper
(ps.gz, 40 KB)
Submitted 12-page version available as:
Technical Report TR-CS-06-02
ANU Joint Computer Science Technical Report
Series, September 2006.
Report
(pdf, 248 KB)
Report
(ps.gz, 236 KB)
- Dynamic Algorithm Selection Using Reinforcement Learning
Warren Armstrong, Peter Christen, Eric McCreath and Alistair
Rendell
Proceedings of the
Workshop on Integrating AI and Data Mining,
Hobart, Australia, December 2006.
Paper
(pdf, 254 KB)
- Data Mining and Analytics 2006
Peter Christen, Paul J. Kennedy, Jiuyong Li, Simeon J. Simoff
and Graham J. Williams (editors).
Proceedings of the Fifth Australasian Data Mining Conference
(AusDM 2006), Sydney, November, 2006.
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 61.
- Towards Automated Record Linkage
Karl Goiser and Peter Christen
In proceedings of the Fifth Australasian Data Mining Conference
(AusDM 2006), Sydney, November 2006.
Paper
(pdf, 513 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 61.
- Secure Health Data Linkage and Geocoding: Current
Approaches and Research Directions
Peter Christen and Tim Churches
Proceedings of the
National e-Health Privacy and Security
Symposium (ehPASS), Brisbane, October 2006.
Paper
(pdf, 139 KB)
Paper
(ps.gz, 127 KB)
- Automated Geocoding of Routinely Collected Health Data
in New South Wales
Richard Summerhayes, Paul Holder, John Beard, Peter
Christen, Alan Willmore and Tim Churches
The NSW Public Health Bulletin,
volume 17, number 3-4, March-April 2006.
Online version available
here.
- A Probabilistic Geocoding System Utilising a Parcel Based
Address File
Peter Christen, Alan Willmore and Tim Churches
In Advances in Data Mining: Theory, Methodology,
Techniques, and Applications. Simeon Simoff and Graham
Williams (editors). State-of-the-Art Lecture Notes in
Artificial Intelligence, Volume 3755, Springer-Verlag,
2006.
Available online at
SpingerLink, LNCS 3755.
Copyright for this publication is held by the Springer
Verlag.
2005:
- Automated Probabilistic Address Standardisation and
Verification
Peter Christen and Daniel Belacic
Proceedings of the
fourth Australasian
Data Mining Conference (AusDM 2005), Sydney, December 2005.
Paper
(pdf, 146 KB)
Paper
(ps.gz, 204 KB)
- Assessing Deduplication and Data Linkage Quality: What to
Measure?
Peter Christen and Karl Goiser
Proceedings of the
fourth Australasian
Data Mining Conference (AusDM 2005), Sydney, December 2005.
Paper
(pdf, 178 KB)
Paper
(ps.gz, 163 KB)
- Probabilistic Data Generation for Deduplication and
Data Linkage
Peter Christen
Proceedings of the
Sixth
International Conference on Intelligent Data Engineering
and Automated Learning (IDEAL'05), Brisbane, July 2005.
Copyright for this publication is held by the Springer Verlag.
Available online at
SpingerLink,
LNCS 3578.
Paper
(pdf, 124 KB)
Paper
(ps.gz, 135 KB)
- Febrl - Freely extensible biomedical record linkage
(Manual, release 0.3)
Peter Christen and Tim Churches
Available online from
SourceForge.Net, April
2005.
Manual
(pdf, 960 KB)
Manual
(pdf, 282 KB)
- A Probabilistic Deduplication, Record Linkage
and Geocoding System
Peter Christen and Tim Churches
Proceedings of the
ARC Health Data Mining workshop,
University of South Australia, April 2005.
Paper
(pdf, 136 KB)
Paper
(ps.gz, 134 KB)
2004:
- A Probabilistic Geocoding System based on
a National Address File
Peter Christen, Tim Churches and Alan Willmore
Proceedings of the
Australasian Data Mining Conference,
Cairns, December 2004.
Paper
(pdf, 120 KB)
Paper
(ps.gz, 128 KB)
- Some Methods for Blindfolded Record Linkage
Tim Churches and Peter Christen
Published online at
BioMed Central
Medical Informatics and Decision Making,
June 2004.
For abstract and downloadable PDF file see
here.
- Febrl - A Parallel Open Source Data Linkage System
Peter Christen, Tim Churches and Markus Hegland
Proceedings of the 8th
PAKDD'04
(Pacific-Asia Conference on Knowledge Discovery and Data
Mining), Sydney, May 2004.
Springer Lecture Notes in Artificial Intelligence,
(3056), available online at
Springerlink.
Copyright for this publication is held by the Springer Verlag.
Paper
(pdf, 202 KB)
Paper
(ps.gz, 81 KB)
- Blind Data Linkage using n-gram Similarity
Comparisons
Tim Churches and Peter Christen
Proceedings of the 8th
PAKDD'04
(Pacific-Asia Conference on Knowledge Discovery and Data
Mining), Sydney, May 2004.
Springer Lecture Notes in Artificial Intelligence,
(3056), available online
here.
Copyright for this publication is held by the Springer Verlag.
Paper
(pdf, 176 KB)
Paper
(ps.gz, 68 KB)
2003:
- A Comparison of Fast Blocking Methods for Record
Linkage
Rohan Baxter, Peter Christen and Tim Churches
Proceedings of the Workshop on Data Cleaning, Record
Linkage and Object Consolidation at the
Ninth ACM
SIGKDD International Conference on Knowledge Discovery
and Data Mining, Washington DC, August 2003.
Paper 3 pages
(pdf, 87 KB)
Paper 6 pages
(pdf, 138 KB)
2002:
- Preparation of name and address data for record linkage
using hidden Markov models
Tim Churches, Peter Christen, Kim Lim and Justin X Zhu
Published online at BioMed Central
Medical Informatics and Decision Making,
December 2002.
For abstract and downloadable PDF file see
here.
Also available locally:
Paper
(pdf, 353 KB)
- Probabilistic Name and Address Cleaning and
Standardisation
Peter Christen, Tim Churches and Justin Xi Zhu
Proceedings of the
Australasian
Data Mining Workshop, Canberra, December 2002.
Paper
(ps.gz, 74 KB)
Paper
(pdf, 158 KB)
- How Fast is '-fast'? Performance Analysis of KDD
Applications using Hardware Performance Counters on
UltraSPARC-III
Adam Czezowski and Peter Christen
Proceedings of the
Australasian
Data Mining Workshop, Canberra, December 2002.
Paper
(ps.gz, 82 KB)
Paper
(pdf, 174 KB)
- High-Performance Computing Techniques for
Record Linkage
Peter Christen, Justin Xi Zhu, Markus Hegland, Stephen Roberts,
Ole M. Nielsen, Tim Churches and Kim Lim
Proceedings of the Australian Health Outcomes
Conference (AHOC-2002), Canberra, July 2002.
Paper
(ps.gz, 95 KB)
Paper
(pdf, 233 KB)
- Parallel Computing Techniques for
High-Performance Probabilistic Record Linkage
Peter Christen, Markus Hegland, Stephen Roberts,
Ole M. Nielsen, Tim Churches and Kim Lim
Proceedings of the Symposium on Health Data
Linkage, Sydney, March 2002.
Paper
(ps.gz, 107 KB)
Paper
(pdf, 228 KB)
- Performance Analysis of KDD Applications using
Hardware Event Counters
Peter Christen and Adam Czezowski
Technical Report TR-CS-02-01, ANU Joint Computer
Science Technical Report Series, February 2002.
Report
(ps.gz, 131 KB)
Report
(pdf, 238 KB)
2001:
- DMtools - Open Source Software for Database Mining
Peter Christen, Ole M. Nielsen and Markus Hegland
In proceedings of the
Workshop on Database Support for KDD (at the
PKDD'2001
Conference), Freiburg, Germany, September 2001.
Paper (ps.gz, 81 KB)
- Parallel Data Mining on a Beowulf Cluster
Peter Christen, Ole M. Nielsen, Markus Hegland and
Peter E. Strazdins
Proceedings of the HPC Asia 2001
Conference, Gold Coast, Queensland, Australia,
September 2001.
Paper (ps.gz, 264 KB)
Paper (pdf.gz, 311 KB)
- A Scalable Parallel FEM Surface Fitting Algorithm for Data
Mining
Peter Christen, Markus Hegland, Stephen Roberts, Ole M.
Nielsen and Irfan Altas
Proceedings of the International Workshop on Mining Spatial
and Temporal Data
(at the
PAKDD-2001
Conference), Hong Kong, April 2001.
Paper (ps.gz, 229 KB)
- A Toolbox Approach to Flexible and Efficient
Data Mining
Ole M. Nielsen, Peter Christen, Markus Hegland,
Tatiana Semenova and Timothy Hancock
Proceedings of the
PAKDD-2001
Conference, Hong Kong, April 2001.
Published in the
Springer
Lecture Notes in Computer Science, Artificial
Intelligence series, LNAI2035.
Copyright for this publication is held by the Springer
Verlag.
Paper (ps.gz, 143 KB)
Paper (pdf, 183 KB)
- Towards a Parallel Data Mining Toolbox
Peter Christen, Markus Hegland, Ole M. Nielsen, Stephen
Roberts, Peter E. Strazdins, Irfan Altas, Tatiana Semenova and
Timothy Hancock
Proceedings of the 15th International Parallel and Distributed
Processing Symposium (IPDPS-2001), San Francisco,
April 2001.
Workshop Parallel
and Distributed Data Mining.
Copyright 2001 Institute of Electrical and Electronic
Engineers (IEEE). Reprinted for the Proceedings of the
IPDPS-2001.
Paper (ps.gz, 139 KB)
- Data Mining with Python
Ole M. Nielsen, Peter Christen, Markus Hegland and Tatiana
Semenova
Proceedings of the
9th International Python
Conference, Long Beach, California, March 2001.
Paper available upon request from:
Ole Nielsen.
- A Scalable Parallel FEM Surface Fitting Algorithm for Data
Mining
Peter Christen, Markus Hegland, Stephen Roberts and Irfan Altas
Technical Report TR-CS-01-01, ANU Joint Computer Science
Technical Report Series, January 2001.
Report (ps.gz, 255 KB)
Report (pdf, 377 KB)
2000:
- Scalable Parallel Algorithms for Surface Fitting and Data
Mining
Peter Christen, Markus Hegland, Ole M. Nielsen, Stephen
Roberts, Peter E. Strazdins and Irfan Altas
In Elsevier Journal of
Parallel Computing,
special issue on Aspects of Parallel Computing for Linear
Systems and Associated Problems, September 2000.
- Data Mining of Administrative Claims Data of Pathology
Services
Simon Hawkins, Graham Williams, Rohan Baxter, Peter Christen,
Michael Fett, Markus Hegland, Fuchun Huang, Ole Nielsen, Tatiana
Semenova and Andrew Smith
Proceedings of the Thirty-Fourth Hawaii International Conference on
System Sciences (HICSS-34), January 2001.
Available upon request from:
Rohan Baxter,
CSIRO CMIS.
- Scalable Parallel Algorithms for Predictive Modelling
Peter Christen, Markus Hegland, Ole Møller Nielsen,
Stephen Roberts and Irfan Altas
Proceedings of the Data Mining 2000 Conference, Cambridge, UK,
N. Ebecken and C.A. Brebbia, editors, in Data Mining II,
WIT Press, Southhampton Boston, 2000.
Paper
(ps.gz, 606 KB)
1999:
- The Integrated Delivery of Large-Scale Data Mining:
The ACSYS Data Mining Project
Graham Williams, Irfan Altas, Sergey Barkin, Peter Christen,
Markus Hegland, Alonso Marquez, Peter Milne, Rajehndra Nagappan
and Stephen Roberts
KDD-99 Workshop on Large-Scale Parallel KDD Systems. San Diego,
August 1999,
Springer Lecture Notes in Artificial Intelligence 1759.
- Parallelization of a Finite Element Surface Fitting Algorithm
for Data Mining
Peter Christen, Irfan Altas, Markus Hegland, Stephen Roberts,
Kevin Burrage and Roger Sidje.
Proceedings of the CTAC-99 Conference. Canberra, 20-24 September
1999.
Paper
(ps.gz, 552 KB)
Slides
(ps.gz, 614 KB)
- A Parallel Iterative Linear System Solver
with Dynamic Load Balancing
Peter Christen
Proceedings of the CTAC-99 Conference. Canberra, 20-24 September
1999.
Paper
(ps.gz, 467 KB)
Slides (ps.gz, 218 KB)
- A Parallel Finite Element Surface Fitting Algorithm for
Data Mining
Peter Christen, Irfan Altas, Markus Hegland, Stephen Roberts,
Kevin Burrage and Roger Sidje
Proceedings of the ParCo-99 Conference, Delft,
17-20 August 1999.
- A Parallel Iterative Linear System Solver with
Dynamic Load Balancing
Peter Christen
Dissertation (PhD thesis), Institut für Informatik,
Universität Basel. February 1999.
Available upon
request.
1998:
- PAISS - Design and Implementation of a Parallel
Iterative Linear System Solver with Dynamic Load
Balancing
Peter Christen
Technischer Bericht 98-5, October 1998.
Report (ps.gz, 193 KB)
- A Parallel Iterative Linear System Solver with
Dynamic Load Balancing
Peter Christen
Proceedings of the ACM International Conference of
Supercomputing (ICS) 1998. Melbourne, 13-17 July 1998.
- Dynamic Load Balancing within a Parallel Iterative
Linear System Solver
Peter Christen.
Proceedings of the High-Performance Computing and Networking
(HPCN) Conference 1998. Amsterdam, 21-23 April 1998,
Springer Lecture Notes in Computer Science 1401.
1996:
- Speicher-Schemata für spärlich besetzte
Matrizen (German)
Peter Christen
Institut für Informatik, Universität Basel.
Technischer Bericht 96-4, September 1996.
Report (ps.gz, 203 KB)
1995:
- Test- und Diagnosesoftware für Alpha7
(German)
Peter Christen
Diplomarbeit (MS thesis), Institut für Elektronik,
ETH Zürich. Prof.Dr. A. Gunzinger, July 1995.
Selected Presentations:
2023:
2022:
2021:
2020:
2019:
- Robust temporal graph clustering and cluster evaluation
measure for group record linkage
Charini Nanayakkara, Peter Christen, and Thilina Ranbaduge.
Invited presentation at the
Spanish National Research
Council (CSIC),
Department
of Population, Seville, Spain, July 2019.
Slides (pdf, 1.7 MBytes)
- Record Linkage: Introduction, Recent Advances, and Privacy
Issues
Peter Christen.
Invited tutorial at the
Spanish National Research
Council (CSIC),
Department
of Population, Seville, Spain, July 2019.
Slides (pdf, 15 MBytes)
- Attack methods on privacy-preserving record linkage
Peter Christen, Rainer Schnell, Dinusha Vatsalan, Thilina
Ranbaduge, and Anushka Vidanage
Invited presentation at the
Data61 Privacy Preserving Record Linkage Workshop,
Sydney, February 2019.
Slides (pdf, 530 KBytes)
2018:
2016:
- Privacy-preserving Record Linkage
Peter Christen
Invited presentation at the
International ScaDS Summer School on Big Data,
Leipzig, Germany, July 2016.
Slides (pdf, 1.7 MBytes)
- Recent developments and research challenges in data linkage
Peter Christen
Invited presentation at the opening workshop
Data Linkage and
Anonymisation: Setting the Agenda at the
Isaac Newton Institute,
Cambridge, UK, July 2016.
Slides (pdf, 2 MBytes)
- Data Linkage - Introduction, Recent Advances, and Privacy Issues
Peter Christen
Invited tutorial at the opening workshop
Data Linkage and
Anonymisation: Setting the Agenda at the
Isaac Newton Institute,
Cambridge, UK, July 2016.
Slides (pdf, 15 MBytes)
- A Tutorial on Population Informatics using Big Data
Peter Christen, Hye-Chung Kum, Qing Wang, and Dinusha Vatsalan
Tutorial at the
Pacific
Asia Conference on Knowledge Discovery and Data Mining (PAKDD),
Auckland, April 2016.
Slides (pdf, 15 MBytes)
2015:
2014:
- Privacy Aspects in Big Data Integration: Challenges and
Opportunities
Peter Christen.
Invited keynote at the
1st International Workshop on Privacy and Security of
Big Data (PSBD 2014),
held at the
ACM International
Conference on Information and Knowledge Management
(CIKM 2014), Shanghai, November 2014.
Slides (pdf, 954 KBytes)
- Advanced record linkage methods and privacy aspects for
population reconstruction
Peter Christen
Keynote paper at the workshop
Population Reconstruction, Amsterdam, February 2014.
Slides (pdf, 1.6 MBytes)
- Data Matching Research at the Australian National
University
Peter Christen
Departmental seminar presentations at
Database group at
the University of Leipzig (Germany),
Hasso-Plattner-Institute, University of Potsdam
(Germany),
School of Computer
Science at the University of St Andrews (Scotland), and
Department of
Computer and Information Sciences at the University of
Strathclyde (Scotland), all in February 2014.
Slides (pdf, 3.6 MBytes)
2013:
- Techniques for Scalable Privacy-preserving Record
Linkage
Peter Christen, Vassilios Verykios, and Dinusha Vatsalan
Tutorial at the
22nd ACM International
Conference on Information and Knowledge Management
(CIKM 2013), San Francisco, October 2013.
Slides (pdf, 2.6 MBytes)
- Overview and taxonomy of techniques for privacy-preserving
record linkage
Peter Christen
Invited presentation at the
Joint Statistical Meetings (JSM 2013),
Montreal, August 2013.
Slides (pdf,
718 KBytes)
Extended abstract,
(pdf, 122 KBytes)
2012:
2011:
- Privacy-Preserving Data Matching
Peter Christen
Invited presentation to the Data Matching Working Group
Australian Government Attorney-General's Department,
Canberra, July 2011.
Slides
8up (pdf, 2.0 MB)
- Scalable Privacy-Preserving Record Linkage using
Similarity-Based Indexing
Peter Christen
Invited presentation at
Fujitsu Laboratories, Kawasaki, Japan, June 2011.
Slides available upon
request.
2010:
2009:
- Privacy-preserving Data Sharing and Matching
Peter Christen
Departmental seminar at the
ANU Computer
Sciences Lab, Canberra, May 2009.
Slides
4up (pdf, 2.1 MB)
Slides
1up (pdf, 634 KB)
- Accurate Synthetic Generation of Realistic Personal
Information
Peter Christen and Agus Pudjijono
Presentation at the
Pacific-Asia Conference on Knowledge Discovery
and Data Mining (PAKDD 2009), Bangkok, Thailand, April 2009.
Slides
6up (pdf, 1.5 MB)
Slides
1up (pdf, 792 KB)
- Data Linkage - An Overview and Research at the ANU
Peter Christen
Invited Presentation at the
ANU Supercomputer Facility, Canberra, March 2009.
Slides
6up (pdf, 2.2 MB)
Slides
1up (pdf, 721 KB)
2008:
- Towards Scalable Real-Time Entity Resolution using a
Similarity-Aware Inverted Index Approach
Peter Christen and Ross Gayler
Presentation at the
Seventh Australasian Data Mining Conference
(AusDM 2008), Glenelg, Adelaide, November 2008.
Slides
4up (pdf, 1.4 MB)
Slides
1up (pdf, 491 KB)
- Privacy-Preserving Data Linkage
Peter Christen
Part of the tutorial on Privacy preserving data sharing
and mining held at the Seventh Australasian Data Mining Conference
(AusDM 2008), Glenelg, Adelaide, November 2008.
Slides
4up (pdf, 3.2 MB)
Slides
1up (pdf, 872 KB)
- Data Matching of Bibliographic Data: Recent Advances
and an Open Source Matching System
Peter Christen
Presentation at the
2008 Annual Forum of the
Australasian
Association for Institutional Research (AAIR), Canberra,
November 2008.
Slides
6up (pdf, 1.9 MB)
Slides
1up (pdf, 839 KB)
- Automatic Record Linkage using Seeded Nearest Neighbour
and Support Vector Machine Classification
Peter Christen
Presentation at the ACM SIGKDD 2008 conference, Las Vegas,
August 2008.
Slides
6up (pdf, 813 KB)
Slides
1up (pdf, 293 KB)
- Geocode Matching and Privacy Preservation
Invited Presentation at the
PinKDD 2008 workshop held at the
ACM SIGKDD 2008 conference, Las Vegas,
August 2008.
Slides
9up (pdf, 1.5 MB)
Slides
1up (pdf, 513 KB)
- Automatic Training Example Selection for Scalable
Unsupervised Record Linkage
Peter Christen
Presentation at the
Pacific-Asia Conference on Knowledge Discovery
and Data Mining (PAKDD 2008), Osaka, Japan, May 2008.
Slides
4up (pdf, 1.2 MB)
Slides
1up (pdf, 391 KB)
2007:
- A Two-Step Classification Approach to Unsupervised Record
Linkage
Peter Christen
Presentation at the
Sixth Australasian Data Mining Conference
(AusDM 2007), Gold Coast, Australia, December 2007.
Slides
4up (pdf, 1.3 MB)
- Evaluation of a Graduate Level Data Mining Course
with Industry Participants
Peter Christen
Presentation at the
Sixth Australasian Data Mining Conference
(AusDM 2007), Gold Coast, Australia, December 2007.
Slides
4up (pdf, 1.2 MB)
- Data Linkage Research at the ANU
Peter Christen
Invited talk at the The Distillery, Canberra, July 2007.
Slides
8up (pdf.gz, 1.7 MB)
Slides
8up (ps.gz, 1.4 MB)
2006:
- Privacy-Preserving Data Linkage and Geocoding: Current
Approaches and Research Directions
Peter Christen
Presentation at the Workshop on Privacy Aspects of Data Mining (PADM)
held at the IEEE International Conference on Data
Mining (ICDM), Hong Kong, December 2006.
Also presented as a
Departmental
seminar at the ANU
Department of Computer Science, Canberra, December 2006.
Slides
(pdf, 859 KB)
Slides
8up (ps.gz, 1.4 MB)
- A Comparison of Personal Name Matching: Techniques and
Practical Issues
Peter Christen
Presentation at the Workshop on Mining Complex Data (MCD)
held at the IEEE International Conference on Data
Mining (ICDM), Hong Kong, December 2006.
Also presented as a
Departmental
seminar at the ANU
Department of Computer Science, Canberra, December 2006.
Slides
(pdf, 830 KB)
Slides
8up (ps.gz, 1.3 MB)
- Recent Developments in Data Linkage and Research at the
ANU
Peter Christen
Invited talk at the
Australian
Taxation Office, data matching personnel, Canberra,
December 2006.
Slides
10up (ps.gz, 1.6 MB)
- Data Quality Aspects in Data Mining, Data Linkage and
Geocoding
Peter Christen
Invited talk at
Geoscience
Australia, Canberra, November 2006.
Slides
(pdf, 1.7 MB)
Slides
(ps.gz, 1.2 MB)
Slides
9up (ps.gz, 1.2 MB)
- Secure Health Data Linkage and Geocoding: Current
Approaches and Research Directions
Peter Christen and Tim Churches
Presentation at the
National e-Health Privacy and Security
Symposium (ehPASS), Brisbane, October 2006.
Slides
(pdf, 779 KB)
Slides
9up (ps.gz, 363 KB)
- Data Linkage Techniques: Past, Present and Future
Peter Christen
Invited talk at the
Australian
Taxation Office, Canberra, October 2006.
(same set of slides as used for the Analytics Practise
Group presentation, see below).
- Data Linkage Techniques: Past, Present and Future
Peter Christen
Invited talk at the
Canberra
Analytics Practise Group, Canberra, August 2006.
Slides 8up
(pdf, 1.5 MB)
Slides 8up
(ps.gz, 630 KB)
Slides
(pdf, 1.5 MB)
2005:
- Automated Probabilistic Address Standardisation and
Verification
Peter Christen
Presentation at the
fourth Australasian
Data Mining Conference (AusDM 2005), Sydney, December 2005.
Slides
8up (ps.gz, 428 KB)
Slides
(pdf, 892 KB)
Slides
(ps.gz, 425 KB)
- Reflections on COMP2720 (Automating Tools for New Media)
Peter Christen
Departmental
seminar at the ANU
Department of Computer Science, Canberra, November 2005.
Slides
6up (ps.gz, 3.0 MB)
Slides
6up (pdf, 523 KB)
- Recent Developments in Data Linkage Technologies
Peter Christen
Invited talk at the
Data
Linkage Symposium of the Canberra Branch of the
Statistical Society of
Australia, Canberra, September 2005.
Slides
(pdf, 1.7 MB)
Slides 8up
(ps.gz, 690 KB)
- Probabilistic Data Generation for Deduplication and
Data Linkage
Peter Christen
Presentation at the
Sixth
International Conference on Intelligent Data Engineering
and Automated Learning (IDEAL'05), Brisbane, July 2005.
Slides
(pdf, 704 KB)
Slides
(ps.gz, 269 KB)  
Slides 8up
(pdf, 677 KB)
Slides 8up
(ps.gz, 272 KB)
- Probabilistic Deduplication, Data Linkage and Geocoding
Peter Christen
Presentation at the
DAMA Canberra Chapter,
June 2005.
Slides 8up
(pdf, 2.7 MB)
Slides 8up
(ps.gz, 1.3 MB)
- Probabilistic Deduplication, Record Linkage
and Geocoding
Peter Christen
Guest lecture for
MATH1500: ANU Computational Science Undergraduate Seminar,
ANU, May 2005.
Slides 8up
(pdf, 2.1 MB)
Slides 8up
(ps.gz, 989 KB)
- A very short Introduction to Data Mining
Peter Christen
Guest lecture for
COMP3420:
Database Systems, ANU, May 2005.
Slides 8up
(pdf, 74 KB)
Slides 8up
(ps, 136 KB)
Slides 1up
(pdf, 88 KB)
- Febrl - A parallel open source record linkage and geocoding
system
Peter Christen
Presentation at the Data Linkage Workshop,
Australian Bureau of Statistics,
Canberra, April 2005.
Slides 8up
(pdf, 2.4 MB)
Slides 8up
(ps.gz, 1.2 MB)
- A Probabilistic Deduplication, Record Linkage
and Geocoding System
Peter Christen and Tim Churches
Presentation at the
ARC Health Data Mining workshop, University of South
Australia, April 2005.
Slides 4up
(pdf, 854 KB)
Slides 4up
(ps.gz, 389 KB)
Slides
(pdf, 885 KB)
Slides
(ps.gz, 386 KB)
2004:
- A Probabilistic Geocoding System based on
a National Address File
Peter Christen, Tim Churches and Alan Willmore
Presentation at the
Australasian Data Mining Conference,
Cairns, December 2004.
Slides 4up
(pdf, 1.6 MB)
Slides 4up
(ps.gz, 752 KB)
Slides
(pdf, 1.6 MB)
Slides
(ps.gz, 751 KB)
- Febrl - A parallel open source data linkage and geocoding
system
Peter Christen
Presentation at the Open Source Workshop,
Australian Bureau of Statistics,
Canberra, July 2004.
Slides 4up
(pdf, 1.3 MB)
Slides 4up
(ps.gz, 595 KB)
- Febrl - A parallel open source data linkage system
Peter Christen
Presentation at the
PAKDD 2004,
Sydney, May 2004.
Slides
(pdf, 655 KB)
Slides 4up
(pdf, 637 KB)
- Blind data linkage using n-gram similarity comparisons
Peter Christen
Short presentation at the
PAKDD 2004,
Sydney, May 2004.
Slides
(pdf, 510 KB)
Slides 4up
(pdf, 500 KB)
2003:
2002:
- Probabilistic Name and Address Cleaning and
Standardisation
Peter Christen, Tim Churches and Justin Xi Zhu
Presentation at the
Australasian
Data Mining Workshop, Canberra, December 2002.
Slides
4up (ps.gz, 345 KB)
Slides
4up (pdf, 764k KB)
- How Fast is '-fast'? Performance Analysis of KDD
Applications using Hardware Performance Counters on
UltraSPARC-III
Adam Czezowski and Peter Christen
Presentation at the
Australasian
Data Mining Workshop, Canberra, December 2002.
Slides
4up (ps.gz, 335 KB)
Slides
4up (pdf, 745k KB)
- High-Performance Computing Techniques for
Record Linkage
Peter Christen, Tim Churches, Markus Hegland, Kim Lim, Ole M.
Nielsen, Stephen Roberts and Justin Xi Zhu
Presentation at the Australian Health Outcomes
Conference (AHOC-2002), Canberra, July 2002.
Slides 4up
(ps.gz, 1.6 MB)
Slides 4up
(pdf, 1.5 MB)
- Parallel Techniques for High-Performance
Record Linkage (Data Matching)
Peter Christen
Seminar at the ANU Department of Computer Science,
Canberra, June 2002.
Slides 4up
(ps.gz, 535 KB)
Slides 4up
(pdf, 1.1 MB)
- Parallel Computing Techniques for High-Performance
Probabilistic Record Linkage
Peter Christen, Tim Churches, Markus Hegland, Kim Lim, Ole M.
Nielsen and Stephen Roberts
Presentation at the Symposium on Health Data
Linkage, Syndey, March 2002.
Slides
(ps.gz, 1.5 MB)
Slides
(pdf, 654 KB)
- Performance Analysis of KDD Applications using Hardware
Event Counters
Peter Christen and Adam Czezowski Presentation at the
CAP Workshop 2002, Fujitsu, Kawasaki, Japan,
6 February 2002.
Slides
(ps.gz, 57 KB)
Slides
(pdf, 77 KB)
2001:
- High Performance Computing and Data Mining
Peter Christen Presentation at the AEA Data Mining
Workshop, Australasian Epidemiological Association, 10th Annual
Scientific Meeting, Sydney, 28 September, 2001.
Slides (ps.gz, 963 KB
Slides (pdf, 947 KB)
- Data Mining at the Australian National University
Peter Christen Presentation at the
Department of Computer
Science,
University of Basel,
Switzerland, September 2001.
- DMtools - Open Source Software for Database Mining
Peter Christen Presentation at the
Workshop on Database Support for KDD (at the
PKDD'2001
Conference), Freiburg, Germany, September 2001.
Slides (ps.gz, 283 KB)
Slides (pdf, 1.4 MB)
- High Performance Computing and Data Mining
Peter Christen Presentation a the EPI-SIG Health
Data Mining Seminar, Australian Museum, Sydney, 25 May,
2001.
Slides (ps.gz, 1.0 MB)
Slides (pdf, 1.1 MB)
2000:
- Application of Parallel Computing in Data Mining
Peter Christen
Seminar at the Suranaree
University of Technology, December 2000.
- Parallel Computing and Message Passing
Peter Christen.
Two-days course at the Suranaree
University of Technology, October 2000.
- Data Mining at the ANU
Peter Christen
Presentation at the ADFA/ANU Machine Learning meeting,
ANU, Canberra, September 2000.
- ACSys CRC - Data Mining Tools
Peter Christen and Ole Nielsen
Presentation and Demonstration for the ACSys CRC
Data Mining research group, ANU, Canberra, August 2000.
- Parallel Algorithms in Data Mining - The
ANU CSL Data Mining Approach
Peter Christen
Seminar at the Department of Computer Science, Australian
National University, Canberra, July 2000.
- Parallel Algorithms for Data Mining
Peter Christen
Seminar at the School for Information Studies, Charls Sturt
University, Wagga Wagga, May 2000.
Slides
(pdf, 1.7 MB)
Slides
(ps.gz, 1.5 MB)
1999:
- Parallelization of a Finite Element Surface Fitting
Algorithm for Data Mining
Peter Christen, Irfan Altas, Markus Hegland, Stephen Roberts,
Kevin Burrage and Roger Sidje
CTAC-99 Conference, Canberra, September 1999.
Slides
(ps.gz, 614 KB)
- A Parallel Iterative Linear System Solver
with Dynamic Load Balancing
Peter Christen
CTAC-99 Conference, Canberra, September 1999.
Slides (ps.gz, 218 KB)
|