G.B. Bell and A. Sethi, Matching Records in a National Medical Patient Index, Communications of the ACM, Vol. 44 No. 9, September 2001.

I. Bartolini, P. Ciaccia and M. Patella, String Matching with Metric Trees Using an Approximate Distance, SPIRE 2002: Proceedings of the 9th International Symposium on String Processing and Information Retrieval, pp. 273-283, 2002.

D.P. Bertsekas, Auction Algorithms for Network Flow Problems: A Tutorial Introduction, Computational Optimization and Applications, Vol. 1, pp. 7-66, 1992.

V. Borkar, K. Deshmukh and S. Sarawagi, Automatic segmentation of text into structured records, in Proceedings of the 2001 ACM SIGMOD international conference on Management of Data, Santa Barbara, California, 2001.

Boulos, M.N.K.: Towards evidence-based, GIS-driven national spatial health information infrastructure and surveillance services in the United Kingdom. International Journal of Health Geographics 2004, 3:1. Available online at:

Cayo, M.R. and Talbot, T.O.: Positional error in automated geocoding of residential addresses. International Journal of Health Geographics 2003, 2:10. Available online at:

P. Christen, T. Churches and J.X. Zhu, Probabilistic Name and Address Cleaning and Standardisation, Proceedings of the Australasian Data Mining Workshop, Canberra, December 2002.

T. Churches, P. Christen, K. Lim and J.X. Zhu, Preparation of name and address data for record linkage using hidden Markov models, BioMed Central Medical Informatics and Decision Making, 2002, 2:9,

R. Cilibrasi and P. Vitanyi, Clustering by compression, IEEE Trans. Infomat. Th. Submitted, 2004. See

W.W. Cohen, The WHIRL Approach to Integration: An Overview, in Proceedings of the AAAI-98 Workshop on AI and Information Integration. AAAI Press, 1998.

M.G. Elfeky, V.S. Verykios and A.K. Elmagarmid, TAILOR: A Record Linkage Toolbox, Proceedings of the ICDE' 2002, San Jose, USA, 2002.

Ester, M., Kriegel, H.-P. and Sander, J.: Spatial Data Mining: A Database Approach, Fifth Symposium on Large Spatial Databases (SSD'97). Springer LNCS 1262, pp. 48-66, 1997.

I. Fellegi and A. Sunter, A Theory for Record Linkage. In Journal of the American Statistical Society, 1969.

H. Galhardas, D. Florescu, D. Shasha and E. Simon, An Extensible Framework for Data Cleaning, Technical Report 3742, INRIA, 1999.

L. Gill, Methods for Automatic Record Matching and Linking and their use in National Statistics, National Statistics Methodology Series No. 25, London 2001.

J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000.

M.A. Hernandez and S.J. Stolfo, The Merge/Purge Problem for Large Databases, in Proceedings of the SIGMOD Conference, San Jose, 1995.

E. Keogh, S. Lonardi and C.A. Ratanamahatana, Towards parameter-free data mining, in Proceedings of the 2004 ACM SIGKDD international conference on knowledge discovery and data mining, pp. 206-215, Seattle, 2004.

C.W. Kelman, Monitoring Health Care Using National Administrative Data Collections, PhD thesis, Australian National University, Canberra, May 2000.

Kukich, K.: Techniques for automatically correcting words in text. ACM Computing Surveys, vol. 24, no. 4, pp. 377-439, December 2004.

A.J. Lait, and B. Randell, An Assessment of Name Matching Algorithms, Technical Report, Department of Computing Science, University of Newcastle upon Tyne, UK 1993.

J.I. Maletic and A. Marcus, Data Cleansing: Beyond Integrity Analysis, in Proceedings of the Conference on Information Quality (IQ2000), Boston, October 2000.

AutoStan and AutoMatch, User's Manuals, MatchWare Technologies, Kennebunk, Maine, 1998. See also:

A. McCallum, K. Nigam and L.H. Ungar, Efficient clustering of high-dimensional data sets with application to reference matching, Knowledge Discovery and Data Mining, 169-178, 2000.

U.Y. Nahm, M. Bilenko and R.J. Mooney, Two Approaches to Handling Noisy Variation in Text Mining, in Proceedings of the ICML-2002 Workshop on Text Learning (TextML'2002), pp.18-27, Sydney, Australia, July 2002.

H.B. Newcombe and J.M. Kennedy, Record Linkage: Making Maximum Use of the Discriminating Power of Identifying Information, Communications of the ACM, Vol. 5 No. 11, 1962.

Paull, D.L.: A geocoded National Address File for Australia: The G-NAF What, Why, Who and When? PSMA Australia Limited, Griffith, ACT, Australia, 2003. Available online at:

L. Philips, The Double-Metaphone Search Algorithm, C/C++ User's Journal, Vol. 18 No. 6, June 2000.

Pollock, J.J. and Zamora, A.: Automatic spelling correction in scientific and scholarly text. Commun. ACM, 27(4):358-368, 1984.

E.H. Porter and W.E. Winkler, Approximate String Comparison and its Effect on an Advanced Record Linkage System, Research Report RR97/02, US Bureau of the Census, 1997.

L.R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, in Proceedings of the IEEE, Vol. 77, No. 2, February 1989.

E. Rahm and H.H. Do, Data Cleaning: Problems and Current Approaches, IEEE Bulletin of the Technical Committee on Data Engineering, Vol. 23 No. 4, December 2000.

K. Seymore. A. McCallum and R. Rosenfeld, Learning Hidden Markov Model Structure for Information Extraction, in Proceedings of AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.

US Federal Geographic Data Committee. Homeland Security and Geographic Information Systems - How GIS and mapping technology can save lives and protect property in post-September 11th America. Public Health GIS News and Information, no. 52, pp. 21-23, May 2003.

V.S. Verykios, A.K. Elmagarmid and E.N. Houstis, Automating the Approximate Record-Matching Process, Information Sciences, Vol. 126, July 2000.

V.S. Verykios, A.K. Elmagarmid, M.G. Elfeky, M. Cochinwala and S. Dalal, On the Completeness and Accuracy of the Record Matching Process, in Proceedings of the MIT Conference on Information Quality, Boston, MA, October 2000.

W.E. Winkler and Y. Thibaudeau, An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census, Research Report RR91/09, US Bureau of the Census, 1991.

W.E. Winkler, Quality of Very Large Databases, Research Report RR2001/04, US Bureau of the Census, 2001.

W.E. Yancey, Frequency-Dependent Probability Measures for Record Linkage, Research Report RR00/07, Statistical Research Division, US Bureau of the Census, July 2000.

W.E. Yancey, BigMatch: A Program for Extracting Probable Matches from a Large File for Record Linkage, Research Report RR 2000-01, Statistical Research Division, US Bureau of the Census, March 2002.