AddressClean – Duplicate Postal Address Detection & Removal Services

“We use what is quite simply the most intelligent and accurate name, address & identity matching software around”

We provide duplicate postal address detection & removal services across scripts & countries.


Some common reasons to detect & optionally remove duplicate addresses are:

  • Have unique customer identities to enable a single unified customer view for Master Data Management
  • Consolidate information before data warehousing
  • Reduce costs by eliminating duplicates before any activity
  • Sharpen & focus marketing effort by reducing clutter by consolidating across various internal and externally sourced lists
  • Fraud detection, e.g. ensure that the same claim is not processed again
  • Criminal detection, e.g. ensure that the individual or organisation is not on a banned list
  • Legal & Know your customer (KYC) norms compliance

Cleaning & validation of addresses is a basic business best practice.

Advantages of using our services

  • 10 years experience in enterprise level solution provision & consultancy in this area
  • Satisfied customers
  • Good credentials:  Using a proven world-class, reliable & robust software tool; already tried and tested in real time under high stress and extreme volumes and backed by a proven organisation for the last 35 years
  • Used by some super security agencies of some super powers

Script flexibility & country specific optimisation

We support all countries who use the Latin1 script which includes most European, North American, South American, Asian, African and Australasian countries.

In the near future we will support Arabic, Chinese, Greek, Japanese, Korean and Russian/Cyrillic scripts and mixed input scripts of Arabic & English, Chinese & English, Japanese & English, and, Korean & English.

While we provide services for all countries, we use special optimisation rules for the larger and commercially important countries. See the related section in our FAQ page. Additionally there are special rules for Anti Money Laundering (AML) & OFAC (Office of Foreign Assets Control – US Department of the Treasury).

Impeccable customer references

  1. Tata Share Registry (now known as TSR Darashaw)
  2. Life Insurance Corporation (LIC) of India – Data Warehousing project



Input can be in tab separated variables (TSV), comma separated (CSV), spreadsheet, XML, JSON, YAML or your format. Different files may have different layouts with different presentation of fields, e.g. entire address together, or as in a fixed number of fields (label), or as separated as post code, province/prefecture/district, etc.

Best results obtained if the same structure is used across files & information is separated into specific fields, i.e. person name in one or more fields, designation, organisation, address part 1 (building, street, sub area, area), address part 2 (city, district, state, country), post code – field structure & order to be discussed before project start.

Country must always be known, i.e. country specified everywhere, or an entire file pertains to a single country.

Data (in ASCII or Unicode utf-8) will be cleaned by us before processing so that there are no control characters (e.g., tab, line feed), or, extra, leading or trailing spaces.

The following information is used for duplicate detection:

  • Identity, e.g. internal record number, national or passport ID – MANDATORY
  • Person name – necessary if search for individuals
  • Designation
  • Organisation
  • Address part 1 (building & street details, sub area, area) – MANDATORY
  • Address part 2 (city, district, county, prefecture, state, country) – MANDATORY
  • Post code – Recommended
  • Phone(s)
  • Date, e.g. birth, incorporation, marriage
  • Any two attribute fields up to 255 bytes length each

A person can have multiple addresses, e.g. a person can have a residence and work address, or more.

We prefer data before any address cleaning is done, else we recommend data both before & after cleaning be sent.


Before processing, client should approve a master format which encompasses all input formats. All records marked as similar will be given a unique cluster number. We provide the output in a variety of formats convenient for you.  Please see the FAQ.

On going processing

Data can be processed in installments – after initial processing, additional data can be sent where only the duplicates pertaining to the new data are reported back.

Manual examination

In sensitive cases and for increased quality, we recommend manual examination (eye-balling) to further enhance results.  This can be for the entire data set or selectively depending on pre-agreed criteria.

Next steps

Try us out.  We will do a free demonstration with basic processing for up to 1 million records. In case you already have a duplicate removed file of any size, we will detect further duplicates in the data. We do expect to be paid in case you are happy with our efforts.

Request a quotation



TATA Share Registry (now known as TSR Darashaw)

This refers to the Identity de-duplication work done for us by IdentLogic Systems. We would like to record our appreciation of the work done on the same. Even though we knew that there was much duplication, you were able to throw up many more matches. Your team also deserves kudos for their co-operative and problem solving attitude. – Sandeep Murdeshwar, Chief Technology Officer

View Original Appreciation

Life Insurance Corporation of India

IdentLogic Systems provides the only possible solution in the world!

BACKGROUND: While inaugurating the Life Insurance Corporation of India hi-tech data centre, Chairman Mr. T.S. Vijayan remarked that the data warehousing project is the largest life insurance data warehouse in the world and would enable consumers to benefit by consolidating policies to enable payments on a single day, ensure uniform update of corrections across all policies, etc. The project cost is Rs. 35 crores. The 280 million policies loaded uses 8 terabytes of storage & this is expected to grow to 56 terabytes. Source: The Economic Times

All this was possible only because of the duplicate detection software sold & supported by IdentLogic Systems. Initially the project was planned with other duplicate detection software sometimes used in large data warehousing projects. However, it was soon apparent that all of them were incapable of handling vast Indian data with significant problems of data accuracy, transposing of words, distortions, abbreviations & missing data elements. The solution provided by IdentLogic Systems was tested & adopted well after project start.

… product performed well – Venkatachalam Balivada, Wipro (LIC Data Warehousing Project Manager)

View Original Appreciation