15 May

Reformatting addresses before or after duplicate removal?

Most solutions correct, improve or standardise addresses before duplicate detection. This helps in making duplicates easier to detect.

This is erroneous as corrections may introduce errors of their own (noise) leading to further errors. If the solution can correct addresses before duplicate detection, then it should incorporate this facility within the duplicate detection process itself.

Also, corrections are irreversible. The original data is mangled as well as enhanced reducing the chances of determining if a match is found or not.