Beware diacritic characters where integrating with SQL

I’m certain you all know the above and practice it regularly. First a little background…

In Dynamics GP we wrote a very basic “CRM like” system using a .NET GP Addin, that lays over the top of the SOP module. It introduces the concept of contact records, with many-many relationship to customers/debtors in GP.  The list of contacts associated with an account can be viewed from a sales order and debtor card. The contacts are syncronised to MailChimp (saas email marketing). Marketing click through and email opens are also synced back to be shown next to the contact record. The contacts are also synchronised with the various ecommerce websites that feed GP, contacts being soft linked to website users.

The website integration means there is a merge required to accommodate new and updated records when users update details on the websites. This is where my oversight came to light. Duplicate records were being created, it turned out to be due to diacritics. Below is an example of a duplicate record.

FirstName
Kristján
Kristjan

The example shows what we know they are the same person, but SQL MERGE statement, due to the default collation on the database, sees these as the same. Instead it sees two distinctly different names and thus creates a new contact record for the second instance, where it should (in our case) be merging changes into the first instance. This is an over simplified version of what happened as there are other keys involved and lot of business rules. Obviously SQL is not doing anything wrong but it is not our desired behaviour for this particular task.

It is easy to resolve, when comparing records, for our purpose, we override the default collation and use a Accent Insensitive (AI) version instead, for example:

COLLATE Latin1_general_CI_AI

where “AI” at the end of the collation name is the key to the insensitive comparison.

WHERE
t2.FirstName = t1.FirstName COLLATE Latin1_general_CI_AI

The implementation depends on your own needs, my point for this post is to not forget about this issue if merging data from different sources where there may be a mixture of diacritic and non-diacritic text entered. Integration of data continues to have its challenges…

Add comment

Loading