Saturday, October 6, 2012

Winter 13 introduces State drop downs but needs data cleanse

It's been a while since I've posted anything new but that doesn't mean I haven't found the need to fix issues with Salesforce. Here is an example that I built about six months ago and then forgot about until I heard about the upcoming Winter 13 release and some sort of drop down lists for States and Provinces.  This is a Pilot feature that you need to sign up to get and, I believe, this will only be offered for Canada and the USA.

While I applaud SF's effort I believe this approach is flawed. First the issue goes beyond just the USA and Canada and secondly: data in these address fields can come from other sources than just the Salesforce user interface!  It is especially true if you import records from or from web forms or marketing tools such as Marketo, Pardot, Hubspot to name a few. How will this new Pilot ensure that all values inserted follow the rules set out in the picklist? I think this feature will not not address this problem.  Therefore, you can't run reports based on the country or state fields!  How broken is that?

A while ago I found a way to fix this problem.  To me my fix seems versatile enough that it could be extended to more geographic regions and extended to other data fields that need to accept a range of values and yet be stored in a canonical form.

The technique makes use of a couple of SF Custom Settings Objects and a Trigger on the Account and Contact objects.


First let's look at the trigger and then we will look at the data set.  I only show the trigger on the Account object but the one on the Contact object is very similar and easy to build from this code.

trigger AccountBeforeUpsert on Account (before insert, before update) {

Map dataMatches = DataMatch__c.getAll();
for(Account theAccount : {
  String target = theAccount.BillingCountry;
  if(target != null) {
    target = target.toLowerCase();
    DataMatch__c match = dataMatches.get(target); 
    if(match != null ) {
      DataIntegrity__c value = DataIntegrity__c.getValues(match.DataIntegrityKey__c);
      if(value != null) {
        theAccount.BillingCountry = value.Value__c;
  target = theAccount.BillingState;
  if(target != null) {
    target = target.toLowerCase();
    match = dataMatches.get(target); 
    if(match != null ) {
      DataIntegrity__c value = DataIntegrity__c.getValues(match.DataIntegrityKey__c);
      if(value != null) {
        theAccount.BillingState = value.Value__c;

As you can see the pattern is quite simple.
  1. Before the record is inserted or updated check the field you want to canonicalize.
  2. Convert the string to all lower case characters. Note this method probably will not work in some languages. I know that you must be careful with the locale when working in Java but I have not tried this method in Salesforce.  Here is a good test: switch to the locale for Turkey and convert any word with an upper case I to lower case i and back to upper case. If you get the same upper case I then it works.
  3. Look up the value in a set of data that has every possible combination of your field (I'll explain more in a minute).
  4. If found then use an associated key to look up the canonical (standard) form to insert into the record.
For example,  if the incoming country field contains any of the following variations the trigger will insert "U.S.A"
  • U.S.
  • United States
  • United States of America
  • US
  • USA
  • U.S.A
  • and more

Data Tables / Custom Settings

The trigger depends on two custom settings: DataMatch__c and DataIntegrity__c. Create a custom setting via Setup... Develop ... Custom Settings.   These are similar to Salesforce Objects but not quite the same.   The way I'm using these two objects they are similar to a Java HashMap or C# Dictionary.  Once you create a Custom Setting the trick is to press the Manage button to actually create the list.  But, once I got the basic structure the way I wanted it I used a data loader to bulk load the data from a CSV file (which I built by hand!  time consuming ... tedious ... but necessary).

   select Name, DataIntegrityKey__c from DataMatch__c

Here is some sample content of this object:

  "d.c.","District of Columbia"
  "dc","District of Columbia"
Now, we could just insert the value in the second column but, suppose someone later decides they want to use abbreviations only? You'd then have a major task on you hands!  So, I introduce a second Object (table) that maps the key in the first table to the value we want in the record.

   select  Name,  Value__c from DataIntegrity__c

Here is some sample data from this second table:
  "American Samoa","American Samoa"
  "British Columbia","British Columbia"
  "District of Columbia","District of Columbia"

First Time Preload

This method works great once it is installed. But what about all the existing data? No problem! Just do a data export of all accounts and then follow with a data load. (I believe you can just import the data as is without changing anything.) On load the trigger will fire for each record and will check the fields against the data tables. VoilĂ  you have data integrity and users can enter the data in a way that suits them. You can also include common mis-spellings in the Data Match table.