Syncing Google Analytics Data with a CRM using Email Hashes (While Complying with GDPR)

Syncing Google Analytics Data with a CRM using Email Hashes (While Complying with GDPR)

Tags
Google AnalyticsSalesforceJavascriptCompliance
Type
Tutorials
First Published
March 14, 2022

While setting up a basic email marketing flow for a client, I hit a few snags trying to come up with an easy way to keep website activity in sync between the CRM & Google Analytics using the email address. There were issues related to both compliance and technology. In the end, the solution was pretty simple but it took a lot of searching around, so here’s a summary.

(Scroll all the way down to the bottom if you just want the javascript that makes it work.)

image

The Goal

In Google Analytics you can use a feature called User-ID to associate a unique identifier with your website visitors. This has tons of great benefits, mostly related to being able to unify your analytics for a particular user across multiple sessions and devices. For me, though, the big goal is to eventually be able to link website activity data from Google Analytics back to specific lead and contact records in Salesforce. In order to accomplish that goal at some point in the future, there are some pieces which need to be put in place right away.

The Strategy

In order to make this work I need a “user id” that can tie everything together. Salesforce is the system of record. But rather than use the Salesforce lead or contact id (which can change if duplicate records are merged), I want to use the email address.

The email address is a natural key rather than a synthetic key, which makes it well suited to use cases where duplicate records are a persistent issue. CRM contacts often end up with dupes due to common things like filling out website forms multiple times and uploading lead lists from marketing events. By using the email address as the Google Analytics “user id” I can reduce the risk that GA will get out of sync with the CRM.

Of course, “natural keys” that relate to humans go by another name: Personally Identifiable Information (PII). Using PII as our user id has some downsides, but we’ll get to those in a minute.

The Components

Here’s a breakdown of the client’s setup.

  • CRM: Salesforce is the primary system of record for leads and contacts.
  • Email Marketing Tool: Mailchimp is used for marketing automation. It’s not the most sophisticated tool but it’s cheap, reliable and fairly simple for end users to use on a day-to-day basis. Also, the Mailchimp Salesforce integration is surprisingly full featured (and works even on their cheapest plans) and lets you build queries of contacts & leads which are automatically pulled from Salesforce into Mailchimp in a nightly batch job.
  • Website: The website is running on WordPress. 🤷🏼‍♂️
  • Tracking: Google Analytics, naturally. Eventually I’m planning on setting up a system to automatically ingest website activity back into Salesforce. But for now my goal is just to make sure that the GA is tracking activity in a way that at least sets me up to link it back to Salesforce in the future.

The Flow

  1. Salesforce pushes contacts and leads to Mailchimp. The email address (obviously) comes along for the ride.
  2. Mailchimp sends emails to the users. The emails contain CTA links. We’ve appended “?email=*|EMAIL|*” to the end of each link URL which causes Mailchimp to include the user’s email as a query parameter.
  3. When the user hits the website I will write some javascript which sees the email address and passes it to Google Analytics as the User Id.(This is the problematic step, details below.)
  4. That’s it! Once we tell GA who the user is, it will cookie them and ensure that all future (and past) data tracked against the current browser session is linked to their email address. This will ensure that we can eventually link that activity back to specific Salesforce records.

The Problem

So what’s the issue? Well, you’re not supposed to send personally-identifiable information to Google Analytics, primarily due to the EU’s GDPR regulations. Not only do we want to be in compliance with Google’s policies (so they don’t shut down our GA account) but we also really need to comply with GDPR (so the European government doesn’t track us down and sue us). And frankly, we also need to be good stewards of our users’ valuable data if we really want to forge tenable business paths into the bold, data-driven future.

Ok, so we need to be careful with PII. And email addresses definitely count as PII, so what do we do? We encrypt them! Per Google’s documentation:

You can send Google Analytics an encrypted identifier or custom dimension that is based on PII, as long as you use the proper encryption level. Google has a minimum hashing requirement of SHA256 and strongly recommends the use of a salt, with a minimum of 8 characters.

A hashing algorithm is a process of turning a secret string (like an email) into a jumbled version of itself (a hash) via a process which is easy to do in one direction (email → hash) and difficult to do in the other (hash → email). Hashing allows us to handle the email address in a way that lessens the risk of exposure to third parties like, in this case, Google Analytics. MD5 is one of the most common ways to hash an email address, but it’s not very secure and Google requires something better. SHA-256 is better, and it’s widely supported in most programming languages, including Salesforce’s Apex language. So I’m going to use SHA-256.

The Solution — Part 1

Somewhere in our data flow we’re going to need to encrypt the email using the SHA-256 algorithm. Mailchimp can’t do it. Salesforce can definitely do it — and in fact that would be a better solution — but for this project I’m trying to avoid adding custom server-side code to Salesforce. So that only leaves one place: javascript running on the website itself.

I’m already using a free plugin to insert the Google Analytics tracking code javascript into the header of the Wordpress website, so that part is easy. I’ll just add a bit more code to the bottom of that script tag and be good to go.

The first tricky bit is figuring out how to do SHA-256 hashing in vanilla javascript in a web browser. The code to do this is below. Note: It relies on the Web Crypto API, which is currently supported in all browsers but evidently requires some sort of opt-in process for IE 11. If maximum browser support was critical I could import a third party crypto library, but that would slow down page loads and add complexity so I’m skipping it since IE support isn’t critical for my use case.

// This code assumes that the email address is in a URL query param called `email`.
let emailParamMatch = window.location.href
  .replace(/%40/gi, '@') // change "%40" to "@"
  .match(/[?&]+email=([^&]*)/i); // extract the "email" query param, the value is in match group 1
let emailParamValue = emailParamMatch ? emailParamMatch[1] : null;

// It's good to add a random "salt" string to prevent attackers from using a hash dictionary.
// This can be literally anything, like the name of your company, but you'll need to use the same 
// exact salt elsewhere if you want the hashes to match up with another system.
let hashSalt = 'make_this_unique';

// Make sure that the browser has the crypto.subtle library first (might not work in IE)
if (emailParamValue && emailParamValue.length && crypto && crypto.subtle) {
  // Use the SubtleCrypto library to create a SHA-256 digest from the email + salt
  crypto.subtle.digest(
    'SHA-256',
    new TextEncoder().encode(emailParamValue.toLowerCase() + hashSalt)
  ).then(hashed => {
    // The digest is returned in a wonky format so we need to convert it to a hex string
    let encryptedEmail = [].map.call(
      new Uint8Array(hashed),
      b => ('00' + b.toString(16)).slice(-2)
    ).join('');

    // Done!
    console.log('Hashed and salted version of the email parameter:', encryptedEmail);
  });
}
This is only part of the solution. We still need to send this info to Google Analytics.

The Solution — Part 2

We’re almost there but we have one last problem: the URL of the page. We just used our email marketing tool to put the user’s email address in a query parameter in the URL. This is what allows our javascript to parse it and encrypt it. But when Google Analytics tracks the page view for this visit, the email address will get tracked along with it. In its unencrypted state. 🤦🏼‍♂️

That’s not good! In fact, if we go live with this solution we could technically still be causing a GDPR violation. So we need to fix that. If you google around, you’ll come across Simo Ahava’s solution which very elegantly redacts all sorts of PII from every single hit we send to Google Analytics. This approach is very sophisticated. It’s also pretty complex to implement, especially if you’re not already using GTM. And for my use case it feels like overkill. I’m really just trying to remove one specific url parameter.

Luckily there is a simpler solution. We just need to make a little change to the GA tracking code. This is the default header tracking code which is automatically generated for you in the Google Analytics admin UI:

// This is the default, it tracks a page view with the current URL
gtag('config', 'GA_MEASUREMENT_ID');

The documentation for the gtag() command explains that the command also accepts an object containing overwrites for the page view parameters. Using that syntax, here’s what it would look like to erase the word “something” from our URLS before tracking them in GA:

// We can modify the URL if desired
gtag('config', 'GA_MEASUREMENT_ID', {
  'page_location': document.location.href.replace(/something/gi, '')
});

So we just need to erase our email parameter before passing it to GA. Putting it all together looks like this:

<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=GA_MEASUREMENT_ID"></script>
<script>

  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  // Remove email param from URL before tracking if present 
  gtag('config', 'GA_MEASUREMENT_ID', {
    'page_location': document.location.href.replace(/email=([^&]*)&?/i, '')
  });
  
  // Extract and encrypt the email parameter if present

  let emailParamMatch = window.location.href.replace(/%40/gi, '@').match(/[?&]+email=([^&]*)/i);
  let emailParamValue = emailParamMatch ? emailParamMatch[1] : null;
  let hashSalt = 'make_this_unique';

  if (emailParamValue && emailParamValue.length && crypto && crypto.subtle) {
    crypto.subtle.digest(
      'SHA-256',
      new TextEncoder().encode(emailParamValue.toLowerCase() + hashSalt)
    ).then(hashed => {
      let encryptedEmail = [].map.call(
        new Uint8Array(hashed),
        b => ('00' + b.toString(16)).slice(-2)
      ).join('');
      gtag('set', {'user_id': encryptedEmail});
    });
  }

</script>

This is the final solution. This code needs to be updated with your own Google Analytics tracking id and a unique hash salt string (like the name of your company). And if you go with a URL query parameter named something other than “email” you’ll need to change that too.

Conclusion

Now all of the Google Analytics website data will automatically be linked with the SHA-256 encrypted version of the user’s email address when they click a CTA link in a marketing email. This will make it possible, in a later project, to link that activity back with the original contact records in our CRM.

In addition to being a co-founder of Quorum1, Hank is a serial entrepreneur and long-time Salesforce consultant. You can learn more about his work on his personal website.