Piifind & fix – TagManager – clickety-clack.click

RESOURCES

MORE INFO
https://clickety-clack.click/pii-removal-from-analytics/
ARTICLES + TOOLS
NUKE CURRENT: https://www.practicalecommerce.com/removing-personal-data-from-google-analytic
https://www.simoahava.com/gtm-tips/remove-pii-google-analytics-hits/
NOT IN GA4: https://brianclifton.com/blog/2017/09/07/remove-pii-from-google-analytics/#comment-160455
DATA STUDIO REPORT: https://datastudio.google.com/u/0/reporting/1MI0l7m79xrEo6HnSiVtvsX9yx_q9fE_w/page/Reo0
- CURRENT JS CODE
- https://brianclifton.com/blog/2017/09/07/remove-pii-from-google-analytics/

How to Find and Fix PII in Google Analytics Data

Find PII in analytics

https://www.getelevar.com/guides/google-analytics/how-to-find-and-fix-pii-in-google-analytics-data/

Receiving a notice like the one below from Google can be a jarring experience.

The threat of being cut-off from a significant marketing channel to drive revenue is nothing to take lightly:

Here are more specifics from Google around this policy:

To protect user privacy, Google policies mandate that no data be passed to Google that Google could use or recognize as personally identifiable information (PII). PII includes, but is not limited to, information such as email addresses, personal mobile numbers, and social security numbers.

Many contracts, terms of service, and policies for Google’s advertising and measurement products refer to “Personally Identifiable Information” (PII). You may find in such contracts, terms of service, and policies a prohibition against passing information to Google that Google could use or recognize as PII.

What Google Considers PII

Google interprets PII as information that could be used on its own to directly identify, contact, or precisely locate an individual. This includes:

email addresses
mailing addresses
phone numbers
full names or usernames

How to Look for PII in Your Google Analytics

There are a few different methods to accomplish this.

The easiest way to do this is go to:

Google Analytics > Behavior > Site Content > All Pages

And then filter with @ so it looks something like this:

This will bring up any pageviews that have common emails in them.

Another option is to use the GA Debugger Google Chrome Extension and

Look for email addresses

If you need a more robust method to ensure you are looking for data like: email@domain.com (instead of just the @ symbol) then insert this regex into the filter field:

([a-zA-Z0-9_\.-]+)@([\da-zA-Z\.-]+)\.([a-zA-Z\.]{2,6})

This is a bit more strict in looking for the full email format.

This regex looks for common social security # format of 111-11-1111:

(\d{3}-?\d{2}-?\d{4})

Look for addresses

This regex looks for common address inclusions but is very subjective so it will need to be adapted to your own needs. The pipe symbol | is an OR condition.

(drive|street|road|dr.|po box|rd.)

Look for phone numbers

This is very similar to your social security regex but can be modified:

(\d{3}-?\d{3}-?\d{4})

This matches the format of 800-867-5309. If you wanted to remove the – then it would look like this:

(\d{3}\d{3}\d{4})

Look for names

This one is a bit more difficult to nail down but you can start with a regex like this that looks for names that are labeled:

(fn|ln|lastname|firstname|name|fullname)

How to Remove PII from Pageview Hits

The only real way to remove PII from your own Google Analytics pageview hits is preventing this PII data from being sent to GA in the first place.

And the only way to fully protect yourself is by putting a safeguard in place that strips out this data from your hits being sent to GA via Google Tag Manager.

NOTE: Filters do not constitute removing this data. Do not put filters in place and think this fixes your issue.

If you are on Shopify then you can use our Google Tag Manager Suite App which has this PII redaction tag already in place.

This redaction was made possible by the GTM guru Simo Ahava by utilizing the customTask function via a custom HTML tag that redacts this data within the pageview hit send to Google Analytics.

Once you’ve implemented one of these methods:

Installing GTM Suite App and migrating Google Analytics hit data to GTM
Implementing Simo’s method of sitewide GA tracking via GTM

Then it’s time to test.

It’s pretty simple to test this. All you have to do is go to your website and put an email into your URL like this:

Then you should start seeing the REDACTED EMAIL within your pageview hits like this:

Next Steps

Once you’ve implemented this PII restriction then it’s time to move on to mitigating bounce rate issues.

Show less

what to do with current pii + nuclear option

YOU FOUND PII

you found pii

jan2005

redacted

https://www.clickinsight.ca/blog/infringing-google-analytics-pii-terms-service-find-out

YOU FOUND PII IN YOUR DATA. WHAT TO DO NOW?

So you checked your data, ran some reports, and discovered that PII is being collected in your GA account. What do you do now? Here are some suggestions:

The first step is to create filters, when possible, to ensure that personal information is no longer captured from this point forward.
Create brand new views/profiles to start collecting data without the identified personal information. They can be copies of your old profiles. In case your old profiles need to be deleted, you can use these new ones that are free from PII.
Assess the extent of your PII issue by determining for how long your GA account has been collecting personal information. This will provide you with a better idea of the impact and possibly provide insight to address the source of the problem.
Backup the data you may need for your reports and analysis from your old profiles. Then you must delete your old views/profiles.
Address the issues that are causing PII to be sent to GA. Put checks in place to prevent this from happening in the future.
You may need to inquire with the legal council in your organization, in order to make the right decision. You should also be aware that there are privacy implications derived from the information being exposed in such an unsecured way in the page URL (yes, a complete new post could be written on that topic alone…).

If you are not sure how to proceed or you need some advice to address this issue, you can always contact us, or any Google Analytics Certified Partner, to guide and assist you through the process.

BUT, WHY SHOULD I LOSE MY DATA?

If you have read this post up to this point, you may be wondering: If in so many situations PII can be captured by accident, shouldn’t Google provide an alternative to solve this problem without having to delete all your data?

We agree. So let us ask you the following question:

What features would like to see to prevent capturing PII and to remedy situations in which data collection got out of control?

https://www.practicalecommerce.com/removing-personal-data-from-google-analytic

DATA STUDIO REPORT:
https://datastudio.google.com/u/0/reporting/1MI0l7m79xrEo6HnSiVtvsX9yx_q9fE_w/page/Reo0

Removing Personal Data from Google Analytics

JANUARY 10, 2020 • MORGAN JONES

Google Analytics prohibits the collection of personally identifiable information. If detected, Google could delete PII from your reports. Users of Google Analytics should therefore be proactive to detect and then delete the PII if necessary.

I’ll explain how to do that in this post.

The best way to remove PII is not to send it to Analytics to begin with. For more, see “Best practices to avoid sending Personally Identifiable Information (PII),” a Google post.

Detecting PII

In the article above, Google explains where to search for PII in your reports. The main areas are:

User ID settings,
Content settings (pages and page titles),
Event settings (category, action, label),
Ecommerce settings (credit card, customer name, shipping and billing address, phone number),
Campaign dimensions (source, medium, campaign, ad content, term fields),
Site-search settings (search term or category),
Custom dimensions.

The screenshot below is an example of PII data. The Source field contains an email address, and the Medium field contains a phone number, which I’ve partially obscured.

This sample Acquisition report includes a personal email address and phone number in, respectively, the Source and Medium fields. Click image to enlarge.

I’ve created a Google Data Studio report to detect PII. It should not be your only method of detecting, but it should work in most cases. Monitor this report regularly. Modify the filters to fit your setup and take action if PII is detected.

Deleting PII

Google provides a process for removing PII. Navigate to Admin > Property > Data Deletion Requests > Create Data Deletion Request.

Enter the “Start Date,” “End Date,” and “Fields to Delete.” In the below example, I am deleting “All” fields because my campaign URL parameters contained PII, and “All” is required to remove this data.

Enter the “Start Date,” “End Date,” and “Fields to Delete.” To delete PII in campaign URL parameters, select “All.”

After clicking “Submit,” the status is “In Grace Period.” It takes at least seven days for Google to remove the data. Check after a week or so to confirm.

You can cancel the deletion request in the interim. To do this, click on “In grace period” in the table and reach the “Data Deletion Request Review” page (below), which includes the status, details, and option to cancel.

To cancel a deletion request, click on “In grace period” in the table and reach the “Cancel Deletion” button on the “Data Deletion Request Review” page. Click image to enlarge.

When it completes the Data Deletion Request, Google will notify you in an email. Also, the status will update to “Completed.”

When finalized, the Data Deletion Request status will change to “Completed.” Click image to enlarge.

Finally, confirm that Google has removed the PII in your reports. The example Acquisition report, above, included my email address and phone number, respectively, in the Source/Medium fields. Note, below, that both are now gone.

Confirm in your reports that Google has removed the PII. This Acquisition report no longer contains a personal email address and phone number in the Source/Medium fields. Click image to enlarge.

Show less

REDACT PII in Analytics

https://brianclifton.com/blog/2017/09/07/remove-pii-from-google-analytics/
This is my PII extension to the initial post by the excellent Simo Ahava (his post: Remove PII From Google Analytics Hits).

Essentially, I had been looking for a way to block Personally Identifiable Information (PII) hits at the collection level i.e. using GTM, before the hit is sent to Google Analytics.

Why do this?

Putting the obvious requirement to not gather personal data to one side, if you are adding filters to your analytics views to delete PII, it is simply too late – the problem has already occurred and GDPR compliance has been broken! See my related post on why filters are not sufficient.

Previously, by using GTM I would simply drop any hits containing page URLs with an @ symbol i.e. in case the URL contained an email address. Apart from being quite blunt (not all URLs with an @ symbol contain an email address), this approach would not tackle email addresses being present in other hit types e.g. events, e-commerce data etc. It also did not tackle other PII types – such as telephone numbers, zip codes, usernames etc. Hence, the much better approach of Simo’s method – using GTM’s new customTask feature – was very interesting to me!

In this post, I extend his method by building out the regex more – for a more sophisticated email detection, and to capture other PII types…

Redact, rather than remove PII

The important thing here is to remember we are redacting the PII – not blocking or removing it. This is an important distinction. If PII is present, it is almost certain that the same PII is being logged elsewhere on your network – your web server logfile at the very least. Reporting this in your Google Analytics in redacted form means you have a monitoring system to flag to your web dev/IT team in order to fix and keep on top of. Essentially, to be compliant, PII issues need to be fixed at their source by your organisation. Alternatively, if you deleted the PII data from your reports is simply stopped collecting it in GA, you would metaphorically be sweeping the problem under the carpet.

Here is my adjusted code for your Custom JavaScript variable.

IMPORTANT: This is a straight replacement to Simo’s code. Replace example\.com with the domain of your website (lines 7 and 11). More on what this is for later. Thank you to the excellent David Vallejo for his JavaScript help – my skills are simply too rusty nowadays! As always, when working with code it’s up to you to test it and ensure it works correctly. No liability accepted!

UPDATE: This code was rewritten 29-Aug-2018 for better handling of the GA hit. In particular, it now works with GTM’s native YouTube trigger. Simply swap out the original code for this new one.


function() {
  return function(model) {
    try{
      // Add the PII patterns into this array as objects
      var piiRegex = [{
        name: 'EMAIL',
        regex: /[^\/]{4}(@|%40)(?!example\.com)[^\/]{4}/gi,
        group: ''
      },{
      name: 'SELF-EMAIL',
        regex: /[^\/]{4}(@|%40)(?=example\.com)[^\/]{4}/gi,
        group: ''
      },{
        name: 'TEL',
        regex: /((tel=)|(telephone=)|(phone=)|(mobile=)|(mob=))[\d\+\s][^&\/\?]+/gi,
        group: '$1'
      },{
        name: 'NAME',
        regex: /((firstname=)|(lastname=)|(surname=))[^&\/\?]+/gi,
        group: '$1'     
      },{
        name: 'PASSWORD',
        regex: /((password=)|(passwd=)|(pass=))[^&\/\?]+/gi,
        group: '$1'
      },{
        name: 'ZIP',
        regex: /((postcode=)|(zipcode=)|(zip=))[^&\/\?]+/gi,
        group: '$1'
      }

    ];        
      // Fetch reference to the original sendHitTask
      var originalSendTask = model.get('sendHitTask');
      var i, hitPayload, data, val;


      model.set('sendHitTask', function(sendModel) {
          hitPayload = model.get('hitPayload');  
          //  Let's convert the current querystring into a key,value object
          data = (hitPayload).replace(/(^\?)/,'').split("&").map(function(n){return n = n.split("="),this[n[0]] = n[1],this}.bind({}))[0];
      //  We'll be looping thu all key and values now
          for(var key in data){

              // Let's have the value decoded before matching it against our array of regexes
              piiRegex.forEach(function(pii) {  
                var val = decodeURIComponent(data[key]);                
                // The value is matching?
                if(val.match(pii.regex)){
                  // Let's replace the key value based on the regex and let's reencode the value
                  data[key] = encodeURIComponent(val.replace(pii.regex, pii.group + '[REDACTED ' + pii.name + ']'));                
                }                        
              });  
                      
          }        
          // Going back to roots, convert our data object into a querystring again =)    
          sendModel.set('hitPayload', Object.keys(data).map(function(key) { return (key) + '=' + (data[key]); }).join('&'), true);
          // Set the value
          originalSendTask(sendModel);
      });    
    }catch(e){}
  };
}

Edit Your Tags

In order to function as intended, the customTask field needs to be added to ALL Google Analytics tags. That of course is cumbersome and does not scale with the volume of tags used. Therefore it is much better to apply this as a one-time fix in a Google Analytics settings variable. You can read more about the power of the Universal Analytics settings variable approach from Simo.

Now any hits sent by these tags will be parsed by this variable, which replaces the instances of PII with the string [REDACTED pii_type]. For example, a URL with path:

/test?tel=+44012345678&email=brian@me.com&other=bclifton@DOMAIN.com&firstName=brian&password=hello

would be replaced with:

/test?tel=[REDACTED TELEPHONE]&email=b[REDACTED EMAIL]om&other=bcli[REDACTED SELF-EMAIL]OMAIN.com&firstName=[REDACTED NAME]&password=[REDACTED PASSWORD]

The Regex Changes Explained

-Extending the Email regex

For the EMAIL check, I make two changes to Simo’s original regex:

regex: /[^\/]{4}@(?!domain\.com)[^\/]{4}/gi,

Firstly, this matches any character that is not a forward slash / 4 times, followed by @. Then, so long as this is not followed by domain.com, it matches the next 4 characters which are not a forward slash.

So apart from looking for an email address, I am doing two extra things:

1. I exclude any “innocent” links that may be captured as outbound links containing an @. Common examples are Google Maps and Flickr links, which contain a forward slash – the [^\/] part. Example links:

www.google.com/maps/place/University+of+San+Francisco+-+Folger+Bldg,+101+Howard+St,+San+Francisco,+CA+94105/@37.7908871,-122.3925594,17z/data=!3m1!
www.flickr.com/photos/123456@N06/sets/721576344/Other PII data types

2. I exclude the domain of the website itself from this check using a negative look ahead – the (?!….) part. Remember to replace domain\.com with your own domain e.g. brianclifton\.com in my case. I match for this separately next.

My suggestion for a separate regex is to catch and redact any payloads containing the SAME email domain as the site itself, with a different “name” value to the regular email redaction. That way such emails will be reported differently in Google Analytics, allowing the site owner to ignore these and monitor real PII infringements.

For example:

If a visitor comes to my site and I capture their email address as simo@hissite.com, that is redaction_message [REDACTED EMAIL]
If a visitor comes to my site and I capture my own email address as an outbound click-through to the site owner e.g. mysite@brianclifton.com, that is redaction_message [REDACTED SELF-EMAIL]

As the site owner, the first message is the one I should be paying attention to. The second message (not really PII as it belongs to the site owner) keeps me compliant with Google’s terms of service.

For the SELF-EMAIL check, the regex is almost identical:

regex: /[^\/]{4}@(?=domain\.com)[^\/]{4}/gi,

The difference now is that I do wish to include my own domain in the match and this is achieved via a positive look ahead – the (?=….) part.

-Extending the regex to capture other PII

The original post by Simo was a simple pattern match – easy to use and maintain when you know the structure of the match you are looking for e.g. an @ symbol to match email addresses, or a well structured set of characters and numbers for strings like personal ID and social security numbers. However, I want to extend this to match less structured PII, for example people’s names, addresses, telephone numbers, zip codes etc.

To do this, we need a regex anchor. That is, a common string likely to contain such PII. I am assuming all such matches are contained within URL strings as query parameters (though name=value pairs in the URL path are also matched) e.g.

/test?tel=+46(0)12398765&firstname=Brian&zip=abc123

The anchor is the query name and we match for common PII culprits – these are tel, firstname and zip in my example. Of course these should be adjusted for your particular language. Anchors are the reason why the group key is required:

name: 'ZIP',regex: /((postcode=)|(zipcode=)|(zip=))[^\/\?&]+/gi,group: '$1'

In this case, $1 is the value of the string (our anchor) just before and including the = sign. We keep this in place for the data hit, and redact what follows. Without applying the grouping, the entire name=value pair would be redacted making troubleshooting difficult. I use [^&\/\?] in order to conclude the match within paths, or query parameters…

Happy compliance testing

BTW, you do you know I am building a data auditing and compliance tool to measure and monitor Google Analytics data quality, right?

https://brianclifton.com/blog/2017/09/07/remove-pii-from-google-analytics/#comment-160455
May 4, 2021 at 6:55 am
Hi Brian,
How do I do it for GA4 property? Should I just replace the page_location and page_referrer in the “Fields to Set” section in the GTM configuration tag?

http://www.advanced-web-metrics.com/ – Brian Clifton
May 4, 2021 at 8:05 am
Hello Prabhu – note that customTask is not available in GA4 and I suspect is unlikely to ever be available. Essentially, the customTask method was an unsupported and undocumented feature of Universal Analytics – it was a hack, albeit a very powerful one.

Pii
find & fix – TagManager

RESOURCES

How to Find and Fix PII in Google Analytics Data

Find PII in analytics

Find PII in analytics

Find PII in analytics

Find PII in analytics

Find PII in analytics

What Google Considers PII

How to Look for PII in Your Google Analytics

Look for email addresses

Look for addresses

Look for phone numbers

Look for names

How to Remove PII from Pageview Hits

Next Steps

what to do with current pii + nuclear option

you found pii

Removing Personal Data from Google Analytics

JANUARY 10, 2020 • MORGAN JONES

Detecting PII

Deleting PII

REDACT PII in Analytics

Redact, rather than remove PII

Edit Your Tags

The Regex Changes Explained

-Extending the Email regex

-Extending the regex to capture other PII

simoahava.com – remove PII (original article)

TIP 64: REMOVE PII FROM HITS TO GOOGLE ANALYTICS

Image Editor - Bitmap PIXIE
Palleon | Palleon Documentation

Image Editor - Vector

RESOURCES

How to Find and Fix PII in Google Analytics Data

Find PII in analytics

Find PII in analytics

Find PII in analytics

Find PII in analytics

Find PII in analytics

What Google Considers PII

How to Look for PII in Your Google Analytics

Look for email addresses

Look for social security #’s

Look for addresses

Look for phone numbers

Look for names

How to Remove PII from Pageview Hits

Next Steps

what to do with current pii + nuclear option

you found pii

Removing Personal Data from Google Analytics

JANUARY 10, 2020 • MORGAN JONES

Detecting PII

Deleting PII

REDACT PII in Analytics

Redact, rather than remove PII

Edit Your Tags

The Regex Changes Explained

-Extending the Email regex

-Extending the regex to capture other PII

simoahava.com – remove PII (original article)

TIP 64: REMOVE PII FROM HITS TO GOOGLE ANALYTICS

Image Editor - Bitmap PIXIE Palleon | Palleon Documentation

Image Editor - Vector

Image Editor - Bitmap PIXIE
Palleon | Palleon Documentation