Back to Blog
Privacy
Feb 12, 2025
Ludde

PII Leaks in GTM: How to Find and Fix Them Before Google Does

Email addresses in your URLs? Phone numbers in event parameters? Here's how to detect and block PII from reaching Google.

PII Leaks in GTM: How to Find and Fix Them Before Google Does

Sending Personally Identifiable Information (PII) to Google Analytics is a violation of Google's Terms of Service — and it can lead to your GA4 property being permanently deleted with no recovery option. This isn't a theoretical risk: Google actively scans for PII patterns in analytics data, and we've seen properties suspended without warning.

The problem is that PII leaks are almost never intentional. They happen because of URL structures that include email addresses, form tracking that accidentally captures personal data, or backend developers pushing user objects to the data layer without understanding what GA4 collects. This guide explains exactly where PII hides, how to detect it, and how to build systematic prevention into your implementation.

GTM triggers list showing custom event triggers, page view triggers, and click triggers used for tracking

What Counts as PII?

Google defines PII as any data that could be used to directly identify, contact, or precisely locate a specific individual. This includes:

  • Email addresses — The most common PII leak in analytics implementations.
  • Phone numbers — Often captured via click-to-call tracking or form fields.
  • Full names — First + last name combinations, especially from form submissions.
  • Physical addresses — Street addresses, postal codes in some jurisdictions.
  • Social security / national ID numbers — Extremely sensitive; any exposure is a serious incident.
  • Credit card numbers — Should never appear in analytics under any circumstances.
  • Login credentials — Usernames that are email addresses, passwords in error messages.

Note that hashed email addresses are generally acceptable (and are even recommended for features like Enhanced Conversions). The issue is with unhashed, plaintext PII appearing in your analytics data.

Where PII Hides in Your Implementation

1. URL Parameters

This is the most common source of PII leaks. GA4 automatically collects the full page URL with every page_view event. If your URLs contain PII in query parameters, it goes straight to Google:

// These URLs send PII to GA4 automatically:
example.com/login?email=john@example.com
example.com/password-reset?user=jane.doe@company.com
example.com/profile?name=John+Smith&phone=555-123-4567
example.com/order-confirmation?email=customer@email.com&order=12345

Common culprits include login redirect URLs, password reset links, form submission confirmation pages, and email marketing links that pass user identifiers.

2. Form Submissions and Click Tracking

If you're using Custom HTML tags or Auto-Event Variables to capture form field values, you may be sending PII without realizing it. For example, tracking a "form_submit" event that captures the form field values will include whatever the user typed — including their email, name, and phone number.

3. DataLayer Pushes from Backend

Backend developers often push user objects to the data layer that include full user profiles. If these objects contain email, name, or phone fields, and your GTM tags are reading from the data layer, PII ends up in GA4:

// BAD: This pushes PII to the data layer
dataLayer.push({
  user_email: 'john@example.com',  // PII!
  user_name: 'John Smith',          // PII!
  user_phone: '+1-555-123-4567',    // PII!
  user_type: 'premium'              // This is fine
});

// GOOD: Use hashed or anonymized identifiers
dataLayer.push({
  user_id: 'abc123def456',           // Opaque ID
  user_type: 'premium',
  hashed_email: 'sha256hash...'      // Hashed (for Enhanced Conversions)
});

4. User-ID and Custom Dimensions

Using an email address as the GA4 User-ID is a direct PII violation. The User-ID must be an opaque, non-reversible identifier — not an email, phone number, or any other PII. Similarly, custom dimensions should never contain plaintext PII values.

5. Page Titles and Site Search

Some CMS platforms include user information in page titles (e.g., "Welcome, John Smith - Dashboard"). GA4 collects page titles automatically. Site search terms are also collected by Enhanced Measurement — if users search for email addresses or personal information, that data goes to GA4.

Prevention Strategies

  1. URL Sanitization via Custom JavaScript Variable:

    Create a Custom JavaScript variable in GTM that strips PII patterns from URLs before they're sent to GA4. Use regex to detect and redact email patterns, phone numbers, and other PII from the page location and page referrer values.

    // Example: Redact emails from URLs
    function() {
      var url = {{Page URL}};
      return url.replace(/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+.[a-zA-Z0-9_-]+)/gi, '[REDACTED]');
    }
  2. GA4 Data Redaction:

    GA4 has a built-in "Redact data" setting in Admin → Data Streams → Configure Tag Settings. Enable "Email redaction" to automatically redact email-like patterns from event parameters. However, do not rely on this alone — it doesn't catch all PII types.

  3. Server-Side GTM Filtering:

    If you're using server-side GTM, add a transformation that strips PII patterns from all incoming event data before forwarding to GA4. This creates a centralized PII filter that protects against leaks regardless of what the client-side sends.

  4. Backend Data Layer Guidelines:

    Create a data layer specification document for your development team that explicitly lists which fields are safe to push and which are prohibited. Review every new data layer push before it goes live.

  5. Regular Audits:

    PII leaks often come from new feature deployments, not from the initial setup. A CMS update, a new form, or a marketing team adding query parameters to email links can introduce PII leaks at any time. Audit quarterly at minimum.

How to Check If You're Already Leaking PII

  1. BigQuery Export: If you have BigQuery export enabled, run a query searching for email patterns in page_location, page_referrer, and event parameters. Use regex like r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'.
  2. GA4 Explorations: Create a free-form exploration with "Page location" as a dimension. Scan through the URLs for query parameters containing email addresses or names.
  3. GTM Custom HTML Audit: Review every Custom HTML tag in your container. These are unrestricted and can capture anything from the page DOM.
  4. Network Tab Analysis: Use Chrome DevTools to monitor GA4 collection requests and inspect the payload for PII patterns.

Automated PII Scanning

NiceLookingData scans your GTM container for Custom HTML tags, variable definitions, URL patterns, and trigger configurations that commonly lead to PII in analytics hits. We flag high-risk configurations and provide specific remediation steps for each finding.

Key Takeaways

  • PII in GA4 violates Google's Terms of Service and can result in permanent property deletion.
  • URL parameters are the #1 source of PII leaks — GA4 automatically collects the full page URL with every page_view.
  • Never use email addresses as User-IDs — use opaque, non-reversible identifiers.
  • Enable GA4's built-in email redaction, but don't rely on it exclusively — implement URL sanitization in GTM as well.
  • Audit for PII quarterly, especially after new feature deployments, CMS updates, or marketing campaigns that modify URL parameters.
Thanks for reading!