Monitoring & Logging
A sGTM server that goes dark does not generate browser console errors. There is no user-visible error page. Tags silently stop firing, ad conversions stop flowing, and analytics data gaps widen — all without any direct signal. Monitoring and logging are not optional for production sGTM deployments. They are how you find out something went wrong before your stakeholders do.
Cloud Logging
Section titled “Cloud Logging”Cloud Logging automatically captures all output from your sGTM container. Every logToConsole() call in your templates, every HTTP request your server handles, and system events from Cloud Run all appear in Cloud Logging.
Accessing logs
Section titled “Accessing logs”# View recent sGTM logsgcloud logging read \ "resource.type=cloud_run_revision \ AND resource.labels.service_name=sgtm-production" \ --limit 100 \ --freshness 1h \ --format "table(timestamp,textPayload)"
# Filter for errors onlygcloud logging read \ "resource.type=cloud_run_revision \ AND resource.labels.service_name=sgtm-production \ AND severity>=ERROR" \ --limit 50 \ --freshness 24hOr use the GCP Console: Logging → Logs Explorer. Filter by service name and severity.
Log structure
Section titled “Log structure”sGTM generates several log types:
Request logs: One log entry per incoming HTTP request. Shows:
- Timestamp
- Request path and method
- Response status code
- Response time (milliseconds)
- Request size / response size
Tag logs: When tags call logToConsole(), entries appear in Cloud Logging. This is how you get visibility into what tags are processing.
Error logs: Container errors, template exceptions, and system errors appear with severity: ERROR.
Custom log statements
Section titled “Custom log statements”Add logToConsole() calls to your tag templates for operational visibility:
const logToConsole = require('logToConsole');const JSON = require('JSON');const getEventData = require('getEventData');
// Log at tag execution startlogToConsole(JSON.stringify({ level: 'info', tag: 'meta_capi', event: getEventData('event_name'), client_id: getEventData('client_id'), has_email: !!getEventData('user_email'), ts: getTimestampMillis(),}));
// After successful API calllogToConsole(JSON.stringify({ level: 'info', tag: 'meta_capi', status: 'success', event: getEventData('event_name'),}));
// On failurelogToConsole(JSON.stringify({ level: 'error', tag: 'meta_capi', status: 'failed', http_status: statusCode, event: getEventData('event_name'),}));Structured JSON logs (rather than string messages) are queryable in Cloud Logging and exportable to BigQuery for analysis.
Cloud Monitoring
Section titled “Cloud Monitoring”Cloud Monitoring provides time-series metrics from your Cloud Run service.
Key metrics to track
Section titled “Key metrics to track”Request count: Total inbound requests per minute. Monitor for unexpected drops (server down) or unexpected spikes (runaway client or traffic spike).
Request latency (P50, P95, P99): P50 should be under 200ms. P99 should be under 1 second. A spike in P99 indicates cold starts or slow outbound API calls.
Container instance count: How many Cloud Run instances are running. Unexpectedly high instance counts indicate either a traffic spike or misconfigured concurrency settings.
CPU utilization: Should stay below 70% average. Consistent high CPU may require upgrading to 2 vCPU or reducing concurrency.
Memory utilization: Should stay below 80%. Approaching 100% causes OOM errors and container restarts.
Setting up a monitoring dashboard
Section titled “Setting up a monitoring dashboard”-
GCP Console → Cloud Monitoring → Dashboards → Create Custom Dashboard
-
Add these chart widgets:
- Request count (grouped by response code)
- Request latency percentiles (P50, P95, P99)
- Container instance count
- CPU utilization
- Memory utilization
-
Set the dashboard as your default view for the project
Alerting policies
Section titled “Alerting policies”Set up alerts for the conditions that matter most:
High error rate alert:
gcloud alpha monitoring policies create \ --notification-channels="projects/PROJECT/notificationChannels/CHANNEL_ID" \ --display-name="sGTM high error rate" \ --condition-filter='resource.type="cloud_run_revision" AND metric.type="run.googleapis.com/request_count" AND metric.labels.response_code_class="5xx"' \ --condition-threshold-value=10 \ --condition-threshold-comparison=COMPARISON_GT \ --condition-duration=60sHigh P99 latency alert (cold start indicator): Trigger when P99 request latency exceeds 2000ms for more than 2 minutes.
Low request count alert (server down): Trigger when request count drops below 20% of the rolling 7-day average for more than 5 minutes.
Cost spike alert: In GCP Billing → Budget Alerts → Create Budget. Set a monthly budget and alert at 80% and 100%.
Uptime monitoring
Section titled “Uptime monitoring”Cloud Monitoring’s Uptime Checks provide external health monitoring independent of Cloud Logging:
-
Cloud Monitoring → Uptime Checks → Create Check
-
Settings:
- Protocol: HTTPS
- Host:
collect.yoursite.com - Path:
/healthz - Check frequency: 1 minute
- Expected response:
200
-
Create an alert policy that fires when the check fails for 2+ consecutive periods (2 minutes)
-
Add email or PagerDuty notification channel
An external uptime check is your fastest signal for server outages. Cloud Logging often lags by 1–2 minutes during incidents; the uptime check alerts within 2 minutes of the first failed response.
sGTM Preview as a debugging tool
Section titled “sGTM Preview as a debugging tool”For non-production debugging, sGTM Preview is more useful than logs. It shows every step of request processing in real time:
- Open your sGTM container in GTM editor
- Click Preview — a debug session starts
- Navigate to your site with client-side GTM Preview also active
- In the sGTM Preview panel, click any request to see:
- Which client claimed it
- The full Event Model built from the request
- Which triggers fired
- Which tags executed, with their output payloads
- Console output from
logToConsole()calls
For tags that make outbound HTTP requests, the Preview panel shows the request URL, request body, and response status code. This is where you diagnose tag failures before they become production issues.
Log-based alerting for tag failures
Section titled “Log-based alerting for tag failures”Create a log-based metric that counts error-level tag log entries:
# Create a log-based metric for tag errorsgcloud logging metrics create sgtm-tag-errors \ --description="sGTM tag execution errors" \ --log-filter='resource.type="cloud_run_revision" AND textPayload=~"\"level\":\"error\""'Then create a Cloud Monitoring alert on this metric: fire when the error count exceeds 10 per minute.
Request tracing
Section titled “Request tracing”For debugging specific request flows, use Cloud Trace:
- GCP Console → Cloud Trace → Trace List
- Find requests by URL path or time range
- Click a trace to see the full request timeline: which function calls took how long, where latency occurred
This is particularly useful for diagnosing slow Firestore lookups or slow outbound API calls in enrichment tags.
Log export to BigQuery
Section titled “Log export to BigQuery”For long-term analysis, export Cloud Logging to BigQuery:
- Cloud Logging → Log Router → Create Sink
- Destination: BigQuery dataset (create one)
- Filter: your sGTM service logs
- Save — logs stream to BigQuery in near-real-time
Once in BigQuery, query for trends:
-- Top error-producing events over the last 7 daysSELECT JSON_VALUE(textPayload, '$.event') as event_name, COUNT(*) as error_countFROM `your_project.sgtm_logs.cloud_run_*`WHERE DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND JSON_VALUE(textPayload, '$.level') = 'error'GROUP BY event_nameORDER BY error_count DESCLIMIT 20Common mistakes
Section titled “Common mistakes”No minimum instances + no monitoring. With minimum instances at 0 and no cold start monitoring, you silently drop tracking events every morning when the first request of the day hits a cold container. The event data gap is invisible without log-based monitoring.
Not structuring log output as JSON. Unstructured log messages are not queryable. If your logToConsole() calls emit plain strings, you cannot build alerts or BigQuery queries on them.
Monitoring only the health check, not tag success rates. The /healthz endpoint returning 200 means the server is running — it does not mean your tags are firing correctly. Monitor tag-level success rates separately.
Alert fatigue from overly sensitive thresholds. An alert that fires 20 times per week for normal traffic spikes trains your team to ignore alerts. Tune alert thresholds after observing normal traffic patterns for 2+ weeks.