Prometheus Interview — Service Discovery and Relabeling

Your Prometheus scrapes Kubernetes pods using service discovery, but the default label names are verbose (__meta_kubernetes_pod_label_app, __meta_kubernetes_pod_namespace, etc.). Dashboards need cleaner label names. Also, some pods have a 'scrape: false' annotation that should prevent scraping. How do you use relabeling to clean up labels and filter targets?

Use relabel_configs to rename, filter, and transform labels before scrape. Example: relabel_configs: [ { source_labels: [__meta_kubernetes_pod_label_app], target_label: app, action: 'replace' }, { source_labels: [__meta_kubernetes_pod_namespace], target_label: namespace, action: 'replace' }, { source_labels: [__meta_kubernetes_pod_annotation_scrape], target_label: scrape, action: 'replace', regex: '(.+)', replacement: '$1' }, { source_labels: [scrape], action: 'drop', regex: 'false' } ]. The first three rules rename labels. The fourth rule uses the 'drop' action: if 'scrape' label matches 'false', the entire target is dropped (not scraped). Relabeling happens *before* scrape, so dropped targets never make a network request. Always use source_labels as a list, even for one label. The regex and replacement parameters support groups: regex: '([a-z]+)-([0-9]+)' captures groups referenced as $1, $2 in replacement.

Follow-up: If an annotation value changes at runtime (pod updated to scrape: "false"), how quickly does Prometheus pick up the change? Is there a SD cache interval?

Your Prometheus scrapes 50 different job types (databases, caches, load balancers, etc.), each with different label naming conventions. Database jobs expose 'db_cluster' label, cache jobs expose 'cache_shard' label. When writing alerts and dashboards, you need a unified label naming scheme (e.g., all called 'instance_group'). How do you normalize labels via relabeling?

Use relabel_configs to rename and normalize labels across different jobs. Example: relabel_configs: [ { source_labels: [job, db_cluster], target_label: instance_group, regex: 'database;([^;]+)', replacement: '$1', action: 'replace' }, { source_labels: [job, cache_shard], target_label: instance_group, regex: 'cache;([^;]+)', replacement: '$1', action: 'replace' } ]. This creates an 'instance_group' label derived from job-specific labels. For simpler cases, use label_replace in PromQL: label_replace(metric, "normalized_label", "$1", "original_label", "(.+)"). However, relabeling (in config) is preferred because it's evaluated once at scrape time, not at query time. For complex label mappings, use multiple relabel_configs rules executed sequentially. Each rule transforms the label set, so later rules see the output of earlier rules. Order matters: more specific rules first, then catch-alls.

Follow-up: If two relabel rules conflict (both trying to set the same target_label), which one wins? Does the last rule win or first?

Your team is migrating from EC2 to Kubernetes. Old EC2 instances are scraped via EC2 service discovery with tags (env, service, team). New Kubernetes pods are scraped with annotations (same info). However, metrics from EC2 and Kubernetes have different label names. You need a unified label set for cross-infrastructure dashboards. How do you align labels from different service discoveries?

Add an explicit 'infrastructure' label during relabeling to distinguish sources, then unify upstream labels: scrape_configs: [ { job_name: 'ec2', ec2_sd_configs: [...], relabel_configs: [ { source_labels: [__meta_ec2_tag_service], target_label: service, action: 'replace' }, { source_labels: [__meta_ec2_tag_env], target_label: env, action: 'replace' }, { source_labels: [], target_label: infrastructure, replacement: 'ec2', action: 'replace' } ] }, { job_name: 'kubernetes', kubernetes_sd_configs: [...], relabel_configs: [ { source_labels: [__meta_kubernetes_pod_label_service], target_label: service, action: 'replace' }, { source_labels: [__meta_kubernetes_pod_label_env], target_label: env, action: 'replace' }, { source_labels: [], target_label: infrastructure, replacement: 'kubernetes', action: 'replace' } ] } ]. Now all metrics have 'service', 'env', 'infrastructure' labels. Query 'up{service="api"}' returns both EC2 and Kubernetes pods. For environment-specific queries, filter by infrastructure: 'up{service="api", infrastructure="kubernetes"}'.

Follow-up: If you have 20+ service discovery methods (EC2, Azure, Kubernetes, Consul, DNS), does maintaining relabel_configs for each become unmaintainable?

You're scraping a multi-tenant application where each tenant's metrics should be tagged with tenant_id. The application exposes metrics but doesn't include tenant_id labels (it's a generic exporter). You need to inject tenant_id from the scrape target metadata (e.g., pod annotation 'tenant_id=CUST_123'). How do you add external context to metrics via relabeling?

Extract tenant_id from the target metadata and promote it to a metric label using relabel_configs: relabel_configs: [ { source_labels: [__meta_kubernetes_pod_annotation_tenant_id], target_label: tenant_id, action: 'replace' }, { source_labels: [__meta_kubernetes_pod_label_team], target_label: team, action: 'replace' } ]. After relabeling, all metrics scraped from that pod include 'tenant_id' and 'team' labels. This is powerful for multi-tenancy: query 'http_requests_total{tenant_id="CUST_123"}' isolates that tenant's data. Important: relabeled labels are added *before* scrape, not after. This means the labels appear in the final time-series. However, if the exporter already has a 'tenant_id' label with a different value, there's a conflict. In this case, use 'honor_labels: false' to allow Prometheus labels to override exporter labels, or drop the exporter label explicitly before relabeling.

Follow-up: If tenant_id annotation is missing from a pod, what happens to the metric label? Is it dropped or labeled with an empty string?

You have 10,000 exporters across EC2, and their IP addresses are constantly changing due to auto-scaling. Prometheus uses EC2 service discovery to find them, but sometimes scrapes fail because the IP changed between SD refresh and scrape attempt. How do you handle stale targets and prevent scrape failures from IP churn?

Service discovery caches target lists and refreshes periodically (default 30s for EC2). Between refreshes, an instance can be terminated or replaced, causing scrape failures. Mitigate with: (1) Reduce SD refresh interval: set sd_refresh_interval in job config to 5-10s for faster target updates. (2) Implement fast target failure recovery: set scrape_timeout to 5-10s so failed scrapes fail fast and are retried. (3) Use relabel_configs to filter stale IPs: if an IP is known to be unreachable, drop it early. (4) Add a health check webhook: verify the target is alive before adding it to SD results. (5) Enable prometheus_sd_failed_refresh_attempts_total to monitor SD instability. If high, investigate the EC2 API (throttling, credentials). (6) For extreme scale (10k+ targets), use Prometheus's --storage.tsdb.compact.timestamp to fast-compact old series. (7) Consider DNS-based SD (slower but more stable) or a custom SD provider that caches targets across multiple sources.

Follow-up: If an EC2 instance is launched, discovered by SD, scraped once, then immediately terminated, what traces are left in Prometheus? Can this cause cardinality explosion?

You're implementing automatic service discovery relabeling for a new data center. You have 1,000 relabel rules spread across 20 scrape jobs. A junior engineer makes a typo in a regex (e.g., '([^:]+)' instead of '([^:]+)?'), causing all targets for one job to be dropped silently. How do you prevent relabeling errors in production?

Relabel errors (invalid regex, wrong group references) fail silently; Prometheus drops targets without alerting. Prevent errors by: (1) Testing relabel_configs before deployment using promtool: promtool test-sd-config --config prometheus.yml --job job_name. This validates SD configs and relabeling without running Prometheus. (2) Adding pre-deployment checks: use a YAML linter and regex validator in CI/CD. (3) Implementing canary deployment: deploy new relabel rules to 1% of targets first, verify targets are scraped, then roll out to 100%. (4) Monitoring Prometheus's dropped_targets and relabel metrics: prometheus_sd_discovered_targets vs prometheus_sd_up_targets shows how many targets were filtered. Alert if the ratio changes dramatically. (5) Adding logging: Prometheus debug logs (--log.level=debug) show which targets were dropped and why. (6) For complex relabeling, split into multiple rules: test each rule independently before combining. (7) Document relabel rules with comments explaining regex groups and expected matches.

Follow-up: If a regex group reference is out of bounds (e.g., replacement: '$3' but the regex only has 2 groups), what happens?

You're using Kubernetes service discovery with relabeling to extract pod labels dynamically. However, Prometheus is discovering 50,000 pods across many namespaces, and relabeling is taking 500ms per SD refresh cycle. This blocks scrape scheduling and causes gaps in metrics. How do you optimize relabeling performance at scale?

Relabeling is sequential; each rule processes the label set. At 50k targets with 10 rules, this is 500k label transformations per cycle. Optimize by: (1) Ordering rules efficiently: drop rules early to reduce downstream processing. (2) Using regex sparingly: regex matching is expensive. Prefer exact matches or prefix/suffix operations. (3) Consolidating rules: combine multiple label_replace calls into one complex rule if possible. (4) Reducing cardinality at SD source: filter by namespace or label before relabeling: kubernetes_sd_configs: [ { role: 'pod', namespaces: { names: ['prod'] } } ]. This reduces targets from 50k to 5k, making relabeling 10x faster. (5) Using Prometheus 2.45+, which has optimized relabeling. (6) Sharding targets: run multiple Prometheus instances, each responsible for 5k targets. (7) For extreme scale, use agent mode or streaming targets from a custom SD provider instead of full SD refresh cycles.

Follow-up: If a relabeling operation is very complex (20+ rules), does splitting into multiple jobs help? Or does Prometheus apply all relabel_configs per job anyway?

You've configured relabeling to drop metrics containing sensitive labels (e.g., passwords, API keys) before they're stored. However, you want to retain the labels internally for routing purposes (e.g., filter by env=secret), but drop them from the time-series stored in TSDB. Can relabeling achieve this, or is it too late in the pipeline?

Relabeling happens *before* metric scrape and storage, so you cannot selectively drop labels from some metrics but keep them for others. If you drop 'password' label via relabel_configs, it's dropped for all metrics. To keep labels for routing but drop from storage: (1) Use metric_relabel_configs (post-scrape, pre-TSDB storage): scrape the metric with all labels, then drop sensitive labels before storing. Example: metric_relabel_configs: [ { source_labels: [__name__], regex: 'internal_.*', action: 'drop' }, { regex: 'password|api_key', action: 'labeldrop' } ]. This keeps labels during scrape/routing but removes them before TSDB. (2) However, this doesn't help with routing (Alertmanager hasn't seen the alert yet). For route-based filtering, use relabel_configs to rename sensitive labels: { source_labels: [password], target_label: password_hash, action: 'replace', regex: '(.+)', replacement: 'REDACTED' }. Hash the value so routing doesn't expose it. (3) For true security, don't put secrets in labels at all. Use external labels or metadata services instead.

Follow-up: Is there a way to encrypt labels in transit to prevent them from being readable in Prometheus logs or API responses?