Jenkins Interview — Webhook Triggers and SCM Polling

Your Jenkins uses SCM polling (every 5 minutes) to trigger builds. With 200+ jobs, polling consumes 80% of controller CPU. Network bandwidth to Git server is saturated. Migrate to webhook-based triggering.

Migrate to webhooks: (1) Configure webhook on Git server (GitHub/GitLab): push event notifies Jenkins immediately. (2) Jenkins > Configure System > GitHub Plugin: set GitHub Server, add Personal Access Token. (3) For each job: change trigger from "Poll SCM" to "GitHub hook trigger for GITScm polling". (4) Implement webhook endpoint: Jenkins exposes `/github-webhook/` for GitHub. (5) Configure GitHub repo > Settings > Webhooks: add `http://jenkins/github-webhook/` as payload URL. (6) Set webhook delivery: trigger on Push events, Pull request events. (7) Implement webhook authentication: configure HMAC secret for signature verification. (8) Monitor webhook deliveries: GitHub > Webhook > Recent Deliveries shows status. (9) Implement retry: if Jenkins down, GitHub retries webhook delivery 3 times. (10) Switch off polling: disable "Poll SCM" triggers. Expected: CPU drops to 5%, latency improves (builds triggered in <10 sec vs 5-min poll). Monitor: webhook delivery latency, GitHub retry rate.

Follow-up: Jenkins is temporarily down. Git webhooks queued. How do you handle pending webhooks without duplication?

You're running webhooks at scale: 1000+ commits/hour. Jenkins webhook endpoint is receiving 200+ webhook requests/sec. Webhook processing is slow, queued requests back up. Implement resilient webhook ingestion.

Implement scalable webhooks: (1) Use message queue: webhooks post to Kafka/RabbitMQ instead of directly invoking Jenkins. (2) Jenkins consumer pulls from queue asynchronously. (3) Implement rate limiting: queue allows max 1000/sec, throttles excess. (4) Use webhook multiplexing: distribute webhooks across multiple Jenkins instances. (5) Implement async processing: Jenkins webhook handler returns 200 immediately, processes async. (6) Enable webhook batching: if multiple webhooks for same branch within 10 sec, coalesce. (7) Use job throttling: limit concurrent builds per branch, queue excess. (8) Implement webhook deduplication: detect duplicate webhooks (e.g., GitHub retries), deduplicate. (9) Monitor queue depth: alert if queue exceeds 5000 items. (10) Use webhook filtering: disable webhooks for non-essential events (comments, reviews). Architecture: Git -> Kafka -> Jenkins workers (N instances). Each worker pulls from queue, processes independently. Kafka ensures ordering, durability. Expected: handle 1000 webhooks/sec without backlog. Monitor: queue depth, latency from Git event to build start (target <30 sec).

Follow-up: Kafka broker crashes. Pending webhooks lost. Recovery strategy?

Your GitHub webhook is sending push events to Jenkins. An attacker repeatedly pushes commits to trigger expensive builds, exhausting resources. Implement security controls on webhook triggers.

Implement webhook security: (1) Enable webhook signature verification: GitHub signs webhook with HMAC-SHA256. Jenkins validates signature using shared secret. (2) Whitelist webhook IPs: GitHub publishes IP ranges. Only allow from GitHub IPs. (3) Implement rate limiting: limit builds per user/branch (e.g., max 10 builds/min per user). (4) Use authentication: require user token for each webhook. (5) Implement build throttling: max concurrent builds per user (e.g., 2 builds). (6) Use quota system: teams allocated N builds/hour, excess rejected. (7) Implement approval for suspicious commits: automated detection flags unusual patterns. (8) Monitor webhook patterns: alert if user triggering 100+ builds/hour. (9) Use branch filtering: only trigger for whitelisted branches (e.g., main, develop). (10) Implement webhook signing: Jenkins validates webhook came from legitimate Git server. Example: GitHub webhook secret "my-secret". Jenkins verifies: `HMAC-SHA256(payload, secret) == X-Hub-Signature-256`. If mismatch, reject. For abuse: implement per-user rate limits via Jenkins plugin. Track build submissions per user, reject if exceeding limit.

Follow-up: An attacker compromises a developer's GitHub token. Malicious webhooks trigger builds. Detection and response?

You're triggering builds via webhooks. Git server sends webhook, but Jenkins is temporarily unavailable (update in progress). Webhook sent to dead endpoint. Build never triggered. Implement webhook reliability.

Implement reliable webhooks: (1) Use message queue: webhook posts to queue (durable), Jenkins consumes when available. (2) Implement retry logic: if Jenkins returns non-200, Git retries webhook (GitHub default: 3 attempts). (3) Configure webhook timeout: set to 30+ seconds (GitHub default 20). (4) Use health checks: Git server can check Jenkins health before sending webhook. (5) Implement circuit breaker: if Jenkins unavailable, temporarily disable webhook delivery. (6) Use DNS failover: webhook points to VIP, load balancer routes to healthy Jenkins. (7) Implement webhook buffering: intermediate queue buffers webhooks during Jenkins downtime. (8) Use persistent queue: webhooks persisted before processing, ensures durability. (9) Monitor webhook delivery: track failed webhooks, alert for retry storms. (10) Implement webhook verification: Jenkins confirms webhook receipt, acknowledges to Git. For implementation: (1) GitHub webhook > Delivery > Recent Deliveries shows retry history. (2) Enable auto-retry in GitHub. (3) Use Jenkins plugin: GitHub Branch Source plugin handles retries automatically. Expected: webhooks never lost, builds triggered reliably even during maintenance.

Follow-up: Webhook delivery took 15 minutes due to network delay. Build triggered, but developer pushed again 2 minutes ago. Order of execution?

You run a hybrid setup: some Git repos use webhooks, others use polling (no webhook support). Managing mixed trigger strategies is complex. Unify trigger patterns.

Implement unified triggers: (1) Use Jenkins Job DSL to define all triggers in code. (2) Create trigger abstraction: `trigger.push()` abstracts webhook vs polling. (3) For webhook-capable repos: use GitHub Plugin trigger. (4) For non-webhook repos: use Poll SCM with optimized schedule. (5) Implement polling schedule optimization: poll less frequently (every 15 min instead of 5). (6) Use sparse polling: Multibranch pipeline discovers only changed branches, skips unchanged. (7) Implement hybrid trigger: webhook for fast path, polling as fallback. (8) Use repository-level config: `.jenkins/config.yaml` in repo specifies trigger type. (9) Implement trigger audit: log which trigger fired each build. (10) Communicate migration plan: teams understand webhook benefits, incentivize migration. For mixed environments: (1) Webhook repos: immediate builds (<10 sec latency). (2) Polling repos: 15-min delay. Document: teams understand which repos support webhooks. Implement: CI/CD pipeline validates all repos have webhooks configured (automation). For legacy repos: set up intermediate webhook receiver that polls on-prem Git server, forwards to Jenkins.

Follow-up: On-prem Git server doesn't support webhooks. How do you trigger builds efficiently without polling?

Your Jenkins pipeline receives webhook from GitHub, starts build. GitHub's webhook delivery metadata shows a timestamp 5 minutes before Jenkins received it (network delay). Your build uses `env.GIT_COMMIT` which is now 5 minutes stale. Implement timestamp-accurate builds.

Implement precise commit timing: (1) Extract webhook timestamp: parse GitHub webhook payload for `pushed_at` timestamp. (2) Store in environment variable: `env.WEBHOOK_TIMESTAMP = webhook.pushedAt`. (3) Use Git commit timestamp: `git log -1 --format=%ci` gets commit's author date. (4) Calculate latency: difference between webhook received and Git commit timestamp. (5) Implement latency alerts: if latency >2 min, investigate network. (6) Use Git metadata: Jenkins scm.lastChanges provides accurate commit info. (7) Implement webhook deduplication: if same commit webhook arrives twice, use first. (8) Track build start time: `BUILD_START_TIME = System.currentTimeMillis()`. (9) Store timing metadata: archive build metadata JSON with timestamps. (10) Monitor timing accuracy: Prometheus histogram tracks webhook latency. For implementation: parse GitHub webhook in Jenkinsfile: `def pushTime = params.payload.pushed_at ?: env.BUILD_TIMESTAMP`. Use for build artifacts naming, log analysis timestamping. Example: Jenkinsfile: `def buildTimestamp = sh(script: 'git log -1 --format=%ci', returnStdout: true).trim()`. Store in artifact metadata. Enables accurate post-hoc analysis of build history.

Follow-up: Multiple commits pushed simultaneously. GitHub sends N webhooks. Build order might differ from push order. Ordering guarantee?

You're implementing cross-repository triggering: build in repo A triggers build in repo B after success. Repo A uses webhook (fast), repo B uses polling (slow). Synchronization is unreliable. Implement reliable cross-repo triggering.

Implement reliable downstream triggers: (1) Use build trigger plugin: repo A's post-build trigger manually invokes repo B's build. (2) Pass parameters: repo A passes commit SHA, branch to repo B. (3) Implement queuing: if repo B build already running, queue instead of duplicate. (4) Use artifact passing: repo A archives artifacts, repo B pulls for downstream use. (5) Implement build linking: Jenkins links parent/child builds, audit trail. (6) Use shared pipeline library: define cross-repo coordination logic in library. (7) Implement status webhooks: repo B pull request updated with status of repo A build. (8) Use external orchestrator: Gitlab CI/GitHub Actions coordinates both repos. (9) Implement messaging: repo A pushes event to message queue, repo B consumes. (10) Monitor trigger chain: alert if chain breaks (repo A succeeds but repo B never triggers). Example: Jenkinsfile in repo A: `post { success { build job: 'repo-b-build', parameters: [string(name: 'GIT_COMMIT', value: env.GIT_COMMIT)] } }`. On success, triggers repo B with same commit SHA. Repo B checks out same commit, inherits configuration. For reliability: implement timeout + retry. If repo B doesn't start in 5 min, alert.

Follow-up: Repo B build fails due to flaky dependency. Should repo A's build be marked as failure? Failure propagation?