Jenkins Interview — Artifact Management and Archiving

Your Jenkins disk storage is 95% full. Artifacts consume 80% of disk. Build logs are 15% and growing. Jenkins is approaching failure. Implement artifact lifecycle management.

Implement artifact lifecycle: (1) Archive artifacts to external storage: Jenkins should archive to S3/GCS, not local disk. Configure ArtifactManager: Jenkins > Configure System > Artifact Manager. (2) Implement retention policy: keep last 30 builds' artifacts locally, archive older builds to S3 Glacier (cheaper). (3) Use build discarder: `properties([buildDiscarder(logRotator(daysToKeepStr: '30', numToKeepStr: '100', artifactDaysToKeepStr: '7', artifactNumToKeepStr: '5'))])`. (4) Compress artifacts: gzip before storage, reduces size by 60%+. (5) Use artifact pruning: scheduled job cleans unreferenced artifacts. (6) Implement artifact deduplication: store fingerprints, reuse artifacts if content matches. (7) Use S3 lifecycle policies: artifacts older than 90 days transition to Glacier. (8) Monitor disk usage: alert if Jenkins disk >80%. (9) Implement artifact verification: periodically verify archived artifacts are readable. (10) Use S3 versioning: track artifact versions, enable rollback if needed. Expected: disk drops to 40% within 24h. Monitor: track disk usage trend via Prometheus/Grafana.

Follow-up: A developer needs an artifact from a 200-day-old build. It's in Glacier. Restoration takes 4 hours. How do you minimize impact?

Your Jenkins stores artifacts locally on attached disk. Disk fails, all artifacts lost. No backups. Production deployment delayed because build artifacts are gone. Implement durable artifact storage.

Implement durable artifact storage: (1) Use S3 as primary artifact store: configure Jenkins artifact manager to use S3 directly. (2) Enable S3 versioning: `aws s3api put-bucket-versioning --bucket my-artifacts --versioning-configuration Status=Enabled`. (3) Use S3 replication: cross-region replication to backup region automatically. (4) Enable S3 MFA delete: require MFA to delete artifacts (prevents accidental loss). (5) Use S3 lifecycle policies: keep 30-day cache locally, archive older to Glacier. (6) Implement backup frequency: daily snapshots of S3 bucket to secondary bucket. (7) Use integrity checks: S3 ETags verify artifact integrity on download. (8) Implement artifact index: separate metadata database tracks artifact locations. (9) Use distributed storage: if using on-prem, use CEPH/GlusterFS with replication. (10) Monitor replication lag: alert if replication >10 min behind. Example config: `artifacts { s3 { bucket = "jenkins-artifacts", region = "us-east-1", credentialsId = "aws-creds" } }`. For compliance: implement audit logging of all artifact access (S3 access logs). Target: RPO = 0 (no artifact loss).

Follow-up: A production artifact is corrupted on S3. S3 versioning shows it was corrupted 3 hours ago. Recovery window?

Your build pipeline generates 500+ artifacts per build (binaries, logs, test reports, coverage). Managing, indexing, and retrieving specific artifacts is difficult. Implement searchable artifact management.

Implement searchable artifact system: (1) Use Elasticsearch: index all artifact metadata (name, size, sha256, build ID, timestamp). (2) Store artifact manifest: JSON file listing all artifacts per build. (3) Implement metadata tagging: tag artifacts by type (binary, log, report). (4) Use Jenkins artifact API: expose artifact search via REST API. (5) Build artifact browser UI: search/filter artifacts via web interface. (6) Implement fingerprinting: Jenkins tracks content hashes, identifies duplicate artifacts. (7) Use artifact provenance: track which build generated artifact, enables audit. (8) Implement artifact linking: link related artifacts (binary + its symbols + its tests). (9) Use CDN for frequent artifacts: CloudFront caches popular artifacts. (10) Monitor artifact growth: alert if total artifacts >1TB. Example Groovy: `archiveArtifacts(artifacts: 'dist/**/*', fingerprint: true)`. For retrieval: ES query finds artifacts by name/type/date range, returns download URLs. Implement REST API: `/api/artifacts/search?name=*.jar&after=2024-01-01` returns matching artifacts with download links.

Follow-up: Elasticsearch cluster fills up with old metadata. How do you prune efficiently?

Your Jenkins stores build artifacts. A compliance audit requires all artifacts >30 days old be immutable (WORM - write-once-read-many). Current S3 bucket allows deletion. Implement immutability enforcement.

Implement artifact immutability: (1) Enable S3 Object Lock: `aws s3api put-object-lock-legal-hold --bucket my-artifacts --key build-123/app.jar --legal-hold Status=ON`. (2) Use S3 retention policy: set retention duration (e.g., 7 years). After retention, object locked until expired. (3) Use Object Lock governance mode: prevent deletion unless admin override. (4) Implement lifecycle transition: artifacts transition to Glacier after 30 days, automatically locked there. (5) Use S3 Bucket Policies to enforce immutability: deny DeleteObject, DeleteObjectVersion for objects >30 days. (6) Implement audit logging: S3 CloudTrail logs all access attempts, including deletion attempts. (7) Use legal hold: mark sensitive artifacts (production releases) with legal hold, undeletable. (8) Implement compliance reporting: generate monthly report of immutability status. (9) Use MFA delete: require MFA for any deletion attempt. (10) Monitor retention compliance: alert if any object lacking retention policy. Example policy: `Deny DeleteObject if LastModified > 30 days`. For compliance: implement automated compliance checker that validates all production artifacts are immutable.

Follow-up: An artifact is marked for legal hold but needs to be deleted for GDPR right-to-be-forgotten. Resolution?

Your team stores build artifacts in S3. Cost is $2000/month because old artifacts are retained indefinitely. Implement intelligent artifact expiration to reduce cost.

Implement cost-optimized artifact management: (1) Use S3 intelligent tiering: automatically move artifacts between storage tiers based on access patterns. (2) Implement retention policy: keep last 30 builds, delete rest. (3) Use S3 lifecycle policies: transition to Glacier after 30 days, delete after 180 days. (4) Archive by type: binaries kept longer (180 days), test logs shorter (30 days). (5) Compress before storage: gzip reduces size by 70%, saves storage cost. (6) Implement deduplication: identify duplicate artifacts, store once, symlink. (7) Use Glacier for compliance retention: costs ~$4/TB/month vs S3 $23/TB/month. (8) Implement expiration dates: tag artifacts with expiration, auto-delete via Lambda. (9) Monitor cost per project: billing dashboard shows S3 cost by build job. (10) Implement cost alerts: alert if monthly bill exceeds budget. Example policy: `artifacts older than 30 days -> Glacier, older than 1 year -> Deep Archive, older than 7 years -> delete`. Expected savings: 60-70% reduction. Monitor: track storage tier distribution, ensure frequently-accessed artifacts stay in S3.

Follow-up: A critical build artifact is accidentally deleted. It's in Glacier with 3-hour retrieval time. Emergency access?

Your Jenkins artifacts are stored in S3 US-East. Build agents are globally distributed (US, EU, APAC). Artifact download latency is 500+ms from EU/APAC, slowing deployments. Implement geo-distributed artifact delivery.

Implement geo-distributed artifacts: (1) Use CloudFront CDN: front S3 with CloudFront, caches artifacts at edge locations. (2) Enable S3 Transfer Acceleration: faster upload/download via CloudFront. (3) Deploy regional artifact caches: Nexus/Artifactory instance in each region, mirrors S3. (4) Use S3 cross-region replication: replicate artifacts to regional S3 buckets in EU, APAC. (5) Implement geolocation-aware routing: Route53 routes requests to nearest S3 bucket. (6) Use S3 Intelligent-Tiering: automatically optimizes tier based on access patterns. (7) Compress artifacts: gzip reduces transfer size, speeds up downloads. (8) Implement parallel downloads: agents download artifact chunks in parallel. (9) Use BitTorrent for large artifacts: distribute via DHT instead of centralized server. (10) Monitor latency per region: CloudWatch metrics track download speed by geo. Example: `CloudFront distribution covers S3, cache TTL 7 days, US-East origins`. For implementation: Terraform creates multi-region S3 + CloudFront + Route53. Agents use regional endpoints via DNS routing.

Follow-up: CloudFront cache becomes stale after deployment. A critical hotfix deploys old artifact. How do you invalidate cache?

Your Jenkins stores 10 years of build artifacts for compliance (financial audit trail). Storage cost is prohibitive. Implement tiered archival strategy balancing cost and compliance.

Implement tiered archival: (1) Tier 0 (Hot): last 30 days in S3 Standard (fast access). (2) Tier 1 (Warm): 30-180 days in S3 Intelligent-Tiering (auto-tiering). (3) Tier 2 (Cold): 180-730 days in Glacier (3-5 hour retrieval). (4) Tier 3 (Frozen): >730 days in Deep Archive (12-hour retrieval). (5) Implement metadata index: Elasticsearch indexes all tiers, provides unified search. (6) Use artifact manifest: JSON file tracks which tier each artifact is in. (7) Implement retrieval SLA: hot tier <1 sec, warm <1 min, cold <30 min, frozen <24 hours. (8) Use compliance reporting: generate audit trail showing artifact location, integrity, access. (9) Implement replication: each tier has geographic backup. (10) Use retention policies per artifact type: production releases retained 10 years, dev builds 90 days. Cost impact: total cost reduced 80% by archiving. Example: "build-12345-app.jar" stored in Deep Archive, compliance required 7-year retention, estimated cost $0.01/GB/month. For retrieval: on-demand tier upgrade via Lambda, bring artifact to hot tier when requested.

Follow-up: An audit requests verification that a frozen artifact from 5 years ago is intact. Chain of custody proof?