Kafka Interview — Schema Registry and Avro Compatibility

Schema Registry & Avro Compatibility

You need to add a required field "email" to your user schema. 50 consumers are reading from this topic with schema_registry_client.deserialize(). You deploy the producer with the new schema. What breaks?

Everything breaks. Consumers are still expecting old schema (no email field). Producers are sending new schema (with email field). When consumer deserializes a message with new schema but tries to read with old schema, it fails: Schema not found or Incompatible schema error. All 50 consumers crash or stop processing. Production outage. The fix: 1) Make field optional with default: change "email": {type: "string"} to "email": {type: ["null", "string"], "default": null}. Now producers send email (optional), old consumers ignore it. New consumers read it (or null if old message). This is backward compatible. 2) Deploy producer first with new schema in registry (as version 2). 3) Wait for consumers to pull new registry schema. 4) Restart consumers to apply new schema. They now expect optional email field, so they handle both old and new messages. Schema evolution rules: 1) Adding optional field = forward compatible. 2) Removing optional field = backward compatible. 3) Adding required field without default = incompatible. 4) Changing field type = incompatible. Use Schema Registry to enforce: set schema.compatibility.level=FORWARD_TRANSITIVE (only new schema can consume old messages). Or BACKWARD (only old consumer can read new messages). Test: run curl http://localhost:8081/config --data '{"compatibility": "FORWARD_TRANSITIVE"}' to set globally. Then try registering incompatible schema: curl -X POST http://localhost:8081/subjects/user-value/versions --data '..new schema..'. Registry will reject incompatible schema.

Follow-up: If you make email field optional, can old consumers still read messages produced by new producer?

You have 3 schema versions in Schema Registry for user topic: v1, v2, v3. Producers use v3, consumers use v2. Who wins when deserializing?

Consumer wins—it deserializes using v2 schema, which they have. When consumer reads message produced with v3: 1) Message contains schema ID (e.g., schema_id=123 for v3). 2) Consumer fetches schema v3 from registry: GET /schemas/ids/123. 3) Consumer has v2 locally. Avro compares v2 (consumer) to v3 (message). 4) If compatible (v3 has new optional fields), deserialization succeeds. Old fields are extracted, new fields are ignored. 5) If incompatible (v3 removed required field), deserialization fails. Consumer crashes. This is why compatibility modes matter. Set Schema Registry to enforce: schema.compatibility.level=BACKWARD. This means: new schema version must be readable by old consumer code. When producer registers v3, registry checks if v3 is readable by v1, v2. If not, registration rejected. To verify compatibility: run curl http://localhost:8081/compatibility/subjects/user-value/versions/2 with v3 payload. Registry returns true/false. Safe practice: 1) Producers deploy first with new optional fields (v3). 2) Registry validates BACKWARD compatibility. 3) Consumers continue with v2 (can still read v3 if compatible). 4) Eventually, restart consumers with v3. Risk: if producer deploys v3 incompatible with v2, and consumers haven't restarted, they fail to deserialize. This is why BACKWARD compatibility mode is critical for zero-downtime deployments.

Follow-up: If Schema Registry has BACKWARD mode and producer registers incompatible schema, does it reject or allow and warn?

You have a large enum field with 100 possible values. You add a new enum value at the end. Is this compatible?

Yes, if old consumer ignores unknown enum values. Avro schema: {"type": "enum", "symbols": ["A", "B", ..., "Z"], "name": "Status"}. Producer adds new symbol "NEW_STATUS" (now 101 values). Message serializes with "NEW_STATUS" (enum value 100, 0-indexed). Consumer has old schema (only 100 symbols). On deserialization, consumer reads value 100, but its schema only defines 0-99. Result: unknown enum value error. Consumer crashes. Fix: Avro has no built-in support for forward-compatible enums. Solutions: 1) Use string instead of enum: {"type": "string"}. Producers send "NEW_STATUS", old consumers deserialize as string. 2) Add default case to enum: not possible in Avro—enums don't have defaults. 3) Pre-add "UNKNOWN" placeholder in enum v1. Producers use "UNKNOWN" for new values. Consumers with old code interpret all unknown as "UNKNOWN". 4) Use map/union: {"type": ["null", "Status"]}. When new value appears, treat as null or skip. In practice: avoid adding enum values if older consumers exist. Option: add new enum value to registry as schema v2, but don't use it in producers yet. Wait for all consumers to upgrade (restart with v2 schema). Then producers start using new enum value. This is safer but slower. To test compatibility: run curl -X POST http://localhost:8081/compatibility/subjects/status-value/versions/1 --data '{"schema": "..."}' with old schema as body and new schema in URL path.

Follow-up: If you have union types, can you add a new type to the union?

You accidentally registered an incompatible schema and 30 producers immediately started using it. Consumers are crashing. How do you roll back?

Immediate action: stop producers from sending new incompatible schema. Kill producer fleet or roll back deployment. This stops bad messages from flowing. Then: restore consumers. Re-register the old compatible schema version as the new latest. In Schema Registry: curl -X PUT http://localhost:8081/config/subjects/user-value --data '{"level": "FULL"}' to lock schema evolution (prevent accidental incompatibility). Check which version is latest: curl http://localhost:8081/subjects/user-value/versions. Look for version IDs. Restore old version: curl -X POST http://localhost:8081/subjects/user-value/versions --data '{"schema": "...old schema from version 2..."}'. This registers old schema as version 3 (new latest). Consumers restart and pull version 3 (old schema), can deserialize old messages again. Messages from 30 producers with incompatible schema are now unreadable (they reference schema ID of version 2 incompatible). These messages are dead. You can: 1) Delete them using Kafka retention. 2) Use consumer group reset to skip them: kafka-consumer-groups --group consumers-group --bootstrap-server localhost:9092 --reset-offsets --to-latest --execute. Prevention: 1) Set Schema Registry schema.compatibility.level=BACKWARD_TRANSITIVE to enforce compatibility. 2) Test schema changes before deploying: curl -X POST http://localhost:8081/compatibility/subjects/user-value/versions/latest --data '{"schema": "...new schema..."}'. 3) Enable schema versioning in producer: send with explicit schema version, not latest.

Follow-up: If you deleted incompatible messages using retention, can consumers still deserialize old messages from before the incompatible push?

You have 5 different topics, each with their own schema. You want unified schema across all. How do you migrate without breaking consumers?

Unified schema migration is complex. Options: 1) Create new unified topic with unified schema. Produce to both old and new topics (dual-write). Consumers gradually migrate to new topic. Old topics stay alive for backward compatibility. Eventually, sunsetting. 2) Keep old topics, transform on consumer side. Consumers read old topic, map to unified schema locally, continue. This breaks schema registry's benefit (centralized schema) but avoids replication. 3) Use Schema Registry transformations (Confluent feature): register schema that remaps fields. Example: old user schema {name, id}, new unified schema {firstName, lastName, userId}. Transformation maps name→firstName, id→userId. Consumers with new schema apply transformation and deserialize old messages correctly. To implement: 1) Design unified schema that covers all 5 topics. List all fields. Add namespace/topic-id to distinguish origin. Example: {type: "record", fields: [{name: "source_topic", type: "string"}, {name: "payload", type: "string"}]}. But this is overly generic. Better: analyze 5 schemas, find common fields, create unified schema. 2) Register unified schema as new subject: curl -X POST http://localhost:8081/subjects/unified-user-value/versions --data '{"schema": "..."}'. 3) Create bridge consumer: reads from 5 old topics, deserializes with old schemas, produces to unified topic with new schema. 4) Test bridge: verify message count, schemas match. 5) Cutover: producers switch to unified topic. 6) Cleanup: old topics archived/deleted after retention. Risk: if bridge consumer crashes, unified topic falls behind. Mitigation: monitor bridge lag. If > 1M, escalate.

Follow-up: If you use a bridge consumer and it crashes halfway, how do you resume without reprocessing all 5 topics?

Your Avro schema has a nested record: {type: "record", fields: [{name: "address", type: {type: "record", fields: [{name: "zip", type: "int"}]}}]}. You add new field to nested record. Is this compatible?

Yes, if you make it optional. Adding optional field to nested record is forward-compatible (new producer, old consumer). Old consumer ignores new nested field. But if you add required field without default, old consumer fails (can't find field in incoming message). Example: v1 nested address has {zip}. v2 nested address has {zip, country} (required, no default). Old consumer deserializes v2 message: reads nested address object, looks for field "country", doesn't find it—error. Fix: make country optional: {name: "country", type: ["null", "string"], "default": null}. Old consumer ignores country, reads zip successfully. To test nested compatibility: Avro tools don't have built-in nested schema comparison. Workaround: flatten schema for testing, or manually verify. Better: use Schema Registry's compatibility checker: curl -X POST http://localhost:8081/compatibility/subjects/user-value/versions/1 --data '{"schema": "...v2 with new nested field..."}'. Registry will recursively check all nested records. If nested field is required without default, it returns false (incompatible). Best practice: when evolving nested records, always add optional fields only. If you must add required field, prepare consumers first: 1) Consumers restart with new schema (which defines new required field). 2) Wait confirmation all consumers have upgraded. 3) Producers send new required field. Timing is critical—messages with required field sent before consumers upgrade = crash.

Follow-up: If nested record schema uses union types, can you change union order?

You're managing schemas for a data lake. 100 topics, new fields added daily. How do you ensure backward compatibility at scale?

At scale, manual testing is infeasible. Automate: 1) Set Schema Registry global compatibility level: curl -X PUT http://localhost:8081/config --data '{"compatibility": "BACKWARD_TRANSITIVE"}'. This enforces all new schemas are readable by v1 and all intermediate versions. Any incompatible schema registration is rejected automatically. 2) Implement schema validation in CI/CD: before deploying producer code, validate schema. Script: fetch current schema from registry, compare with new schema in code. Use tool like Schemata or schema-validator. If incompatible, build fails. 3) Test in staging: producers send to staging Kafka, consumers in staging deserve with old schema. If deserialization succeeds, schema is compatible. 4) Use schema versioning: producers send explicit schema version in message header (not auto-resolved). Consumers know which version to expect. If mismatch, consumer skips gracefully. 5) Monitor: track schema registration attempts. Alert if BACKWARD_TRANSITIVE rejects schema (likely accident). 6) Gradual rollout: producers deploy with new schema, run 1 hour, monitor error rates. If >0.1% deserialization errors, rollback immediately. 7) Schema governance: assign owner per topic. Owner reviews schema changes before merge to main. Catch incompatibilities in PR review. 8) Documentation: maintain schema changelog. For each version, document breaking/compatible changes. Example: "v3: added optional field 'email' (backward compatible). v4: removed optional field 'phone' (forward compatible with v3)." At 100 topics: use automation. Manual review is bottleneck.

Follow-up: If BACKWARD_TRANSITIVE rejects your schema, how do you debug which version it's incompatible with?