Kafka 3.9.0: Migrating from ZooKeeper to KRaft Mode

Introduction

ZooKeeper has been Kafka’s metadata management backbone for over a decade, but it comes with operational complexity: separate processes to maintain, additional monitoring, and another failure point. Apache Kafka 3.3+ introduced KRaft mode (Kafka Raft metadata mode) as production-ready, allowing you to run Kafka without ZooKeeper.

Why migrate to KRaft?

Eliminate ZooKeeper dependency - One less system to manage and monitor
Simpler architecture - Kafka handles its own metadata via the Raft consensus protocol
Better scalability - Supports millions of partitions (ZooKeeper struggles beyond 200K)
Faster recovery - Controller failover happens in milliseconds instead of seconds

This guide walks you through migrating a Kafka 3.9.0 cluster from ZooKeeper mode to KRaft mode with zero downtime. The migration happens in phases, allowing you to rollback at each checkpoint.

Starting point: Need a Kafka 3.9.0 cluster with ZooKeeper? See this upgrade guide for setup instructions.

Migration Overview

The migration happens in 4 distinct phases, each moving the cluster closer to pure KRaft mode:

graph TB subgraph Phase1["Phase 1: Deploy Controllers"] P1_ZK[ZooKeeper
Active] P1_B[Brokers
ZK Mode] P1_K[KRaft Controllers
Quorum Formed] P1_B --> P1_ZK P1_K -.->|Not Connected| P1_B style P1_ZK fill:#ffd700 style P1_B fill:#87ceeb style P1_K fill:#90EE90 end subgraph Phase2["Phase 2: Enable Migration"] P2_ZK[ZooKeeper
Active] P2_B[Brokers
Dual-Write Mode] P2_K[KRaft Controllers
Syncing Metadata] P2_B --> P2_ZK P2_B -.->|Metadata Sync| P2_K style P2_ZK fill:#ffd700 style P2_B fill:#ffa500 style P2_K fill:#90EE90 end subgraph Phase3["Phase 3: Brokers to KRaft"] P3_ZK[ZooKeeper
Running but Unused] P3_B[Brokers
KRaft Mode Observers] P3_K[KRaft Controllers
Managing Metadata] P3_B --> P3_K P3_K -.->|Dual-Write| P3_ZK style P3_ZK fill:#d3d3d3 style P3_B fill:#4CAF50 style P3_K fill:#4CAF50 end subgraph Phase4["Phase 4: Finalize"] P4_B[Brokers
KRaft Mode Observers] P4_K[KRaft Controllers
Pure KRaft] P4_B --> P4_K style P4_B fill:#4CAF50 style P4_K fill:#4CAF50 end Phase1 -->|Rolling Restart| Phase2 Phase2 -->|Config Update| Phase3 Phase3 -->|Decommission ZK| Phase4

Key phases:

Phase 1: Deploy KRaft controllers alongside ZooKeeper
Phase 2: Enable migration - brokers dual-write to both systems
Phase 3: Convert brokers to KRaft mode as observers
Phase 4: Finalize - remove ZooKeeper dependency completely

Architecture Overview

Our migration transforms the cluster architecture from ZooKeeper-based coordination to KRaft-based self-management.

Before Migration: ZooKeeper Mode

graph LR subgraph Kafka["Kafka Cluster"] K1[Kafka Broker 1
3.9.0] K2[Kafka Broker 2
3.9.0] K3[Kafka Broker 3
3.9.0] end subgraph ZooKeeper["ZooKeeper Ensemble"] ZK1[ZooKeeper 1] ZK2[ZooKeeper 2] ZK3[ZooKeeper 3] end Kafka --> ZooKeeper style ZK1 fill:#ffd700 style ZK2 fill:#ffd700 style ZK3 fill:#ffd700

Current state:

3 ZooKeeper nodes forming an ensemble with quorum-based consensus
3 Kafka brokers (version 3.9.0) connecting to the ZooKeeper ensemble
All coordination, leader elections, and configuration stored in ZooKeeper

After Migration: KRaft Mode

graph LR subgraph Controllers["KRaft Controller Quorum (Voters)"] C1[Controller 1
Leader] C2[Controller 2
Follower] C3[Controller 3
Follower] end subgraph Brokers["Kafka Brokers (Observers)"] B1[Broker 1] B2[Broker 2] B3[Broker 3] end C1 ---|Raft Protocol| C2 C2 ---|Raft Protocol| C3 C3 ---|Raft Protocol| C1 Controllers -.->|Metadata Updates| Brokers style C1 fill:#2E7D32,color:#fff style C2 fill:#4CAF50,color:#fff style C3 fill:#4CAF50,color:#fff style B1 fill:#1976D2,color:#fff style B2 fill:#1976D2,color:#fff style B3 fill:#1976D2,color:#fff

Target state:

3 KRaft controllers (voters) - Form Raft quorum with 1 leader, 2 followers
3 Kafka brokers (observers) - Handle client requests, observe metadata
No ZooKeeper - Metadata managed by KRaft protocol internally
Event-driven - Metadata changes propagate via __cluster_metadata topic

Initial Setup

This section covers setting up a Kafka 3.9.0 cluster with ZooKeeper from scratch. If you already have a running cluster, skip to Phase 1.

Build Docker Images

What happens: Build the base image with Kafka 3.9.0 binaries, then create specialized images for ZooKeeper, Kafka brokers, and KRaft controllers.

cd kafka3.9.0
./docker-image-build.sh

Expected: Four images created:

kafka-lab-base - Base image with Kafka 3.9.0 binaries
kafka-lab-zk - ZooKeeper image
kafka-lab-kafka - Kafka broker image
kafka-lab-kraft - KRaft controller image

Start Containers

What happens: Launch all 6 containers (3 ZooKeeper + 3 Kafka brokers) using Docker Compose with persistent volumes and custom network.

docker-compose up -d
sleep 10
docker ps

Expected: All 6 containers running:

CONTAINER ID   IMAGE                  STATUS         PORTS                    NAMES
abc123...      kafka-lab-zk           Up 10 seconds  2181/tcp                 zk1
def456...      kafka-lab-zk           Up 10 seconds  2181/tcp                 zk2
ghi789...      kafka-lab-zk           Up 10 seconds  2181/tcp                 zk3
jkl012...      kafka-lab-kafka        Up 10 seconds  9092/tcp                 kafka1
mno345...      kafka-lab-kafka        Up 10 seconds  9092/tcp                 kafka2
pqr678...      kafka-lab-kafka        Up 10 seconds  9092/tcp                 kafka3

Initialize ZooKeeper Ensemble

What happens: Configure each ZooKeeper node with its unique ID and server list, then start the ZooKeeper service on all nodes.

for i in 1 2 3; do 
    echo "Starting zookeeper node zk$i"
    docker exec zk$i bash -c 'echo $ZOO_MY_ID > /zookeeper/data/myid' 
    docker exec zk$i bash -c 'echo $ZOO_SERVERS | tr " " "\n" >> /opt/kafka/config/zoo.cfg'
    docker exec zk$i systemctl start zookeeper
done

sleep 5

Verify ZooKeeper is running:

docker exec zk1 bash -c 'echo ruok | nc localhost 2181'

Expected:

imok

docker exec zk1 systemctl status zookeeper | grep "Active:"

Expected:

Active: active (running) since...

Initialize Kafka Brokers

What happens: Generate configuration for each broker using environment variables, then start the Kafka service on all brokers.

for i in 1 2 3; do \
    echo "Starting kafka broker node kafka$i" && \
    docker exec kafka$i bash -c 'sh /var/tmp/kafka_config_generator.sh' && \
    docker exec kafka$i systemctl start kafka
done

sleep 10

Verify all brokers registered with ZooKeeper:

docker exec zk1 /opt/kafka/bin/zookeeper-shell.sh localhost:2181 ls /brokers/ids

Expected:

[1, 2, 3]

Verify Kafka Version

What happens: Confirm all brokers are running Kafka 3.9.0 before starting the migration.

docker exec kafka1 ls /opt/kafka/libs | grep kafka_

Expected:

kafka_2.13-3.9.0.jar

Create Test Topic

What happens: Create a test topic with 3 partitions and replication factor 3 to verify cluster health throughout the migration.

docker exec kafka1 /opt/kafka/bin/kafka-topics.sh \
  --bootstrap-server kafka1.local:9092 \
  --create --topic test \
  --partitions 3 \
  --replication-factor 3

Describe the test topic:

docker exec kafka1 /opt/kafka/bin/kafka-topics.sh \
  --bootstrap-server kafka1.local:9092 \
  --describe --topic test

Expected:

Topic: test     TopicId: xyz123     PartitionCount: 3       ReplicationFactor: 3
Topic: test     Partition: 0    Leader: 1       Replicas: 1,2,3 Isr: 1,2,3
Topic: test     Partition: 1    Leader: 2       Replicas: 2,3,1 Isr: 2,3,1
Topic: test     Partition: 2    Leader: 3       Replicas: 3,1,2 Isr: 3,1,2

What to verify:

✅ 3 partitions created
✅ All replicas in sync (ISR matches Replicas)
✅ Leaders distributed across brokers

Phase 1: Deploy KRaft Controllers

What happens: Start 3 KRaft controller nodes that will form a Raft quorum. These controllers will eventually replace ZooKeeper for metadata management. The cluster continues running on ZooKeeper during this phase.

Get Cluster ID from ZooKeeper

What happens: Extract the existing cluster UUID from ZooKeeper. KRaft controllers must use the same cluster ID to ensure continuity.

CLUSTER_ID=$(docker exec zk1 /opt/kafka/bin/zookeeper-shell.sh localhost:2181 \
  get /cluster/id 2>&1 | grep '"id"' | sed 's/.*"id":"\([^"]*\)".*/\1/')

echo "Cluster ID: $CLUSTER_ID"

Expected:

Cluster ID: SnA2-fqPTM-_QGpKAiA8oA

Caution: This cluster ID must match exactly. If empty or malformed, check ZooKeeper connectivity.

Start KRaft Controllers

What happens: Launch 3 KRaft controller containers using the cluster ID from ZooKeeper. Each controller starts with the /var/tmp/start-kraft.sh script.

for i in 1 2 3; do 
  echo "Starting kraft$i..."
  docker exec -d -e CLUSTER_UUID=$CLUSTER_ID kraft$i /var/tmp/start-kraft.sh
  sleep 5
done

sleep 20

Why the wait? Controllers need time to:

Format storage directories (5 seconds)
Start Kafka process (5 seconds)
Form quorum and elect leader (10 seconds)

Verify Quorum Formation

What happens: Check that the 3 controllers have formed a Raft quorum with 1 leader and 2 followers.

docker exec kraft1 /opt/kafka/bin/kafka-metadata-quorum.sh \
  --bootstrap-controller kraft1:9093 describe --replication --human-readable

Expected:

NodeId  DirectoryId             LogEndOffset    Lag     LastFetchTimestamp      LastCaughtUpTimestamp   Status
101     KfL1aMqPSuGz1bX2cY3dEQ  156             0       5 ms ago                5 ms ago                Leader
102     GhI2bNrQTvHz2cY3dZ4eFg  156             0       215 ms ago              215 ms ago              Follower
103     JkL3c0sRUwIa3dZ4eA5fGg  156             0       215 ms ago              215 ms ago              Follower

What to verify:

✅ 3 controllers present (IDs: 101, 102, 103)
✅ 1 Leader, 2 Followers
✅ LogEndOffset matches across all nodes
✅ Lag is 0 (all controllers caught up)

Current state: ZooKeeper still active, brokers still using ZooKeeper, KRaft controllers running in parallel (not yet managing brokers).

Phase 2: Enable Migration on Brokers

What happens: Configure brokers to start dual-writing metadata to both ZooKeeper (old) and KRaft controllers (new). This phase synchronizes metadata between the two systems.

Add Migration Configuration

What happens: Add migration-specific settings to each broker’s configuration file, enabling them to communicate with the KRaft controller quorum.

MIGRATION_CONFIG='listener.security.protocol.map=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
zookeeper.metadata.migration.enable=true
controller.quorum.bootstrap.servers=kraft1:9093,kraft2:9093,kraft3:9093
controller.listener.names=CONTROLLER'

for i in 1 2 3; do
  docker exec kafka$i bash -c "echo '$MIGRATION_CONFIG' >> /opt/kafka/config/server.properties"
  echo "Added migration config to kafka$i"
done

Configuration explained:

listener.security.protocol.map - Defines CONTROLLER listener protocol
zookeeper.metadata.migration.enable=true - Activates dual-write mode
controller.quorum.bootstrap.servers - KRaft controller addresses
controller.listener.names=CONTROLLER - Listener name for controller communication

Rolling Restart Brokers

What happens: Restart each broker one by one to apply the migration configuration. During restart, metadata starts syncing to KRaft controllers.

for i in 1 2 3; do
  echo "Restarting kafka$i..."
  docker exec kafka$i systemctl restart kafka
  sleep 20
  echo "kafka$i restarted."
done

sleep 10

Why 20 seconds between restarts? Allows broker to:

Shut down gracefully (5 seconds)
Start up and rejoin cluster (10 seconds)
Begin metadata sync to KRaft (5 seconds)

Verify Migration Completed

What happens: Check broker logs to confirm metadata has been fully synchronized from ZooKeeper to KRaft controllers.

docker exec kraft1 grep "Completed migration" /opt/kafka/logs/server.log

Expected:

[2025-12-29 10:15:23,456] INFO Completed migration of metadata from ZooKeeper to KRaft (kafka.migration.ZkMigrationClient)

What to verify:

✅ “Completed migration” log entry appears
✅ No error messages in controller logs
✅ All brokers successfully restarted

Current state: Brokers dual-writing to both ZooKeeper and KRaft controllers. Metadata synchronized. Still using ZooKeeper as primary coordination system.

Phase 3: Move Brokers to KRaft Mode

What happens: Convert brokers from ZooKeeper-mode to KRaft-mode by updating their configuration to read metadata exclusively from KRaft controllers. This removes ZooKeeper dependency from brokers.

Update Broker Configurations

What happens: Change broker configurations to operate in KRaft mode as “brokers” (not voters in the quorum, just observers).

for i in 1 2 3; do
  docker exec kafka$i bash -c "
    sed -i 's/broker.id/node.id/g' /opt/kafka/config/server.properties
    echo 'process.roles=broker' >> /opt/kafka/config/server.properties
    sed -i '/inter.broker.protocol.version/d' /opt/kafka/config/server.properties
    sed -i '/zookeeper.connect/d' /opt/kafka/config/server.properties
    sed -i '/zookeeper.metadata.migration.enable/d' /opt/kafka/config/server.properties
    sed -i '/zookeeper.session.timeout.ms/d' /opt/kafka/config/server.properties
  "
  echo "Updated kafka$i configuration"
done

Configuration changes:

broker.id → node.id - KRaft uses node.id for all nodes
Add process.roles=broker - Declares this node as broker (not controller)
Remove inter.broker.protocol.version - Not needed in KRaft mode
Remove zookeeper.connect - No longer connecting to ZooKeeper
Remove zookeeper.metadata.migration.enable - Migration complete
Remove zookeeper.session.timeout.ms - No ZooKeeper sessions

Rolling Restart Brokers

What happens: Restart brokers one by one to activate KRaft-mode configuration. Brokers now read metadata from KRaft controllers as observers.

for i in 1 2 3; do
  echo "Restarting kafka$i in KRaft mode..."
  docker exec kafka$i systemctl restart kafka
  sleep 20
  echo "kafka$i now in KRaft mode."
done

Verify Brokers as Observers

What happens: Confirm brokers appear in the KRaft quorum as observers (non-voting members that read metadata).

docker exec kraft1 /opt/kafka/bin/kafka-metadata-quorum.sh \
  --bootstrap-controller kraft1:9093 describe --status

Expected:

ClusterId:              SnA2-fqPTM-_QGpKAiA8oA
LeaderId:               101
LeaderEpoch:            3
HighWatermark:          1853
MaxFollowerLag:         0
MaxFollowerLagTimeMs:   0
CurrentVoters:          [{"id": 101, ...}, {"id": 102, ...}, {"id": 103, ...}]
CurrentObservers:       [{"id": 1, ...}, {"id": 2, ...}, {"id": 3, ...}]

What to verify:

✅ CurrentVoters shows 3 controllers (IDs: 101, 102, 103)
✅ CurrentObservers shows 3 brokers (IDs: 1, 2, 3)
✅ MaxFollowerLag is 0

Verify No Under-Replicated Partitions

What happens: Ensure all topic partitions are healthy and fully replicated after the configuration change.

docker exec kafka1 /opt/kafka/bin/kafka-topics.sh \
  --bootstrap-server kafka1.local:9092 \
  --describe --under-replicated-partitions

Expected: Empty output (no under-replicated partitions)

Current state: Brokers operating in KRaft mode as observers, reading metadata from controllers. ZooKeeper still running but no longer used by brokers.

Phase 4: Finalize Migration

What happens: Remove ZooKeeper dependency from KRaft controllers and decommission ZooKeeper entirely. This completes the migration to pure KRaft mode.

Remove ZooKeeper Config from Controllers

What happens: Clean up ZooKeeper-related configuration from controller properties files.

for i in 1 2 3; do
  docker exec kraft$i bash -c "
    sed -i '/zookeeper.connect/d' /opt/kafka/config/kraft/controller.properties
    sed -i '/zookeeper.metadata.migration.enable/d' /opt/kafka/config/kraft/controller.properties
  "
  echo "Removed ZooKeeper config from kraft$i"
done

Rolling Restart Controllers

What happens: Restart controllers one by one to apply the configuration changes. Controllers now operate in pure KRaft mode.

for i in 1 2 3; do
  docker exec kraft$i pkill -f kafka
  sleep 3
  docker exec -d -e CLUSTER_UUID=$CLUSTER_ID kraft$i /var/tmp/start-kraft.sh
  sleep 15
  echo "kraft$i restarted in pure KRaft mode."
done

Why 15 seconds? Allows controller to:

Shut down gracefully (3 seconds)
Start up (5 seconds)
Rejoin quorum (7 seconds)

Decommission ZooKeeper

What happens: Stop all ZooKeeper services as they are no longer needed.

for i in 1 2 3; do
  echo "Stopping zookeeper node zk$i"
  docker exec zk$i systemctl stop zookeeper 2>/dev/null || docker exec zk$i pkill -f zookeeper
  sleep 2
done

Verify ZooKeeper is stopped:

for i in 1 2 3; do
  STATUS=$(docker exec zk$i ps aux | grep -E "zookeeper|QuorumPeerMain" | grep -v grep || echo "No ZooKeeper process")
  echo "zk$i: $STATUS"
done

Expected:

zk1: No ZooKeeper process
zk2: No ZooKeeper process
zk3: No ZooKeeper process

Verify Final Quorum Status

What happens: Confirm the KRaft quorum is healthy with all controllers and broker observers active.

docker exec kraft1 /opt/kafka/bin/kafka-metadata-quorum.sh \
  --bootstrap-controller kraft1:9093 describe --replication --human-readable

Expected:

NodeId  DirectoryId             LogEndOffset    Lag     LastFetchTimestamp      LastCaughtUpTimestamp   Status
101     KfL1aMqPSuGz1bX2cY3dEQ  2718            0       6 ms ago                6 ms ago                Leader
102     GhI2bNrQTvHz2cY3dZ4eFg  2718            0       226 ms ago              226 ms ago              Follower
103     JkL3c0sRUwIa3dZ4eA5fGg  2718            0       226 ms ago              226 ms ago              Follower
1       S6hsGkwXxgwPaz8JQTzFCQ  2718            0       226 ms ago              226 ms ago              Observer
2       IEpkgBT6ucGSzwaZm53U-g  2718            0       226 ms ago              226 ms ago              Observer
3       UlEhjluFhWjj5EnHeMD4-g  2718            0       226 ms ago              226 ms ago              Observer

What to verify:

✅ 3 voters (controllers 101, 102, 103)
✅ 3 observers (brokers 1, 2, 3)
✅ All nodes at same LogEndOffset
✅ Lag is 0

Test Producer/Consumer

What happens: End-to-end test to verify the cluster functions correctly in pure KRaft mode.

Create a new test topic:

docker exec kafka1 /opt/kafka/bin/kafka-topics.sh \
  --bootstrap-server kafka1.local:9092 \
  --create --topic migration-test --partitions 1 --replication-factor 3

sleep 2

Produce a message:

docker exec kafka1 bash -c 'echo "hello-kraft" | /opt/kafka/bin/kafka-console-producer.sh \
  --bootstrap-server kafka1.local:9092 --topic migration-test'

echo "Message produced"

Consume the message:

docker exec kafka1 /opt/kafka/bin/kafka-console-consumer.sh \
  --bootstrap-server kafka1.local:9092 --topic migration-test \
  --from-beginning --max-messages 1 --property print.timestamp=true

Expected:

CreateTime:1735450523456    hello-kraft
Processed a total of 1 messages

Verification Summary

Phase	Check	Command	Expected
Phase 1	Quorum formed	`kafka-metadata-quorum.sh describe --replication --human-readable`	1 Leader, 2 Followers
Phase 2	Migration completed	`grep "Completed migration" /opt/kafka/logs/server.log`	Log entry found
Phase 2	No under-replicated	`kafka-topics.sh --describe --under-replicated-partitions`	Empty output
Phase 3	Brokers as Observers	`kafka-metadata-quorum.sh describe --status`	3 Voters, 3 Observers
Phase 4	Producer/Consumer	Test message flow	Message received

🎉 Migration Complete! Your Kafka cluster now runs in pure KRaft mode with no ZooKeeper dependency.

Rollback Procedures

Important: Rollback becomes increasingly difficult as you progress through phases. Test thoroughly at each phase before proceeding.

Rollback from Phase 1

When: You’ve deployed KRaft controllers but haven’t enabled migration on brokers yet.

Risk level: Low - brokers still fully on ZooKeeper

# Simply stop the KRaft controllers
for i in 1 2 3; do 
  docker exec kraft$i pkill -f kafka
done

Result: Cluster continues operating normally on ZooKeeper.

Rollback from Phase 2

When: You’ve enabled migration on brokers but haven’t converted them to KRaft mode yet.

Risk level: Medium - requires cleaning up migration state

# Stop KRaft controllers
for i in 1 2 3; do 
  docker exec kraft$i pkill -f kafka
done

# Remove ZK migration znodes
docker exec zk1 /opt/kafka/bin/zookeeper-shell.sh localhost:2181 deleteall /controller
docker exec zk1 /opt/kafka/bin/zookeeper-shell.sh localhost:2181 deleteall /migration

# Remove migration config from brokers
for i in 1 2 3; do
  docker exec kafka$i bash -c "
    sed -i '/listener.security.protocol.map/d' /opt/kafka/config/server.properties
    sed -i '/zookeeper.metadata.migration.enable/d' /opt/kafka/config/server.properties
    sed -i '/controller.quorum.bootstrap.servers/d' /opt/kafka/config/server.properties
    sed -i '/controller.listener.names/d' /opt/kafka/config/server.properties
  "
done

# Rolling restart brokers
for i in 1 2 3; do 
  docker exec kafka$i systemctl restart kafka
  sleep 20
done

Result: Cluster back to pure ZooKeeper mode.

Rollback from Phase 3

When: You’ve converted brokers to KRaft mode but haven’t finalized controllers yet.

Risk level: High - complex rollback, potential for data issues

# Revert broker configs to ZK mode
for i in 1 2 3; do
  docker exec kafka$i bash -c "
    sed -i 's/node.id/broker.id/g' /opt/kafka/config/server.properties
    sed -i '/process.roles/d' /opt/kafka/config/server.properties
    echo 'zookeeper.connect=zk1.local:2181,zk2.local:2181,zk3.local:2181' >> /opt/kafka/config/server.properties
    echo 'zookeeper.metadata.migration.enable=true' >> /opt/kafka/config/server.properties
  "
done

# Rolling restart brokers
for i in 1 2 3; do 
  docker exec kafka$i systemctl restart kafka
  sleep 20
done

# Then follow Phase 2 rollback to fully return to ZooKeeper

Result: Brokers back in migration mode. Follow Phase 2 rollback to complete.

⚠️ Phase 4: No Easy Rollback

Once you’ve finalized migration (Phase 4), rolling back requires:

Restoring ZooKeeper data from backups
Rebuilding broker configurations
Potential data loss if metadata diverged

Recommendation: Run Phase 4 only after thoroughly testing Phases 1-3 in production for several days.

Production Mitigations

Apply these settings before starting the migration to prevent common issues:

Disable Automatic Leader Rebalancing

Why: Preferred Leader Election (PLE) during rolling restarts can cause timeouts and high load.

# Add to all brokers BEFORE Phase 1
for i in 1 2 3; do
  docker exec kafka$i bash -c "echo 'auto.leader.rebalance.enable=false' >> /opt/kafka/config/server.properties"
done

# Restart brokers to apply
for i in 1 2 3; do
  docker exec kafka$i systemctl restart kafka
  sleep 20
done

Disable Unclean Leader Election

Why: Prevents data loss if replicas fall out of sync during migration.

# Should already be set, but verify
for i in 1 2 3; do
  docker exec kafka$i bash -c "grep 'unclean.leader.election.enable' /opt/kafka/config/server.properties || echo 'unclean.leader.election.enable=false' >> /opt/kafka/config/server.properties"
done

Common Production Issues

Issue	Symptom	Mitigation
Application Timeouts During Phase 2/3	Producer/consumer timeout errors	Set `auto.leader.rebalance.enable=false`
OutOfOrderSequenceException	Producer errors during migration	Use default producer retry settings
Under-Replicated Partitions	ISR shrinks during restarts	Wait longer between broker restarts (30-60s)
Controller Failover	Slow controller election	Ensure 3 controllers, check network latency

Cleanup

After migration is complete and verified, you can optionally remove ZooKeeper containers:

# Stop all containers
docker-compose down -v

# Remove all images
docker rmi kafka-lab-base kafka-lab-zk kafka-lab-kafka kafka-lab-kraft

Conclusion

You’ve successfully migrated your Kafka 3.9.0 cluster from ZooKeeper to KRaft mode! Your cluster now:

Runs without ZooKeeper dependency
Uses Raft consensus for metadata management
Has simpler architecture with fewer moving parts
Supports better scalability for future growth

Next steps:

Monitor cluster health for 24-48 hours
Re-enable auto.leader.rebalance.enable=true if desired
Update monitoring dashboards to track KRaft metrics
Plan ZooKeeper infrastructure decommissioning

Get the Full Lab Environment

📧 Enter your email to access the complete Kafka KRaft migration lab setup

Free • Docker setup • Complete scripts

Introduction#

Migration Overview#

Architecture Overview#

Before Migration: ZooKeeper Mode#

After Migration: KRaft Mode#

Initial Setup#

Build Docker Images#

Start Containers#

Initialize ZooKeeper Ensemble#

Initialize Kafka Brokers#

Verify Kafka Version#

Create Test Topic#

Phase 1: Deploy KRaft Controllers#

Get Cluster ID from ZooKeeper#

Start KRaft Controllers#

Verify Quorum Formation#

Phase 2: Enable Migration on Brokers#

Add Migration Configuration#

Rolling Restart Brokers#

Verify Migration Completed#

Phase 3: Move Brokers to KRaft Mode#

Update Broker Configurations#

Rolling Restart Brokers#

Verify Brokers as Observers#

Verify No Under-Replicated Partitions#

Phase 4: Finalize Migration#

Remove ZooKeeper Config from Controllers#

Rolling Restart Controllers#

Decommission ZooKeeper#

Verify Final Quorum Status#

Test Producer/Consumer#

Verification Summary#

Rollback Procedures#

Rollback from Phase 1#

Rollback from Phase 2#

Rollback from Phase 3#

⚠️ Phase 4: No Easy Rollback#

Production Mitigations#

Disable Automatic Leader Rebalancing#

Disable Unclean Leader Election#

Common Production Issues#

Cleanup#

Conclusion#

Get the Full Lab Environment#

References#

Introduction

Migration Overview

Architecture Overview

Before Migration: ZooKeeper Mode

After Migration: KRaft Mode

Initial Setup

Build Docker Images

Start Containers

Initialize ZooKeeper Ensemble

Initialize Kafka Brokers

Verify Kafka Version

Create Test Topic

Phase 1: Deploy KRaft Controllers

Get Cluster ID from ZooKeeper

Start KRaft Controllers

Verify Quorum Formation

Phase 2: Enable Migration on Brokers

Add Migration Configuration

Rolling Restart Brokers

Verify Migration Completed

Phase 3: Move Brokers to KRaft Mode

Update Broker Configurations

Rolling Restart Brokers

Verify Brokers as Observers

Verify No Under-Replicated Partitions

Phase 4: Finalize Migration

Remove ZooKeeper Config from Controllers

Rolling Restart Controllers

Decommission ZooKeeper

Verify Final Quorum Status

Test Producer/Consumer

Verification Summary

Rollback Procedures

Rollback from Phase 1

Rollback from Phase 2

Rollback from Phase 3

⚠️ Phase 4: No Easy Rollback

Production Mitigations

Disable Automatic Leader Rebalancing

Disable Unclean Leader Election

Common Production Issues

Cleanup

Conclusion

Get the Full Lab Environment

References