Deploy Kafka and Zookeeper on Docker
In this tutorial, we’ll be learning the a few basics of Kafka and deploying it as a container on Docker
What is Event Streaming ?
- Event Streaming is the capturing and processing of data in real-time from various sources
- This source might be a database, a sensor, mobile devices, applications etc.,
- The data obtained from these sources needs to be stored reliably and obtained when required to manipulate or process it as required
- On the whole, Event Streaming can be defined as the continuous flow and interpretation of data from a source
Where is Event Streaming Used ?
- Any application that involves real time data can make use of Event Streaming
- Some examples: Stock Exchanges, Payment Systems, Banks, Shipment Tracking, Logistics, IoT applications, and the list goes on
What is Kafka ?
- Kafka is a distributed system consisting of servers and clients which can be deployed on VM’s, Containers or Cloud Environments
- Servers:
- Kafka runs as a cluster of one or more servers which can be deployed on multiple regions and data centres
- Some of these servers form a layer for storage called Brokers
- Others run Kafka Connect which integrates with the data source and constantly updates changes
- Kafka Servers are highly scalable and fault-tolerant
- Kafka ensures no data is lost when a server goes down by immediately assigning another server to take over
- Clients:
- Clients read, write and process streams of events in parallel, at scale, and in a fault-tolerant manner
- Kafka-Streams Client can be used in a wide variety of Programming Languages
Kafka Terminologies
- Producer: It is a component/application which is used to send/publish data to the Kafka Server
- Consumer: It is a component/application which is used to receive/consume data from the Kafka Server
- Broker: A broker is a server which facilitates the communication between a Producer and a Consumer
- Kafka Cluster: A group of Kafka Servers which share the load among themselves along with ensuring zero/minimal data loss
- Topic: A Topic is a unique name given to a stream of data by which it can be identified
- Partitions: The data inside a topic can be distributed among partitions which ensures the load is divided among the brokers
- Offsets: Offset is an id given to each data unit. This id helps in identifying a data unit based on the requirement
- Consumer Groups: A group of consumer applications which share the load of consuming and processing the data as required
- Zookeeper: This is a component/application which takes care of managing the Kafka Clusters and ensuring highly reliable and fault-tolerant setup
- For detailed understanding of Kafka Architecture refer the below diagram
Kafka API’s
- Admin API: Manage and inspect topics, brokers and all other Kafka Components
- Producer API: Publish a stream of events to brokers via topics
- Consumer API: Subscribe and read data from topics
- Kafka Streams API: Higher level functions to process, manage and manipulate a stream
- Kafka Connect API: Integrate Kafka with different types of data sources and ensure sync between streams and data produced
In this tutorial, we will also be deploying another image called sheepkiller which provides us a UI to manage the Kafka Cluster
Prerequisites
- Docker and Docker-Compose should be installed and ready to use
Deploy Kafka and Zookeeper on Docker
1. Create a directory to manage all files related to the Containers
# Create a directory
mkdir kafka
# Navigate into the directory
cd kafka
# Create a docker compose file
touch docker-compose.yaml
2. Configure the containers using the docker-compose file
# Open the docker-compose file with an editor of your choice and add in the below contents
nano docker-compose.yaml
# Paste the following into the docker-compose file
version: "3"
# These are the services we will be deploying i.e, zookeeper, 3 instances of kafka, sheepkiller
services:
# Configuration for the zookeeper image
zookeeper:
# The name and version of the image to use
image: zookeeper:3.4.9
# Hostname for the container
hostname: zk
# Port Mapping for the container
ports:
# Map the host port 2181 to the container port 2181
- "2181:2181"
# Environment Variables
environment:
# ID is required in case multiple containers of zookeepers are deployed
ZOO_MY_ID: 1
# Port on which zookeeper will run in the container
ZOO_PORT: 2181
ZOO_SERVERS: server.1=zk:2888:3888
# Volume mounts to store the container data
volumes:
- ./kafka-data/zookeeper/data:/data
- ./kafka-data/zookeeper/datalog:/datalog
# Configuration for the kafka image
kafka1:
# The name and version of the image to use
image: confluentinc/cp-kafka:latest
# Hostname for the container
hostname: kafka1
# Port Mapping for the container
ports:
# Map the host port 2181 to the container port 2181
- "9091:9091"
# Environment Variables
environment:
# Configure kafka listeners with the information of the ports on which it needs to run
KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka1:19091,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9091
# Security protocol configuration
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
# Kafka Listener name configuration
KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
# Zookeeper hostname and port
KAFKA_ZOOKEEPER_CONNECT: "zk:2181"
# Broker ID in case of multiple brokers
KAFKA_BROKER_ID: 1
# Logging configuration
KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO"
# Volume mounts to store the container data
volumes:
- ./kafka-data/kafka1/data:/var/lib/kafka/data
# Ensures zookeeper is running before running this container
depends_on:
- zookeeper
# Similar to configuration of kafka1 container except for the port and broker ID
kafka2:
image: confluentinc/cp-kafka:latest
hostname: kafka2
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka2:19092,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
KAFKA_ZOOKEEPER_CONNECT: "zk:2181"
KAFKA_BROKER_ID: 2
KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO"
volumes:
- ./kafka-data/kafka2/data:/var/lib/kafka/data
depends_on:
- zookeeper
# Similar to configuration of kafka1 container except for the port and broker ID
kafka3:
image: confluentinc/cp-kafka:latest
hostname: kafka3
ports:
- "9093:9093"
environment:
KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka3:19093,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9093
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
KAFKA_ZOOKEEPER_CONNECT: "zk:2181"
KAFKA_BROKER_ID: 3
KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO"
volumes:
- ./kafka-data/kafka3/data:/var/lib/kafka/data
depends_on:
- zookeeper
# Configuration for the sheepkiller image (The container is named manager)
manager:
# Image and version for the sheepkiller image
image: sheepkiller/kafka-manager:latest
# Port mapping for the Sheepkiller container
ports:
# Map the host port 9000 to the container port 9000
- 9000:9000
# Environment variables for the container
environment:
# Provide a list of the zookeeper hosts, In our case we have only one zookeeper
- ZK_HOSTS=zk:2181
# Ensures zookeeper is running before running this container
depends_on:
- zookeeper
3. Run the container through docker-compose
# -d will run the containers in detached state
docker-compose up -d
4. Open sheepkiller on your browser by navigating to http://<ip>:<port> where <ip> is the ip of your machine. <ip> is localhost if you are running it on your local machine. <port> is the port on which sheepkiller is deployed (9000 in this case)
5. Click on Add Cluster to add a cluster
6. Now enter the details which are required to create the cluster. For the sake of simplicity, here we will only be adding the cluster name and zookeeper hosts and keeping all other properties as default
7. Scroll down and click on “Save” and you should now see the following screen stating the operation was successful
8. You should now be able to see your cluster listed
Congratulations!! You have successfully deployed and created a Kafka Cluster. Stay tuned to learn more about how to use Kafka
You completed some nice points there. I did a search on the issue and found nearly all folks will consent with your blog.
Pingback:Change Data Capture (CDC) with Debezium and Kafka » EasyCode