Deploy Kafka and Zookeeper on Docker

READ TIME ( ~ 15 MINS)

Deploy Kafka and Zookeeper on Docker

In this tutorial, we’ll be learning the a few basics of Kafka and deploying it as a container on Docker

What is Event Streaming ?

Event Streaming is the capturing and processing of data in real-time from various sources
This source might be a database, a sensor, mobile devices, applications etc.,
The data obtained from these sources needs to be stored reliably and obtained when required to manipulate or process it as required
On the whole, Event Streaming can be defined as the continuous flow and interpretation of data from a source

Where is Event Streaming Used ?

Any application that involves real time data can make use of Event Streaming
Some examples: Stock Exchanges, Payment Systems, Banks, Shipment Tracking, Logistics, IoT applications, and the list goes on

What is Kafka ?

Kafka is a distributed system consisting of servers and clients which can be deployed on VM’s, Containers or Cloud Environments
Servers:
- Kafka runs as a cluster of one or more servers which can be deployed on multiple regions and data centres
- Some of these servers form a layer for storage called Brokers
- Others run Kafka Connect which integrates with the data source and constantly updates changes
- Kafka Servers are highly scalable and fault-tolerant
- Kafka ensures no data is lost when a server goes down by immediately assigning another server to take over
Clients:
- Clients read, write and process streams of events in parallel, at scale, and in a fault-tolerant manner
- Kafka-Streams Client can be used in a wide variety of Programming Languages

Kafka Terminologies

Producer: It is a component/application which is used to send/publish data to the Kafka Server
Consumer: It is a component/application which is used to receive/consume data from the Kafka Server
Broker: A broker is a server which facilitates the communication between a Producer and a Consumer
Kafka Cluster: A group of Kafka Servers which share the load among themselves along with ensuring zero/minimal data loss
Topic: A Topic is a unique name given to a stream of data by which it can be identified
Partitions: The data inside a topic can be distributed among partitions which ensures the load is divided among the brokers
Offsets: Offset is an id given to each data unit. This id helps in identifying a data unit based on the requirement
Consumer Groups: A group of consumer applications which share the load of consuming and processing the data as required
Zookeeper: This is a component/application which takes care of managing the Kafka Clusters and ensuring highly reliable and fault-tolerant setup
For detailed understanding of Kafka Architecture refer the below diagram

Kafka API’s

Admin API: Manage and inspect topics, brokers and all other Kafka Components
Producer API: Publish a stream of events to brokers via topics
Consumer API: Subscribe and read data from topics
Kafka Streams API: Higher level functions to process, manage and manipulate a stream
Kafka Connect API: Integrate Kafka with different types of data sources and ensure sync between streams and data produced

In this tutorial, we will also be deploying another image called sheepkiller which provides us a UI to manage the Kafka Cluster

Prerequisites

Docker and Docker-Compose should be installed and ready to use

Deploy Kafka and Zookeeper on Docker

1. Create a directory to manage all files related to the Containers

# Create a directory
mkdir kafka
# Navigate into the directory
cd kafka
# Create a docker compose file
touch docker-compose.yaml

2. Configure the containers using the docker-compose file

# Open the docker-compose file with an editor of your choice and add in the below contents
nano docker-compose.yaml
# Paste the following into the docker-compose file

version: "3"
# These are the services we will be deploying i.e, zookeeper, 3 instances of kafka, sheepkiller
services:
    # Configuration for the zookeeper image
    zookeeper:
        # The name and version of the image to use
        image: zookeeper:3.4.9
        # Hostname for the container
        hostname: zk
        # Port Mapping for the container
        ports:
            # Map the host port 2181 to the container port 2181
            - "2181:2181"
        # Environment Variables
        environment:
            # ID is required in case multiple containers of zookeepers are deployed
            ZOO_MY_ID: 1
            # Port on which zookeeper will run in the container
            ZOO_PORT: 2181
            ZOO_SERVERS: server.1=zk:2888:3888
        # Volume mounts to store the container data
        volumes:
            - ./kafka-data/zookeeper/data:/data
            - ./kafka-data/zookeeper/datalog:/datalog
    # Configuration for the kafka image
    kafka1:
        # The name and version of the image to use
        image: confluentinc/cp-kafka:latest
        # Hostname for the container
        hostname: kafka1
        # Port Mapping for the container
        ports:
            # Map the host port 2181 to the container port 2181
            - "9091:9091"
        # Environment Variables
        environment:
            # Configure kafka listeners with the information of the ports on which it needs to run
            KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka1:19091,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9091
            # Security protocol configuration
            KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
            # Kafka Listener name configuration
            KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
            # Zookeeper hostname and port
            KAFKA_ZOOKEEPER_CONNECT: "zk:2181"
            # Broker ID in case of multiple brokers
            KAFKA_BROKER_ID: 1
            # Logging configuration
            KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO"
        # Volume mounts to store the container data
        volumes:
            - ./kafka-data/kafka1/data:/var/lib/kafka/data
        # Ensures zookeeper is running before running this container
        depends_on:
            - zookeeper
    # Similar to configuration of kafka1 container except for the port and broker ID
    kafka2:
        image: confluentinc/cp-kafka:latest
        hostname: kafka2
        ports:
            - "9092:9092"
        environment:
            KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka2:19092,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9092
            KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
            KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
            KAFKA_ZOOKEEPER_CONNECT: "zk:2181"
            KAFKA_BROKER_ID: 2
            KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO"
        volumes:
            - ./kafka-data/kafka2/data:/var/lib/kafka/data
        depends_on:
            - zookeeper
    # Similar to configuration of kafka1 container except for the port and broker ID
    kafka3:
        image: confluentinc/cp-kafka:latest
        hostname: kafka3
        ports:
            - "9093:9093"
        environment:
            KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka3:19093,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9093
            KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
            KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
            KAFKA_ZOOKEEPER_CONNECT: "zk:2181"
            KAFKA_BROKER_ID: 3
            KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO"
        volumes:
            - ./kafka-data/kafka3/data:/var/lib/kafka/data
        depends_on:
            - zookeeper
    # Configuration for the sheepkiller image (The container is named manager)
    manager:
        # Image and version for the sheepkiller image
        image: sheepkiller/kafka-manager:latest
        # Port mapping for the Sheepkiller container
        ports:
            # Map the host port 9000 to the container port 9000
            - 9000:9000
        # Environment variables for the container
        environment:
            # Provide a list of the zookeeper hosts, In our case we have only one zookeeper
            - ZK_HOSTS=zk:2181
        # Ensures zookeeper is running before running this container
        depends_on:
            - zookeeper

3. Run the container through docker-compose

# -d will run the containers in detached state
docker-compose up -d

4. Open sheepkiller on your browser by navigating to http://<ip>:<port> where <ip> is the ip of your machine. <ip> is localhost if you are running it on your local machine. <port> is the port on which sheepkiller is deployed (9000 in this case)

5. Click on Add Cluster to add a cluster

6. Now enter the details which are required to create the cluster. For the sake of simplicity, here we will only be adding the cluster name and zookeeper hosts and keeping all other properties as default

7. Scroll down and click on “Save” and you should now see the following screen stating the operation was successful

8. You should now be able to see your cluster listed

Congratulations!! You have successfully deployed and created a Kafka Cluster. Stay tuned to learn more about how to use Kafka

62640cookie-checkDeploy Kafka and Zookeeper on Docker

Deploy Kafka and Zookeeper on Docker

Deploy Kafka and Zookeeper on Docker

What is Event Streaming ?

Where is Event Streaming Used ?

What is Kafka ?

Kafka API’s

Prerequisites

Deploy Kafka and Zookeeper on Docker

2 Comments

Leave a Comment Cancel