Deploy Kafka and Zookeeper on Docker

In this tutorial, we’ll be learning the a few basics of Kafka and deploying it as a container on Docker

What is Event Streaming ?

  • Event Streaming is the capturing and processing of data in real-time from various sources
  • This source might be a database, a sensor, mobile devices, applications etc.,
  • The data obtained from these sources needs to be stored reliably and obtained when required to manipulate or process it as required
  • On the whole, Event Streaming can be defined as the continuous flow and interpretation of data from a source

Where is Event Streaming Used ?

  • Any application that involves real time data can make use of Event Streaming
  • Some examples: Stock Exchanges, Payment Systems, Banks, Shipment Tracking, Logistics, IoT applications, and the list goes on

What is Kafka ?

  • Kafka is a distributed system consisting of servers and clients which can be deployed on VM’s, Containers or Cloud Environments
  • Servers:
    • Kafka runs as a cluster of one or more servers which can be deployed on multiple regions and data centres
    • Some of these servers form a layer for storage called Brokers
    • Others run Kafka Connect which integrates with the data source and constantly updates changes
    • Kafka Servers are highly scalable and fault-tolerant
    • Kafka ensures no data is lost when a server goes down by immediately assigning another server to take over
  • Clients:
    • Clients read, write and process streams of events in parallel, at scale, and in a fault-tolerant manner
    • Kafka-Streams Client can be used in a wide variety of Programming Languages

Kafka Terminologies

  • Producer: It is a component/application which is used to send/publish data to the Kafka Server
  • Consumer: It is a component/application which is used to receive/consume data from the Kafka Server
  • Broker: A broker is a server which facilitates the communication between a Producer and a Consumer
  • Kafka Cluster: A group of Kafka Servers which share the load among themselves along with ensuring zero/minimal data loss
  • Topic: A Topic is a unique name given to a stream of data by which it can be identified
  • Partitions: The data inside a topic can be distributed among partitions which ensures the load is divided among the brokers
  • Offsets: Offset is an id given to each data unit. This id helps in identifying a data unit based on the requirement
  • Consumer Groups: A group of consumer applications which share the load of consuming and processing the data as required
  • Zookeeper: This is a component/application which takes care of managing the Kafka Clusters and ensuring highly reliable and fault-tolerant setup
  • For detailed understanding of Kafka Architecture refer the below diagram
Kafka

Kafka API’s

  • Admin API: Manage and inspect topics, brokers and all other Kafka Components
  • Producer API: Publish a stream of events to brokers via topics
  • Consumer API: Subscribe and read data from topics
  • Kafka Streams API: Higher level functions to process, manage and manipulate a stream
  • Kafka Connect API: Integrate Kafka with different types of data sources and ensure sync between streams and data produced

In this tutorial, we will also be deploying another image called sheepkiller which provides us a UI to manage the Kafka Cluster

Prerequisites

Deploy Kafka and Zookeeper on Docker

1. Create a directory to manage all files related to the Containers

# Create a directory
mkdir kafka
# Navigate into the directory
cd kafka
# Create a docker compose file
touch docker-compose.yaml

2. Configure the containers using the docker-compose file

# Open the docker-compose file with an editor of your choice and add in the below contents
nano docker-compose.yaml
# Paste the following into the docker-compose file

version: "3"
# These are the services we will be deploying i.e, zookeeper, 3 instances of kafka, sheepkiller
services:
    # Configuration for the zookeeper image
    zookeeper:
        # The name and version of the image to use
        image: zookeeper:3.4.9
        # Hostname for the container
        hostname: zk
        # Port Mapping for the container
        ports:
            # Map the host port 2181 to the container port 2181
            - "2181:2181"
        # Environment Variables
        environment:
            # ID is required in case multiple containers of zookeepers are deployed
            ZOO_MY_ID: 1
            # Port on which zookeeper will run in the container
            ZOO_PORT: 2181
            ZOO_SERVERS: server.1=zk:2888:3888
        # Volume mounts to store the container data
        volumes:
            - ./kafka-data/zookeeper/data:/data
            - ./kafka-data/zookeeper/datalog:/datalog
    # Configuration for the kafka image
    kafka1:
        # The name and version of the image to use
        image: confluentinc/cp-kafka:latest
        # Hostname for the container
        hostname: kafka1
        # Port Mapping for the container
        ports:
            # Map the host port 2181 to the container port 2181
            - "9091:9091"
        # Environment Variables
        environment:
            # Configure kafka listeners with the information of the ports on which it needs to run
            KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka1:19091,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9091
            # Security protocol configuration
            KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
            # Kafka Listener name configuration
            KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
            # Zookeeper hostname and port
            KAFKA_ZOOKEEPER_CONNECT: "zk:2181"
            # Broker ID in case of multiple brokers
            KAFKA_BROKER_ID: 1
            # Logging configuration
            KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO"
        # Volume mounts to store the container data
        volumes:
            - ./kafka-data/kafka1/data:/var/lib/kafka/data
        # Ensures zookeeper is running before running this container
        depends_on:
            - zookeeper
    # Similar to configuration of kafka1 container except for the port and broker ID
    kafka2:
        image: confluentinc/cp-kafka:latest
        hostname: kafka2
        ports:
            - "9092:9092"
        environment:
            KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka2:19092,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9092
            KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
            KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
            KAFKA_ZOOKEEPER_CONNECT: "zk:2181"
            KAFKA_BROKER_ID: 2
            KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO"
        volumes:
            - ./kafka-data/kafka2/data:/var/lib/kafka/data
        depends_on:
            - zookeeper
    # Similar to configuration of kafka1 container except for the port and broker ID
    kafka3:
        image: confluentinc/cp-kafka:latest
        hostname: kafka3
        ports:
            - "9093:9093"
        environment:
            KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka3:19093,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9093
            KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
            KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
            KAFKA_ZOOKEEPER_CONNECT: "zk:2181"
            KAFKA_BROKER_ID: 3
            KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO"
        volumes:
            - ./kafka-data/kafka3/data:/var/lib/kafka/data
        depends_on:
            - zookeeper
    # Configuration for the sheepkiller image (The container is named manager)
    manager:
        # Image and version for the sheepkiller image
        image: sheepkiller/kafka-manager:latest
        # Port mapping for the Sheepkiller container
        ports:
            # Map the host port 9000 to the container port 9000
            - 9000:9000
        # Environment variables for the container
        environment:
            # Provide a list of the zookeeper hosts, In our case we have only one zookeeper
            - ZK_HOSTS=zk:2181
        # Ensures zookeeper is running before running this container
        depends_on:
            - zookeeper

3. Run the container through docker-compose

# -d will run the containers in detached state
docker-compose up -d

4. Open sheepkiller on your browser by navigating to http://<ip>:<port> where <ip> is the ip of your machine. <ip> is localhost if you are running it on your local machine. <port> is the port on which sheepkiller is deployed (9000 in this case)

Kafka

5. Click on Add Cluster to add a cluster

Kafka

6. Now enter the details which are required to create the cluster. For the sake of simplicity, here we will only be adding the cluster name and zookeeper hosts and keeping all other properties as default

Kafka

7. Scroll down and click on “Save” and you should now see the following screen stating the operation was successful

Kafka

8. You should now be able to see your cluster listed

Kafka

Congratulations!! You have successfully deployed and created a Kafka Cluster. Stay tuned to learn more about how to use Kafka

6260cookie-checkDeploy Kafka and Zookeeper on Docker

2 Comments

  1. You completed some nice points there. I did a search on the issue and found nearly all folks will consent with your blog.

  2. Pingback:Change Data Capture (CDC) with Debezium and Kafka » EasyCode

Leave a Comment

Your email address will not be published. Required fields are marked *