Introduction
Before we begin talking about how to install Apache Kafka on Ubuntu 20.04, let’s briefly understand – What is Apache Kafka?
Apache Kafka is a well-known message broker which has the potential to handle large volumes of real-time data. In comparison with ActiveMQ & RabbitMQ, a Kafka cluster has a higher throughput along with high scalability and fault tolerance. Generally, it is used as a publish/subscribe messaging system. Though, there are organizations that use it for log aggregation because it offers persistent storage for published messages.
In this tutorial, you will install Apache Kafka on Ubuntu 20.04.
Prerequisites
- An Ubuntu 20.04 server and a non-root user with sudo privileges
- At least 4GB of RAM on the server, otherwise it leads to the Kafka service failing, with the Java virtual machine (JVM) showing an “Out Of Memory” exception during startup.
- OpenJDK 17 is installed on the server. Since Kafka is written in Java it requires JVM.
Step 1 – Creating a user for Kafka
1) A dedicated user for Kafka is recommended since it can handle requests over a network. This can turn out to be extremely helpful in case the Kafka server is compromised. A dedicated Kafka user needs to be created in this step, but a different non-root user to perform other tasks on the server needs to be created once the Kafka setup is done.
With the help of the useradd
command, create a user Kafka:
2) The -m
flag will ensure that a home directory is created for the user. /home/kafka
, the home directory, which will act as the workspace directory for executing commands.
3) Then, use the passwd
command to set a password:
4) After that, add the Kafka user to the sudo
group with the help of the adduser
command. This is done so that it has privileges to install Kafka’s dependencies.
5) Next, log in with the help of su
:
Step 2 – Download and Extract Kafka Binaries
1) At first, create a directory in /home/kafka
called Downloads
to store the downloads.
2) With the help of curl
command, download the Kafka binaries.
3) You then need to create a directory kafka
which will be the base directory of the Kafka installation
4) Now, extract the downloaded archive using the tar
command:
5) Then, you need to specify the --strip 1
flag to ensure that the archive’s contents are extracted in ~/kafka/
and not in directories like /kafka/kafka_2.12-3.4.0/
.
Step 3 – Configuring the Kafka Server
1) The default settings don’t allow a user to delete a topic, category, group, or feed name to which messages can be published. This needs to be modified, for which the configuration file needs to be edited.
2) Open the server.properties
file with the help of a text editor:
3) Add the following to the bottom of the file in order to add a setting that allows you to delete Kafka topics.
4) Save and exit the text editor.
Step 4 – Creating Systemd Unit Files and Starting the Kafka Server
Systemd files help us in performing common service actions such as starting, stopping, and restarting Kafka in a manner consistent with other Linux services.
Kafka uses Zookeeper to manage its cluster states and configurations. It is used in many systems as an integral component.
1) Now, create a unit file for zookeeper
:
2) After that, enter the following unit definition into the file:
3) The [Unit]
section specifies that Zookeeper requires networking and the file system should be ready so that it can function properly.
The [Service]
the section specifies that systemd should use zookeeper-server-start.sh
and zookeeper-server-stop.sh
shell files for initiating and halting the service. Also, it specifies that Zookeeper should be restarted automatically if it exits abnormally.
4) You then need to create the system service file for kafka
:
5) Next, you have to enter the below-mentioned unit definition into the kafka.service
file:
6) The [Unit]
the section specifies that this unit file depends on zookeeper.service
which will ensure zookeeper
gets started automatically when the kafka
service starts.
The systemd should use the kafka-server-start.sh
and kafka-server-stop.sh
shell files for starting and stopping the service. The [Service]
section also specifies that Kafka should be restarted if it exits abnormally.
7) Finally, start Kafka with the following command:
8) Next, check the journal logs for the kafka
unit to ensure that the server has started successfully.
9) You will get an output similar as below:
Output
Jul 17 18:38:59 kafka-ubuntu systemd[1]: Started kafka.service.
The Kafka server would be listening on the port 9092
.
10) The kafka
service won’t start automatically if we reboot the server. The following command will help in the same:
Step 5 – Testing the Installation
You can begin by publishing the message “Hello World” in order to ensure that the Kafka service is behaving correctly. This requires:
- A producer, that enables the publication of records and data to topics.
- A consumer reads messages and data from topics.
1) Now, for that, you need to create a topic TutorialTopic
:
With the help of the kafka-console-producer.sh
the script, you’ll be able to create a producer. The Kafka server’s hostname, port, and topic name are the arguments expected by it.
2) Then, publish the string "Hello, World"
to the TutorialTopic
topic:
3) After that, you should create a Kafka consumer using the kafka-console-consumer.sh
script. It expects the ZooKeeper server’s hostname and port, along with a topic name, as arguments.
The following command consumes messages from TutorialTopic
.
4) You will get Hello, World
as an output.
Output
Hello, World
The script will carry on running, waiting for more messages to be published. You may open a new terminal and start a producer to publish a few more messages. All of them shall be visible in the consumer script.
Then, with the help of CTRL+C
, stop the consumer script.
Step 6 – Install KafkaT (Optional)
1) KafkaT is a tool from Airbnb that help users by making it easy to view details about the Kafka cluster and perform certain administrative tasks from the command line. Also, you’ll require build-essential
packaging order to build other gems it is dependent on, you can use apt
to install them:
2) You can then proceed with installing KafkaT with the help of the gem command:
.kafkatcfg
is used as the configuration file by KafkaT to determine the installation and log directories of the Kafka server. An entry pointing KafkaT to your ZooKeeper instance should also be there.
3) Create a new file .kafkatcfg
:
4) Next, you need to add the lines mentioned below to specify the required information about your Kafka server and Zookeeper instance:
5) You can use the following command to view details about all Kafka partitions and receive the below output:
Output
Topic Partition Leader Replicas ISRs
TutorialTopic 0 0 [0] [0]
__consumer_offsets 0 0 [0] [0]
...
...
You should be able to see TutorialTopic
along with __consumer_offsets
, which is useful for storing client-related information. The lines starting with __consumer_offsets
can be safely ignored.
Step 7 – Setting Up a Multi-Node Cluster (Optional)
In case you wish to create a multi-broker cluster with the help of more Ubuntu 18.04 machines, you’ll have to repeat Steps 1, 4 & 5 on each of the machines. Also, make the following changes to the server.properties
file.
- The value of the
broker.id
property needs to be changed such that it is unique throughout the cluster. This property identifies each server in the cluster uniquely and can also have any string as its value. For instance,"server1"
,"server2"
, etc. - The value of the
zookeeper.connect
property needs to be changed such that all node points to one ZooKeeper instance. This property specifies the Zookeeper instance’s address and follows the<HOSTNAME/IP_ADDRESS>:<PORT>
format. For instance,"203.0.113.0:2181"
,"203.0.113.1:2181"
etc.
The value of the zookeeper.connect
property on each node should be the same, comma-separated string listing the IP addresses and port numbers of all the ZooKeeper instances, in case you want to have multiple ZooKeeper instances for your cluster.
Step 8 – Restricting the Kafka User
1) You can now proceed to remove all admin privileges from the Kafka user. Make sure to log out and then log in as any other non-root sudo user before you begin. In case you’re running the same shell session, simply use exit
.
Then, remove the Kafka user from the sudo group:
2) You can lock the Kafka user’s password using passwd
command, which ensures that nobody directly logs into the server using this account.
3) Only a root or a sudo user can log in as kafka
at this point, with the following command:
You can unlock it with the help of passwd
with the -u
option:
FAQs to Install Apache Kafka on Ubuntu 20.04
What is Apache Kafka, and what is it used for?
Apache Kafka is an open-source distributed streaming platform that is used to publish and subscribe to streams of records in real time, effectively handling large amounts of data.
What are the system requirements for installing Apache Kafka on Ubuntu 20.04?
To install Apache Kafka on Ubuntu 20.04, you will need a 64-bit operating system with at least 1 GB of RAM, and a minimum of 2 CPU cores.
What is the default directory for installing Apache Kafka on Ubuntu 20.04?
The default directory for installing Apache Kafka on Ubuntu 20.04 is /opt/kafka
.
What is the configuration file for Apache Kafka on Ubuntu 20.04?
The configuration file for Apache Kafka on Ubuntu 20.04 is located in the /opt/kafka/config/server.properties
file.
How do I start the Apache Kafka server on Ubuntu 20.04?
To start the Apache Kafka server on Ubuntu 20.04, use the command: sudo systemctl start kafka
.
How do I check if Apache Kafka is running on Ubuntu 20.04?
To check if Apache Kafka is running on Ubuntu 20.04, use the command: sudo systemctl status kafka
.
How do I stop the Apache Kafka server on Ubuntu 20.04?
To stop the Apache Kafka server on Ubuntu 20.04, use the command: sudo systemctl stop Kafka.
How do I uninstall Apache Kafka from Ubuntu 20.04?
To uninstall Apache Kafka from Ubuntu 20.04, use the command: sudo apt-get remove kafka
.