If you’re working with machine learning, deep learning, or other GPU-intensive workloads, setting up Docker with NVIDIA GPU support is essential. This guide will walk you through the process of installing and configuring Docker, NVIDIA drivers, and the NVIDIA Container Toolkit on an AWS G4dn instance running Ubuntu.
Prerequisites
Ensure you have:
- An AWS G4dn instance with Ubuntu installed.
sudo
privileges to install and configure packages.
Step 1: Update System and Install Docker
First, update the system and install Docker:
sudo apt update && sudo apt upgrade -y && sudo apt install docker.io -y
This ensures you have the latest system updates and installs Docker.
Step 2: Install Docker Compose
Download and set up Docker Compose:
wget https://github.com/docker/compose/releases/download/v2.32.4/docker-compose-linux-x86_64
mv docker-compose-linux-x86_64 /usr/bin/docker-compose
chmod +x /usr/bin/docker-compose
Step 3: Install NVIDIA Drivers
Install the required NVIDIA utilities:
sudo apt install nvidia-utils-535 -y
Next, determine the latest available NVIDIA driver version for GCP instances and install it:
NVIDIA_DRIVER_VERSION=$(sudo apt-cache search 'linux-modules-nvidia-[0-9]+-gcp$' | awk '{print $1}' | sort | tail -n 1 | head -n 1 | awk -F"-" '{print $4}')
sudo apt install linux-modules-nvidia-${NVIDIA_DRIVER_VERSION}-gcp nvidia-driver-${NVIDIA_DRIVER_VERSION} -y
Check the installation:
sudo nvidia-smi
If the drivers are correctly installed, you should see your GPU details in the output.
Step 4: Install NVIDIA Container Toolkit
To enable GPU support for Docker containers, install the NVIDIA Container Toolkit:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
Enable experimental features:
sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
Update and install:
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
Configure Docker to use the NVIDIA runtime:
sudo nvidia-ctk runtime configure --runtime=docker
sudo nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json
Step 5: Restart Docker and Verify
Restart Docker to apply the changes:
sudo systemctl restart docker
Now, verify that your Docker installation recognizes the GPU by running:
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
If everything is set up correctly, you should see the NVIDIA GPU details in the output.
Conclusion
You have now successfully installed and configured Docker with NVIDIA GPU support on an AWS G4dn instance. This setup allows you to run GPU-accelerated applications inside Docker containers, making it easier to deploy machine learning and AI workloads.
Stay tuned for more guides on optimizing GPU performance for your workloads!