End-to-end Stable Diffusion test on Azure NC A100/H100 MIG

Please follow and rate my GitHub repository.https://github.com/xinyuwei-david/david-share.git

E2E Stable Diffusion on A100 MIG

A100/H100 is a high-end training GPU, which can also be used for inference. To save computing power and GPU memory, you can use NVIDIA Multi-Instance GPU (MIG), and then run Stable Diffusion on MIG.
We are testing on Azure NC A100 VM.

Composition MIG

Enable MIG on the first physical GPU.

root@david1a100:~# nvidia-smi -i 0 -mig 1

After rebooting the VM, MIG was enabled.

Lists all available GPU MIG profiles.

#nvidia-smi mig -lgip

At this point, we need to calculate how to best utilize GPU resources to meet the compute power and GPU memory requirements for SD.

Divide the A100 into four parts: ID 14×3 and ID 20×1.

root@david1a100:~# sudo nvidia-smi mig -cgi 14,14,14,20 -C
Successfully created GPU instance ID  5 on GPU  0 using profile MIG 2g.20gb (ID 14)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  5 using profile MIG 2g.20gb (ID  1)
Successfully created GPU instance ID  3 on GPU  0 using profile MIG 2g.20gb (ID 14)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  3 using profile MIG 2g.20gb (ID  1)
Successfully created GPU instance ID  4 on GPU  0 using profile MIG 2g.20gb (ID 14)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  4 using profile MIG 2g.20gb (ID  1)
Successfully created GPU instance ID 13 on GPU  0 using profile MIG 1g.10gb+me (ID 20)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID 13 using profile MIG 1g.10gb (ID  0)

Maintains MIG configuration.

You need to set up a bash script because the CPU MIG configuration will be lost when you reboot the VM.

#vi /usr/local/bin/setup_mig.sh

!/bin/bash
nvidia-smi -i 0 -mig 1
sudo nvidia-smi mig -dgi
sudo nvidia-smi mig -cgi 14,14,14,20 -C

Grant execute permission:

chmod +x /usr/local/bin/setup_mig.sh

Create a system service.

vi /etc/systemd/system/setup_mig.service

[Unit]  
Description=Setup NVIDIA MIG Instances  
After=default.target  

[Service]  
Type=oneshot  
ExecStart=/usr/local/bin/setup_mig.sh  

[Install]  
WantedBy=default.target

Activate and start setup_mig.service.

sudo systemctl daemon-reload 
sudo systemctl enable setup_mig.service
sudo systemctl status setup_mig.service

Preparing the MIG container environment

Installing Docker and NVIDIA Container Toolkit on VMs

sudo apt-get update  
sudo apt-get install -y docker.io  
sudo apt-get install -y aptitude  
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)  
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -  
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list  
sudo apt-get update  
sudo aptitude install -y nvidia-docker2  
sudo systemctl restart docker  
sudo aptitude install -y nvidia-container-toolkit  
sudo systemctl restart docker

Configuring container creation scripts in VMs

#vi createcontainer.sh

#!/bin/bash

# 容器名称数组
CONTAINER_NAMES=("mig1_tensorrt_container" "mig2_tensorrt_container" "mig3_tensorrt_container" "mig4_tensorrt_container")

# 删除已有的容器
for CONTAINER in "${CONTAINER_NAMES[@]}"; do
  if [ "$(sudo docker ps -a -q -f name=$CONTAINER)" ]; then
    echo "Stopping and removing container: $CONTAINER"
    sudo docker stop $CONTAINER
    sudo docker rm $CONTAINER
  fi
done

# 获取MIG设备的UUID
MIG_UUIDS=$(nvidia-smi -L | grep 'MIG' | awk -F 'UUID: ' '{print $2}' | awk -F ')' '{print $1}')
UUID_ARRAY=($MIG_UUIDS)

# 检查是否获取到足够的MIG设备UUID
if [ ${#UUID_ARRAY[@]} -lt 4 ]; then
  echo "Error: Not enough MIG devices found."
  exit 1
fi

# 启动容器
sudo docker run --gpus '"device="${UUID_ARRAY[0]}""' -v /mig1:/mnt/mig1 -p 8081:80 -d --name mig1_tensorrt_container nvcr.io/nvidia/pytorch:24.05-py3 tail -f /dev/null
sudo docker run --gpus '"device="${UUID_ARRAY[1]}""' -v /mig2:/mnt/mig2 -p 8082:80 -d --name mig2_tensorrt_container nvcr.io/nvidia/pytorch:24.05-py3 tail -f /dev/null
sudo docker run --gpus '"device="${UUID_ARRAY[2]}""' -v /mig3:/mnt/mig3 -p 8083:80 -d --name mig3_tensorrt_container nvcr.io/nvidia/pytorch:24.05-py3 tail -f /dev/null
sudo docker run --gpus '"device="${UUID_ARRAY[3]}""' -v /mig4:/mnt/mig4 -p 8084:80 -d --name mig4_tensorrt_container nvcr.io/nvidia/pytorch:24.05-py3 tail -f /dev/null

# 打印容器状态
sudo docker ps
sudo ufw allow 8081
sudo ufw allow 8082
sudo ufw allow 8083
sudo ufw allow 8084
sudo ufw reload

Make sure the container is accessible from outside.

Start listener 80 in the container:

root@david1a100:~# sudo docker exec -it mig1_tensorrt_container /bin/bash
root@b6abf5bf48ae:/workspace# python3 -m http.server 80
Serving HTTP on 0.0.0.0 port 80 (http://0.0.0.0:80/) ...
167.220.233.184 - - [23/Aug/2024 10:54:47] "GET / HTTP/1.1" 200 -

curl on my laptop:

(base) PS C:\Users\xinyuwei> curl http://20.5.**.**:8081

StatusCode : 200
StatusDescription : OK
Content : http://www.w3.org/TR/html4/strict.dtd">

Directory listing fo...<br/>RawContent : HTTP/1.0 200 OK<br/>Content-Length: 594<br/>Content-Type: text/html; charset=utf-8<br/>Date: Fri, 23 Aug 2024 10:54:47 GMT<br/>Server: SimpleHTTP/0.6 Python/3.10.12

Ping google.com from the container.

root@david1a100:~#sudo docker exec -it mig1_tensorrt_container /bin/bash
root@b6abf5bf48ae:/workspace# pip install ping3
root@b6abf5bf48ae:/workspace# ping3 www.google.com
ping 'www.google.com' ... 2ms
ping 'www.google.com' ... 1ms
ping 'www.google.com' ... 1ms
ping 'www.google.com' ... 1ms
Related useful commands.

Perform SD inference tests in containers.

Check the tensorrt version in the container:

root@david1a100:/workspace# pip show tensorrt
Name: tensorrt
Version: 10.2.0
Summary: A high performance deep learning inference library
Home-page: https://developer.nvidia.com/tensorrt
Author: NVIDIA Corporation
Author-email:
License: Proprietary
Location: /usr/local/lib/python3.10/dist-packages
Requires:
Required-by:

Perform SD testing via github example in container.

git clone --branch release/10.2 --single-branch https://github.com/NVIDIA/TensorRT.git 
cd TensorRT/demo/Diffusion
pip3 install -r requirements.txt

In the test, an image of size 1024*1024 is generated.

python3 demo_txt2img.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN

We can check the image generation speed in several ways:

In a MIG1 container with 2 GPCs and 20G memory:

On a mig4 container with 2 GPCs and 20G memory:

The output image is as follows. Copy it to your VM and download it.

#cp ./output/* /mig1

Comparing Int8 inference speed and quality on H100 GPU

To see the effectiveness of int8, we tested Stable Diffusion XL1.0 on a single H100. NVIDIA claims that INT8 on H100 is more optimized than A100.

#python3 demo_txt2img_xl.py "a photo of an astronaut riding a horse on mars" --hf-token=$HF_TOKEN --version=xl-1.0

Image generation effects:

Using SDXL and INT8 AMMO quantization:

python3 demo_txt2img_xl.py "a photo of an astronaut riding a horse on mars" --version xl-1.0 --onnx-dir onnx-sdxl --engine-dir engine-sdxl --int8

When you run the above command, 8-bit quantization of the model is performed first.

Building TensorRT engine for onnx/unetxl-int8.l2.5.bs2.s30.c32.p1.0.a0.8.opt/model.onnx: engine/unetxl-int8.l2.5.bs2.s30.c32.p1.0.a0.8.trt10.0.1.plan

Then make an inference

Check the generated image:

You can see that the quality of the generated images is the same and the file size is almost the same.

We can see that INT8 inference speed is increased by 20% compared to FP16.

Source link

E2E Stable Diffusion on A100 MIG

Composition MIG

Maintains MIG configuration.

Preparing the MIG container environment

Perform SD inference tests in containers.

Comparing Int8 inference speed and quality on H100 GPU

Our Company

About Links

Useful Links

Newsletter

Laest News

End-to-end Stable Diffusion test on Azure NC A100/H100 MIG

E2E Stable Diffusion on A100 MIG

Composition MIG

Maintains MIG configuration.

Preparing the MIG container environment

Perform SD inference tests in containers.

Comparing Int8 inference speed and quality on H100 GPU

Explore Exciting Receptionist Job Opportunities at TIPS-G ALWAR in Alwar Today

Microsoft 365 Insider program FAQ

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Laest News