End-to-end Stable Diffusion test on Azure NC A100/H100 MIG by info.odysseyx@gmail.com August 24, 2024 written by info.odysseyx@gmail.com August 24, 2024 0 comment 8 views 8 Please follow and rate my GitHub repository.https://github.com/xinyuwei-david/david-share.git E2E Stable Diffusion on A100 MIG A100/H100 is a high-end training GPU, which can also be used for inference. To save computing power and GPU memory, you can use NVIDIA Multi-Instance GPU (MIG), and then run Stable Diffusion on MIG.We are testing on Azure NC A100 VM. Composition MIG Enable MIG on the first physical GPU. root@david1a100:~# nvidia-smi -i 0 -mig 1 After rebooting the VM, MIG was enabled. Lists all available GPU MIG profiles. #nvidia-smi mig -lgip At this point, we need to calculate how to best utilize GPU resources to meet the compute power and GPU memory requirements for SD. Divide the A100 into four parts: ID 14×3 and ID 20×1. root@david1a100:~# sudo nvidia-smi mig -cgi 14,14,14,20 -C Successfully created GPU instance ID 5 on GPU 0 using profile MIG 2g.20gb (ID 14) Successfully created compute instance ID 0 on GPU 0 GPU instance ID 5 using profile MIG 2g.20gb (ID 1) Successfully created GPU instance ID 3 on GPU 0 using profile MIG 2g.20gb (ID 14) Successfully created compute instance ID 0 on GPU 0 GPU instance ID 3 using profile MIG 2g.20gb (ID 1) Successfully created GPU instance ID 4 on GPU 0 using profile MIG 2g.20gb (ID 14) Successfully created compute instance ID 0 on GPU 0 GPU instance ID 4 using profile MIG 2g.20gb (ID 1) Successfully created GPU instance ID 13 on GPU 0 using profile MIG 1g.10gb+me (ID 20) Successfully created compute instance ID 0 on GPU 0 GPU instance ID 13 using profile MIG 1g.10gb (ID 0) Maintains MIG configuration. You need to set up a bash script because the CPU MIG configuration will be lost when you reboot the VM. #vi /usr/local/bin/setup_mig.sh !/bin/bash nvidia-smi -i 0 -mig 1 sudo nvidia-smi mig -dgi sudo nvidia-smi mig -cgi 14,14,14,20 -C Grant execute permission: chmod +x /usr/local/bin/setup_mig.sh Create a system service. vi /etc/systemd/system/setup_mig.service [Unit] Description=Setup NVIDIA MIG Instances After=default.target [Service] Type=oneshot ExecStart=/usr/local/bin/setup_mig.sh [Install] WantedBy=default.target Activate and start setup_mig.service. sudo systemctl daemon-reload sudo systemctl enable setup_mig.servicesudo systemctl status setup_mig.service Preparing the MIG container environment Installing Docker and NVIDIA Container Toolkit on VMs sudo apt-get update sudo apt-get install -y docker.io sudo apt-get install -y aptitude distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update sudo aptitude install -y nvidia-docker2 sudo systemctl restart docker sudo aptitude install -y nvidia-container-toolkit sudo systemctl restart docker Configuring container creation scripts in VMs #vi createcontainer.sh #!/bin/bash # 容器名称数组 CONTAINER_NAMES=("mig1_tensorrt_container" "mig2_tensorrt_container" "mig3_tensorrt_container" "mig4_tensorrt_container") # 删除已有的容器 for CONTAINER in "${CONTAINER_NAMES[@]}"; do if [ "$(sudo docker ps -a -q -f name=$CONTAINER)" ]; then echo "Stopping and removing container: $CONTAINER" sudo docker stop $CONTAINER sudo docker rm $CONTAINER fi done # 获取MIG设备的UUID MIG_UUIDS=$(nvidia-smi -L | grep 'MIG' | awk -F 'UUID: ' '{print $2}' | awk -F ')' '{print $1}') UUID_ARRAY=($MIG_UUIDS) # 检查是否获取到足够的MIG设备UUID if [ ${#UUID_ARRAY[@]} -lt 4 ]; then echo "Error: Not enough MIG devices found." exit 1 fi # 启动容器 sudo docker run --gpus '"device="${UUID_ARRAY[0]}""' -v /mig1:/mnt/mig1 -p 8081:80 -d --name mig1_tensorrt_container nvcr.io/nvidia/pytorch:24.05-py3 tail -f /dev/null sudo docker run --gpus '"device="${UUID_ARRAY[1]}""' -v /mig2:/mnt/mig2 -p 8082:80 -d --name mig2_tensorrt_container nvcr.io/nvidia/pytorch:24.05-py3 tail -f /dev/null sudo docker run --gpus '"device="${UUID_ARRAY[2]}""' -v /mig3:/mnt/mig3 -p 8083:80 -d --name mig3_tensorrt_container nvcr.io/nvidia/pytorch:24.05-py3 tail -f /dev/null sudo docker run --gpus '"device="${UUID_ARRAY[3]}""' -v /mig4:/mnt/mig4 -p 8084:80 -d --name mig4_tensorrt_container nvcr.io/nvidia/pytorch:24.05-py3 tail -f /dev/null # 打印容器状态 sudo docker ps sudo ufw allow 8081 sudo ufw allow 8082 sudo ufw allow 8083 sudo ufw allow 8084 sudo ufw reload Make sure the container is accessible from outside. Start listener 80 in the container: root@david1a100:~# sudo docker exec -it mig1_tensorrt_container /bin/bashroot@b6abf5bf48ae:/workspace# python3 -m http.server 80Serving HTTP on 0.0.0.0 port 80 (http://0.0.0.0:80/) ...167.220.233.184 - - [23/Aug/2024 10:54:47] "GET / HTTP/1.1" 200 - curl on my laptop: (base) PS C:\Users\xinyuwei> curl http://20.5.**.**:8081 StatusCode : 200StatusDescription : OKContent : http://www.w3.org/TR/html4/strict.dtd">Directory listing fo...RawContent : HTTP/1.0 200 OKContent-Length: 594Content-Type: text/html; charset=utf-8Date: Fri, 23 Aug 2024 10:54:47 GMTServer: SimpleHTTP/0.6 Python/3.10.12 Ping google.com from the container. root@david1a100:~#sudo docker exec -it mig1_tensorrt_container /bin/bashroot@b6abf5bf48ae:/workspace# pip install ping3root@b6abf5bf48ae:/workspace# ping3 www.google.comping 'www.google.com' ... 2msping 'www.google.com' ... 1msping 'www.google.com' ... 1msping 'www.google.com' ... 1msRelated useful commands. Perform SD inference tests in containers. Check the tensorrt version in the container: root@david1a100:/workspace# pip show tensorrtName: tensorrtVersion: 10.2.0Summary: A high performance deep learning inference libraryHome-page: https://developer.nvidia.com/tensorrtAuthor: NVIDIA CorporationAuthor-email:License: ProprietaryLocation: /usr/local/lib/python3.10/dist-packagesRequires:Required-by: Perform SD testing via github example in container. git clone --branch release/10.2 --single-branch https://github.com/NVIDIA/TensorRT.git cd TensorRT/demo/Diffusionpip3 install -r requirements.txt In the test, an image of size 1024*1024 is generated. python3 demo_txt2img.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN We can check the image generation speed in several ways: In a MIG1 container with 2 GPCs and 20G memory: On a mig4 container with 2 GPCs and 20G memory: The output image is as follows. Copy it to your VM and download it. #cp ./output/* /mig1 Comparing Int8 inference speed and quality on H100 GPU To see the effectiveness of int8, we tested Stable Diffusion XL1.0 on a single H100. NVIDIA claims that INT8 on H100 is more optimized than A100. #python3 demo_txt2img_xl.py "a photo of an astronaut riding a horse on mars" --hf-token=$HF_TOKEN --version=xl-1.0 Image generation effects: Using SDXL and INT8 AMMO quantization: python3 demo_txt2img_xl.py "a photo of an astronaut riding a horse on mars" --version xl-1.0 --onnx-dir onnx-sdxl --engine-dir engine-sdxl --int8 When you run the above command, 8-bit quantization of the model is performed first. Building TensorRT engine for onnx/unetxl-int8.l2.5.bs2.s30.c32.p1.0.a0.8.opt/model.onnx: engine/unetxl-int8.l2.5.bs2.s30.c32.p1.0.a0.8.trt10.0.1.plan Then make an inference Check the generated image: You can see that the quality of the generated images is the same and the file size is almost the same. We can see that INT8 inference speed is increased by 20% compared to FP16. Source link Share 0 FacebookTwitterPinterestEmail info.odysseyx@gmail.com previous post Explore Exciting Receptionist Job Opportunities at TIPS-G ALWAR in Alwar Today next post Microsoft 365 Insider program FAQ You may also like 7 Disturbing Tech Trends of 2024 December 19, 2024 AI on phones fails to impress Apple, Samsung users: Survey December 18, 2024 Standout technology products of 2024 December 16, 2024 Is Intel Equivalent to Tech Industry 2024 NY Giant? December 12, 2024 Google’s Willow chip marks breakthrough in quantum computing December 11, 2024 Job seekers are targeted in mobile phishing campaigns December 10, 2024 Leave a Comment Cancel Reply Save my name, email, and website in this browser for the next time I comment.