Tencent-Hunyuan
/

HYDiT-ControlNet-v1.2

English

Model card Files Files and versions

xet

Community

Zhiminli commited on Jul 9, 2024

Commit

b156b54

verified ·

1 Parent(s): 256e41e

Update README.md

Browse files

Files changed (1) hide show

README.md +29 -31

README.md CHANGED Viewed

@@ -10,7 +10,8 @@ language:
 ### Instructions
- The dependencies and installation are basically the same as the [**base model**](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.1).
  We provide three types of ControlNet weights for you to test: canny, depth and pose ControlNet.
@@ -24,7 +25,7 @@ huggingface-cli download Tencent-Hunyuan/HYDiT-ControlNet-v1.2 --local-dir ./ckp
 huggingface-cli download Tencent-Hunyuan/Distillation-v1.2 ./pytorch_model_distill.pt --local-dir ./ckpts/t2i/model
 # Quick start
-python3 sample_controlnet.py  --no-enhance --load-key distill --infer-steps 50 --control-type canny --prompt "在夜晚的酒店门前，一座古老的中国风格的狮子雕像矗立着，它的眼睛闪烁着光芒，仿佛在守护着这座建筑。背景是夜晚的酒店前，构图方式是特写，平视，居中构图。这张照片呈现了真实摄影风格，蕴含了中国雕塑文化，同时展现了神秘氛围" --condition-image-path controlnet/asset/input/canny.jpg --control-weight 1.0 --infer-mode fa
 ```
 Examples of condition input and ControlNet results are as follows:
@@ -86,33 +87,29 @@ We provide three types of weights for ControlNet training, `ema`, `module` and `
 Here is an example, we load the `distill` weights into the main model and conduct ControlNet training.
-If you want to load the `module` weights into the main model, just remove the `--ema-to-module` parameter.
 If apply multiple resolution training, you need to add the `--multireso` and `--reso-step 64` parameter.
 ```bash
-task_flag="canny_controlnet"                                # task flag is used to identify folders.
 control_type=canny
-resume=./ckpts/t2i/model/                                    # checkpoint root for resume
-index_file=path/to/your/index_file
-results_dir=./log_EXP                                        # save root for results
-batch_size=1                                                 # training batch size
-image_size=1024                                              # training image resolution
-grad_accu_steps=2                                            # gradient accumulation
-warmup_num_steps=0                                           # warm-up steps
-lr=0.0001                                                    # learning rate
-ckpt_every=10000                                             # create a ckpt every a few steps.
-ckpt_latest_every=5000                                       # create a ckpt named `latest.pt` every a few steps.
 sh $(dirname "$0")/run_g_controlnet.sh \
     --task-flag ${task_flag} \
     --control-type ${control_type} \
-    --noise-schedule scaled_linear --beta-start 0.00085 --beta-end 0.03 \
     --predict-type v_prediction \
-    --multireso \
-    --reso-step 64 \
-    --ema-to-module \
     --uncond-p 0.44 \
     --uncond-p-t5 0.44 \
     --index-file ${index_file} \
@@ -125,18 +122,19 @@ sh $(dirname "$0")/run_g_controlnet.sh \
     --warmup-num-steps ${warmup_num_steps} \
     --use-flash-attn \
     --use-fp16 \
-    --use-ema \
-    --ema-dtype fp32 \
     --results-dir ${results_dir} \
-    --resume-split \
-    --resume ${resume} \
     --ckpt-every ${ckpt_every} \
     --ckpt-latest-every ${ckpt_latest_every} \
     --log-every 10 \
     --deepspeed \
     --deepspeed-optimizer \
     --use-zero-stage 2 \
     "$@"
 ```
 Recommended parameter settings
@@ -154,26 +152,26 @@ You can use the following command line for inference.
 a. You can use a float to specify the weight for all layers, **or use a list to separately specify the weight for each layer**, for example, '[1.0 * (0.825 ** float(19 - i)) for i in range(19)]'
 ```bash
-python3 sample_controlnet.py  --control-weight [1.0 * (0.825 ** float(19 - i)) for i in range(19)] --no-enhance --load-key distill --infer-steps 50 --control-type canny --prompt "在夜晚的酒店门前，一座古老的中国风格的狮子雕像矗立着，它的眼睛闪烁着光芒，仿佛在守护着这座建筑。背景是夜晚的酒店前，构图方式是特写，平视，居中构图。这张照片呈现了真实摄影风格，蕴含了中国雕塑文化，同时展现了神秘氛围" --condition-image-path controlnet/asset/input/canny.jpg --infer-mode fa
 ```
 b. Using canny ControlNet during inference
 ```bash
-python3 sample_controlnet.py  --no-enhance --load-key distill --infer-steps 50 --control-type canny --prompt "在夜晚的酒店门前，一座古老的中国风格的狮子雕像矗立着，它的眼睛闪烁着光芒，仿佛在守护着这座建筑。背景是夜晚的酒店前，构图方式是特写，平视，居中构图。这张照片呈现了真实摄影风格，蕴含了中国雕塑文化，同时展现了神秘氛围" --condition-image-path controlnet/asset/input/canny.jpg --control-weight 1.0 --infer-mode fa
 ```
 c. Using depth ControlNet during inference
 ```bash
-python3 sample_controlnet.py  --no-enhance --load-key distill --infer-steps 50 --control-type depth --prompt "在茂密的森林中，一只黑白相间的熊猫静静地坐在绿树红花中，周围是山川和海洋。背景是白天的森林，光线充足。照片采用特写、平视和居中构图的方式，呈现出写实的效果" --condition-image-path controlnet/asset/input/depth.jpg --control-weight 1.0 --infer-mode fa
 ```
 d. Using pose ControlNet during inference
 ```bash
-python3 sample_controlnet.py  --no-enhance --load-key distill --infer-steps 50 --control-type pose --prompt "在白天的森林中，一位穿着绿色上衣的亚洲女性站在大象旁边。照片采用了中景、平视和居中构图的方式，呈现出写实的效果。这张照片蕴含了人物摄影文化，并展现了宁静的氛围" --condition-image-path controlnet/asset/input/pose.jpg --control-weight 1.0 --infer-mode fa
 ```
 ## HunyuanDiT Controlnet v1.1
@@ -193,7 +191,7 @@ huggingface-cli download Tencent-Hunyuan/Distillation-v1.1 ./pytorch_model_disti
 ```bash
 task_flag="canny_controlnet"                                # the task flag is used to identify folders.
 control_type=canny
-resume=./HunyuanDiT-v1.1/t2i/model/                          # checkpoint root for resume
 index_file=/path/to/your/indexfile                           # index file for dataloader
 results_dir=./log_EXP                                        # save root for results
 batch_size=1                                                 # training batch size
@@ -213,7 +211,6 @@ sh $(dirname "$0")/run_g_controlnet.sh \
     --predict-type v_prediction \
     --multireso \
     --reso-step 64 \
-    --ema-to-module \
     --uncond-p 0.44 \
     --uncond-p-t5 0.44 \
     --index-file ${index_file} \
@@ -227,8 +224,8 @@ sh $(dirname "$0")/run_g_controlnet.sh \
     --use-flash-attn \
     --use-fp16 \
     --results-dir ${results_dir} \
-    --resume-split \
-    --resume ${resume} \
     --epochs ${epochs} \
     --ckpt-every ${ckpt_every} \
     --ckpt-latest-every ${ckpt_latest_every} \
@@ -261,3 +258,4 @@ c. Using pose ControlNet during inference
 ```bash
 python3 sample_controlnet.py  --no-enhance --load-key distill --infer-steps 50 --control-type pose --prompt "一位亚洲女性，身穿绿色上衣，戴着紫色头巾和紫色围巾，站在黑板前。背景是黑板。照片采用近景、平视和居中构图的方式呈现真实摄影风格" --condition-image-path controlnet/asset/input/pose.jpg --control-weight 1.0 --use-style-cond --size-cond 1024 1024 --beta-end 0.03
 ```

 ### Instructions
+ The dependencies and installation are basically the same as the [**base model**](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2).
  We provide three types of ControlNet weights for you to test: canny, depth and pose ControlNet.
 huggingface-cli download Tencent-Hunyuan/Distillation-v1.2 ./pytorch_model_distill.pt --local-dir ./ckpts/t2i/model
 # Quick start
+python sample_controlnet.py --infer-mode fa --no-enhance --load-key distill --infer-steps 50 --control-type canny --prompt "在夜晚的酒店门前，一座古老的中国风格的狮子雕像矗立着，它的眼睛闪烁着光芒，仿佛在守护着这座建筑。背景是夜晚的酒店前，构图方式是特写，平视，居中构图。这张照片呈现了真实摄影风格，蕴含了中国雕塑文化，同时展现了神秘氛围" --condition-image-path controlnet/asset/input/canny.jpg --control-weight 1.0
 ```
 Examples of condition input and ControlNet results are as follows:
 Here is an example, we load the `distill` weights into the main model and conduct ControlNet training.
 If apply multiple resolution training, you need to add the `--multireso` and `--reso-step 64` parameter.
 ```bash
+task_flag="canny_controlnet"                                   # the task flag is used to identify folders.
 control_type=canny
+resume_module_root=./ckpts/t2i/model/pytorch_model_distill.pt  # checkpoint root for resume
+index_file=/path/to/your/indexfile                             # index file for dataloader
+results_dir=./log_EXP                                          # save root for results
+batch_size=1                                                   # training batch size
+image_size=1024                                                # training image resolution
+grad_accu_steps=2                                              # gradient accumulation
+warmup_num_steps=0                                             # warm-up steps
+lr=0.0001                                                      # learning rate
+ckpt_every=10000                                               # create a ckpt every a few steps.
+ckpt_latest_every=5000                                         # create a ckpt named `latest.pt` every a few steps.
+epochs=100                                                     # total training epochs
 sh $(dirname "$0")/run_g_controlnet.sh \
     --task-flag ${task_flag} \
     --control-type ${control_type} \
+    --noise-schedule scaled_linear --beta-start 0.00085 --beta-end 0.018 \
     --predict-type v_prediction \
     --uncond-p 0.44 \
     --uncond-p-t5 0.44 \
     --index-file ${index_file} \
     --warmup-num-steps ${warmup_num_steps} \
     --use-flash-attn \
     --use-fp16 \
     --results-dir ${results_dir} \
+    --resume \
+    --resume-module-root ${resume_module_root} \
+    --epochs ${epochs} \
     --ckpt-every ${ckpt_every} \
     --ckpt-latest-every ${ckpt_latest_every} \
     --log-every 10 \
     --deepspeed \
     --deepspeed-optimizer \
     --use-zero-stage 2 \
+    --gradient-checkpointing \
     "$@"
 ```
 Recommended parameter settings
 a. You can use a float to specify the weight for all layers, **or use a list to separately specify the weight for each layer**, for example, '[1.0 * (0.825 ** float(19 - i)) for i in range(19)]'
 ```bash
+python sample_controlnet.py --infer-mode fa --control-weight "[1.0 * (0.825 ** float(19 - i)) for i in range(19)]" --no-enhance --load-key distill --infer-steps 50 --control-type canny --prompt "在夜晚的酒店门前，一座古老的中国风格的狮子雕像矗立着，它的眼睛闪烁着光芒，仿佛在守护着这座建筑。背景是夜晚的酒店前，构图方式是特写，平视，居中构图。这张照片呈现了真实摄影风格，蕴含了中国雕塑文化，同时展现了神秘氛围" --condition-image-path controlnet/asset/input/canny.jpg
 ```
 b. Using canny ControlNet during inference
 ```bash
+python sample_controlnet.py --infer-mode fa --control-weight 1.0 --no-enhance --load-key distill --infer-steps 50 --control-type canny --prompt "在夜晚的酒店门前，一座古老的中国风格的狮子雕像矗立着，它的眼睛闪烁着光芒，仿佛在守护着这座建筑。背景是夜晚的酒店前，构图方式是特写，平视，居中构图。这张照片呈现了真实摄影风格，蕴含了中国雕塑文化，同时展现了神秘氛围" --condition-image-path controlnet/asset/input/canny.jpg
 ```
 c. Using depth ControlNet during inference
 ```bash
+python sample_controlnet.py --infer-mode fa --control-weight 1.0 --no-enhance --load-key distill --infer-steps 50 --control-type depth --prompt "在茂密的森林中，一只黑白相间的熊猫静静地坐在绿树红花中，周围是山川和海洋。背景是白天的森林，光线充足。照片采用特写、平视和居中构图的方式，呈现出写实的效果" --condition-image-path controlnet/asset/input/depth.jpg
 ```
 d. Using pose ControlNet during inference
 ```bash
+python3 sample_controlnet.py --infer-mode fa --control-weight 1.0 --no-enhance --load-key distill --infer-steps 50 --control-type pose --prompt "在白天的森林中，一位穿着绿色上衣的亚洲女性站在大象旁边。照片采用了中景、平视和居中构图的方式，呈现出写实的效果。这张照片蕴含了人物摄影文化，并展现了宁静的氛围" --condition-image-path controlnet/asset/input/pose.jpg
 ```
 ## HunyuanDiT Controlnet v1.1
 ```bash
 task_flag="canny_controlnet"                                # the task flag is used to identify folders.
 control_type=canny
+resume_module_root=./ckpts/t2i/model/pytorch_model_distill.pt  # checkpoint root for resume
 index_file=/path/to/your/indexfile                           # index file for dataloader
 results_dir=./log_EXP                                        # save root for results
 batch_size=1                                                 # training batch size
     --predict-type v_prediction \
     --multireso \
     --reso-step 64 \
     --uncond-p 0.44 \
     --uncond-p-t5 0.44 \
     --index-file ${index_file} \
     --use-flash-attn \
     --use-fp16 \
     --results-dir ${results_dir} \
+    --resume \
+    --resume-module-root ${resume_module_root} \
     --epochs ${epochs} \
     --ckpt-every ${ckpt_every} \
     --ckpt-latest-every ${ckpt_latest_every} \
 ```bash
 python3 sample_controlnet.py  --no-enhance --load-key distill --infer-steps 50 --control-type pose --prompt "一位亚洲女性，身穿绿色上衣，戴着紫色头巾和紫色围巾，站在黑板前。背景是黑板。照片采用近景、平视和居中构图的方式呈现真实摄影风格" --condition-image-path controlnet/asset/input/pose.jpg --control-weight 1.0 --use-style-cond --size-cond 1024 1024 --beta-end 0.03
 ```