1. Install dependency libraries and prepare data
Before starting training, install corresponding dependency libraries, such as PyTorch, CUDA, CUDNN, etc., and prepare training data. For details about how to perform these operations, see DDRNet official documents.
2. Modify the configuration file. You need to specify some parameters in the configuration file, such as data set path, learning rate, batch size, and number of training rounds. During single-card training, batch_size should be set to a smaller value to adapt to the video memory of a single card. In addition, you need to set num_gpus to 1 to indicate that only one video card will be used. Here is the code for a sample configuration file, where you need to take care to modify parameters such as the data set path:
# train config DATASET: ROOT: './data' NAME: 'cityscapes' TRAIN_SET: 'train' TEST_SET: 'val' INPUT_SIZE: [769, 769] BASE_SIZE: 769 SCALE_FACTOR: [0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0] CROP_SIZE: [769, 769] IGNORE_LABEL: 255 MEAN: [0.485, 0.456, 0.406] STD: [0.229, 0.224, 0.225] NUM_CLASSES: 19 TRAIN: MULTI_SCALE: True FLIP: True IGNORE_LABEL: 255 BASE_SIZE: 769 CROP_SIZE: [769, 769] BATCH_SIZE_PER_GPU: 2 NUM_WORKERS: 4 LEARNING_RATE: 0.001 MOMENTUM: 0.9 WEIGHT_DECAY: 0.0001 POWER: 0.9 MAX_ITER: 80000 WARMUP_STEPS: 1000 WARMUP_FACTOR: 1.0 / 3.0 SAVE_PRED_EVERY: 10000 SNAPSHOT_DIR: './snapshots/' LOG_DIR: './logs/' DISPLAY_INTERVAL: 20 GPU_ID: 0 NUM_GPUS: 1
3. Modify the train.py file. The
DDRNet training script is changed to the train.py file. In the train.py file, add the following code:
import torch.distributed as dist from torch.nn.parallel import DistributedDataParallel # Add this line to initialize the process group dist.init_process_group('nccl', init_method='env://') # Modify the model instantiation to use DistributedDataParallel model = DDRNet(num_classes=args.num_classes, pretrained=True) model = DistributedDataParallel(model.cuda())
These changes are intended to initialize process groups and allel the model to use DistributedDataParallel data. 10.