From 113dd2f048cff5d740c7abdd7294ae94ac289e17 Mon Sep 17 00:00:00 2001 From: majorli Date: Fri, 2 Dec 2022 02:33:32 +0000 Subject: [PATCH] fix ncf and dlrm model readme issue link #I63WEH Signed-off-by: majorli --- .../ncf/pytorch/README.md | 21 ++-- recommendation/ctr/dlrm/pytorch/README.md | 108 +++++++++--------- 2 files changed, 66 insertions(+), 63 deletions(-) diff --git a/recommendation/collaborative_filtering/ncf/pytorch/README.md b/recommendation/collaborative_filtering/ncf/pytorch/README.md index 5d405b80b..2edb294c5 100644 --- a/recommendation/collaborative_filtering/ncf/pytorch/README.md +++ b/recommendation/collaborative_filtering/ncf/pytorch/README.md @@ -7,20 +7,24 @@ By replacing the inner product with a neural architecture that can learn an arbi ## Step 1: Installing packages -``` +```shell pip3 install -r requirements.txt ``` ## Step 2: Preparing datasets -dataset is movielens +Dataset is movielens -cd /modelzoo/recommendation/collaborative_filtering/ncf/pytorch/ -``` -bash download_dataset.sh +```shell +# Download dataset +mkdir -p data/ +wget http://files.grouplens.org/datasets/movielens/ml-20m.zip -P data/ + +# Unzip +unzip data/ml-20m.zip -d data/ -# convert +# Convert python3 convert.py --path ./data/ml-20m/ratings.csv --output ./data/ml-20m ``` @@ -29,13 +33,14 @@ python3 convert.py --path ./data/ml-20m/ratings.csv --output ./data/ml-20m ### Multiple GPUs on one machine -``` +```shell export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash run_train_fp32.sh ``` ### Multiple GPUs on one machine (AMP) -``` + +```shell export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 # fp16 train bash run_train_fp16.sh diff --git a/recommendation/ctr/dlrm/pytorch/README.md b/recommendation/ctr/dlrm/pytorch/README.md index 9e73d923d..0fe8df516 100644 --- a/recommendation/ctr/dlrm/pytorch/README.md +++ b/recommendation/ctr/dlrm/pytorch/README.md @@ -1,55 +1,53 @@ -# DLRM - -## Model description - -With the advent of deep learning, neural network-based recommendation models have emerged as an important tool for tackling personalization and recommendation tasks. These networks differ significantly from other deep learning networks due to their need to handle categorical features and are not well studied or understood. In this paper, we develop a state-of-the-art deep learning recommendation model (DLRM) and provide its implementation in both PyTorch and Caffe2 frameworks. In addition, we design a specialized parallelization scheme utilizing model parallelism on the embedding tables to mitigate memory constraints while exploiting data parallelism to scale-out compute from the fully-connected layers. We compare DLRM against existing recommendation models and characterize its performance on the Big Basin AI platform, demonstrating its usefulness as a benchmark for future algorithmic experimentation and system co-design. - -## Step 1: Installing packages - -```shell -$ cd ../../ -$ pip3 install -r requirements.txt && python3 ./setup.py install -``` - - -## Step 2: Preparing datasets - -Criteo_Terabyte consists of 23 days data, as it is very large, here only take 3 days data for an example. - -```shell -$ cd modelzoo/recommendation/ctr/dlrm/pytorch/dlrm/data -$ bash download_and_preprocess.sh -``` - -After above steps, can get files: terabyte_processed_test.bin, terabyte_processed_train.bin, terabyte_processed_val.bin . - - - -## Step 3: Training - -### On single GPU - -```shell -$ python3 -u scripts/train.py --model_config dlrm/config/official_config.json --dataset /home/datasets/recommendation/Criteo_Terabyte --lr 0.1 --warmup_steps 2750 --decay_end_lr 0 --decay_steps 27772 --decay_start_step 49315 --batch_size 2048 --epochs 5 |& tee 1card.txt -``` - -### Multiple GPUs on one machine - -```shell -$ python3 -u -m torch.distributed.launch --nproc_per_node=8 --use_env scripts/dist_train.py --model_config dlrm/config/official_config.json --dataset /home/datasets/recommendation/Criteo_Terabyte --lr 0.1 --warmup_steps 2750 --decay_end_lr 0 --decay_steps 27772 --decay_start_step 49315 --batch_size 2048 --epochs 5 |& tee 8cards.txt -``` - -## Results on BI-V100 - -| GPUs | FPS | AUC | -| ---- | ------ | ---- | -| 1x1 | 196958 | N/A | -| 1x8 | 346555 | 0.75 | - -| Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power(W) | Scalability | Memory utilization(G) | Stability | -| -------------------- | ---------------------------------------- | ----------- | -------- | ---------- | ----------- | ----------------------- | --------- | -| AUC:0.75 | SDK V2.2,bs:2048,8x,AMP | 793486 | 0.75 | 60\*8 | 0.97 | 3.7\*8 | 1 | - - -## Reference -https://github.com/mlcommons/training_results_v0.7/tree/master/NVIDIA/benchmarks/dlrm/implementations/pytorch +# DLRM + +## Model description + +With the advent of deep learning, neural network-based recommendation models have emerged as an important tool for tackling personalization and recommendation tasks. These networks differ significantly from other deep learning networks due to their need to handle categorical features and are not well studied or understood. In this paper, we develop a state-of-the-art deep learning recommendation model (DLRM) and provide its implementation in both PyTorch and Caffe2 frameworks. In addition, we design a specialized parallelization scheme utilizing model parallelism on the embedding tables to mitigate memory constraints while exploiting data parallelism to scale-out compute from the fully-connected layers. We compare DLRM against existing recommendation models and characterize its performance on the Big Basin AI platform, demonstrating its usefulness as a benchmark for future algorithmic experimentation and system co-design. + +## Step 1: Installing packages + +```shell +pip3 install -r requirements.txt && python3 ./setup.py install +``` + +## Step 2: Preparing datasets + +Criteo_Terabyte consists of 23 days data, as it is very large, here only take 3 days data for an example. + +```shell +# download data +cd dlrm/data/ +bash download_and_preprocess.sh +``` + +After above steps, can get files: terabyte_processed_test.bin, terabyte_processed_train.bin, terabyte_processed_val.bin. + + +## Step 3: Training + +### On single GPU + +```shell +python3 -u scripts/train.py --model_config dlrm/config/official_config.json --dataset /home/datasets/recommendation/Criteo_Terabyte --lr 0.1 --warmup_steps 2750 --decay_end_lr 0 --decay_steps 27772 --decay_start_step 49315 --batch_size 2048 --epochs 5 |& tee 1card.txt +``` + +### Multiple GPUs on one machine + +```shell +python3 -u -m torch.distributed.launch --nproc_per_node=8 --use_env scripts/dist_train.py --model_config dlrm/config/official_config.json --dataset /home/datasets/recommendation/Criteo_Terabyte --lr 0.1 --warmup_steps 2750 --decay_end_lr 0 --decay_steps 27772 --decay_start_step 49315 --batch_size 2048 --epochs 5 |& tee 8cards.txt +``` + +## Results on BI-V100 + +| GPUs | FPS | AUC | +| ---- | ------ | ---- | +| 1x1 | 196958 | N/A | +| 1x8 | 346555 | 0.75 | + +| Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power(W) | Scalability | Memory utilization(G) | Stability | +| -------------------- | ---------------------------------------- | ----------- | -------- | ---------- | ----------- | ----------------------- | --------- | +| AUC:0.75 | SDK V2.2,bs:2048,8x,AMP | 793486 | 0.75 | 60\*8 | 0.97 | 3.7\*8 | 1 | + + +## Reference +https://github.com/mlcommons/training_results_v0.7/tree/master/NVIDIA/benchmarks/dlrm/implementations/pytorch -- Gitee