# deep-visual-geo-localization-benchmark **Repository Path**: codepool_admin/deep-visual-geo-localization-benchmark ## Basic Information - **Project Name**: deep-visual-geo-localization-benchmark - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-08-07 - **Last Updated**: 2024-08-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Deep Visual Geo-localization Benchmark This is the official repository for the CVPR 2022 Oral paper [Deep Visual Geo-localization Benchmark](https://arxiv.org/abs/2204.03444). It can be used to reproduce results from the paper, and to compute a wide range of experiments, by changing the components of a Visual Geo-localization pipeline. ## Setup Before you begin experimenting with this toolbox, your dataset should be organized in a directory tree as such: ``` . ├── benchmarking_vg └── datasets_vg └── datasets └── pitts30k └── images ├── train │ ├── database │ └── queries ├── val │ ├── database │ └── queries └── test ├── database └── queries ``` The [datasets_vg](https://github.com/gmberton/datasets_vg) repo can be used to download a number of datasets. Detailed instructions on how to download datasets are in the repo. Note that many datasets are available, and _pitts30k_ is just an example. ## Running experiments ### Basic experiment For a basic experiment run `$ python3 train.py --dataset_name=pitts30k` this will train a ResNet-18 + NetVLAD on Pitts30k. The experiment creates a folder named `./logs/default/YYYY-MM-DD_HH-mm-ss`, where checkpoints are saved, as well as an `info.log` file with training logs and other information, such as model size, FLOPs and descriptors dimensionality. ### Architectures and mining You can replace the backbone and the aggregation as such `$ python3 train.py --dataset_name=pitts30k --backbone=resnet50conv4 --aggregation=gem` you can easily use ResNets cropped at conv4 or conv5. #### Add a fully connected layer To add a fully connected layer of dimension 2048 to GeM pooling: `$ python3 train.py --dataset_name=pitts30k --backbone=resnet50conv4 --aggregation=gem --fc_output_dim=2048` #### Add PCA To add PCA to a NetVLAD layer just do: `$ python3 eval.py --dataset_name=pitts30k --backbone=resnet50conv4 --aggregation=netvlad --pca_dim=2048 --pca_dataset_folder=pitts30k/images/train` where _pca_dataset_folder_ points to the folder with the images used to compute PCA. In the paper we compute PCA's principal components on the train set as it showed best results. PCA is used only at test time. #### Evaluate trained models To evaluate the trained model on other datasets (this example is with the St Lucia dataset), simply run `$ python3 eval.py --backbone=resnet50conv4 --aggregation=gem --resume=logs/default/YYYY-MM-DD_HH-mm-ss/best_model.pth --dataset_name=st_lucia` #### Reproduce the results Finally, to reproduce our results, use the appropriate mining method: _full_ for _pitts30k_ and _partial_ for _msls_ as such: `$ python3 train.py --dataset_name=pitts30k --mining=full` As simple as this, you can replicate all results from tables 3, 4, 5 of the main paper, as well as tables 2, 3, 4 of the supplementary. ### Resize To resize the images simply pass the parameters _resize_ with the target resolution. For example, 80% of resolution to the full _pitts30k_ images, would be 384, 512, because the full images are 480, 640: `$ python3 train.py --dataset_name=pitts30k --resize=384 512` ### Query pre/post-processing and predictions refinement We gather all such methods under the _test_method_ parameter. The available methods are _hard_resize_, _single_query_, _central_crop_, _five_crops_mean_, _nearest_crop_ and _majority_voting_. Although _hard_resize_ is the default, in most datasets it doesn't apply any transformation at all (see the paper for more information), because all images have the same resolution. `$ python3 eval.py --resume=logs/default/YYYY-MM-DD_HH-mm-ss/best_model.pth --dataset_name=tokyo247 --test_method=nearest_crop` ### Data augmentation You can reproduce all data augmentation techniques from the paper with simple commands, for example: `$ python3 train.py --dataset_name=pitts30k --horizontal_flipping --saturation 2 --brightness 1` ### Off-the-shelf models trained on Landmark Recognition datasets The code allows to automatically download and use models trained on Landmark Recognition datasets from popular repositories: [radenovic](https://github.com/filipradenovic/cnnimageretrieval-pytorch) and [naver](https://github.com/naver/deep-image-retrieval). These repos offer ResNets-50/101 with GeM and FC 2048 trained on such datasets, and can be used as such: `$ python eval.py --off_the_shelf=radenovic_gldv1 --l2=after_pool --backbone=r101l4 --aggregation=gem --fc_output_dim=2048` `$ python eval.py --dataset_name=pitts30k --off_the_shelf=naver --l2=none --backbone=r101l4 --aggregation=gem --fc_output_dim=2048` ### Using pretrained networks on other datasets Check out our [pretrain_vg](https://github.com/rm-wu/pretrain_vg) repo which we use to train such models. You can automatically download and train on those models as such `$ python train.py --dataset_name=pitts30k --pretrained=places` ### Changing the threshold distance You can use a different distance than the default 25 meters as simply as this (for example to 100 meters): `$ python3 eval.py --resume=logs/default/YYYY-MM-DD_HH-mm-ss/best_model.pth --val_positive_dist_threshold=100` ### Changing the recall values (R@N) By default the toolbox computes recalls@ 1, 5, 10, 20, but you can compute other recalls as such: `$ python3 eval.py --resume=logs/default/YYYY-MM-DD_HH-mm-ss/best_model.pth --recall_values 1 5 10 15 20 50 100` ### Model Zoo We are currently exploring hosting options, so this is a partial list of models. More models will be added soon!!
Pretrained models with different backbones
Pretained networks employing different backbones.

Model Training on Pitts30k Training on MSLS
Pitts30k (R@1) MSLS (R@1) Download Pitts30k (R@1) MSLS (R@1) Download
vgg16-gem 78.5 43.4 [Link] 70.2 66.7 [Link]
resnet18-gem 77.8 35.3 [Link] 71.6 65.3 [Link]
resnet50-gem 82.0 38.0 [Link] 77.4 72.0 [Link]
resnet101-gem 82.4 39.6 [Link] 77.2 72.5 [Link]
ViT(224)-CLS _ _ _ 80.4 69.3 [Link]
vgg16-netvlad 83.2 50.9 [Link] 79.0 74.6 [Link]
resnet18-netvlad 86.4 47.4 [Link] 81.6 75.8 [Link]
resnet50-netvlad 86.0 50.7 [Link] 80.9 76.9 [Link]
resnet101-netvlad 86.5 51.8 [Link] 80.8 77.7 [Link]
cct384-netvlad 85.0 52.5 [Link] 80.3 85.1 [Link]
Pretrained models with different aggregation methods
Pretrained networks trained using different aggregation methods.

Model Training on Pitts30k (R@1) Training on MSLS (R@1)
Pitts30k (R@1) MSLS (R@1) Download Pitts30k (R@1) MSLS (R@1) Download
resnet50-gem 82.0 38.0 [Link] 77.4 72.0 [Link]
resnet50-gem-fc2048 80.1 33.7 [Link] 79.2 73.5 [Link]
resnet50-gem-fc65536 80.8 35.8 [Link] 79.0 74.4 [Link]
resnet50-netvlad 86.0 50.7 [Link] 80.9 76.9 [Link]
resnet50-crn 85.8 54.0 [Link] 80.8 77.8 [Link]
Pretrained models with different mining methods
Pretained networks trained using three different mining methods (random, full database mining and partial database mining):

Model Training on Pitts30k (R@1) Training on MSLS (R@1)
Pitts30k (R@1) MSLS (R@1) Download Pitts30k (R@1) MSLS (R@1) Download
resnet18-gem-random 73.7 30.5 [Link] 62.2 50.6 [Link]
resnet18-gem-full 77.8 35.3 [Link] 70.161.8 [Link]
resnet18-gem-partial 76.5 34.2 [Link] 71.6 65.3 [Link]
resnet18-netvlad-random 83.9 43.6 [Link] 73.3 61.5 [Link]
resnet18-netvlad-full 86.4 47.4 [Link] -- -
resnet18-netvlad-partial 86.2 47.3 [Link] 81.6 75.8 [Link]
resnet50-gem-random 77.9 34.3 [Link] 69.5 57.4 [Link]
resnet50-gem-full 82.0 38.0 [Link] 77.3 69.7 [Link]
resnet50-gem-partial 82.3 39.0 [Link] 77.4 72.0 [Link]
resnet50-netvlad-random 83.4 45.0 [Link] 74.9 63.6 [Link]
resnet50-netvlad-full 86.0 50.7 [Link] -- -
resnet50-netvlad-partial 85.5 48.6 [Link] 80.9 76.9 [Link]
If you find our work useful in your research please consider citing our paper: ``` @inProceedings{Berton_CVPR_2022_benchmark, author = {Berton, Gabriele and Mereu, Riccardo and Trivigno, Gabriele and Masone, Carlo and Csurka, Gabriela and Sattler, Torsten and Caputo, Barbara}, title = {Deep Visual Geo-localization Benchmark}, booktitle = {CVPR}, month = {June}, year = {2022}, } ``` ## Acknowledgements Parts of this repo are inspired by the following great repositories: - [NetVLAD's original code](https://github.com/Relja/netvlad) (in MATLAB) - [NetVLAD layer in PyTorch](https://github.com/lyakaap/NetVLAD-pytorch) - [NetVLAD training in PyTorch](https://github.com/Nanne/pytorch-NetVlad/) - [GeM layer](https://github.com/filipradenovic/cnnimageretrieval-pytorch) - [Deep Image Retrieval](https://github.com/naver/deep-image-retrieval) - [Mapillary Street-level Sequences](https://github.com/mapillary/mapillary_sls) - [Compact Convolutional Transformers](https://github.com/SHI-Labs/Compact-Transformers) Check out also our other repo [_CosPlace_](https://github.com/gmberton/CosPlace), from the CVPR 2022 paper "Rethinking Visual Geo-localization for Large-Scale Applications", which provides a new SOTA in visual geo-localization / visual place recognition.