From 28ce806f55fd6ebfdaf42a6da6e01b4443c01bbe Mon Sep 17 00:00:00 2001 From: Victor Escorcia Date: Fri, 12 Jan 2024 10:27:15 +0000 Subject: [PATCH] Update README.md Community drive contributions: SwiftFormer meets Android. Qualcomm S8G2 DSP/HTP hardware, via Qualcomm tooling (QNN). Details in #14. Work done by @3scorciav . Refer to his fork for details. --- README.md | 40 +++++++++++++++++++++++++++++++--------- 1 file changed, 31 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index d8f5688..8d04ed9 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # SwiftFormer ### **SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications** - + ![](https://i.imgur.com/waxVImv.png) [Abdelrahman Shaker](https://scholar.google.com/citations?hl=en&user=eEz4Wu4AAAAJ)*1, [Muhammad Maaz](https://scholar.google.com/citations?user=vTy9Te8AAAAJ&hl=en&authuser=1&oi=sra)1, [Hanoona Rasheed](https://scholar.google.com/citations?user=yhDdEuEAAAAJ&hl=en&authuser=1&oi=sra)1, [Salman Khan](https://salman-h-khan.github.io/)1, [Ming-Hsuan Yang](https://scholar.google.com/citations?user=p9-ohHsAAAAJ&hl=en)2,3 and [Fahad Shahbaz Khan](https://scholar.google.es/citations?user=zvaeYnUAAAAJ&hl=en)1,4 @@ -60,14 +60,36 @@ Self-attention has become a defacto choice for capturing global context in vario

-## Latency Measurement +## Latency Measurement The latency reported in SwiftFormer for iPhone 14 (iOS 16) uses the benchmark tool from [XCode 14](https://developer.apple.com/videos/play/wwdc2022/10027/). -## ImageNet +### SwiftFormer meets Android + +Community-driven results with [Samsung Galaxy S23 Ultra, with Qualcomm Snapdragon 8 Gen 2](https://www.qualcomm.com/snapdragon/device-finder/samsung-galaxy-s23-ultra): + +1. [Export](https://github.com/escorciav/SwiftFormer/blob/main-v/export.py) & profiler results of [`SwiftFormer_L1`](./models/swiftformer.py): + + | QNN | 2.16 | 2.17 | 2.18 | + | -------------- | -----| ----- | ------ | + | Latency (msec) | 2.63 | 2.26 | 2.43 | + +2. [Export](https://github.com/escorciav/SwiftFormer/blob/main-v/export_block.py) & profiler results of SwiftFormerEncoder block: + + | QNN | 2.16 | 2.17 | 2.18 | + | -------------- | -----| ----- | ------ | + | Latency (msec) | 2.17 | 1.69 | 1.7 | + + Refer to script above for details of the input & block parameters. + +❓ _Interested in reproducing the results above?_ + +Refer to [Issue #14](https://github.com/Amshaker/SwiftFormer/issues/14) for details about [exporting & profiling.](https://github.com/Amshaker/SwiftFormer/issues/14#issuecomment-1883351728) + +## ImageNet ### Prerequisites -`conda` virtual environment is recommended. +`conda` virtual environment is recommended. ```shell conda create --name=swiftformer python=3.9 @@ -89,7 +111,7 @@ Download and extract ImageNet train and val images from http://image-net.org. Th ### Single machine multi-GPU training -We provide training script for all models in `dist_train.sh` using PyTorch distributed data parallel (DDP). +We provide training script for all models in `dist_train.sh` using PyTorch distributed data parallel (DDP). To train SwiftFormer models on an 8-GPU machine: @@ -97,7 +119,7 @@ To train SwiftFormer models on an 8-GPU machine: sh dist_train.sh /path/to/imagenet 8 ``` -Note: specify which model command you want to run in the script. To reproduce the results of the paper, use 16-GPU machine with batch-size of 128 or 8-GPU machine with batch size of 256. Auto Augmentation, CutMix, MixUp are disabled for SwiftFormer-XS, and CutMix, MixUp are disabled for SwiftFormer-S. +Note: specify which model command you want to run in the script. To reproduce the results of the paper, use 16-GPU machine with batch-size of 128 or 8-GPU machine with batch size of 256. Auto Augmentation, CutMix, MixUp are disabled for SwiftFormer-XS, and CutMix, MixUp are disabled for SwiftFormer-S. ### Multi-node training @@ -107,11 +129,11 @@ On a Slurm-managed cluster, multi-node training can be launched as sbatch slurm_train.sh /path/to/imagenet SwiftFormer_XS ``` -Note: specify slurm specific paramters in `slurm_train.sh` script. +Note: specify slurm specific paramters in `slurm_train.sh` script. -### Testing +### Testing -We provide an example test script `dist_test.sh` using PyTorch distributed data parallel (DDP). +We provide an example test script `dist_test.sh` using PyTorch distributed data parallel (DDP). For example, to test SwiftFormer-XS on an 8-GPU machine: ```