PyTorch Image Models (timm)是一个图像模型(models)、层(layers)、实用程序(utilities)、优化器(optimizers)、调度器(schedulers)、数据加载/增强(data-loaders / augmentations)和参考训练/验证脚本(reference training / validation scripts)的集合,目的是将各种SOTA模型组合在一起,从而能够重现ImageNet的训练结果。
作者:Ross Wightman,来自加拿大温哥华。
git地址:https://github.com/rwightman/pytorch-image-models#introduction
涵盖的模型:
Aggregating Nested Transformers - https://arxiv.org/abs/2105.12723 BEiT - https://arxiv.org/abs/2106.08254 Big Transfer ResNetV2 (BiT) - https://arxiv.org/abs/1912.11370 Bottleneck Transformers - https://arxiv.org/abs/2101.11605 CaiT (Class-Attention in Image Transformers) - https://arxiv.org/abs/2103.17239 CoaT (Co-Scale Conv-Attentional Image Transformers) - https://arxiv.org/abs/2104.06399 CoAtNet (Convolution and Attention) - https://arxiv.org/abs/2106.04803 ConvNeXt - https://arxiv.org/abs/2201.03545 ConViT (Soft Convolutional Inductive Biases Vision Transformers)- https://arxiv.org/abs/2103.10697 CspNet (Cross-Stage Partial Networks) - https://arxiv.org/abs/1911.11929 DeiT - https://arxiv.org/abs/2012.12877 DeiT-III - https://arxiv.org/pdf/2204.07118.pdf DenseNet - https://arxiv.org/abs/1608.06993 DLA - https://arxiv.org/abs/1707.06484 DPN (Dual-Path Network) - https://arxiv.org/abs/1707.01629 EdgeNeXt - https://arxiv.org/abs/2206.10589 EfficientFormer - https://arxiv.org/abs/2206.01191 EfficientNet (MBConvNet Family) EfficientNet NoisyStudent (B0-B7, L2) - https://arxiv.org/abs/1911.04252 EfficientNet AdvProp (B0-B8) - https://arxiv.org/abs/1911.09665 EfficientNet (B0-B7) - https://arxiv.org/abs/1905.11946 EfficientNet-EdgeTPU (S, M, L) - https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html EfficientNet V2 - https://arxiv.org/abs/2104.00298 FBNet-C - https://arxiv.org/abs/1812.03443 MixNet - https://arxiv.org/abs/1907.09595 MNASNet B1, A1 (Squeeze-Excite), and Small - https://arxiv.org/abs/1807.11626 MobileNet-V2 - https://arxiv.org/abs/1801.04381 Single-Path NAS - https://arxiv.org/abs/1904.02877 TinyNet - https://arxiv.org/abs/2010.14819 EVA - https://arxiv.org/abs/2211.07636 GCViT (Global Context Vision Transformer) - https://arxiv.org/abs/2206.09959 GhostNet - https://arxiv.org/abs/1911.11907 gMLP - https://arxiv.org/abs/2105.08050 GPU-Efficient Networks - https://arxiv.org/abs/2006.14090 Halo Nets - https://arxiv.org/abs/2103.12731 HRNet - https://arxiv.org/abs/1908.07919 Inception-V3 - https://arxiv.org/abs/1512.00567 Inception-ResNet-V2 and Inception-V4 - https://arxiv.org/abs/1602.07261 Lambda Networks - https://arxiv.org/abs/2102.08602 LeViT (Vision Transformer in ConvNet’s Clothing) - https://arxiv.org/abs/2104.01136 MaxViT (Multi-Axis Vision Transformer) - https://arxiv.org/abs/2204.01697 MLP-Mixer - https://arxiv.org/abs/2105.01601 MobileNet-V3 (MBConvNet w/ Efficient Head) - https://arxiv.org/abs/1905.02244 FBNet-V3 - https://arxiv.org/abs/2006.02049 HardCoRe-NAS - https://arxiv.org/abs/2102.11646 LCNet - https://arxiv.org/abs/2109.15099 MobileViT - https://arxiv.org/abs/2110.02178 MobileViT-V2 - https://arxiv.org/abs/2206.02680 MViT-V2 (Improved Multiscale Vision Transformer) - https://arxiv.org/abs/2112.01526 NASNet-A - https://arxiv.org/abs/1707.07012 NesT - https://arxiv.org/abs/2105.12723 NFNet-F - https://arxiv.org/abs/2102.06171 NF-RegNet / NF-ResNet - https://arxiv.org/abs/2101.08692 PNasNet - https://arxiv.org/abs/1712.00559 PoolFormer (MetaFormer) - https://arxiv.org/abs/2111.11418 Pooling-based Vision Transformer (PiT) - https://arxiv.org/abs/2103.16302 PVT-V2 (Improved Pyramid Vision Transformer) - https://arxiv.org/abs/2106.13797 RegNet - https://arxiv.org/abs/2003.13678 RegNetZ - https://arxiv.org/abs/2103.06877 RepVGG - https://arxiv.org/abs/2101.03697 ResMLP - https://arxiv.org/abs/2105.03404 ResNet/ResNeXt ResNet (v1b/v1.5) - https://arxiv.org/abs/1512.03385 ResNeXt - https://arxiv.org/abs/1611.05431 ‘Bag of Tricks’ / Gluon C, D, E, S variations - https://arxiv.org/abs/1812.01187 Weakly-supervised (WSL) Instagram pretrained / ImageNet tuned ResNeXt101 - https://arxiv.org/abs/1805.00932 Semi-supervised (SSL) / Semi-weakly Supervised (SWSL) ResNet/ResNeXts - https://arxiv.org/abs/1905.00546 ECA-Net (ECAResNet) - https://arxiv.org/abs/1910.03151v4 Squeeze-and-Excitation Networks (SEResNet) - https://arxiv.org/abs/1709.01507 ResNet-RS - https://arxiv.org/abs/2103.07579 Res2Net - https://arxiv.org/abs/1904.01169 ResNeSt - https://arxiv.org/abs/2004.08955 ReXNet - https://arxiv.org/abs/2007.00992 SelecSLS - https://arxiv.org/abs/1907.00837 Selective Kernel Networks - https://arxiv.org/abs/1903.06586 Sequencer2D - https://arxiv.org/abs/2205.01972 Swin S3 (AutoFormerV2) - https://arxiv.org/abs/2111.14725 Swin Transformer - https://arxiv.org/abs/2103.14030 Swin Transformer V2 - https://arxiv.org/abs/2111.09883 Transformer-iN-Transformer (TNT) - https://arxiv.org/abs/2103.00112 TResNet - https://arxiv.org/abs/2003.13630 Twins (Spatial Attention in Vision Transformers) - https://arxiv.org/pdf/2104.13840.pdf Visformer - https://arxiv.org/abs/2104.12533 Vision Transformer - https://arxiv.org/abs/2010.11929 VOLO (Vision Outlooker) - https://arxiv.org/abs/2106.13112 VovNet V2 and V1 - https://arxiv.org/abs/1911.06667 Xception - https://arxiv.org/abs/1610.02357 Xception (Modified Aligned, Gluon) - https://arxiv.org/abs/1802.02611 Xception (Modified Aligned, TF) - https://arxiv.org/abs/1802.02611 XCiT (Cross-Covariance Image Transformers) - https://arxiv.org/abs/2106.09681
使用样例
import timm
self.backbone = timm.create_model('resnet50', pretrained=False, num_classes=500, in_chans=13)
out = self.backbone(x)
|