图像分类¶

MXNet Pytorch

MXNet¶

下图展示了 ImageNet 预训练模型的推理吞吐量与验证准确率的可视化。吞吐量是使用单个 V100 GPU 和批量大小 64 进行测量的。

图像分类 - GluonCV 文档

如何使用预训练模型¶

以下示例需要 GluonCV>=0.4 和 MXNet>=1.4.0。如有必要，请按照我们的安装指南安装或升级 GluonCV 和 MXNet。
您可以自己准备一张图片，或者使用我们的示例图片。您可以将图片保存为文件名 classification-demo.png 到您的工作目录，或者如果使用其他文件名，则修改源代码中的文件名。
使用预训练模型。模型通过其名称指定。

让我们来试试看！

import mxnet as mx
import gluoncv

# you can change it to your image filename
filename = 'classification-demo.png'
# you may modify it to switch to another model. The name is case-insensitive
model_name = 'ResNet50_v1d'
# download and load the pre-trained model
net = gluoncv.model_zoo.get_model(model_name, pretrained=True)
# load image
img = mx.image.imread(filename)
# apply default data preprocessing
transformed_img = gluoncv.data.transforms.presets.imagenet.transform_eval(img)
# run forward pass to obtain the predicted score for each class
pred = net(transformed_img)
# map predicted values to probability by softmax
prob = mx.nd.softmax(pred)[0].asnumpy()
# find the 5 class indices with the highest score
ind = mx.nd.topk(pred, k=5)[0].astype('int').asnumpy().tolist()
# print the class name and predicted probability
print('The input picture is classified to be')
for i in range(5):
    print('- [%s], with probability %.3f.'%(net.classes[ind[i]], prob[ind[i]]))

使用我们的示例图片得到的输出预计为

The input picture is classified to be
- [Welsh springer spaniel], with probability 0.899.
- [Irish setter], with probability 0.005.
- [Brittany spaniel], with probability 0.003.
- [cocker spaniel], with probability 0.002.
- [Blenheim spaniel], with probability 0.002.

记住，您可以通过替换 model_name 的值来尝试不同的模型。有关模型名称及其性能，请继续阅读下表。

ImageNet¶

提示

训练命令使用此脚本

下载 train_imagenet.py

同一个模型可以使用不同的 hashtag 下载不同训练参数。带有灰色名称的参数可以通过传入相应的 hashtag 来下载。

下载默认预训练权重：net = get_model('ResNet50_v1d', pretrained=True)
下载指定 hashtag 的权重：net = get_model('ResNet50_v1d', pretrained='117a384e')

ResNet50_v1_int8 和 MobileNet1.0_int8 是在 ImageNet 数据集上校准过的量化模型。

ResNet¶

提示

ResNet50_v1_int8 是 ResNet50_v1 的量化模型。
ResNet_v1b 通过在瓶颈块的 3x3 层设置步长来修改 ResNet_v1。
ResNet_v1c 通过用三个 3x3 卷积层替换 7x7 卷积层来修改 ResNet_v1b。
ResNet_v1d 在残差路径上添加一个步长为 2 的 2x2 avgpool 层来下采样特征图，从而修改 ResNet_v1c，以保留更多信息。

模型	Top-1	Top-5	Hashtag	训练命令	训练日志
ResNet18_v1 1	70.93	89.92	a0666292	shell脚本	日志
ResNet34_v1 1	74.37	91.87	48216ba9	shell脚本	日志
ResNet50_v1 1	77.36	93.57	cc729d95	shell脚本	日志
ResNet50_v1_int8 1	76.86	93.46	cc729d95
ResNet101_v1 1	78.34	94.01	d988c13d	shell脚本	日志
ResNet152_v1 1	79.22	94.64	acfd0970	shell脚本	日志
ResNet18_v1b 1	70.94	89.83	2d9d980c	shell脚本	日志
ResNet34_v1b 1	74.65	92.08	8e16b848	shell脚本	日志
ResNet50_v1b 1	77.67	93.82	0ecdba34	shell脚本	日志
ResNet50_v1b_gn 1	77.36	93.59	0ecdba34	shell脚本	日志
ResNet101_v1b 1	79.20	94.61	a455932a	shell脚本	日志
ResNet152_v1b 1	79.69	94.74	a5a61ee1	shell脚本	日志
ResNet50_v1c 1	78.03	94.09	2a4e0708	shell脚本	日志
ResNet101_v1c 1	79.60	94.75	064858f2	shell脚本	日志
ResNet152_v1c 1	80.01	94.96	75babab6	shell脚本	日志
ResNet50_v1d 1	79.15	94.58	117a384e	shell脚本	日志
ResNet50_v1d 1	78.48	94.20	00319ddc	shell脚本	日志
ResNet101_v1d 1	80.51	95.12	1b2b825f	shell脚本	日志
ResNet101_v1d 1	79.78	94.80	8659a9d6	shell脚本	日志
ResNet152_v1d 1	80.61	95.34	cddbc86f	shell脚本	日志
ResNet152_v1d 1	80.26	95.00	cfe0220d	shell脚本	日志
ResNet18_v2 2	71.00	89.92	a81db45f	shell脚本	日志
ResNet34_v2 2	74.40	92.08	9d6b80bb	shell脚本	日志
ResNet50_v2 2	77.11	93.43	ecdde353	shell脚本	日志
ResNet101_v2 2	78.53	94.17	18e93e4f	shell脚本	日志
ResNet152_v2 2	79.21	94.31	f2695542	shell脚本	日志

ResNext¶

模型	Top-1	Top-5	Hashtag	训练命令	训练日志
ResNext50_32x4d 12	79.32	94.53	4ecf62e2	shell脚本	日志
ResNext101_32x4d 12	80.37	95.06	8654ca5d	shell脚本	日志
ResNext101_64x4d_v1 12	80.69	95.17	2f0d1c9d	shell脚本	日志
SE_ResNext50_32x4d 12 14	79.95	94.93	7906e0e1	shell脚本	日志
SE_ResNext101_32x4d 12 14	80.91	95.39	688e2389	shell脚本	日志
SE_ResNext101_64x4d 12 14	81.01	95.32	11c50114	shell脚本	日志

ResNeSt¶

模型	Top-1	Top-5	Hashtag	训练命令	训练日志
ResNeSt14 17	75.75	92.70	7e0b0cae	shell脚本	日志
ResNeSt26 17	78.68	94.38	36459074	shell脚本
ResNeSt50 17	81.04	95.42	bcfefe1d	shell脚本	日志
ResNeSt101 17	82.83	96.42	5da943b3	shell脚本	日志
ResNeSt200 17	83.86	96.86	0c5d117d	shell脚本	日志
ResNeSt269 17	84.53	96.98	11ae7f5d	shell脚本	日志

MobileNet¶

提示

MobileNet1.0_int8 是 MobileNet1.0 的量化模型。

模型	Top-1	Top-5	Hashtag	训练命令	训练日志
MobileNet1.0 4	73.28	91.30	efbb2ca3	shell脚本	日志
MobileNet1.0_int8 4	72.85	90.99	efbb2ca3
MobileNet1.0 4	72.93	91.14	cce75496	shell脚本	日志
MobileNet0.75 4	70.25	89.49	84c801e2	shell脚本	日志
MobileNet0.5 4	65.20	86.34	0130d2aa	shell脚本	日志
MobileNet0.25 4	52.91	76.94	f0046a3d	shell脚本	日志
MobileNetV2_1.0 5	72.04	90.57	f9952bcd	shell脚本	日志
MobileNetV2_0.75 5	69.36	88.50	b56e3d1c	shell脚本	日志
MobileNetV2_0.5 5	64.43	85.31	08038185	shell脚本	日志
MobileNetV2_0.25 5	51.76	74.89	9b1d2cc3	shell脚本	日志
MobileNetV3_Large 15	75.32	92.30	eaa44578	shell脚本	日志
MobileNetV3_Small 15	67.72	87.51	33c100a7	shell脚本	日志

VGG¶

模型	Top-1	Top-5	Hashtag	训练命令	训练日志
VGG11 9	66.62	87.34	dd221b16
VGG13 9	67.74	88.11	6bc5de58
VGG16 9	73.23	91.31	e660d456	shell脚本	日志
VGG19 9	74.11	91.35	ad2f660d	shell脚本	日志
VGG11_bn 9	68.59	88.72	ee79a809
VGG13_bn 9	68.84	88.82	7d97a06c
VGG16_bn 9	73.10	91.76	7f01cf05	shell脚本	日志
VGG19_bn 9	74.33	91.85	f360b758	shell脚本	日志

SqueezeNet¶

模型	Top-1	Top-5	Hashtag	训练命令	训练日志
SqueezeNet1.0 10	56.11	79.09	264ba497
SqueezeNet1.1 10	54.96	78.17	33ba0f93

DenseNet¶

模型	Top-1	Top-5	Hashtag
DenseNet121 7	74.97	92.25	f27dbf2d
DenseNet161 7	77.70	93.80	b6c8a957
DenseNet169 7	76.17	93.17	2603f878
DenseNet201 7	77.32	93.62	1cdbc116

剪枝 ResNet¶

模型	Top-1	Top-5	Hashtag	加速比（相对于原始 ResNet）
resnet18_v1b_0.89	67.2	87.45	54f7742b	2倍
resnet50_v1d_0.86	78.02	93.82	a230c33f	1.68倍
resnet50_v1d_0.48	74.66	92.34	0d3e69bb	3.3倍
resnet50_v1d_0.37	70.71	89.74	9982ae49	5.01倍
resnet50_v1d_0.11	63.22	84.79	6a25eece	8.78倍
resnet101_v1d_0.76	79.46	94.69	a872796b	1.8倍
resnet101_v1d_0.73	78.89	94.48	712fccb1	2.02倍

其他¶

提示

InceptionV3 使用 299x299 的输入尺寸进行评估。

模型	Top-1	Top-5	Hashtag	训练命令	训练日志
AlexNet 6	54.92	78.03	44335d1f
darknet53 3	78.56	94.43	2189ea49	shell脚本	日志
darknet53 3	78.13	93.86	95975047	shell脚本	日志
InceptionV3 8	78.77	94.39	a5050dbc	shell脚本	日志
GoogLeNet 16	72.87	91.17	c7c89366	shell脚本	日志
Xception 8	79.56	94.77	37c1c90b	shell脚本	日志
InceptionV3 8	78.41	94.13	e132adf2	shell脚本	日志
SENet_154 14	81.26	95.51	b5538ef1

CIFAR10¶

下表列出了在 CIFAR10 上训练的预训练模型。

提示

我们的预训练模型重现了“Mix-Up” 13 的结果。请查阅参考文献以获取更多信息。

表中的训练命令适用于以下脚本

对于标准训练 (Vanilla)：下载 train_cifar10.py
对于 mix-up 训练：下载 train_mixup_cifar10.py

模型	准确率 (Vanilla/Mix-Up 13 )	训练命令	训练日志
CIFAR_ResNet20_v1 1	92.1 / 92.9	标准训练 (Vanilla) / Mix-Up	标准训练 (Vanilla) / Mix-Up
CIFAR_ResNet56_v1 1	93.6 / 94.2	标准训练 (Vanilla) / Mix-Up	标准训练 (Vanilla) / Mix-Up
CIFAR_ResNet110_v1 1	93.0 / 95.2	标准训练 (Vanilla) / Mix-Up	标准训练 (Vanilla) / Mix-Up
CIFAR_ResNet20_v2 2	92.1 / 92.7	标准训练 (Vanilla) / Mix-Up	标准训练 (Vanilla) / Mix-Up
CIFAR_ResNet56_v2 2	93.7 / 94.6	标准训练 (Vanilla) / Mix-Up	标准训练 (Vanilla) / Mix-Up
CIFAR_ResNet110_v2 2	94.3 / 95.5	标准训练 (Vanilla) / Mix-Up	标准训练 (Vanilla) / Mix-Up
CIFAR_WideResNet16_10 11	95.1 / 96.7	标准训练 (Vanilla) / Mix-Up	标准训练 (Vanilla) / Mix-Up
CIFAR_WideResNet28_10 11	95.6 / 97.2	标准训练 (Vanilla) / Mix-Up	标准训练 (Vanilla) / Mix-Up
CIFAR_WideResNet40_8 11	95.9 / 97.3	标准训练 (Vanilla) / Mix-Up	标准训练 (Vanilla) / Mix-Up
CIFAR_ResNeXt29_16x64d 12	96.3 / 97.3	标准训练 (Vanilla) / Mix-Up	标准训练 (Vanilla) / Mix-Up

PyTorch¶

使用 PyTorch 实现的模型将在稍后添加。请参阅我们的 MXNet 实现。

参考文献¶

1(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24): He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.
2(1,2,3,4,5,6,7,8): He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Identity mappings in deep residual networks.” In European Conference on Computer Vision, pp. 630-645. Springer, Cham, 2016.
3(1,2): Redmon, Joseph, and Ali Farhadi. “Yolov3: An incremental improvement.” arXiv preprint arXiv:1804.02767 (2018).
4(1,2,3,4,5,6): Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. “Mobilenets: Efficient convolutional neural networks for mobile vision applications.” arXiv preprint arXiv:1704.04861 (2017).
5(1,2,3,4): Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. “Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation.” arXiv preprint arXiv:1801.04381 (2018).
6: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” In Advances in neural information processing systems, pp. 1097-1105. 2012.
7(1,2,3,4): Huang, Gao, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. “Densely Connected Convolutional Networks.” In CVPR, vol. 1, no. 2, p. 3. 2017.
8(1,2,3): Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. “Rethinking the inception architecture for computer vision.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818-2826. 2016.
9(1,2,3,4,5,6,7,8): Karen Simonyan, Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” arXiv technical report arXiv:1409.1556 (2014).
10(1,2): Iandola, Forrest N., Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size.” arXiv preprint arXiv:1602.07360 (2016).
11(1,2,3): Zagoruyko, Sergey, and Nikos Komodakis. “Wide residual networks.” arXiv preprint arXiv:1605.07146 (2016).
12(1,2,3,4,5,6,7): Xie, Saining, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. “Aggregated residual transformations for deep neural networks.” In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pp. 5987-5995. IEEE, 2017.
13(1,2): Zhang, Hongyi, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. “mixup: Beyond empirical risk minimization.” arXiv preprint arXiv:1710.09412 (2017).
14(1,2,3,4): Hu, Jie, Li Shen, and Gang Sun. “Squeeze-and-excitation networks.” arXiv preprint arXiv:1709.01507 7 (2017).
15(1,2): Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. “Searching for mobilenetv3.” arXiv preprint arXiv:1905.02244 (2019).
16: Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich “Going Deeper with Convolutions” arXiv preprint arXiv:1409.4842 (2014).
17(1,2,3,4,5,6): Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Muller, R. Manmatha, Mu Li and Alex Smola “ResNeSt: Split-Attention Network” arXiv preprint (2020).