2. 使用预训练的AlphaPose估计模型进行预测¶

本文展示了如何用几行代码使用预训练的Alpha Pose模型。

首先我们导入一些必要的库

from matplotlib import pyplot as plt
from gluoncv import model_zoo, data, utils
from gluoncv.data.transforms.pose import detector_to_alpha_pose, heatmap_to_coord_alpha_pose

加载预训练模型¶

让我们获取一个在MS COCO数据集上、输入图像尺寸为256x192训练的Alpha Pose模型。我们选择使用ResNet-101 V1b作为基础模型的那个。通过指定 pretrained=True，它会在需要时自动从模型库下载模型。有关更多预训练模型，请参阅模型库。

请注意，Alpha Pose模型采用自顶向下的策略，在从目标检测模型中检测到的边界框内估计人体姿态。

detector = model_zoo.get_model('yolo3_mobilenet1.0_coco', pretrained=True)
pose_net = model_zoo.get_model('alpha_pose_resnet101_v1b_coco', pretrained=True)

# Note that we can reset the classes of the detector to only include
# human, so that the NMS process is faster.

detector.reset_class(["person"], reuse_weights=['person'])

输出

Downloading /root/.mxnet/models/alpha_pose_resnet101_v1b_coco-de56b871.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/alpha_pose_resnet101_v1b_coco-de56b871.zip...

  0%|          | 0/216178 [00:00<?, ?KB/s]
  3%|3         | 7137/216178 [00:00<00:02, 71340.98KB/s]
  7%|7         | 15264/216178 [00:00<00:02, 74073.64KB/s]
 11%|#         | 22944/216178 [00:00<00:02, 75289.37KB/s]
 14%|#4        | 30909/216178 [00:00<00:02, 76982.03KB/s]
 18%|#7        | 38609/216178 [00:00<00:02, 60253.24KB/s]
 22%|##1       | 47076/216178 [00:00<00:02, 67223.05KB/s]
 25%|##5       | 54236/216178 [00:00<00:02, 64381.97KB/s]
 29%|##8       | 62266/216178 [00:00<00:02, 68838.61KB/s]
 32%|###2      | 69677/216178 [00:01<00:02, 70340.03KB/s]
 36%|###6      | 78304/216178 [00:01<00:01, 74956.71KB/s]
 40%|###9      | 85966/216178 [00:01<00:01, 74827.96KB/s]
 43%|####3     | 93744/216178 [00:01<00:01, 75363.92KB/s]
 47%|####6     | 101362/216178 [00:01<00:02, 56882.53KB/s]
 50%|#####     | 108923/216178 [00:01<00:01, 61380.11KB/s]
 54%|#####3    | 115673/216178 [00:01<00:01, 53127.95KB/s]
 57%|#####7    | 124257/216178 [00:01<00:01, 60832.54KB/s]
 61%|######    | 131194/216178 [00:01<00:01, 62437.20KB/s]
 64%|######4   | 139368/216178 [00:02<00:01, 67018.84KB/s]
 68%|######8   | 147186/216178 [00:02<00:00, 70048.16KB/s]
 72%|#######1  | 154986/216178 [00:02<00:00, 72269.71KB/s]
 75%|#######5  | 163121/216178 [00:02<00:00, 74858.74KB/s]
 79%|#######8  | 170777/216178 [00:02<00:00, 74691.34KB/s]
 83%|########2 | 178774/216178 [00:02<00:00, 75970.78KB/s]
 86%|########6 | 186715/216178 [00:02<00:00, 76974.25KB/s]
 90%|########9 | 194489/216178 [00:02<00:00, 77088.66KB/s]
 94%|#########3| 202426/216178 [00:02<00:00, 77762.82KB/s]
 97%|#########7| 210235/216178 [00:03<00:00, 77234.75KB/s]
216179KB [00:03, 70046.85KB/s]

为检测器预处理图像并进行推断¶

接下来我们下载一张图像，并使用预设的数据转换进行预处理。这里我们指定将图像的短边调整大小到512像素。但您也可以输入任意大小的图像。

此函数返回两个结果。第一个是形状为 (batch_size, RGB_channels, height, width) 的NDArray。它可以直接输入模型。第二个包含numpy格式的图像，方便绘图。由于我们只加载了单张图像，x 的第一个维度是1。

im_fname = utils.download('https://github.com/dmlc/web-data/blob/master/' +
                          'gluoncv/pose/soccer.png?raw=true',
                          path='soccer.png')
x, img = data.transforms.presets.yolo.load_test(im_fname, short=512)
print('Shape of pre-processed image:', x.shape)

class_IDs, scores, bounding_boxs = detector(x)

输出

Shape of pre-processed image: (1, 3, 512, 605)

处理从检测器到关键点网络的张量¶

接下来我们处理检测器的输出。

对于Alpha Pose网络，它期望输入尺寸为256x192，并且人体居中。我们裁剪出每个人的边界框区域，并将其调整大小到256x192，最后进行归一化。

为了确保边界框包含了整个人，我们通常会稍微放大边界框的大小。

pose_input, upscale_bbox = detector_to_alpha_pose(img, class_IDs, scores, bounding_boxs)

使用Alpha Pose网络进行预测¶

现在我们可以进行预测了。

Alpha Pose网络预测每个关节（即关键点）的热力图。推断后，我们在热力图中寻找最高值，并将其映射到原始图像上的坐标。

predicted_heatmap = pose_net(pose_input)
pred_coords, confidence = heatmap_to_coord_alpha_pose(predicted_heatmap, upscale_bbox)

显示姿态估计结果¶

我们可以使用 gluoncv.utils.viz.plot_keypoints() 函数来可视化结果。

ax = utils.viz.plot_keypoints(img, pred_coords, confidence,
                              class_IDs, bounding_boxs, scores,
                              box_thresh=0.5, keypoint_thresh=0.2)
plt.show()

脚本总运行时间： ( 0 分钟 7.768 秒)

画廊由 Sphinx-Gallery 生成