01. 使用预训练的 Monodepth2 模型从单张图像预测深度¶

这是一个使用GluonCV Monodepth2模型在KITTI上处理真实世界图像的快速演示。如果尚未安装MXNet和GluonCV，请遵循安装指南进行安装。

import numpy as np

import mxnet as mx
from mxnet.gluon.data.vision import transforms
import gluoncv
# using cpu
ctx = mx.cpu(0)

准备图像¶

首先我们下载示例图像，

url = 'https://raw.githubusercontent.com/KuangHaofei/GluonCV_Test/master/monodepthv2/tutorials/test_img.png'
filename = 'test_img.png'
gluoncv.utils.download(url, filename, True)

输出

Downloading test_img.png from https://raw.githubusercontent.com/KuangHaofei/GluonCV_Test/master/monodepthv2/tutorials/test_img.png...

  0%|          | 0/728 [00:00<?, ?KB/s]
729KB [00:00, 58393.29KB/s]

然后我们加载并可视化图像，

import PIL.Image as pil
img = pil.open(filename).convert('RGB')

from matplotlib import pyplot as plt
plt.imshow(img)
plt.show()

我们调整图像大小使其与预训练模型具有相同的输入尺寸，并将图像转换为NDArray，

original_width, original_height = img.size
feed_height = 192
feed_width = 640

img = img.resize((feed_width, feed_height), pil.LANCZOS)
img = transforms.ToTensor()(mx.nd.array(img)).expand_dims(0).as_in_context(context=ctx)

加载预训练模型并进行预测¶

接下来，我们从模型库中获取一个预训练模型，

model = gluoncv.model_zoo.get_model('monodepth2_resnet18_kitti_stereo_640x192',
                                    pretrained_base=False, ctx=ctx, pretrained=True)

输出

Downloading /root/.mxnet/models/monodepth2_resnet18_kitti_stereo_640x192-83eea4a9.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/monodepth2_resnet18_kitti_stereo_640x192-83eea4a9.zip...

  0%|          | 0/70343 [00:00<?, ?KB/s]
  0%|          | 98/70343 [00:00<01:27, 801.78KB/s]
  1%|          | 514/70343 [00:00<00:30, 2317.41KB/s]
  3%|3         | 2178/70343 [00:00<00:09, 7357.90KB/s]
 12%|#1        | 8210/70343 [00:00<00:02, 25582.76KB/s]
 22%|##1       | 15318/70343 [00:00<00:01, 40577.04KB/s]
 34%|###3      | 23586/70343 [00:00<00:00, 52791.09KB/s]
 44%|####4     | 31075/70343 [00:00<00:00, 59610.49KB/s]
 55%|#####5    | 38785/70343 [00:00<00:00, 64958.78KB/s]
 67%|######7   | 47340/70343 [00:00<00:00, 68967.64KB/s]
 78%|#######8  | 55029/70343 [00:01<00:00, 71312.58KB/s]
 90%|######### | 63469/70343 [00:01<00:00, 75200.27KB/s]
70344KB [00:01, 55337.75KB/s]

我们直接在图像上进行视差图预测，并将其大小调整为输入尺寸

outputs = model.predict(img)
disp = outputs[("disp", 0)]
disp_resized = mx.nd.contrib.BilinearResize2D(disp, height=original_height, width=original_width)

最后，我们添加标准化颜色图来可视化预测的视差图，

import matplotlib as mpl
import matplotlib.cm as cm
disp_resized_np = disp_resized.squeeze().as_in_context(mx.cpu()).asnumpy()
vmax = np.percentile(disp_resized_np, 95)
normalizer = mpl.colors.Normalize(vmin=disp_resized_np.min(), vmax=vmax)
mapper = cm.ScalarMappable(norm=normalizer, cmap='magma')
colormapped_im = (mapper.to_rgba(disp_resized_np)[:, :, :3] * 255).astype(np.uint8)
im = pil.fromarray(colormapped_im)
im.save('test_output.png')

import matplotlib.image as mpimg
disp_map = mpimg.imread('test_output.png')
plt.imshow(disp_map)
plt.show()

脚本总运行时间： ( 0 分钟 2.807 秒)

由 Sphinx-Gallery 生成的图库