注意
点击此处下载完整示例代码
01. 使用预训练的 Monodepth2 模型从单张图像预测深度¶
这是一个使用GluonCV Monodepth2模型在KITTI上处理真实世界图像的快速演示。如果尚未安装MXNet和GluonCV,请遵循安装指南进行安装。
import numpy as np
import mxnet as mx
from mxnet.gluon.data.vision import transforms
import gluoncv
# using cpu
ctx = mx.cpu(0)
准备图像¶
首先我们下载示例图像,
url = 'https://raw.githubusercontent.com/KuangHaofei/GluonCV_Test/master/monodepthv2/tutorials/test_img.png'
filename = 'test_img.png'
gluoncv.utils.download(url, filename, True)
输出
Downloading test_img.png from https://raw.githubusercontent.com/KuangHaofei/GluonCV_Test/master/monodepthv2/tutorials/test_img.png...
0%| | 0/728 [00:00<?, ?KB/s]
729KB [00:00, 58393.29KB/s]
然后我们加载并可视化图像,
import PIL.Image as pil
img = pil.open(filename).convert('RGB')
from matplotlib import pyplot as plt
plt.imshow(img)
plt.show()

我们调整图像大小使其与预训练模型具有相同的输入尺寸,并将图像转换为NDArray,
original_width, original_height = img.size
feed_height = 192
feed_width = 640
img = img.resize((feed_width, feed_height), pil.LANCZOS)
img = transforms.ToTensor()(mx.nd.array(img)).expand_dims(0).as_in_context(context=ctx)
加载预训练模型并进行预测¶
接下来,我们从模型库中获取一个预训练模型,
model = gluoncv.model_zoo.get_model('monodepth2_resnet18_kitti_stereo_640x192',
pretrained_base=False, ctx=ctx, pretrained=True)
输出
Downloading /root/.mxnet/models/monodepth2_resnet18_kitti_stereo_640x192-83eea4a9.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/monodepth2_resnet18_kitti_stereo_640x192-83eea4a9.zip...
0%| | 0/70343 [00:00<?, ?KB/s]
0%| | 98/70343 [00:00<01:27, 801.78KB/s]
1%| | 514/70343 [00:00<00:30, 2317.41KB/s]
3%|3 | 2178/70343 [00:00<00:09, 7357.90KB/s]
12%|#1 | 8210/70343 [00:00<00:02, 25582.76KB/s]
22%|##1 | 15318/70343 [00:00<00:01, 40577.04KB/s]
34%|###3 | 23586/70343 [00:00<00:00, 52791.09KB/s]
44%|####4 | 31075/70343 [00:00<00:00, 59610.49KB/s]
55%|#####5 | 38785/70343 [00:00<00:00, 64958.78KB/s]
67%|######7 | 47340/70343 [00:00<00:00, 68967.64KB/s]
78%|#######8 | 55029/70343 [00:01<00:00, 71312.58KB/s]
90%|######### | 63469/70343 [00:01<00:00, 75200.27KB/s]
70344KB [00:01, 55337.75KB/s]
我们直接在图像上进行视差图预测,并将其大小调整为输入尺寸
outputs = model.predict(img)
disp = outputs[("disp", 0)]
disp_resized = mx.nd.contrib.BilinearResize2D(disp, height=original_height, width=original_width)
最后,我们添加标准化颜色图来可视化预测的视差图,
import matplotlib as mpl
import matplotlib.cm as cm
disp_resized_np = disp_resized.squeeze().as_in_context(mx.cpu()).asnumpy()
vmax = np.percentile(disp_resized_np, 95)
normalizer = mpl.colors.Normalize(vmin=disp_resized_np.min(), vmax=vmax)
mapper = cm.ScalarMappable(norm=normalizer, cmap='magma')
colormapped_im = (mapper.to_rgba(disp_resized_np)[:, :, :3] * 255).astype(np.uint8)
im = pil.fromarray(colormapped_im)
im.save('test_output.png')
import matplotlib.image as mpimg
disp_map = mpimg.imread('test_output.png')
plt.imshow(disp_map)
plt.show()

脚本总运行时间: ( 0 分钟 2.807 秒)