全文共6821字,預計學習時長20分鐘
來源:Pexels
從自動駕駛汽車檢測路上的物體,到通過複雜的面部及身體語言識别發現可能的犯罪活動。多年來,研究人員一直在探索讓機器通過視覺識别物體的可能性。
這一特殊領域被稱為計算機視覺 (Computer Vision, CV),在現代生活中有着廣泛的應用。
目标檢測 (ObjectDetection) 也是計算機視覺最酷的應用之一,這是不容置疑的事實。
現在的CV工具能夠輕松地将目标檢測應用于圖檔甚至是直播視訊。本文将簡單地展示如何用TensorFlow建立實時目标檢測器。
建立一個簡單的目标檢測器
設定要求:
TensorFlow版本在1.15.0或以上
執行pip install TensorFlow安裝最新版本
一切就緒,現在開始吧!
設定環境
第一步:從Github上下載下傳或複制TensorFlow目标檢測的代碼到本地計算機
在終端運作如下指令:
git clonehttps://github.com/tensorflow/models.git
第二步:安裝依賴項
下一步是确定計算機上配備了運作目标檢測器所需的庫群組件。
下面列舉了本項目所依賴的庫。(大部分依賴都是TensorFlow自帶的)
· Cython
· contextlib2
· pillow
· lxml
· matplotlib
若有遺漏的元件,在運作環境中執行pip install即可。
第三步:安裝Protobuf編譯器
谷歌的Protobuf,又稱Protocol buffers,是一種語言無關、平台無關、可擴充的序列化結構資料的機制。Protobuf幫助程式員定義資料結構,輕松地在各種資料流中使用各種語言進行編寫和讀取結構資料。
Protobuf也是本項目的依賴之一。點選這裡了解更多關于Protobufs的知識。接下來把Protobuf安裝到計算機上。
打開終端或者打開指令提示符,将位址改為複制的代碼倉庫,在終端執行如下指令:
cd models/research
wget -Oprotobuf.zip https://github.com/protocolbuffers/protobuf/releases/download/v3.9.1/protoc-3.9.1-osx-x86_64.zip
unzipprotobuf.zip
注意:請務必在models/research目錄解壓protobuf.zip檔案。
來源:Pexels
第四步:編輯Protobuf編譯器
從research/ directory目錄中執行如下指令編輯Protobuf編譯器:
./bin/protoc object_detection/protos/*.proto--python_out=.
用Python實作目标檢測
現在所有的依賴項都已經安裝完畢,可以用Python實作目标檢測了。
在下載下傳的代碼倉庫中,将目錄更改為:
models/research/object_detection
這個目錄下有一個叫object_detection_tutorial.ipynb的ipython notebook。該檔案是示範目标檢測算法的demo,在執行時會用到指定的模型:
ssd_mobilenet_v1_coco_2017_11_17
這一測試會識别代碼庫中提供的兩張測試圖檔。下面是測試結果之一:
要檢測直播視訊中的目标還需要一些微調。在同一檔案夾中建立一個Jupyter notebook,按照下面的代碼操作:
[1]:
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from distutils.version import StrictVersion
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
# This isneeded since the notebook is stored in the object_detection folder.
sys.path.append("..")
from utils import ops as utils_ops
if StrictVersion(tf.__version__) < StrictVersion('1.12.0'):
raise ImportError('Please upgrade your TensorFlow installation to v1.12.*.')
[2]:
# This isneeded to display the images.
get_ipython().run_line_magic('matplotlib', 'inline')
[3]:
# Objectdetection imports
# Here arethe imports from the object detection module.
from utils import label_map_util
from utils import visualization_utils as vis_util
[4]:
# Modelpreparation
# Anymodel exported using the `export_inference_graph.py` tool can be loaded heresimply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb file.
# Bydefault we use an "SSD with Mobilenet" model here.
#See https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
#for alist of other models that can be run out-of-the-box with varying speeds andaccuracies.
# Whatmodel to download.
MODEL_NAME= 'ssd_mobilenet_v1_coco_2017_11_17'
MODEL_FILE= MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE= 'http://download.tensorflow.org/models/object_detection/'
# Path tofrozen detection graph. This is the actual model that is used for the objectdetection.
PATH_TO_FROZEN_GRAPH= MODEL_NAME + '/frozen_inference_graph.pb'
# List ofthe strings that is used to add correct label for each box.
PATH_TO_LABELS= os.path.join('data', 'mscoco_label_map.pbtxt')
[5]:
#DownloadModel
opener =urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE+ MODEL_FILE, MODEL_FILE)
tar_file =tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
file_name= os.path.basename(file.name)
if'frozen_inference_graph.pb'in file_name:
tar_file.extract(file,os.getcwd())
[6]:
# Load a(frozen) Tensorflow model into memory.
detection_graph= tf.Graph()
with detection_graph.as_default():
od_graph_def= tf.GraphDef()
withtf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
serialized_graph= fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def,name='')
[7]:
# Loadinglabel map
# Labelmaps map indices to category names, so that when our convolution networkpredicts `5`,
#we knowthat this corresponds to `airplane`. Here we use internal utilityfunctions,
#butanything that returns a dictionary mapping integers to appropriate stringlabels would be fine
category_index= label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS,use_display_name=True)
[8]:
defrun_inference_for_single_image(image, graph):
with graph.as_default():
with tf.Session() as sess:
# Get handles to input and output tensors
ops= tf.get_default_graph().get_operations()
all_tensor_names= {output.name for op in ops for output in op.outputs}
tensor_dict= {}
for key in [
'num_detections', 'detection_boxes', 'detection_scores',
'detection_classes', 'detection_masks']:
tensor_name= key + ':0'
if tensor_name in all_tensor_names:
tensor_dict[key]= tf.get_default_graph().get_tensor_by_name(tensor_name)
if'detection_masks'in tensor_dict:
# The following processing is only for single image
detection_boxes= tf.squeeze(tensor_dict['detection_boxes'], [0])
detection_masks= tf.squeeze(tensor_dict['detection_masks'], [0])
# Reframe is required to translate mask from boxcoordinates to image coordinates and fit the image size.
real_num_detection= tf.cast(tensor_dict['num_detections'][0], tf.int32)
detection_boxes= tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
detection_masks= tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
detection_masks_reframed= utils_ops.reframe_box_masks_to_image_masks(
detection_masks,detection_boxes, image.shape[1],image.shape[2])
detection_masks_reframed= tf.cast(
tf.greater(detection_masks_reframed,0.5),tf.uint8)
# Follow the convention by adding back the batchdimension
tensor_dict['detection_masks'] =tf.expand_dims(
detection_masks_reframed,0)
image_tensor= tf.get_default_graph().get_tensor_by_name('image_tensor:0')
# Run inference
output_dict= sess.run(tensor_dict, feed_dict={image_tensor: image})
# all outputs are float32 numpy arrays, so convert typesas appropriate
output_dict['num_detections'] =int(output_dict['num_detections'][0])
output_dict['detection_classes'] =output_dict[
'detection_classes'][0].astype(np.int64)
output_dict['detection_boxes'] =output_dict['detection_boxes'][0]
output_dict['detection_scores'] =output_dict['detection_scores'][0]
if'detection_masks'in output_dict:
output_dict['detection_masks'] =output_dict['detection_masks'][0]
return output_dict
[9]:
import cv2
cam =cv2.cv2.VideoCapture(0)
rolling = True
while (rolling):
ret,image_np = cam.read()
image_np_expanded= np.expand_dims(image_np, axis=0)
# Actual detection.
output_dict= run_inference_for_single_image(image_np_expanded, detection_graph)
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks'),
use_normalized_coordinates=True,
line_thickness=8)
cv2.imshow('image', cv2.resize(image_np,(1000,800)))
if cv2.waitKey(25) & 0xFF == ord('q'):
break
cv2.destroyAllWindows()
cam.release()
在運作Jupyter notebook時,網絡攝影系統會開啟并檢測所有原始模型訓練過的物品類别。
感謝閱讀本文,如果有什麼建議,歡迎在留言區積極發言喲~
留言點贊關注
我們一起分享AI學習與發展的幹貨
如轉載,請背景留言,遵守轉載規範