天天看点

深度学习Course4第三周Detection Algorithms习题整理

Detection Algorithms

  1. You are building a 3-class object classification and localization algorithm. The classes are: pedestrian (c=1), car (c=2), motorcycle (c=3). What should yy be for the image below? Remember that “?” means “don’t care”, which means that the neural network loss function won’t care what the neural network gives for that component of the output. Recall y = [].
  2. 深度学习Course4第三周Detection Algorithms习题整理
  • y=[1,?,?,?,?,1,?,?]
  • y=[1,0.66,0.5,0.75,0.16,1,0,0]
  • y=[1,0.66,0.5,0.16,0.75,1,0,0]
  • y=[1,0.66,0.5,0.75,0.16,0,0,0]

2;You are working on a factory automation task. Your system will see a can of soft-drink coming down a conveyor belt, and you want it to take a picture and decide whether (i) there is a soft-drink can in the image, and if so (ii) its bounding box. Since the soft-drink can is round, the bounding box is always square, and the soft drink can always appear the same size in the image. There is at most one soft drink can in each image. Here’re some typical images in your training set:

  1. 深度学习Course4第三周Detection Algorithms习题整理
  2. To solve this task it is necessary to divide the task into two: 1. Construct a system to detect if a can is present or not. 2. Construct a system that calculates the bounding box of the can when present. Which one of the following do you agree with the most?
  •  We can approach the task as an image classification with a localization problem.
  • An end-to-end solution is always superior to a two-step system.
  • The two-step system is always a better option compared to an end-to-end solution.
  • We can’t solve the task as an image classification with a localization problem since all the bounding boxes have the same dimensions.
  1. If you build a neural network that inputs a picture of a person’s face and outputs N landmarks on the face (assume the input image always contains exactly one face), how many output units will the network have?
  •  2N
  • N
  • N 2 N^2 N
  • 2
  • 3N
  1. When training one of the object detection systems described in the lectures, you need a training set that contains many pictures of the object(s) you wish to detect. However, bounding boxes do not need to be provided in the training set, since the algorithm can learn to detect the objects by itself.
  • False
  • True
解析:you need bounding boxes in the training set. Your loss function should try to match the predictions for the bounding boxes to the true bounding boxes from the training set.
  1. What is the IoU between the red box and the blue box in the following figure? Assume that all the squares have the same measurements.
  2. 深度学习Course4第三周Detection Algorithms习题整理
解析:(2 * 2)/ (4 * 4 + 4 * 4 - 2 * 2)= 4 / 28 = 1 / 7
  1. Suppose you run non-max suppression on the predicted boxes below. The parameters you use for non-max suppression are that boxes with probability q≤ 0.4 are discarded, and the IoU threshold for deciding if two boxes overlap is 0.5. How many boxes will remain after non-max suppression?
  2. 深度学习Course4第三周Detection Algorithms习题整理
  3. If we use anchor boxes in YOLO we no longer need the coordinates of the bounding box since they are given by the cell position of the grid and the anchor box selection. True/False?
  • False
  • True
解析:We use the grid and anchor boxes to improve the capabilities of the algorithm to localize and detect objects, for example, two different objects that intersect, but we still use the bounding box coordinates.
  1. Semantic segmentation can only be applied to classify pixels of images in a binary way as 1 or 0, according to whether they belong to a certain class or not. True/False?
  • False
  • True
解析:The same ideas used for multi-class classification can be applied to semantic segmentation.
  1. Using the concept of Transpose Convolution, fill in the values of X, Y and Z below. (padding = 1, stride = 2)

    Input: 2x2 **

  2. 深度学习Course4第三周Detection Algorithms习题整理
  3. Filter: 3x3
  4. 深度学习Course4第三周Detection Algorithms习题整理
  5. Result: 6x6
  6. 深度学习Course4第三周Detection Algorithms习题整理
  • X = 3, Y = 0, Z = 4
  • X = 10, Y = 0, Z = 6
  • X = 4, Y = 3, Z = 2
  • X = 10, Y = 0, Z = 0
  1. When using the U-Net architecture with an input , where cc denotes the number of channels, the output will always have the shape . True/False?
  • False
  • True

继续阅读