Image-to-Image Translation with Conditional Adversarial Networks 總結

2023-03-28 11:33:09

cGAN：

problems：translating an input image into a corresponding output image
Traditionally：tackled with separate, special-purpose machinery(用單獨、特殊用途機制來處理)
GAN：instead specify only a high-level goal
cGANs ：suitable for image-to-image translation tasks, where we condition on an input image and generate a corresponding output image. （image-conditional GANs）

Structured losses for image modeling

Image-to-image translation problems —> per-pixel classification or regression
Conditional GANs instead learn a structured loss
Structured losses penalize the joint configuration of the output
conditional GAN loss is learned, can penalize any possible structure that differs between output and target.

Conditional GANs

Prior ：used GANs for image-to-image mappings, but only applied the GAN unconditionally, relying

on other terms (such as L2 regression) to force the output to be conditioned on the input
architectural choices

generator ： “U-Net”-based architecture

discriminator：convolutional “PatchGAN” classifier----penalizes structure at the scale of image patches.

GANs ： G : z —> y

conditional GANs： G : { x , z } —> y

Objective
test the importance of conditioning the discriminator
it beneficial to mix the GAN objective with a more traditional loss

explore: L1 encourages less blurring
find generator simply learned to ignore the noise

provide dropout noise

applied on several layers of our generator at both training and test time.

observe only minor stochasticity(次要特征變化)

Designing produce highly stochastic output, capture the full entropy of the conditional distributions they model, is an important question left
Network architectures ：convolution-BatchNorm-ReLu
Generator with skips
structure in the input is roughly aligned with structure in the output. （大緻一緻）

previous：used an encoder-decoder network
circumvent the bottleneck（規避瓶頸）：add skip connections, following the general shape of a “U-Net”
Markovian discriminator (PatchGAN)
L1 produces blurry results on image generation problems （ capture the low frequencies）
model high-frequency structure, relying on an L1 term to force low-frequency correctness
In order to model high-frequencies, it is sufficient to restrict our attention to the structure in local image patches.
design a discriminator architecture – which we term a PatchGAN – that only penalizes structure at the scale of patches.
PatchGAN can be understood as a form of texture/style loss