Normalization Series (BN, LN, GN ...)


< 목차 >


gn_paper_figure1 Fig. 현대 딥러닝에 쓰이는 수많은 Normalization Methods

Why Normalization ?

cs182_lec7_bn_fig1 Fig.

asd

cs182_lec7_bn_fig2 Fig.

asd

cs182_lec7_bn_fig3 Fig.

cs182_lec7_bn_fig4 Fig.

Batch Normalization (BN)

결국 우리가 원하는건

cs182_lec7_bn_fig3 Fig.

vgg_bn_good_train Fig.

normal_schematic Fig. Vanilla Network

bn_schematic Fig. Vanilla Network + BatchNorm Layer

vgg_bn_good Fig.

cs182_lec7_bn_fig5 Fig.

cs182_lec7_bn_fig6 Fig.

cs182_lec7_bn_fig7 Fig.

cs182_lec7_bn_fig8 Fig.

cs182_lec7_bn_fig9 Fig.

cs182_lec7_bn_fig10 Fig.

cs182_lec7_bn_fig11 Fig.

cs182_lec7_bn_fig12 Fig.

Internal Covariate Shift (ICS)

noisy_bn Fig.

noisy_bn2 Fig.

Layer Normalization (LN)

bn_figure Fig. Batch Normalization

ln_figure Fig. Layer Normalization

bn_vs_ln Fig. BN vs LN in 3D Sentence Tensor

PreNorm vs PostNorm

ln_in_transformer_ Fig.

\[\boldsymbol{x}_{\ell+1}=\boldsymbol{x}_{\ell}+F_{\ell}\left(\boldsymbol{x}_{\ell}\right)\] \[\text { POSTNORM: } \quad \boldsymbol{x}_{\ell+1}=\text { LAYERNORM }\left(\boldsymbol{x}_{\ell}+F_{\ell}\left(\boldsymbol{x}_{\ell}\right)\right)\] \[\text { PRENORM: } \quad \boldsymbol{x}_{\ell+1}=\boldsymbol{x}_{\ell}+F_{\ell}\left(\text { LAYERNORM }\left(\boldsymbol{x}_{\ell}\right)\right)\]

prenorm_table2 Fig.

Instance Normalization (IN)

in_figure Fig. Instance Normalization

Group Normalization (GN)

gn_figure Fig. Group Normalization

Pytorch Implementation

import torch
import torch.nn as nn
import torch.nn.functional as F
torch.manual_seed(1234)

## 1d BN
# With Learnable Parameters
m = nn.BatchNorm1d(100)
# Without Learnable Parameters
m = nn.BatchNorm1d(100, affine=False)
input = torch.randn(20, 100)
output = m(input)

## 2d BN
# With Learnable Parameters
m = nn.BatchNorm2d(100)
# Without Learnable Parameters
m = nn.BatchNorm2d(100, affine=False)
input = torch.randn(20, 100, 35, 45)
output = m(input)

## 3d BN
# With Learnable Parameters
m = nn.BatchNorm3d(100)
# Without Learnable Parameters
m = nn.BatchNorm3d(100, affine=False)
input = torch.randn(20, 100, 35, 45, 10)
output = m(input)
import torch
import torch.nn as nn
import torch.nn.functional as F
torch.manual_seed(1234)

# NLP Example
batch, sentence_length, embedding_dim = 20, 5, 10
embedding = torch.randn(batch, sentence_length, embedding_dim)
layer_norm = nn.LayerNorm(embedding_dim)

# Activate module
a = layer_norm(embedding)

# Image Example
N, C, H, W = 20, 5, 10, 10
input = torch.randn(N, C, H, W)
# Normalize over the last three dimensions (i.e. the channel and spatial dimensions)
# as shown in the image below
layer_norm = nn.LayerNorm([C, H, W])
output = layer_norm(input)
import torch
import torch.nn as nn
import torch.nn.functional as F
torch.manual_seed(1234)

input = torch.randn(20, 6, 10, 10)

# Separate 6 channels into 3 groups
m = nn.GroupNorm(3, 6)
# Separate 6 channels into 6 groups (equivalent with InstanceNorm)
m = nn.GroupNorm(6, 6)
# Put all 6 channels into a single group (equivalent with LayerNorm)
m = nn.GroupNorm(1, 6)
# Activating the module

output = m(input)

References