A model is made up of multiple layers.
The activation function determines what is outputted by neurons of this layer. There are two ways to add an activation function to a layer. You can either add it via the activation
argument on any layer or you can add the activation function as a layer.
model.add(keras.layers.Dense(32, activation='relu'))
model.add(keras.layers.Dense(32))
model.add(keras.layers.Activation('relu'))
Built in activation functions:
max(x, 0)
max_value
argument or the min value using the threshold
argument.if x ≥ 0: return scale * x
if x ≤ 0: return scale * alpha * (exp(x) - 1)
Advanced Activation Functions:
tf.keras.layers.LeackyReLU(alpha=0.3)
Pros and Cons of most activation function: HERE.
All layers inherit from the base layer class.
tf.keras.layers.Layer(
trainable=True, name=None, dtype=None, dynamic=False, **kwargs
)
model.layers[0].name
Convolutional layers are used with convolutional neural networks (CNNs)
There are three different convolutional layer dimensions:
tf.keras.layers.Conv2D(
filters,
kernel_size,
strides=(1, 1),
padding="valid",
data_format=None,
dilation_rate=(1, 1),
groups=1,
activation=None,
use_bias=True,
kernel_initializer="glorot_uniform",
bias_initializer="zeros",
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs
)
(batch_size, height, width, channels)
as input(batch_size, channels, height, width)
as inputInput Shape: batch_shape + (channels, rows, cols)
if data format is channels first
Output Shape: batch_shape + (filters, new_rows, new_cols)
if data format is channels first. Rows and columns may change due to padding.
Similar to Conv*D layers, but channels are kept separate at first and then mixed at the end. The 2D version is similar to an Inception Block.
Performs the first half of SeperableConv2D, where channels are kept separate.
"Undoes" a convolutional layer. Is generally used to increase the dimensionality (rows and columns) while decreasing the channel number.
tf.keras.layers.Conv2DTranspose(
filters,
kernel_size,
strides=(1, 1),
padding="valid",
output_padding=None,
data_format=None,
dilation_rate=(1, 1),
activation=None,
use_bias=True,
kernel_initializer="glorot_uniform",
bias_initializer="zeros",
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs
)
Used to instantiate a keras tensor
tf.keras.Input(
shape=None,
batch_size=None,
name=None,
dtype=None,
sparse=False,
tensor=None,
ragged=False,
**kwargs
)
The most common layer type. A layer that is completely connected to the previous layer.
tf.keras.layers.Dense(
units,
activation=None,
use_bias=True,
kernel_initializer="glorot_uniform",
bias_initializer="zeros",
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs
)
Add an activation function to the previous layer.
tf.keras.layers.Activation(activation, **kwargs)
THIS site does a great job of explaining what an embedding layer does. "an embedding learns tries to find the optimal mapping of each of the unique words to a vector of real numbers. The size of that vectors is equal to the output_dim". An embedding layer maps a vector that consists of a small sample of the vocabulary to a feature vector.
Must be the first layer of a model.
tf.keras.layers.Embedding(
input_dim,
output_dim,
embeddings_initializer="uniform",
embeddings_regularizer=None,
activity_regularizer=None,
embeddings_constraint=None,
mask_zero=False,
input_length=None,
**kwargs
)
Flatten
followed by Dense
later on.Used primarily in RNNs. Skips timesteps. Good for skipping padding when using LSTM.
tf.keras.layers.Masking(mask_value=0.0, **kwargs)
tf.keras.layers.Lambda(
function, output_shape=None, mask=None, arguments=None, **kwargs
)
Pooling layers are used to downsample. They are generally used with convolutional layers to reduce the size of the feature space
Max pooling uses passes the max value over a window to the next layer. There are three different pooling layer dimensions:
tf.keras.layers.MaxPooling2D(
pool_size=(2, 2),
strides=None,
padding="valid",
data_format=None,
**kwargs
)
output_shape = (input_shape - pool_size + 1) / strides)
output_shape = input_shape / strides
Average pooling passes the average value over a window to the next layer. There are three different pooling layer dimensions:
tf.keras.layers.AveragePooling2D(
pool_size=(2, 2), strides=None, padding="valid", data_format=None, **kwargs
)
Args same as MaxPooling
There are also GlobalMaxPooling and GlobalAveragePooling varients that don't use a window, but the entire input.
Constraints can be added to the weights of a layer. For example, a constraint might not allow negative weights or a constraint might limit the norm of a layer.
Weight initalizers initialize a layer's weights.
Weight regularizers penalizes certain aspects of a layer's parameters during optimization (training).
Three common regulaizers exist for most layer types:
kernel_regularizer
: Applies regularization function to the weights matrixbias_regularizer
: Applies regularization function to the biasactivity_regularizer
: Applies regularization function to the output of the layerGood stackexchange post about the three regularizers.
There are three available regularizers:
tf.keras.regularizers.l1(l1=0.01)
: loss = l1 * reduced_sum(abs(x))
tf.keras.regularizers.l2(l1=0.01)
: loss = l1 * reduced_sum(square(x))
tf.keras.regularizers.l1_l2(l1=0.01, l2=0.02)
For example:
layer = tf.keras.layers.Dense(
units=64,
kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),
bias_regularizer=regularizers.l2(1e-4),
activity_regularizer=regularizers.l2(1e-5)
)