Keras is a high-level deep learning API framework for machine learning platforms such as TensorFlow.
Docs.
A model is made up of multiple layers.
The activation function determines what is outputted by neurons of this layer. There are two ways to add an activation function to a layer. You can either add it via the activation
argument on any layer or you can add the activation function as a layer.
model.add(keras.layers.Dense(32, activation='relu'))
model.add(keras.layers.Dense(32))
model.add(keras.layers.Activation('relu'))
Built in activation functions:
max(x, 0)
max_value
argument or the min value using the threshold
argument.if x ≥ 0: return scale * x
if x ≤ 0: return scale * alpha * (exp(x) - 1)
Advanced Activation Functions:
tf.keras.layers.LeackyReLU(alpha=0.3)
Pros and Cons of most activation function: HERE.
All layers inherit from the base layer class.
tf.keras.layers.Layer(
trainable=True, name=None, dtype=None, dynamic=False, **kwargs
)
model.layers[0].name
Convolutional layers are used with convolutional neural networks (CNNs)
There are three different convolutional layer dimensions:
tf.keras.layers.Conv2D(
filters,
kernel_size,
strides=(1, 1),
padding="valid",
data_format=None,
dilation_rate=(1, 1),
groups=1,
activation=None,
use_bias=True,
kernel_initializer="glorot_uniform",
bias_initializer="zeros",
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs
)
(batch_size, height, width, channels)
as input(batch_size, channels, height, width)
as inputInput Shape: batch_shape + (channels, rows, cols)
if data format is channels first
Output Shape: batch_shape + (filters, new_rows, new_cols)
if data format is channels first. Rows and columns may change due to padding.
Similar to Conv*D layers, but channels are kept separate at first and then mixed at the end. The 2D version is similar to an Inception Block.
Performs the first half of SeperableConv2D, where channels are kept separate.
"Undoes" a convolutional layer. Is generally used to increase the dimensionality (rows and columns) while decreasing the channel number.
tf.keras.layers.Conv2DTranspose(
filters,
kernel_size,
strides=(1, 1),
padding="valid",
output_padding=None,
data_format=None,
dilation_rate=(1, 1),
activation=None,
use_bias=True,
kernel_initializer="glorot_uniform",
bias_initializer="zeros",
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs
)
Used to instantiate a keras tensor
tf.keras.Input(
shape=None,
batch_size=None,
name=None,
dtype=None,
sparse=False,
tensor=None,
ragged=False,
**kwargs
)
The most common layer type. A layer that is completely connected to the previous layer.
tf.keras.layers.Dense(
units,
activation=None,
use_bias=True,
kernel_initializer="glorot_uniform",
bias_initializer="zeros",
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs
)
Add an activation function to the previous layer.
tf.keras.layers.Activation(activation, **kwargs)
THIS site does a great job of explaining what an embedding layer does. "an embedding learns tries to find the optimal mapping of each of the unique words to a vector of real numbers. The size of that vectors is equal to the output_dim". An embedding layer maps a vector that consists of a small sample of the vocabulary to a feature vector.
Must be the first layer of a model.
tf.keras.layers.Embedding(
input_dim,
output_dim,
embeddings_initializer="uniform",
embeddings_regularizer=None,
activity_regularizer=None,
embeddings_constraint=None,
mask_zero=False,
input_length=None,
**kwargs
)
Flatten
followed by Dense
later on.Used primarily in RNNs. Skips timesteps. Good for skipping padding when using LSTM.
tf.keras.layers.Masking(mask_value=0.0, **kwargs)
tf.keras.layers.Lambda(
function, output_shape=None, mask=None, arguments=None, **kwargs
)
Pooling layers are used to downsample. They are generally used with convolutional layers to reduce the size of the feature space
Max pooling uses passes the max value over a window to the next layer. There are three different pooling layer dimensions:
tf.keras.layers.MaxPooling2D(
pool_size=(2, 2),
strides=None,
padding="valid",
data_format=None,
**kwargs
)
output_shape = (input_shape - pool_size + 1) / strides)
output_shape = input_shape / strides
Average pooling passes the average value over a window to the next layer. There are three different pooling layer dimensions:
tf.keras.layers.AveragePooling2D(
pool_size=(2, 2), strides=None, padding="valid", data_format=None, **kwargs
)
Args same as MaxPooling
There are also GlobalMaxPooling and GlobalAveragePooling varients that don't use a window, but the entire input.
Constraints can be added to the weights of a layer. For example, a constraint might not allow negative weights or a constraint might limit the norm of a layer.
Weight initalizers initialize a layer's weights.
Weight regularizers penalizes certain aspects of a layer's parameters during optimization (training).
Three common regulaizers exist for most layer types:
kernel_regularizer
: Applies regularization function to the weights matrixbias_regularizer
: Applies regularization function to the biasactivity_regularizer
: Applies regularization function to the output of the layerGood stackexchange post about the three regularizers.
There are three available regularizers:
tf.keras.regularizers.l1(l1=0.01)
: loss = l1 * reduced_sum(abs(x))
tf.keras.regularizers.l2(l1=0.01)
: loss = l1 * reduced_sum(square(x))
tf.keras.regularizers.l1_l2(l1=0.01, l2=0.02)
For example:
layer = tf.keras.layers.Dense(
units=64,
kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),
bias_regularizer=regularizers.l2(1e-4),
activity_regularizer=regularizers.l2(1e-5)
)
A model in keras is just a group of layers.
A complete example that goes through creating, training, and evaluating a keras model:
from sklearn.datasets import load_iris
from sklearn.preprocessing import LabelBinarizer
from sklearn.utils import shuffle
import keras
# Load dataset
iris = load_iris()
data = iris.data
enc = LabelBinarizer()
target = enc.fit_transform(iris.target)
X, y = shuffle(data, target, random_state=0)
# Make model
inputs = keras.Input(shape=(4,))
x = keras.layers.Dense(5, activation='relu')(inputs)
outputs = keras.layers.Dense(3, activation='softmax')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train
model.fit(x=X, y=y, batch_size=8, epochs=150, validation_split=0.3)
# Evaluate
loss, accuracy = model.evaluate(X, y)
# Predict
predictions = model.predict(X) # softmax gives us a probability of each category
The Model
class requires two things, the inputs
to a model and the outputs
to a model. There is an optimal name
parameter
There are two ways to instantiate a Model
class:
Model
classFunction API method:
import tensorflow as tf
inputs = tf.keras.Input(shape=(3,))
x = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
Saving a model whole is super easy.
model.save('my_model')
This will create a directory called my_model
with assets
, saved_model.pb
, and variables
as contents. This only works if using tf.keras
instead of native keras. See below if using native keras.
To load it again:
model = tf.keras.models.load_model('my_model')
Saving the model as a single HDF5 is an option, however some items are not saved, such as custom layers and external losses and metrics. These can be quite annoying to add back later if you get a model from someone else.
H5:
model.save('model.h5')
model = tf.keras.models.load_model('model.h5')
model.get_weights()
model.set_weights(weights)
model.save_weights('file_path.h5')
model.load_weights('file_path.h5')
model.to_json()
model = tf.keras.models.model_from_json(config)
new_model = tf.keras.models.clone_model(model)
The Sequential class allows you to add layers sequentially to a model.
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(16,))) # Add input layer that accepts a feature vector of length 16
model.add(tf.keras.layers.Dense(6)) # Adds a layer containing 6 neurons
Model.summary()
can be used to summarize your model, but outputting the layers
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 3)] 0
_________________________________________________________________
dense (Dense) (None, 4) 16
_________________________________________________________________
dense_1 (Dense) (None, 5) 25
=================================================================
Total params: 41
Trainable params: 41
Non-trainable params: 0
_________________________________________________________________
Keras training APIs involve compiling, fitting, evaluating, and predicting using a model.
Prepares the model for training (does a lot of hidden stuff).
Model.compile(
optimizer="rmsprop",
loss=None,
metrics=None,
loss_weights=None,
weighted_metrics=None,
run_eagerly=None,
steps_per_execution=None,
**kwargs
)
['accuracy']
is the most common metric[10, 1]
would weight the first loss function 10 times heavier than the second loss function.Used to train a model.
Model.fit(
x=None,
y=None,
batch_size=None,
epochs=1,
verbose=1,
callbacks=None,
validation_split=0.0,
validation_data=None,
shuffle=True,
class_weight=None,
sample_weight=None,
initial_epoch=0,
steps_per_epoch=None,
validation_steps=None,
validation_batch_size=None,
validation_freq=1,
max_queue_size=10,
workers=1,
use_multiprocessing=False,
)
(x_val, y_val)
format. Do not use with validation_split
.Returns a History
object that can be used for plotting.
Like fit, but without the training. Used to find the loss and metric values for the model.
Model.evaluate(
x=None,
y=None,
batch_size=None,
verbose=1,
sample_weight=None,
steps=None,
callbacks=None,
max_queue_size=10,
workers=1,
use_multiprocessing=False,
return_dict=False,
)
Used to predict data. A batch is expected.
Model.predict(
x,
batch_size=None,
verbose=0,
steps=None,
callbacks=None,
max_queue_size=10,
workers=1,
use_multiprocessing=False,
)
Numpy array of predictions is returned