Keras QuickRef
Keras is a high-level neural networks API, written in Python that runs on top of the Deep Learning framework TensorFlow. In fact,
tf.keras will be integrated directly into TensorFlow 1.2 !
Here are my API notes:
Model API
summary()
get_config()
from_config(config)
set_weights()
set_weights(weights)
to_json()
to_yaml()
save_weights(filepath)
load_weights(filepath, by_name)
layers
Model Sequential /Functional APIs
add(layer)
compile(optimizer, loss, metrics, sample_weight_mode)
fit(x, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight)
evaluate(x, y, batch_size, verbose, sample_weight)
predict(x, batch_size, verbose)
predict_classes(x, batch_size, verbose)
predict_proba(x, batch_size, verbose)
train_on_batch(x, y, class_weight, sample_weight)
test_on_batch(x, y, class_weight)
predict_on_batch(x)
fit_generator(generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe)
evaluate_generator(generator, val_samples, max_q_size, nb_worker, pickle_safe)
predict_generator(generator, val_samples, max_q_size, nb_worker, pickle_safe)
get_layer(name, index)
Layers
Core
Layer | description | IO | params |
---|
Dense | vanilla fully connected NN layer | (nb_samples, input_dim) --> (nb_samples, output_dim) | output_dim/shape, init, activation, weights, W_regularizer, b_regularizer, activity_regularizer, W_constraint, b_constraint, bias, input_dim/shape |
Activation | Applies an activation function to an output | TN --> TN | activation |
Dropout | randomly set fraction p of input units to 0 at each update during training time --> reduce overfitting | TN --> TN | p |
SpatialDropout2D/3D | dropout of entire 2D/3D feature maps to counter pixel / voxel proximity correlation | (samples, rows, cols, [stacks,] channels) --> (samples, rows, cols, [stacks,] channels) | p |
Flatten | Flattens the input to 1D | (nb_samples, D1, D2, D3) --> (nb_samples, D1xD2xD3) | - |
Reshape | Reshapes an output to a different factorization | eg (None, 3, 4) --> (None, 12) or (None, 2, 6) | target_shape |
Permute | Permutes dimensions of input - output_shape is same as the input shape, but with the dimensions re-ordered | eg (None, A, B) --> (None, B, A) | dims |
RepeatVector | Repeats the input n times | (nb_samples, features) --> (nb_samples, n, features) | n |
Merge | merge a list of tensors into a single tensor | [TN] --> TN | layers, mode, concat_axis, dot_axes, output_shape, output_mask, node_indices, tensor_indices, name |
Lambda | TensorFlow expression | flexible | function, output_shape, arguments |
ActivityRegularization | regularize the cost function | TN --> TN | l1, l2 |
Masking | identify timesteps in D1 to be skipped | TN --> TN | mask_value |
Highway | LSTM for FFN ? | (nb_samples, input_dim) --> (nb_samples, output_dim) | same as Dense + transform_bias |
MaxoutDense | takes the element-wise maximum of prev layer - to learn a convex, piecewise linear activation function over the inputs ?? | (nb_samples, input_dim) --> (nb_samples, output_dim) | same as Dense + nb_feature |
TimeDistributed | Apply a Dense layer for each D1 time_dimension | (nb_sample, time_dimension, input_dim) --> (nb_sample, time_dimension, output_dim) | Dense |
Convolutional
Layer | description | IO | params |
---|
Convolution1D | filter neighborhoods of 1D inputs | (samples, steps, input_dim) --> (samples, new_steps, nb_filter) | nb_filter, filter_length, init, activation, weights, border_mode, subsample_length, W_regularizer, b_regularizer, activity_regularizer, W_constraint, b_constraint, bias, input_dim, input_length |
Convolution2D | filter neighborhoods of 2D inputs | (samples, rows, cols, channels) --> (samples, new_rows, new_cols, nb_filter) | like Convolution1D + nb_row, nb_col instead of filter_length , subsample, dim_ordering |
AtrousConvolution1/2D | dilated convolution with holes | same as Convolution2D | same as Convolution1/2D + atrous_rate |
SeparableConvolution2D | first does a depth 1st spatial convolution on each input channel separately, then a pointwise convolution which mixes together the resulting output channels. | same as Convolution2D | same as Convolution2D + depth_multiplier, depthwise_regularizer, pointwise_regularizer, depthwise_constraint, pointwise_constraint |
Deconvolution2D | Transposed convolution ??? | | |
Convolution3D | | (samples, conv_dim1, conv_dim2, conv_dim3, channels) --> (samples, new_conv_dim1, new_conv_dim2, new_conv_dim3, nb_filter) | kernel_dim1, kernel_dim2, kernel_dim3 |
Cropping1D/2D/3D | crops along the dimension(s) | (samples, depth, [axes_to_crop]) -->(samples, depth, [cropped_axes]) | cropping, dim_order |
UpSampling1D/2D/3D | Repeat each step x times along the specified axes | (samples, [dims], channels) --> (samples, [upsampled_dims], channels) | size, dim_order |
ZeroPadding1/2/3D | 0 padding | (samples, [dims], channels) --> (samples, [padded_dims], channels) | padding, dim_order |
Pooling && Locally Connected
Layer | description | IO | params |
---|
Max/AveragePooling1/2/3D | downscale to max / average | (samples, [len_pool_dimN], channels) -->(samples, [pooled_dimN], channels) | pool_size, strides, border_mode, dim_ordering |
GlobalMax/GlobalAveragePooling1/2D | downscale to max / average | (samples, [len_pool_dimN], channels) -->(samples, [pooled_dimN], channels) | dim_ordering |
Locally Connected1D/2D | similarly to ConvolutionxD but weights are unshared - different filters applied at each patch | | like ConvolutionxD + subsample |
Recurrent
Layer | description | IO | params |
---|
Recurrent | abstract base class | (nb_samples, timesteps, input_dim) --> (return_sequences)?(nb_samples, timesteps, output_dim):(nb_samples, output_dim) | weights, return_sequences, go_backwards, stateful, unroll, consume_less, input_dim, input_length |
SimpleRNN | Fully-connected RNN where output is fed back as input | like Recurrent | Recurrent + output_dim, init, inner_init, activation, W_regularizer, U_regularizer, b_regularizer, dropout_W, dropout_U |
GRU | Gated Recurrent Unit | like Recurrent | like SimpleRNN |
LSTM | Long-Short Term Memory unit | like Recurrent | like SimpleRNN |
Misc
Layer | description | IO | params |
---|
Embedded | Turn positive integers (indexes) into dense vectors of fixed size | (nb_samples, sequence_length) --> (nb_samples, sequence_length, output_dim) | input_dim, output_dim, init, input_length, W_regularizer, activity_regularizer, W_constraint, mask_zero, weights, dropout |
BatchNormalization | at each batch, normalize activations of previous layer (mean:0, sd: 1) | TN --> TN | epsilon, mode, axis, momentum, weights, beta_init, gamma_init, gamma_regularizer, beta_regularizer |
Activation
Layer | description | IO | params |
---|
LeakyReLU | ReLU that allows a small gradient when unit is inactive: f(x) = alpha * x for x < 0, f(x) = x for x >= 0 | TN --> TN | alpha |
PReLU | Parametric ReLU - gradient is a learned array: f(x) = alphas * x for x < 0, f(x) = x for x >= 0 | TN --> TN | init, weights |
ELU | Exponential Linear Unit: f(x) = alpha * (exp(x) - 1.) for x < 0, f(x) = x for x >= 0 | TN --> TN | alpha |
ParametricSoftplus | alpha * log(1 + exp(beta * x)) | TN --> TN | alpha, beta |
ThresholdedReLU | f(x) = x for x > theta f(x) = 0 otherwise | TN --> TN | theta |
SReLU | S-shaped ReLU | TN --> TN | t_left_init, a_left_init, t_right_init, a_right_init |
Noise
Layer | description | IO | params |
---|
GaussianNoise | mitigate overfitting by smoothing: 0-centered Gaussian noise with standard deviation sigma | TN --> TN | sigma |
GaussianDropout | mitigate overfitting by smoothing: 0-centered Gaussian noise with standard deviation sqrt(p/(1-p)) | TN --> TN | p |
Preprocessing
type | name | transform | params |
---|
sequence | pad_sequences | list of nb_samples scalar sequence --> 2D array of shape (nb_samples, nb_timesteps) | sequences, maxlen, dtype |
| skipgrams | word index list of int --> list of (word,word) | sequence, vocabulary_size, window_size, negative_samples, shuffle, categorical, sampling_table |
| make_sampling_table | generate word index array of shape (size,) for skipgrams | size, sampling_factor |
Text | text_to_word_sequence | sentence --> list of words | text, filters, lower, split |
| one_hot | text --> list of n word indexes | text, n, filters, lower, split |
| Tokenizer | text --> list of word indexes | nb_words, filters, lower, split |
image | ImageDataGenerator | batches of image tensors | featurewise_center, samplewise_center, featurewise_std_normalization, samplewise_std_normalization,zca_whitening, rotation_range,width_shift_range, height_shift_range,shear_range,zoom_range,channel_shift_range, fill_mode, cval, horizontal_flip, vertical_flip, rescale, dim_ordering |
Objectives (Loss Functions)
- mean_squared_error / mse
- mean_absolute_error / mae
- mean_absolute_percentage_error / mape
- mean_squared_logarithmic_error / msle
- squared_hinge
- hinge
- binary_crossentropy (logloss)
- categorical_crossentropy (multiclass logloss) - requires labels be binary arrays of shape
(nb_samples, nb_classes)
- sparse_categorical_crossentropy As above but accepts sparse labels
- kullback_leibler_divergence / kld Information gain from a predicted probability distribution Q to a true probability distribution P
- poisson Mean of
(predictions - targets * log(predictions))
- cosine_proximity negative mean cosine proximity between predictions and targets
metrics
- binary_accuracy - for binary classification
- categorical_accuracy -for multiclass classification
- sparse_categorical_accuracy
- top_k_categorical_accuracy - when the target class is within the top-k predictions provided
- mean_squared_error (mse) - for regression
- mean_absolute_error (mae)
- mean_absolute_percentage_error (mape)
- mean_squared_logarithmic_error (msle)
- hinge - hinge loss: `max(1 - y_true * y_pred, 0)``
- squared_hinge hinge ^ 2
- categorical_crossentropy - for multiclass classification
- sparse_categorical_crossentropy
- binary_crossentropy -for binary classification
- kullback_leibler_divergence
- poisson
- cosine_proximity
- matthews_correlation - for quality of binary classification
- fbeta_score - weighted harmonic mean of precision and recall in multi-label classification
Optimizers
- SGD - Stochastic gradient descent, with support for momentum, learning rate decay, and Nesterov momentum
- RMSProp - good for RNNs
- Adagrad
- AdaDelta
- AdaMax
- Adam
- Nadam
Activation Functions
- softmax
- softplus
- softsign
- relu
- tanh
- sigmoid
- hard_sigmoid
- linear
Callbacks
name | description | params |
---|
Callback | abstract base class - hooks: on_epoch_end , on_batch_start , on_batch_end | |
BaseLogger | accumulates epoch averages of metrics being monitored | |
ProgbarLogger | writes to stdout | |
History | records events into a History object (automatic) | |
ModelCheckpoint | Save model after every epoch, according to monitored quantity | filepath, monitor, verbose, save_best_only, save_weights_only, mode |
EarlyStopping | stop training when a monitored quantity has stopped improving after patience | monitor, min_delta, patience, verbose, mode |
RemoteMonitor | stream events to a server | root, path, field |
LearningRateScheduler | ? | schedule |
TensorBoard | write a log for TensorBaord to visualize | log_dir, histogram_freq, write_graph, write_images |
ReduceLROnPlateau | Reduce learning rate when a metric has stopped improving | monitor, factor, patience, verbose, mode, epsilon, cooldown, min_lr |
CSVLogger | stream epoch results to a csv file | filename, separator, append |
LambdaCallback | custom callback | on_epoch_begin, on_epoch_end, on_batch_begin, on_batch_end, on_train_begin, on_train_end |
Init Functions
- uniform
- lecun_uniform
- identity
- orthogonal
- zero
- glorot_normal - Gaussian initialization * **scaled by fan_in + fan_out
- glorot_uniform
- he_uniform
Regulizers
arguments
- W_regularizer, b_regularizer (WeightRegularizer)
- activity_regularizer (ActivityRegularizer)
penalties:
- l1 - LASSO
- l2 - weight decay, Ridge
- l1l2 - ElasticNet
Constraints
arguments
- W_constraint - for the main weights matrix
- b_constraint for bias
constraints
- maxnorm - maximum-norm
- nonneg - non-negativity
- unitnorm - unit-norm
Tuning Hyper-Parameters:
- batch size
- number of epochs
- training optimization algorithm
- Learning Weight
- momentum
- network weight initialization
- activation function
- dropout regularization
- number of neurons in a hidden layer
- depth of hidden layers