Geeks With Blogs

Josh Reuben

Keras QuickRef

Keras is a high-level neural networks API, written in Python that runs on top of the Deep Learning framework TensorFlow. In fact, tf.keras will be integrated directly into TensorFlow 1.2 !
Here are my API notes:

Model API

load_weights(filepath, by_name)

Model Sequential /Functional APIs

compile(optimizer, loss, metrics, sample_weight_mode)
fit(x, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight)
evaluate(x, y, batch_size, verbose, sample_weight)

predict(x, batch_size, verbose)
predict_classes(x, batch_size, verbose)
predict_proba(x, batch_size, verbose)

train_on_batch(x, y, class_weight, sample_weight)
test_on_batch(x, y, class_weight)

fit_generator(generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe)
evaluate_generator(generator, val_samples, max_q_size, nb_worker, pickle_safe)
predict_generator(generator, val_samples, max_q_size, nb_worker, pickle_safe)

get_layer(name, index)



Densevanilla fully connected NN layer(nb_samples, input_dim) --> (nb_samples, output_dim)output_dim/shape, init, activation, weights, W_regularizer, b_regularizer, activity_regularizer, W_constraint, b_constraint, bias, input_dim/shape
ActivationApplies an activation function to an outputTN --> TNactivation
Dropoutrandomly set fraction p of input units to 0 at each update during training time --> reduce overfittingTN --> TNp
SpatialDropout2D/3Ddropout of entire 2D/3D feature maps to counter pixel / voxel proximity correlation(samples, rows, cols, [stacks,] channels) --> (samples, rows, cols, [stacks,] channels)p
FlattenFlattens the input to 1D(nb_samples, D1, D2, D3) --> (nb_samples, D1xD2xD3)-
ReshapeReshapes an output to a different factorizationeg (None, 3, 4) --> (None, 12) or (None, 2, 6)target_shape
PermutePermutes dimensions of input - output_shape is same as the input shape, but with the dimensions re-orderedeg (None, A, B) --> (None, B, A)dims
RepeatVectorRepeats the input n times(nb_samples, features) --> (nb_samples, n, features)n
Mergemerge a list of tensors into a single tensor[TN] --> TNlayers, mode, concat_axis, dot_axes, output_shape, output_mask, node_indices, tensor_indices, name
LambdaTensorFlow expressionflexiblefunction, output_shape, arguments
ActivityRegularizationregularize the cost functionTN --> TNl1, l2
Maskingidentify timesteps in D1 to be skippedTN --> TNmask_value
HighwayLSTM for FFN ?(nb_samples, input_dim) --> (nb_samples, output_dim)same as Dense + transform_bias
MaxoutDensetakes the element-wise maximum of prev layer - to learn a convex, piecewise linear activation function over the inputs ??(nb_samples, input_dim) --> (nb_samples, output_dim)same as Dense + nb_feature
TimeDistributedApply a Dense layer for each D1 time_dimension(nb_sample, time_dimension, input_dim) --> (nb_sample, time_dimension, output_dim)Dense


Convolution1Dfilter neighborhoods of 1D inputs(samples, steps, input_dim) --> (samples, new_steps, nb_filter)nb_filter, filter_length, init, activation, weights, border_mode, subsample_length, W_regularizer, b_regularizer, activity_regularizer, W_constraint, b_constraint, bias, input_dim, input_length
Convolution2Dfilter neighborhoods of 2D inputs(samples, rows, cols, channels) --> (samples, new_rows, new_cols, nb_filter)like Convolution1D + nb_row, nb_col instead of filter_lengthsubsample, dim_ordering
AtrousConvolution1/2Ddilated convolution with holessame as Convolution2Dsame as Convolution1/2D + atrous_rate
SeparableConvolution2Dfirst does a depth 1st spatial convolution on each input channel separately, then a pointwise convolution which mixes together the resulting output channels.same as Convolution2Dsame as Convolution2D + depth_multiplier, depthwise_regularizer, pointwise_regularizer, depthwise_constraint, pointwise_constraint
Deconvolution2DTransposed convolution ???
Convolution3D(samples, conv_dim1, conv_dim2, conv_dim3, channels) --> (samples, new_conv_dim1, new_conv_dim2, new_conv_dim3, nb_filter)kernel_dim1, kernel_dim2, kernel_dim3
Cropping1D/2D/3Dcrops along the dimension(s)(samples, depth, [axes_to_crop]) -->(samples, depth, [cropped_axes])cropping, dim_order
UpSampling1D/2D/3DRepeat each step x times along the specified axes(samples, [dims], channels) --> (samples, [upsampled_dims], channels)size, dim_order
ZeroPadding1/2/3D0 padding(samples, [dims], channels) --> (samples, [padded_dims], channels)padding, dim_order

Pooling && Locally Connected

Max/AveragePooling1/2/3Ddownscale to max / average(samples, [len_pool_dimN], channels) -->(samples, [pooled_dimN], channels)pool_size, strides, border_mode, dim_ordering
GlobalMax/GlobalAveragePooling1/2Ddownscale to max / average(samples, [len_pool_dimN], channels) -->(samples, [pooled_dimN], channels)dim_ordering
Locally Connected1D/2Dsimilarly to ConvolutionxD but weights are unshared - different filters applied at each patchlike ConvolutionxD + subsample


Recurrentabstract base class(nb_samples, timesteps, input_dim) --> (return_sequences)?(nb_samples, timesteps, output_dim):(nb_samples, output_dim)weights, return_sequences, go_backwards, stateful, unroll, consume_less, input_dim, input_length
SimpleRNNFully-connected RNN where output is fed back as inputlike RecurrentRecurrent + output_dim, init, inner_init, activation, W_regularizer, U_regularizer, b_regularizer, dropout_W, dropout_U
GRUGated Recurrent Unitlike Recurrentlike SimpleRNN
LSTMLong-Short Term Memory unitlike Recurrentlike SimpleRNN


EmbeddedTurn positive integers (indexes) into dense vectors of fixed size(nb_samples, sequence_length) --> (nb_samples, sequence_length, output_dim)input_dim, output_dim, init, input_length, W_regularizer, activity_regularizer, W_constraint, mask_zero, weights, dropout
BatchNormalizationat each batch, normalize activations of previous layer (mean:0, sd: 1)TN --> TNepsilon, mode, axis, momentum, weights, beta_init, gamma_init, gamma_regularizer, beta_regularizer


LeakyReLUReLU that allows a small gradient when unit is inactive: f(x) = alpha * x for x < 0, f(x) = x for x >= 0TN --> TNalpha
PReLUParametric ReLU - gradient is a learned array: f(x) = alphas * x for x < 0, f(x) = x for x >= 0TN --> TNinit, weights
ELUExponential Linear Unit: f(x) = alpha * (exp(x) - 1.) for x < 0, f(x) = x for x >= 0TN --> TNalpha
ParametricSoftplusalpha * log(1 + exp(beta * x))TN --> TNalpha, beta
ThresholdedReLUf(x) = x for x > theta f(x) = 0 otherwiseTN --> TNtheta
SReLUS-shaped ReLUTN --> TNt_left_init, a_left_init, t_right_init, a_right_init


GaussianNoisemitigate overfitting by smoothing: 0-centered Gaussian noise with standard deviation sigmaTN --> TNsigma
GaussianDropoutmitigate overfitting by smoothing: 0-centered Gaussian noise with standard deviation sqrt(p/(1-p))TN --> TNp


sequencepad_sequenceslist of nb_samplesscalar sequence --> 2D array of shape (nb_samples, nb_timesteps)sequences, maxlen, dtype
skipgramsword index list of int --> list of (word,word)sequence, vocabulary_size, window_size, negative_samples, shuffle, categorical, sampling_table
make_sampling_tablegenerate word index array of shape (size,) for skipgramssize, sampling_factor
Texttext_to_word_sequencesentence --> list of wordstext, filters, lower, split
one_hottext --> list of n word indexestext, n, filters, lower, split
Tokenizertext --> list of word indexesnb_words, filters, lower, split
imageImageDataGeneratorbatches of image tensorsfeaturewise_center, samplewise_center, featurewise_std_normalization, samplewise_std_normalization,zca_whitening, rotation_range,width_shift_range, height_shift_range,shear_range,zoom_range,channel_shift_range, fill_mode, cval, horizontal_flip, vertical_flip, rescale, dim_ordering

Objectives (Loss Functions)

  • mean_squared_error / mse
  • mean_absolute_error / mae
  • mean_absolute_percentage_error / mape
  • mean_squared_logarithmic_error / msle
  • squared_hinge
  • hinge
  • binary_crossentropy (logloss)
  • categorical_crossentropy (multiclass logloss) - requires labels be binary arrays of shape (nb_samples, nb_classes)
  • sparse_categorical_crossentropy As above but accepts sparse labels
  • kullback_leibler_divergence / kld Information gain from a predicted probability distribution Q to a true probability distribution P
  • poisson Mean of (predictions - targets * log(predictions))
  • cosine_proximity negative mean cosine proximity between predictions and targets


  • binary_accuracy - for binary classification
  • categorical_accuracy -for multiclass classification
  • sparse_categorical_accuracy
  • top_k_categorical_accuracy - when the target class is within the top-k predictions provided
  • mean_squared_error (mse) - for regression
  • mean_absolute_error (mae)
  • mean_absolute_percentage_error (mape)
  • mean_squared_logarithmic_error (msle)
  • hinge - hinge loss: `max(1 - y_true * y_pred, 0)``
  • squared_hinge hinge ^ 2
  • categorical_crossentropy - for multiclass classification
  • sparse_categorical_crossentropy
  • binary_crossentropy -for binary classification
  • kullback_leibler_divergence
  • poisson
  • cosine_proximity
  • matthews_correlation - for quality of binary classification
  • fbeta_score - weighted harmonic mean of precision and recall in multi-label classification


  • SGD - Stochastic gradient descent, with support for momentum, learning rate decay, and Nesterov momentum
  • RMSProp - good for RNNs
  • Adagrad
  • AdaDelta
  • AdaMax
  • Adam
  • Nadam

Activation Functions

  • softmax
  • softplus
  • softsign
  • relu
  • tanh
  • sigmoid
  • hard_sigmoid
  • linear


Callbackabstract base class - hooks: on_epoch_endon_batch_starton_batch_end
BaseLoggeraccumulates epoch averages of metrics being monitored
ProgbarLoggerwrites to stdout
Historyrecords events into a History object (automatic)
ModelCheckpointSave model after every epoch, according to monitored quantityfilepath, monitor, verbose, save_best_only, save_weights_only, mode
EarlyStoppingstop training when a monitored quantity has stopped improving after patiencemonitor, min_delta, patience, verbose, mode
RemoteMonitorstream events to a serverroot, path, field
TensorBoardwrite a log for TensorBaord to visualizelog_dir, histogram_freq, write_graph, write_images
ReduceLROnPlateauReduce learning rate when a metric has stopped improvingmonitor, factor, patience, verbose, mode, epsilon, cooldown, min_lr
CSVLoggerstream epoch results to a csv filefilename, separator, append
LambdaCallbackcustom callbackon_epoch_begin, on_epoch_end, on_batch_begin, on_batch_end, on_train_begin, on_train_end

Init Functions

  • uniform
  • lecun_uniform
  • identity
  • orthogonal
  • zero
  • glorot_normal - Gaussian initialization * **scaled by fan_in + fan_out
  • glorot_uniform
  • he_uniform



  • W_regularizer, b_regularizer (WeightRegularizer)
  • activity_regularizer (ActivityRegularizer)


  • l1 - LASSO
  • l2 - weight decay, Ridge
  • l1l2 - ElasticNet



  • W_constraint - for the main weights matrix
  • b_constraint for bias


  • maxnorm - maximum-norm
  • nonneg - non-negativity
  • unitnorm - unit-norm

Tuning Hyper-Parameters:

  • batch size
  • number of epochs
  • training optimization algorithm
  • Learning Weight
  • momentum
  • network weight initialization
  • activation function
  • dropout regularization
  • number of neurons in a hidden layer
  • depth of hidden layers
Posted on Friday, March 17, 2017 3:26 PM Artificial Intelligence , TensorFlow | Back to top

Comments on this post: Keras QuickRef

No comments posted yet.
Your comment:
 (will show your gravatar)

Copyright © JoshReuben | Powered by: