
This post covering
Herein, advanced frameworks cannot catch innovations. For example, you cannot use Swish based activation functions in Keras today. This might appear in the following patch but you may need to use an another activation function before related patch pushed. So, this post will guide you to consume a custom activation function out of the Keras and Tensorflow such as Swish or E-Swish.
🙋♂️ You may consider to enroll my top-rated machine learning course on Udemy

Code wins arguments
All you need is to create your custom activation function. In this case, I’ll consume swish which is x times sigmoid. Besides, I include this in a convolutional neural networks model.
import keras def swish(x): beta = 1.5 #1, 1.5 or 2 return beta * x * keras.backend.sigmoid(x) model = Sequential() #1st convolution layer model.add(Conv2D(32, (3, 3) #32 is number of filters and (3, 3) is the size of the filter. , activation = swish , input_shape=(28,28,1))) model.add(MaxPooling2D(pool_size=(2,2))) #2nd convolution layer model.add(Conv2D(64,(3, 3), activation = swish)) # apply 64 filters sized of (3x3) on 2nd convolution layer model.add(MaxPooling2D(pool_size=(2,2))) model.add(Flatten()) # Fully connected layer. 1 hidden layer consisting of 512 nodes model.add(Dense(512, activation = swish)) model.add(Dense(num_classes, activation='softmax')) model.compile(loss='categorical_crossentropy' , optimizer=keras.optimizers.Adam() , metrics=['accuracy'] ) model.fit_generator(x_train, y_train , epochs=epochs , validation_data=(x_test, y_test) )
Ok, but how?
Remember that we will use this activation function in feed forward step whereas we need to use its derivative in the backpropagation. We just define the activation function but we do offer its derivative. That’s the power of TensorFlow. The framework knows how to apply differentiation for backpropagation. This comes from importing keras backend module. If you design swish function without keras.backend then fitting would fail.
To sum up
So, we’ve mentioned how to include a new activation function for learning process in Keras / TensorFlow pair. Picking the most convenient activation function is the state-of-the-art for scientists just like structure (number of hidden layers, number of nodes in the hidden layers) and learning parameters (learning rate, epoch or learning rate). Now, you can design your own activation function or consume any newly introduced activation function just similar to the following picture.

My friend and colleague Giray inspires me to produce this post. I am grateful to him as usual.
Let’s dance
These are the dance moves of the most common activation functions in deep learning. Ensure to turn the volume up 🙂
Support this blog if you do like!