Activation functions are decision making units of neural networks. They calculates net output of a neural node. Herein, heaviside step function is one of the most common activation function in neural networks. The function produces binary output. That is the reason why it also called as binary step function. The function produces 1 (or true) when input passes threshold limit whereas it produces 0 (or false) when input does not pass threshold. That’s why, they are very useful for binary classification studies.
Human reflexes act based on the same principle. A person will withdraw his hand when he touces on a hot surface. Because his sensory neuron detects high temperature and fires. Passing threshold triggers to respond and withdrawal reflex action is taken. You might think true output causing fire action.
Every logic function can be implemented by neural networks. So, step function is commonly used in primitive neural networks without hidden layer or widely known name as single layer perceptrons. This type of network can classify linearly separable problems such as AND gate or OR gate. In other words, all classes (0 and 1) can be separated by a single linear line as illustrated below.
Suppose that threshold value is picked up as 0. After then, the following single layer neural network models will satisfy these logic functions.
Neural networks including hidden layers (aka multilayer perceptrons) can classify non-linearly separable problems. Such non-linear problems can be solved by couple of multilayer perceptrons and step function such as XOR gate.
So, XOR problem cannot be separable linearly as illustrated above. In other words, two classes (0 and 1) cannot be separated by a single linear line. In this case, multilayer network model can satify xor gate results.
However, step function is not perfect. Previous network models are manually designed network samples. In real world, backpropagation algorithm is run for train multilayer neural networks (updating weights). It requires differentiable activation function. The algorithm uses derivative of activation function as a multiplier (this is already mentioned in the following post: Math behind backpropagation). But derivative of step function is 0. This means gradient descent won’t be able to make progress in updating the weights and backpropagation will fail. That’s why, sigmoid function and hyperbolic tangent function are common activation functions in practice because their derivatives are easy to demonstrate.
So, we’ve mentioned an activation function for neural networks. Although, it is not effective for complex neural network systems, we mostly see it in legacy perceptrons. It is good at binary classification. Moreover, all logic functions can be implemented by using step function. Furthermore, taking action in human reflexes resembles step activation function in perceptrons.
Hopefully, your decisions would be certain like step function. As Former US President Roosevelt said, in any moment of decision, the best thing you can do is the right thing, the next best thing is the wrong thing, and the worst thing you can do is nothing.