Convolutional Autoencoder: Clustering Images with Neural Networks

Previously, we’ve applied conventional autoencoder to handwritten digit database (MNIST). That approach was pretty. We can apply same model to non-image problems such as fraud or anomaly detection. If the problem were pixel based one, you might remember that convolutional neural networks are more successful than conventional ones. However, we tested it for labeled supervised learning problems. The question is that can I adapt convolutional neural networks to unlabeled images for clustering? Absolutely yes! these customized form of CNN are convolutional autoencoder.

Remembering regular autoencoders

Remember autoencoder post. Network design is symettric about centroid and number of nodes reduce from left to centroid, they increase from centroid to right. Centroid layer would be compressed representation. We will apply same procedure for CNN, too. We will additionally consume convolution, activation and pooling layer for convolutional autoencoder.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

convolutional-autoencoder
Convolutional autoencoder

We can call left to centroid side as convolution whereas centroid to right side as deconvolution. Deconvolution side is also known as unsampling or transpose convolution. We’ve mentioned how pooling operation works. It is a basic reduction operation. How can we apply its reverse operation? That might be a little confusing. I’ve found a excellent animation for unsampling. Input matrix size of 2×2 (blue one) will be deconvolved to a matrix size of 4×4 (cyan one). To do this duty, we can add imaginary elements (e.g. 0 values) to the base matrix and it is transformed to 6×6 sized matrix.

unsampling
Unsampling

We will work on handwritten digit database again. We’ll design the structure of convolutional autoencoder as illustrated above.

model = Sequential()

#1st convolution layer
model.add(Conv2D(16, (3, 3) #16 is number of filters and (3, 3) is the size of the filter.
, padding='same', input_shape=(28,28,1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), padding='same'))

#2nd convolution layer
model.add(Conv2D(2,(3, 3), padding='same')) # apply 2 filters sized of (3x3)
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), padding='same'))

#here compressed version

#3rd convolution layer
model.add(Conv2D(2,(3, 3), padding='same')) # apply 2 filters sized of (3x3)
model.add(Activation('relu'))
model.add(UpSampling2D((2, 2)))

#4rd convolution layer
model.add(Conv2D(16,(3, 3), padding='same'))
model.add(Activation('relu'))
model.add(UpSampling2D((2, 2)))

model.add(Conv2D(1,(3, 3), padding='same'))
model.add(Activation('sigmoid'))

You can summarize the constructed network structure.


model.summary()

This command dumps the following output. Base input is size of 28×28 at the beginnig, 2 first two layers are responsible for reduction, following 2 layers are in charged of restoration. Final layer restores same size of input as seen.

_____________
Layer (type) Output Shape Param #
========
conv2d_1 (Conv2D) (None, 28, 28, 16) 160
_____________
activation_1 (Activation) (None, 28, 28, 16) 0
_____________
max_pooling2d_1 (MaxPooling2 (None, 14, 14, 16) 0
_____________
conv2d_2 (Conv2D) (None, 14, 14, 2) 290
_____________
activation_2 (Activation) (None, 14, 14, 2) 0
_____________
max_pooling2d_2 (MaxPooling2 (None, 7, 7, 2) 0
_____________
conv2d_3 (Conv2D) (None, 7, 7, 2) 38
_____________
activation_3 (Activation) (None, 7, 7, 2) 0
_____________
up_sampling2d_1 (UpSampling2 (None, 14, 14, 2) 0
_____________
conv2d_4 (Conv2D) (None, 14, 14, 16) 304
_____________
activation_4 (Activation) (None, 14, 14, 16) 0
_____________
up_sampling2d_2 (UpSampling2 (None, 28, 28, 16) 0
_____________
conv2d_5 (Conv2D) (None, 28, 28, 1) 145
_____________
activation_5 (Activation) (None, 28, 28, 1) 0
========

Here, we can start training.

model.compile(optimizer='adadelta', loss='binary_crossentropy')
model.fit(x_train, x_train, epochs=3, validation_data=(x_test, x_test))

Loss values for both training set and test set are satisfactory.

loss: 0.0968 – val_loss: 0.0926





Let’s visualize some restorations.

restored_imgs = model.predict(x_test)

for i in range(5):
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
plt.show()

plt.imshow(restored_imgs[i].reshape(28, 28))
plt.gray()
plt.show()

Testing

Restorations seems really satisfactory. Images on the left side are original images whereas images on the right side are restored from compressed representation.

convolutional-autoencoder-restoration
Some restorations of convolutional autoencoder

Notice that 5th layer named max_pooling2d_2 states the compressed representation and it is size of (None, 7, 7, 2). This work reveals that we can restore 28×28 pixel image from 7x7x2 sized matrix with a little loss. In other words, compressed representation takes a 8 times less space to original image.

Compressed Representations

You might wonder how to extract compressed representations.

compressed_layer = 5
get_3rd_layer_output = K.function([model.layers[0].input], [model.layers[compressed_layer].output])
compressed = get_3rd_layer_output([x_test])[0]

#flatten compressed representation to 1 dimensional array
compressed = compressed.reshape(10000,7*7*2)

Now, we can apply clustering to compressed representation. I would like to apply k-means clustering.

from tensorflow.contrib.factorization.python.ops import clustering_ops
import tensorflow as tf

def train_input_fn():
data = tf.constant(compressed, tf.float32)
return (data, None)

unsupervised_model = tf.contrib.learn.KMeansClustering(
10 #num of clusters
, distance_metric = clustering_ops.SQUARED_EUCLIDEAN_DISTANCE
, initial_clusters=tf.contrib.learn.KMeansClustering.RANDOM_INIT
)

unsupervised_model.fit(input_fn=train_input_fn, steps=1000)

Training is over. Now, we can check clusters for all test set.

clusters = unsupervised_model.predict(input_fn=train_input_fn)

index = 0
for i in clusters:
current_cluster = i['cluster_idx']
features = x_test[index]
index = index + 1

For example, 6th cluster consists of 46 items. Distribution for this cluster is like that: 22 items are 4, 14 items are 9, 7 items are 7, and 1 item is 5. It seems mostly 4 and 9 digits are put in this cluster.

So, we’ve integrated both convolutional neural networks and autoencoder ideas for information reduction from image based data. That would be pre-processing step for clustering. In this way, we can apply k-means clustering with 98 features instead of 784 features. This could fasten labeling process for unlabeled data. Of course, with autoencoding comes great speed. Source code of this post is already pushed into GitHub.


Like this blog? Support me on Patreon

Buy me a coffee


8 Comments

  1. Thank you for your useful post!
    Hello, I’m graduated student.
    I’m interested in Deep Learning.
    I want clustering with Deep Learning.
    Can I use csv file by input instead image files?
    If it’s ok to use csv file, then what should I do for preprocessing ?

    model.fit(x_train, x_train, epochs=3, validation_data=(x_test, x_test))
    I wonder your code has two x_train, this is because autoencoder is unsupervised learning?
    I hope your comment.
    Thank you.

    1. You can feed raw data to auto-encoders as well (this is called dimension reduction) but convolutional autoencoders are not designed for feeding raw data. Please read the following blog post. Even though in that post I fed image as input, you can also feed raw dataset.

      Model expects x_train 2 times as input and output because we try to restore the input almost lossless. If it can be restored with little loss, then middle part of the network model stores significant but smaller data. That is the idea behind autoencoders.

      Autoencoder: Neural Networks For Unsupervised Learning: https://sefiks.com/2018/03/21/autoencoder-neural-networks-for-unsupervised-learning/

  2. Hello! Thank you for your useful post! You tried to transform the encoded representation in to to 2d space (using methods like pca or t-sna) to see if any clusters were created? I would very interesting to see that results!

  3. Hi Sefik,
    I see in your post that you are applying K-Means once the model has been trained. I am interested in doing something similar. I want to apply K-Means to the outputs of each layer when testing the model. For example, consider a model with following config:
    Conv1–>Pool1–>Conv2–>Pool2–>Flatten–>Dense1–>Output
    I would like this to be:
    Conv1–>Pool1–>Clustering1–>Conv2–>Pool2–>Clustering2–>Flatten–>Dense1–>Clustering3–>Output
    I am unsure how to approach this. Do you have any suggestions?

    1. It is an interesting approach but I cannot imagine what you expect by applying this.

      1. You can think of this as quantization. I am interested in seeing how the accuracy changes if the output of these layers are represented by the cluster centers only. Furthermore, how many clusters would yield same or similar accuracy to without clustering. Therefore, I would ask again if you’re aware of a method to do this using tensorflow/keras. Please let me know.

  4. plz sir can you share code which used autoencoder and rbm for x-ray image classification of covid-19 patients . These should be separted models. Or any code which used autoencoder and rbm for image classification .any link ,any video which help for this

Comments are closed.