Tips for Building AutoML with AutoKeras

Previously, we’ve already mentioned AutoKeras – an AutoML tool supported by Keras team. It just handles image data in a pretty way. Today, we will mention some tips and tricks.

autokeras-logo
The new logo of Auto-Keras

GPU Training

If you are going to build your model on GPU, you should monitor gpu usage when you run the autokeras model. Because autokeras comes with tensorflow requirement. As you know that you must install just tensorflow-gpu package. Herein, running installation command with no dependencies argument might be helpful. Besides, you should confirm that single tensorflow package is not installed on your environment. You can list the installed packages by running “pip freeze” command. If single tensorflow is already installed, you should uninstall it first.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

I run the following commands respectively to run autokeras on GPU.

!pip install tensorflow-gpu==1.12.0

!pip install keras==2.2.0

!pip install autokeras==0.3.7 –no-deps

In this way, I can run autokeras on GPU. Besides, if you have several GPUs, then autokeras can be run on multiple GPUs by default.

Also, you must not import tensorflow or keras in your current notebook. Otherwise, autokeras will run on cpu even if the gpu memory is allocated. Importing autokeras should print using tensorflow backend message.

Validation sets

We mostly separate the data set into training, test and cross validation sets. Cross validation set is important to confirm not to overfitted over test set. Herein, we should separate the data set into train and test set because autokeras already separates the train set into train and validation sets. You can see this in constant file under autokeras folder for 0.3.7 version. Validation set is set to 0.08333. In other words, autokeras store 8% of your training data as validation set.

AutoKeras to Keras

This is similar to java and javascript. Autokeras is not a keras distribution. Even though you can export autokeras model structure in keras format, it requires a training. To be honest, I try to fit autokeras exported keras model but it cannot get close to accuracy level of autokeras model.





Importing AutoKeras to Kaggle Kernel

You might build an automl model externally and adapt to your kaggle kernel. Because, kaggle kernels are died in 9 hours. This time might be very short for an automl study.

Firstly, you should export auto keras model in pickle format.

model.export_autokeras_model('autokeras_model.pkl')

If the competition lets you to turn internet on, it is easy to load autokeras models. You then turn internet on in the settings tab under toggle sidebar. Then, you can run the following command in your notebook.

!pip install autokeras==0.3.7

If kernel requirements enforce you to turn internet off, you can still load the autokeras model in your kernel. Download the code repository here first. Version number appears in the file name and folder inside of the zip. Firstly, remove this version number. Then, you can add the zipped code repository as a file in your kernel. Follow the add data and upload steps. Finally, you can import the autokeras and its dependencies appending it to system path.

import sys
package_dir = '../input/autokeras/autokeras'
sys.path.insert(0, package_dir)

import autokeras as ak
from autokeras.utils import pickle_from_file

In this way, I can run an automl model for days, and load its outcomes in minutes in my kaggle studies even if internet connection is disabled. You can install and import any other external code repositories in your kaggle kernel.

Here, you can find an example kaggle kernel adapting Autokeras. Internet connection must be disabled in this competition but I can still submit Autokeras predictions. I trained the Autokeras model externally. Pre-trained weights and the framework itself are attached into the kernel. I spent 12 hours to find the best model. Overall, 44 different models are tested. On the other hand, kernel has 9 hours up-time. Besides, I used several GPUs. It got promising results. Notice that the longer training time, the better it is.


Like this blog? Support me on Patreon

Buy me a coffee