The post A Gentle Introduction to Auto-Keras appeared first on Sefik Ilkin Serengil.

]]>AutoML idea is basically based on brute force. This makes sense for people out of data science world and data scientists. Today, Google, H2O, Auto-WEKA, Auto-Sklearn – they all exist in the market with pre-release versions. Herein, Auto-Keras is an unpaid alternative to these self-service AI solutions. It is supported by Keras community as well.

It is very easy to use. All you need is just to feed your data set and execution time.

I feed FER 2013 data set. This data set stores facial expression labels for face photos. The data set consists of 28709 instances for training set and 3589 instances for public test set. Each instance consists of 48×48 sized image. In other words, there are 2304 features. Same procedure with this repository is applied to load data.

num_classes = 7 #angry, disgust, fear, happy, sad, surprise, neutral with open("fer2013.csv") as f: content = f.readlines() lines = np.array(content) num_of_instances = lines.size x_train, y_train, x_test, y_test = [], [], [], [] for i in range(1,num_of_instances): try: emotion, img, usage = lines[i].split(",") val = img.split(" ") pixels = np.array(val, 'float32') #auto-keras expects labels as numeric instead of one hot encoded #emotion = keras.utils.to_categorical(emotion, num_classes) if 'Training' in usage: y_train.append(emotion) x_train.append(pixels) elif 'PublicTest' in usage: y_test.append(emotion) x_test.append(pixels) except Exception as ex: print(ex) #------------------------ x_train = np.array(x_train, 'float32') y_train = np.array(y_train, 'float32') x_test = np.array(x_test, 'float32') y_test = np.array(y_test, 'float32') x_train /= 255 #normalize inputs between [0, 1] x_test /= 255 x_train = x_train.reshape(x_train.shape[0], 48, 48, 1) x_train = x_train.astype('float32') x_test = x_test.reshape(x_test.shape[0], 48, 48, 1) x_test = x_test.astype('float32') print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples')

Here, I set time limit to 24 hours. This is the default configuration.

import autokeras as ak model = ak.ImageClassifier(path="/automodels/", verbose=True) model.fit(x_train, y_train, time_limit=60*60*24) model.final_fit(x_train, y_train, x_test, y_test, retrain=True)

My custom design was able to get 57% accuracy. Auto-keras can get 66.23% accuracy. Increasing time limit might contribute to improve the accuracy but as-is comes with almost **10 percent improvement** and this is very satisfactory.

score = model.evaluate(x_test, y_test) print("accuracy on test set is ",100*score)

This is a classification problem and pure accuracy couldn’t give an idea to evaluate the system.

autokeras_predictions = autokeras_model.predict(x_test) from sklearn.metrics import classification_report, confusion_matrix confusion_matrix(y_test, autokeras_predictions)

Auto-Keras generates the following confusion matrix. X-axis is predictions whereas y-axis is actual values.

angry | disgust | fear | happy | sad | surprise | neutral | Recall | |

angry | 233 |
9 | 49 | 23 | 90 | 11 | 52 | 50% |

disgust | 12 | 31 |
1 | 2 | 5 | 0 | 5 | 55% |

fear | 30 | 3 | 189 |
21 | 130 | 41 | 82 | 38% |

happy | 7 | 1 | 11 | 778 |
22 | 17 | 59 | 87% |

sad | 35 | 4 | 51 | 24 | 391 |
9 | 139 | 60% |

surprise | 8 | 1 | 20 | 18 | 13 | 333 |
22 | 80% |

neutral | 13 | 1 | 29 | 49 | 90 | 3 | 422 |
70% |

Precision | 69% |
62% |
54% |
85% |
53% |
80% |
54% |
acc=66% |

24-hours-run can complete 41 different models.

Preprocessing the images. Preprocessing finished. Initializing search. Initialization finished. +----------------------------------------------+ | Training model 0 | +----------------------------------------------+ No loss decrease after 5 epochs. Saving model. +--------------------------------------------------------------------------+ | Model ID | Loss | Metric Value | +--------------------------------------------------------------------------+ | 0 | 5.586169672012329 | 0.48439999999999994 | +--------------------------------------------------------------------------+ +----------------------------------------------+ | Training model 1 | +----------------------------------------------+ No loss decrease after 5 epochs. Saving model. +--------------------------------------------------------------------------+ | Model ID | Loss | Metric Value | +--------------------------------------------------------------------------+ | 1 | 3.8872979521751403 | 0.6352 | +--------------------------------------------------------------------------+ ... +----------------------------------------------+ | Training model 40 | +----------------------------------------------+ No loss decrease after 5 epochs. Saving model. +--------------------------------------------------------------------------+ | Model ID | Loss | Metric Value | +--------------------------------------------------------------------------+ | 40 | 3.762257432937622 | 0.6532 | +--------------------------------------------------------------------------+ +----------------------------------------------+ | Training model 41 | +----------------------------------------------+ Epoch-10, Current Metric - 0.668: 59%|█████████████▌ | 130/221 [01:24&amp;amp;amp;lt;01:02, 1.45 batch/s]Time is out.

Auto-keras stores model related files into the automodels folder. I specified this folder while initializing model. best_model.txt contains the text “best model: 29”.

Making predictions are very similar to regular Keras.

predictions = model.predict(x_test) instances = 0 correct_ones = 0 for i in range(0, len(predictions)): print("Prediction: ",predictions[i],", Actual: ",y_test[i]) if predictions[i] == y_test[i]: correct_ones = correct_ones + 1 instances = instances + 1

Restoration is naturally enabled in Auto-Keras. We can re-use the best model by exporting auto-keras model and pickle from file functions respectively.

model.export_autokeras_model('best_auto_keras_model.h5')

When I load this auto-keras model in a different notebook, I can get same accuracy score for same data set. This means that exporting auto-keras model stores both network structure and final weights. Notice that restored model is different than Keras!

from autokeras.utils import pickle_from_file model = pickle_from_file("best_auto_keras_model.h5") score = model.evaluate(x_test, y_test) print(score) predictions = model.predict(x_test)

We can export regular Keras models in Auto-Keras, too.

model.export_keras_model('best_keras_model.h5')

Now, I need to call Keras functions to load models.

from keras.models import load_model keras_model = load_model('best_keras_model.h5')

Remember that summarizing is enabled in Keras.

It seems Auto-Keras builds a model consisting of 501 layers and 7M parameters. This is really amazing! Here can be found the exported Keras model structure.

keras_model.summary() layers = keras_model.layers index = 1 for layer in layers: print(index,"- ",layer.name) index = index + 1

It seems that auto-keras just exports network structure and excludes weights. Because predictions must be same in previous step if weights are same. This means that regular keras model must be re-trained.

keras_predictions = keras_model.predict(x_test) correctly_classified = 0 for i in range(0, keras_predictions.shape[0]): print("actual: ",y_test[i]," - prediction: ",np.argmax(keras_predictions[i])) if y_test[i] == np.argmax(keras_predictions[i]): correctly_classified = correctly_classified + 1

This disappoints me because exporting model except weights doesn’t come in handy. Maybe this is because Auto-Keras is now pre-release version. I report this to developers. I’ll update this post if this issue is resolved.

Automated machine learning is a promising field in machine learning world. However, all alternatives exist in the market with pre-release versions. Besides, the most powerful one – Google AutoML costs 20 dollars per hours. This wouldn’t be adopted for personal usage. On the other hand, you should have a powerful computational power (GPU) hand it to open source alternative such as Auto-Keras. All things considered, it seems that we need to come a long way to have artificial general intelligence.

I pushed to jupyter notebook of this post to GitHub. I will also shared the model including both network structure and pre-trained weights for auto-keras on Google Drive because of file size limit soon.

The post A Gentle Introduction to Auto-Keras appeared first on Sefik Ilkin Serengil.

]]>The post Tips and Tricks for GPU and Multiprocessing in TensorFlow appeared first on Sefik Ilkin Serengil.

]]>If you have multiple GPUs but you need to work on a single GPU, you can mention the specifying GPU number. This starts from 0 to number of GPU count by default. This GPU is reserved to you and all memory of the device is allocated.

import os os.environ["CUDA_VISIBLE_DEVICES"]="0" #specific index

Alternatively, you can specify the GPU name before creating a model. Suppose that you are going to use pre-trained VGG model. You can store 3 different models on 2 GPUs as demonstrated below.

with tf.device('/gpu:0'): content_model = vgg19.VGG19(input_tensor=base_image , weights='imagenet', include_top=False) style_model = vgg19.VGG19(input_tensor=style_reference_image , weights='imagenet', include_top=False) with tf.device('/gpu:1'): generated_model = vgg19.VGG19(input_tensor=combination_image , weights='vgg_weights.h5', include_top=False)

If working on CPU cores is ok for your case, you might think not to consume GPU memory. In this case, specifying the number of cores for both cpu and gpu is expected.

config = tf.ConfigProto( device_count = {'GPU': 0 , 'CPU': 5} ) sess = tf.Session(config=config) keras.backend.set_session(sess)

Memory demand enforces you even if you are working on a small sized data. **TensorFlow tends to allocate all memory of all GPUs**. Consider allocating 16GB memory of 4 different GPUs for a small processing task e.g. building XOR classifier. This is really unfair. It blocks other processes of yourself or different tasks of the others. Everybody including you has to wait until this process ends. You can skip this problem by modifying the session config. Allowing growth handles not to allocate all the memory.

config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config) keras.backend.set_session(session)

In some cases, you might need to build several machine learning models **simultaneously**. Building models serially causes significant time loss. Here, Python provides a strong multiprocesing library.

If you run multiprocessing by default configuration, then the first thread allocates all memory and out of memory exception is throwed by the second thread. Using multiprocessing, GPU and allowing GPU memory growth is untouched topic.

from device: CUDA_ERROR_OUT_OF_MEMORY

E tensorflow/core/common_runtime/direct_session.cc:154] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

If you set allowing growth once, you will still face with the following error.

E tensorflow/core/grappler/clusters/utils.cc:81] Failed to get device properties, error code: 3

These problems are solved when you move the session configuration to the dedicated function in training task. But I have a confusing problem this time. Sessions hang even though command seems running. In other words, GPU consumption seems 0% whereas memory allocation still exists. I can solve this problem when I pass the start method of the multiprocessing as spawn in my linux machine. Default configuration is fork in linux whereas spawn in windows.

multiprocessing.set_start_method('spawn', force=True)

It is interesting but the machine confused when it comes to prediction. This time you **must not** specify the start method. I guess this is because of that each training lasts more than 30 seconds but prediction lasts a second. In my opinion, the machine confuses short task in spawn mode where it confuses longer tasks in fork mode.

We feed 10 items into the pool, and multiprocessing library processes these 10 items simultaneously. Then, the library will process the remaining 10 items when first pool thread completed. In other words, we can speed processing power up 10 times faster.

import pandas as pd import multiprocessing from multiprocessing import Pool def train(index, df): import tensorflow as tf import keras from keras.models import Sequential #------------------------------ #this block enables GPU enabled multiprocessing core_config = tf.ConfigProto() core_config.gpu_options.allow_growth = True session = tf.Session(config=core_config) keras.backend.set_session(session) #------------------------------ #prepare input and output values df = df.drop(columns=['index']) data = df.drop(columns=['target']).values target = df['target'] #------------------------------ model = Sequential() model.add(Dense(5 #num of hidden units , input_shape=(data.shape[1],))) #num of features in input layer model.add(Activation('sigmoid')) model.add(Dense(1))#number of nodes in output layer model.add(Activation('sigmoid')) model.compile(loss='mse', optimizer=keras.optimizers.Adam()) #------------------------------ model.fit(data, target, epochs = 5000, verbose = 1) model.save("model_for_%s.hdf5" % index) #------------------------------ #finally, close sessions session.close() keras.backend.clear_session() return 0 #----------------------------- #main program multiprocessing.set_start_method('spawn', force=True) df = pd.read_csv("dataset.csv") my_tuple = [(i, df[df['index'] == i]) for i in range(0, 10)] with Pool(10) as pool: pool.starmap(train, my_tuple)

I pushed the code for GPU enabled multiprocessing task to GitHub.

This case might be seen as the simplest but it is not. Having multiple GPUs won’t make you a couple times faster or stronger. Gradients and computations must be stored same GPU. Herein, nvlink would fasten you but still it does not make you number of GPUs times faster. Besides, horovod can be a choice here. It is developed by Uber engineers. It combines separate GPUs virtually.

So, I’ve shared some tips and tricks for GPU and multiprocessing in TensorFlow and Keras I experienced in time. I can reduce the time for prediction task from 3.3 hours to 4 minute for a case. Fasten almost 50 times. Similarly, time for training task reduced from 25 hours to 1 hour. It is amazing, right? You can just run the **watch nvidia-smi** command, monitor and have fun!

The post Tips and Tricks for GPU and Multiprocessing in TensorFlow appeared first on Sefik Ilkin Serengil.

]]>The post Machine Learning meets Blockchain appeared first on Sefik Ilkin Serengil.

]]>The magazine cover of the economist mentioned that data is the new oil for this century. Net worth of the data-first tech giants is more than the gross national incomes of many countries. For instance, the net worth of the companies existing in the cover is 2.5 trillion dollars whereas gross national income of Turkey was 2.2 trillion dollars in 2017.

Social media users feed their own data to these services just like volunteers. Consider the public meeting of Obama in Berlin. Almost every attendee takes photo and creates data in this scene.

These giant companies are all fed by the data. But just having data doesn’t make these companies giant. They have also the power to process and understand data. Herein, deep learning offers no theoretical limitations of what it can learn; the more data you give and the more computational time you provide, the better it is. These are the words of Geoffrey Hinton. Still, they are not as intelligent as human beings. We can fool neural networks.

Associated press tweeted a breaking news and announced that there are two explosions in the White House and Obama is injured. Soon, it is understood that Syrian hackers access associated press account. However, some bots immediately decide to sell market shares. This causes to damage 139 billion dollars.

Even though neural networks and deep learning are the most powerful machine learning algorithm, they are not explainable. For instance, Google’s Inception V3 model is the winner of the imagenet contest of kaggle. Models compete with to classify millions of pictures in hundres of classes. Here, Inception V3 model can classify the following picture as panda successfully but adding some noise (non-random) to the original image manipulates it and the model classify the new one as gibbon. Even though people eye cannot differentiate them both, they are as different as the noise. So, if you feed the noise added picture in the learning step, you cause neural networks to mislearn.

This attacks become vital in some cases. We will see driverless cars soon everywhere. Google, Uber and Tesla invest in this technology. Computer vision is important member of these systems. The following picture shows a physical adversarial attack. This is classified as speed limit: 45 mph sign! Consider a driverless car goes 45 mph when it should stop.

People enjoy to troll and manipulate systems. We will always be defenceless?

Herein, solution is simple. Attacking a public key system is much harder than cracking a password. For instance, breaking a ECC private key requires 10^{18} years whereas age of the universe in 10^{9} years.

If Associated Press tend to sign its tweets, then readers realize that account is accessed by some bad guys. So, cryptography is enough to be defensive? Still some malicious ones feed manipulated data and sign it with its private keys.

Herein, think blockchain as a public database. One feeds data to blockchain and users verify the correctness of the data before adding to chain. Feeder signs the data with its private key. That’s why, we know that the data is fed by that user. If you realize a manipulated data, then feeder will be banned from the system. In this case, noise added panda picture won’t be added to public data set and you won’t neural networks to mislearn.

Blockchain and machine learning are like yin yang. Even though they are based on opposite ideas, they complement each other like puzzle parts.

Herein, PayPal mafia members described the relation of blockchain and machine learning very interesting. Peter Thiel described that **Crypto is libertarian, AI is communist**. Reid Hoffman explained that **Crypto is anarchy, AI is the rule of laws**.

Firstly, blockchain is deterministic whereas machine learning is probabilistic. Probabilistic models can fail. Besides, it should fail to avoid overfitting.

Secondly, blockchain offers decentralized architecture whereas machine learning has centralized. You build a machine learning model and it is your own property.

Thirdly, blockchain is transparent whereas machine learning models, in particular deep learning is a completely black box. You can see all previous transactions backwardly in blockchain. In contrast, you do not know how deep learning models work completely.

Finally, blockchain transactions are permanent whereas machine learning models are changing. You cannot remove or change a verified transaction on a blockchain because previous chain hashes moved to following chains. You must change all chains but this is not possible. On the other hand, you might need to remodel or retrain a machine learning in time even though it works well on first deployment.

Meeting blockchain and machine learning creates secure and also more data. Because, people encourage to feed data to public data sets. Having secure and more data causes better machine learning models. Having better models offers better results and actions.

In this post, we will described an utopia which can make tomorrow better. We can just trust blockchain enabled AI solutions for intelligent homes or driverless cars in the near future. Moreover, this won’t be a POC study just aiming to use blockchain. It solves a real world problem for machine learning practitioners.

I’ve captured this blog post as a webinar.

The post Machine Learning meets Blockchain appeared first on Sefik Ilkin Serengil.

]]>The post Machine Learning Wars: Deep Learning vs GBM appeared first on Sefik Ilkin Serengil.

]]>It is a fact that deep learning offers superpowers. Face recognition, mood analysis, making art are not hard tasks anymore. However deep neural networks **hit the wall** when decisioning matters. Because they are totally black boxes. They cannot answer why and how questions. Why did your network have this number of hidden layers and nodes? Why did you set learning rate and learning time to these values? More importantly, how this works? This is because neural networks are combinations of matrix multiplications, non-linear functions (e.g. sigmoid, relu), derivatives and normalizations. Notice that basic information gathering requires to answer 5W1H. Otherwise, it won’t be considered complete.

Being unexplainable makes hard to find solutions for troubles. Remember that Microsoft shuts down its chatbot Tay instead of publishing a patch when it becomes racist.

Herein, some industries such as banking and finance have heavy regulations. Interpretablity is a must here. You cannot deploy unexplainable model to production even though it works.

On the other hand, decisions made by decision tree algorithms can be read clearly. Because they can be transformed to if statements. This offers to modify rules based on your custom requirements. Moreover, rules are not complex. A decision tree algorithm finds the most dominant feature and checks it in every if block. This means that if the features are luggage capacity and number of doors are, then you can check either luggace capacity of number of doors. There **cannot be** a statement like that if luggage capacity is big, otherwise number of doors is 2. Else condition must check the same feature in the if statement.

The most dominant feature can be found by information gain in ID3, gain ratio in C4.5, gini in CART or standard deviation in Regression Trees. Then, the decision tree algorithm will be run for its sub data set recursively. This is called divide and conquer.

Decision trees are transparent algorithms but interpretability and accuracy are inversely proportional. We cannot ignore accuracy against interpretability.

GBM pushes decision trees to close accuracy level of neural networks. Its approach is that a single decision tree is not strong enough. Applying a decision tree algorithm based on the error of the previous round is getting closer or going beyond to neural networks accuracy. Besides, it is still explainable.

Consider the two-spiral data set. It is really hard to classify these kind of a data set. This data set exists in tensorflow playground. Enabling all input candidates and increasing the number of nodes in the first hidden layer to 8 over-performs.

A similar interface in GBM named gradient boosting interactive playground exists. Building recursively 30 decision trees over-performs, too.

Deep learning might be **Superman** and move a really heavy rock. Decision trees do not have super powers but several decision tree come together and become GBM. Herein, GBM can move same heavy rock, too. So, GBM would be **Flash** if deep learning were **Superman**!

Convinced you about that GBM is as strong as Deep Learning. Could it be stronger?

**Kaggle** is the platform for the data enthusiastics. It is not a must but winning model owners can explain model details. We have the data for the 29 winning solutions in 2015. 17 solutions are related to GBM whereas 11 solutions are using neural networks. This means that GBM brings **more than half** of the winning solutions. GBM dominates **KDDCup** challange, too. Every winning team used GBM in the top 10.

Herein, comparing GBM and deep learning might not be fair. Firstly, 9 solutions are using both neural networks and GBM as ensemble models. Moreover, deep learning winning solutions mostly related to image based unstructured data whereas GBM winning solutions mostly related to structured data. Ensemble of GBM and neural networks pair is very successful because these models appear in the podium ceremony.

Kaggle also published a survey for data science and machine learning. Half of the attendees stated that they are familiar with decision trees in their daily work whereas only 37.6% declared that they use neural networks. We’ve mentioned about heavy regulations in banking and finance industry. In this side, decision trees are in the toolbox of 60% of finance employees whereas 30% already put neural networks in the toolbox.

In this post, I try to convince you about that neither GBM nor deep learning is superior than the other. GBM is a very powerful machine learning algorithm that machine learning practitioners should put it in their toolbox. It should not be ignored any practitioner.

Even though the race result of the Superman and Flash is not mentioned in Justice League, this appears in the comic book. They arrived the finish tie simultaneously. There is no winner. Maybe the challenge between deep learning and GBM has not winners, too. The winner might be ensemble models.

I created the content of this blog post for Bilisim IO Tech Talks event. It was performed in Turkish. I captured same slides in **Webinar** format **in English** and published on my Youtube channel. You can find the Machine Learning Wars Webinar here.

The post Machine Learning Wars: Deep Learning vs GBM appeared first on Sefik Ilkin Serengil.

]]>The post Apparent Age and Gender Prediction in Keras appeared first on Sefik Ilkin Serengil.

]]>The original work consumed face pictures collected from IMDB (7 GB) and Wikipedia (1 GB). You can find these data sets here. In this post, I will just consume wiki data source to develop solution fast. You should **download faces only** files.

Extracting wiki_crop.tar creates 100 folders and an index file (wiki.mat). The index file is saved as Matlab format. We can read Matlab files in python with SciPy.

import scipy.io mat = scipy.io.loadmat('wiki_crop/wiki.mat')

Converting pandas dataframe will make transformations easier.

instances = mat['wiki'][0][0][0].shape[1] columns = ["dob", "photo_taken", "full_path", "gender", "name", "face_location", "face_score", "second_face_score"] import pandas as pd df = pd.DataFrame(index = range(0,instances), columns = columns) for i in mat: if i == "wiki": current_array = mat[i][0][0] for j in range(len(current_array)): df[columns[j]] = pd.DataFrame(current_array[j][0])

Data set contains date of birth (dob) in Matlab datenum format. We need to convert this to Python datatime format. We just need the birth year.

from datetime import datetime, timedelta def datenum_to_datetime(datenum): days = datenum % 1 hours = days % 1 * 24 minutes = hours % 1 * 60 seconds = minutes % 1 * 60 exact_date = datetime.fromordinal(int(datenum)) \ + timedelta(days=int(days)) + timedelta(hours=int(hours)) \ + timedelta(minutes=int(minutes)) + timedelta(seconds=round(seconds)) \ - timedelta(days=366) return exact_date.year df['date_of_birth'] = df['dob'].apply(datenum_to_datetime)

Extracting date of birth from matlab datenum format

Now, we have both date of birth and photo taken time. Subtracting these values will give us the ages.

df['age'] = df['photo_taken'] - df['date_of_birth']

Some pictures don’t include people in the wiki data set. For example, a vase picture exists in the data set. Moreover, some pictures might include two person. Furthermore, some are taken distant. Face score value can help us to understand the picture is clear or not. Also, age information is missing for some records. They all might confuse the model. We should ignore them. Finally, unnecessary columns should be dropped to occupy less memory.

#remove pictures does not include face df = df[df['face_score'] != -np.inf] #some pictures include more than one face, remove them df = df[df['second_face_score'].isna()] #check threshold df = df[df['face_score'] >= 3] #some records do not have a gender information df = df[~df['gender'].isna()] df = df.drop(columns = ['name','face_score','second_face_score','date_of_birth','face_location'])

Some pictures are taken for unborn people. Age value seems to be negative for some records. Dirty data might cause this. Moreover, some seems to be alive for more than 100. We should restrict the age prediction problem for 0 to 100 years.

#some guys seem to be greater than 100. some of these are paintings. remove these old guys df = df[df['age'] <= 100] #some guys seem to be unborn in the data set df = df[df['age'] > 0]

The raw data set will be look like the following data frame.

We can visualize the target label distribution.

histogram_age = df['age'].hist(bins=df['age'].nunique()) histogram_gender = df['gender'].hist(bins=df['gender'].nunique())

Full path column states the exact location of the picture on the disk. We need its pixel values.

target_size = (224, 224) def getImagePixels(image_path): img = image.load_img("wiki_crop/%s" % image_path[0], grayscale=False, target_size=target_size) x = image.img_to_array(img).reshape(1, -1)[0] #x = preprocess_input(x) return x df['pixels'] = df['full_path'].apply(getImagePixels)

We can extract the real pixel values of pictures

Age prediction is a regression problem. But researchers define it as a classification problem. There are 101 classes in the output layer for ages 0 to 100. they applied transfer learning for this duty. Their choice was VGG for imagenet.

Pandas data frame includes both input and output information for age and gender prediction tasks. Wee should just focus on the age task.

classes = 101 #0 to 100 target = df['age'].values target_classes = keras.utils.to_categorical(target, classes) features = [] for i in range(0, df.shape[0]): features.append(df['pixels'].values[i]) features = np.array(features) features = features.reshape(features.shape[0], 224, 224, 3)

Also, we need to split data set as training and testing set.

from sklearn.model_selection import train_test_split train_x, test_x, train_y, test_y = train_test_split(features, target_classes, test_size=0.30)

The final data set consists of 22578 instances. It is splitted into 15905 train instances and 6673 test instances .

As mentioned, researcher used VGG imagenet model. Still, they tuned weights for this data set. Herein, I prefer to use **VGG-Face** model. Because, this model is tuned for face recognition task. In this way, we might have outcomes for patterns in the human face.

#VGG-Face model model = Sequential() model.add(ZeroPadding2D((1,1),input_shape=(224,224, 3))) model.add(Convolution2D(64, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(128, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(128, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(256, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(256, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(256, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(Convolution2D(4096, (7, 7), activation='relu')) model.add(Dropout(0.5)) model.add(Convolution2D(4096, (1, 1), activation='relu')) model.add(Dropout(0.5)) model.add(Convolution2D(2622, (1, 1))) model.add(Flatten()) model.add(Activation('softmax'))

Load the pre-trained weights for VGG-Face model. You can find the related blog post here.

#pre-trained weights of vgg-face model. #you can find it here: https://drive.google.com/file/d/1CPSeum3HpopfomUEK1gybeuIVoeJT_Eo/view?usp=sharing #related blog post: https://sefiks.com/2018/08/06/deep-face-recognition-with-keras/ model.load_weights('vgg_face_weights.h5')

We should lock the layer weights for early layers because they could already detect some patterns. Fitting the network from scratch might cause to lose this important information. I prefer to freeze all layers except last 3 convolution layers (make exception for last 7 model.add units). Also, I cut the last convolution layer because it has 2622 units. I need just 101 (ages from 0 to 100) units for age prediction task. Then, add a custom convolution layer consisting of 101 units.

for layer in model.layers[:-7]: layer.trainable = False base_model_output = Sequential() base_model_output = Convolution2D(101, (1, 1), name='predictions')(model.layers[-4].output) base_model_output = Flatten()(base_model_output) base_model_output = Activation('softmax')(base_model_output) age_model = Model(inputs=model.input, outputs=base_model_output)

This is a multi-class classification problem. Loss function must be categorical crossentropy. Optimization algorithm will be Adam to converge loss faster. I create a checkpoint to monitor model over iterations and avoid overfitting. The iteration which has the minimum validation loss value will include the optimum weights. That’s why, I’ll monitor validation loss and save the best one only.

To avoid overfitting, I feed random 256 instances for each epoch.

age_model.compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(), metrics=['accuracy']) checkpointer = ModelCheckpoint(filepath='age_model.hdf5' , monitor = "val_loss", verbose=1, save_best_only=True, mode = 'auto') scores = [] epochs = 250; batch_size = 256 for i in range(epochs): print("epoch ",i) ix_train = np.random.choice(train_x.shape[0], size=batch_size) score = age_model.fit(train_x[ix_train], train_y[ix_train] , epochs=1, validation_data=(test_x, test_y), callbacks=[checkpointer]) scores.append(score)

It seems that validation loss reach the minimum. Increasing epochs will cause to overfitting.

We can evaluate the final model on the test set.

age_model.evaluate(test_x, test_y, verbose=1)

This gives both validation loss and accuracy respectively for 6673 test instances. It seems that we have the following results.

[2.871919590848929, 0.24298789490543357]

24% accuracy seems very low, right? Actually, it is not. Herein, researchers develop an age prediction approach and convert classification task to regression. They propose that you should multiply each softmax out with its label. Summing this multiplications will be the apparent age prediction.

This is a very easy operation in Python numpy.

predictions = age_model.predict(test_x) output_indexes = np.array([i for i in range(0, 101)]) apparent_predictions = np.sum(predictions * output_indexes, axis = 1)

Herein, mean absolute error metric might be more meaningful to evaluate the system.

mae = 0 for i in range(0 ,apparent_predictions.shape[0]): prediction = int(apparent_predictions[i]) actual = np.argmax(test_y[i]) abs_error = abs(prediction - actual) actual_mean = actual_mean + actual mae = mae + abs_error mae = mae / apparent_predictions.shape[0] print("mae: ",mae) print("instances: ",apparent_predictions.shape[0])

Our apparent age prediction model averagely predict ages ± 4.65 error. This is acceptable.

We can feel the power of the model when we feed custom images into it.

from keras.preprocessing import image from keras.preprocessing.image import ImageDataGenerator def loadImage(filepath): test_img = image.load_img(filepath, target_size=(224, 224)) test_img = image.img_to_array(test_img) test_img = np.expand_dims(test_img, axis = 0) test_img /= 255 return test_img picture = "marlon-brando.jpg" prediction = age_model.predict(loadImage(picture))

Prediction variable stores distribution for each age class. Monitoring it might be intersting.

y_pos = np.arange(101) plt.bar(y_pos, prediction[0], align='center', alpha=0.3) plt.ylabel('percentage') plt.title('age') plt.show()

This is the age prediction distribution of Marlon Brando in Godfather. The most dominant age class is 44 whereas weighted age is 48 which is the exact age of him in 1972.

We’ll calculate apparent age from these age distributions

img = image.load_img(picture) plt.imshow(img) plt.show() print("most dominant age class (not apparent age): ",np.argmax(prediction)) apparent_age = np.round(np.sum(prediction * output_indexes, axis = 1)) print("apparent age: ", int(apparent_age[0]))

Results are very satisfactory even though it does not have a good perspective. Marlon Brando was 48 and Al Pacino was 32 in Godfather Part I.

Apparent age prediction was a challenging problem. However, gender prediction is much more predictable.

We’ll apply binary encoding to target gender class.

target = df['gender'].values target_classes = keras.utils.to_categorical(target, 2)

We then just need to put 2 classes in the output layer for man and woman.

for layer in model.layers[:-7]: layer.trainable = False base_model_output = Sequential() base_model_output = Convolution2D(2, (1, 1), name='predictions')(model.layers[-4].output) base_model_output = Flatten()(base_model_output) base_model_output = Activation('softmax')(base_model_output) gender_model = Model(inputs=model.input, outputs=base_model_output)

Now, the model is ready to fit.

scores = [] epochs = 250; batch_size = 256 for i in range(epochs): print("epoch ",i) ix_train = np.random.choice(train_x.shape[0], size=batch_size) score = gender_model.fit(train_x[ix_train], train_y[ix_train] , epochs=1, validation_data=(test_x, test_y), callbacks=[checkpointer]) scores.append(score)

It seems that the model is saturated. Terminating training will be clever.

gender_model.evaluate(test_x, test_y, verbose=1)

The model has the following validation loss and accuracy. It is really satisfactory.

[0.07324957040103375, 0.9744245524655362]

This is a real classification problem instead of age prediction. The accuracy should not be the only metric we need to monitor. Precision and recall should also be checked.

from sklearn.metrics import classification_report, confusion_matrix predictions = gender_model.predict(test_x) pred_list = []; actual_list = [] for i in predictions: pred_list.append(np.argmax(i)) for i in test_y: actual_list.append(np.argmax(i)) confusion_matrix(actual_list, pred_list)

The model generates the following confusion matrix.

Prediction |
|||

Female |
Male |
||

Actual |
Female |
1873 | 98 |

Male |
72 | 4604 |

This means that we have 96.29% precision, 95.05% recall. These metrics are as satisfactory as the accuracy.

We just need to feed images to the model.

picture = "katy-perry.jpg" prediction = gender_model.predict(loadImage(picture)) img = image.load_img(picture)#, target_size=(224, 224)) plt.imshow(img) plt.show() gender = "Male" if np.argmax(prediction) == 1 else "Female" print("gender: ", gender)

We can apply age and gender predictions in real time as well.

So, we’ve built an apparent age and gender predictors from scratch based on the research article of computer vision group of ETH Zurich. In particular, the way they proposed to calculate apparent age is an over-performing novel method. Deep learning really has a limitless power for learning.

I pushed the source code for both apparent age prediction and gender prediction to GitHub. Similarly, real time age and gender prediction implementation is pushed here. You might want to just use pre-trained weights. I put pre-trained weights for age and gender tasks to Google Drive.

The post Apparent Age and Gender Prediction in Keras appeared first on Sefik Ilkin Serengil.

]]>The post Twisted Edwards Curves for Digital Signatures appeared first on Sefik Ilkin Serengil.

]]>Twisted Edward curves look like a bird’s-eye roundabout intersection of a road.

Regular Edwards curves are special form of twisted Edwards curves where a = 1. We can prove the addition formula of twisted ones similarly. Besides, proof of twisted Edwards curves will also prove the regular Edwards forms Bernstein and Tanja simplified.

Suppose that (x_{1}, y_{1}) and (x_{2}, y_{2}) are points on the curve ax^{2} + y^{2} = 1 + dx^{2}y^{2}. In this case, (x_{3}, y_{3}) derived from the following formula will be on the same curve.

x_{3} = (x_{1}y_{2} + y_{1}x_{2})/(1 + dx_{1}x_{2}y_{1}y_{2})

y_{3} = (y_{1}y_{2} – ax_{1}x_{2})/(1 – dx_{1}x_{2}y_{1}y_{2})

We can validate the addition formula by putting (x_{3}, y_{3}) values to the twisted edwards curve equation.

ax_{3}^{2} + y_{3}^{2} = 1 + dx_{3}^{2}y_{3}^{2}

a(x_{1}y_{2} + y_{1}x_{2})^{2}/(1 + dx_{1}x_{2}y_{1}y_{2})^{2} + (y_{1}y_{2} – ax_{1}x_{2})^{2}/(1 – dx_{1}x_{2}y_{1}y_{2})^{2} = 1 + d(x_{1}y_{2} + y_{1}x_{2})^{2}(y_{1}y_{2} – ax_{1}x_{2})^{2}/(1 + dx_{1}x_{2}y_{1}y_{2})^{2}(1 – dx_{1}x_{2}y_{1}y_{2})^{2}

Make denominators same

a(x_{1}y_{2} + y_{1}x_{2})^{2}(1 – dx_{1}x_{2}y_{1}y_{2})^{2}/(1 + dx_{1}x_{2}y_{1}y_{2})^{2}(1 – dx_{1}x_{2}y_{1}y_{2})^{2} + (y_{1}y_{2} – ax_{1}x_{2})^{2}(1 + dx_{1}x_{2}y_{1}y_{2})^{2}/(1 – dx_{1}x_{2}y_{1}y_{2})^{2}(1 + dx_{1}x_{2}y_{1}y_{2})^{2} = (1 – dx_{1}x_{2}y_{1}y_{2})^{2}(1 + dx_{1}x_{2}y_{1}y_{2})^{2}/(1 – dx_{1}x_{2}y_{1}y_{2})^{2}(1 + dx_{1}x_{2}y_{1}y_{2})^{2} + d(x_{1}y_{2} + y_{1}x_{2})^{2}(y_{1}y_{2} – ax_{1}x_{2})^{2}/(1 + dx_{1}x_{2}y_{1}y_{2})^{2}(1 – dx_{1}x_{2}y_{1}y_{2})^{2}

Now, all denominators are same, we can simplify them. Note that (1 + dx_{1}x_{2}y_{1}y_{2})^{2}(1 – dx_{1}x_{2}y_{1}y_{2})^{2} cannot be 0.

a(x_{1}y_{2} + y_{1}x_{2})^{2}(1 – dx_{1}x_{2}y_{1}y_{2})^{2} + (y_{1}y_{2} – ax_{1}x_{2})^{2}(1 + dx_{1}x_{2}y_{1}y_{2})^{2} = (1 – dx_{1}x_{2}y_{1}y_{2})^{2}(1 + dx_{1}x_{2}y_{1}y_{2})^{2} + d(x_{1}y_{2} + y_{1}x_{2})^{2}(y_{1}y_{2} – ax_{1}x_{2})^{2}

Set P to x_{1}x_{2}y_{1}y_{2} to express this complex equation simpler.

a(x_{1}y_{2} + y_{1}x_{2})^{2}(1 – dP)^{2} + (y_{1}y_{2} – ax_{1}x_{2})^{2}(1 + dP)^{2} = (1 – dP)^{2}(1 + dP)^{2} + d(x_{1}y_{2} + y_{1}x_{2})^{2}(y_{1}y_{2} – ax_{1}x_{2})^{2}

The term (1 – dP)^{2}(1 + dP)^{2} can also be written as [(1 – dP)(1 + dP)]^{2} = (1 – d^{2}P^{2})^{2}

a(x_{1}y_{2} + y_{1}x_{2})^{2}(1 – dP)^{2} + (y_{1}y_{2} – ax_{1}x_{2})^{2}(1 + dP)^{2} = (1 – d^{2}P^{2})^{2} + d(x_{1}y_{2} + y_{1}x_{2})^{2}(y_{1}y_{2} – ax_{1}x_{2})^{2}

Evaluate the powers except (1 – d^{2}P^{2})^{2}

a(x_{1}^{2}y_{2}^{2} + y_{1}^{2}x_{2}^{2} + 2P)(1 + d^{2}P^{2} – 2dP) + (y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2} – 2aP)(1 + d^{2}P^{2} + 2dP) = (1 – d^{2}P^{2})^{2} + d(x_{1}^{2}y_{2}^{2} + y_{1}^{2}x_{2}^{2} + 2P)(y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2} – 2aP)

Focus on the left side. We can separate 1 + d^{2}P^{2} and 2dP multipliers.

(1 + d^{2}P^{2})(a(x_{1}^{2}y_{2}^{2} + y_{1}^{2}x_{2}^{2} + 2P) + y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2} – 2aP) + (2dP)(a(- x_{1}^{2}y_{2}^{2} – y_{1}^{2}x_{2}^{2} – 2P) + y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2} – 2aP)

Move a multiplier into the paranthesis

(1 + d^{2}P^{2})(ax_{1}^{2}y_{2}^{2} + ay_{1}^{2}x_{2}^{2} + 2aP + y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2} – 2aP) + (2dP)(- ax_{1}^{2}y_{2}^{2} – ay_{1}^{2}x_{2}^{2} – 2aP + y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2} – 2aP)

Plus and minus 2aP terms exist in the first paranthesis. We can remove them.

(1 + d^{2}P^{2})(ax_{1}^{2}y_{2}^{2} + ay_{1}^{2}x_{2}^{2} + y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2}) + (2dP)(- ax_{1}^{2}y_{2}^{2} – ay_{1}^{2}x_{2}^{2} – 2aP + y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2} – 2aP)

We can rewrite the term (ax_{1}^{2}y_{2}^{2} + ay_{1}^{2}x_{2}^{2} + y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2}) as (ax_{1}^{2} + y_{1}^{2})(ax_{2}^{2} + y_{2}^{2}). Similarly, the term (- ax_{1}^{2}y_{2}^{2} – ay_{1}^{2}x_{2}^{2} – 2aP + y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2} – 2aP) can be rewritten as ((ax_{1}^{2} – y_{1}^{2})(ax_{2}^{2} – y_{2}^{2}) – 4aP).

(1 + d^{2}P^{2})(ax_{1}^{2} + y_{1}^{2})(ax_{2}^{2} + y_{2}^{2}) + (2dP)[(ax_{1}^{2} – y_{1}^{2})(ax_{2}^{2} – y_{2}^{2}) – 4aP].

Move 2dP into the paranthesis

(1 + d^{2}P^{2})(ax_{1}^{2} + y_{1}^{2})(ax_{2}^{2} + y_{2}^{2}) + (2dP)(ax_{1}^{2} – y_{1}^{2})(ax_{2}^{2} – y_{2}^{2}) – 8adP^{2}

Now, focus on the right side.

(1 – d^{2}P^{2})^{2} + d(x_{1}^{2}y_{2}^{2} + y_{1}^{2}x_{2}^{2} + 2P)(y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2} – 2aP)

Move 2P term to the outside of the paranthesis

(1 – d^{2}P^{2})^{2} + d[(x_{1}^{2}y_{2}^{2} + y_{1}^{2}x_{2}^{2})(y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2}) + 2P(y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2} – ax_{1}^{2}y_{2}^{2} + ay_{1}^{2}x_{2}^{2}) – 4aP^{2}]

The term (y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2} – ax_{1}^{2}y_{2}^{2} + ay_{1}^{2}x_{2}^{2}) can be rewritten as (ax_{1}^{2} – y_{1}^{2})(ax_{2}^{2} – y_{2}^{2})

(1 – d^{2}P^{2})^{2} + d[(x_{1}^{2}y_{2}^{2} + y_{1}^{2}x_{2}^{2})(y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2}) + 2P(ax_{1}^{2} – y_{1}^{2})(ax_{2}^{2} – y_{2}^{2})- 4aP^{2}]

Also, (x_{1}^{2}y_{2}^{2} + y_{1}^{2}x_{2}^{2})(y_{1}^{2}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{2}) can be rewritten as (x_{1}^{2}y_{1}^{2}y_{2}^{4} + a^{2}x_{1}^{4}x_{2}^{2}y_{2}^{2} + x_{2}^{2}y_{1}^{4}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{4}y_{1}^{2})

(1 – d^{2}P^{2})^{2} + d[(x_{1}^{2}y_{1}^{2}y_{2}^{4} + a^{2}x_{1}^{4}x_{2}^{2}y_{2}^{2} + x_{2}^{2}y_{1}^{4}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{4}y_{1}^{2}) + 2P(ax_{1}^{2} – y_{1}^{2})(ax_{2}^{2} – y_{2}^{2})- 4aP^{2}]

The left side of the equation has (1 + d^{2}P^{2}). We can refactor the term (1 – d^{2}P^{2})^{2} on the right side.

(1 – d^{2}P^{2})^{2} = (1 + d^{2}P^{2})^{2} – 4d^{2}P^{2} = (1 + d^{2}P^{2})(1 + d^{2}P^{2}) – 4d^{2}P^{2}

We got the term (1 + d^{2}P^{2}). Now, replace P value with original one in the multiplier.

(1 + d^{2}P^{2})(1 + d^{2}x_{1}^{2}x_{2}^{2}y_{1}^{2}y_{2}^{2}) – 4d^{2}P^{2}

Adding and substracting dx_{1}^{2}y_{1}^{2} and dx_{2}^{2}y_{2}^{2} values would not change the content.

(1 + d^{2}P^{2})(1 + d^{2}x_{1}^{2}x_{2}^{2}y_{1}^{2}y_{2}^{2} + dx_{1}^{2}y_{1}^{2} + dx_{2}^{2}y_{2}^{2} – dx_{1}^{2}y_{1}^{2} – dx_{2}^{2}y_{2}^{2}) – 4d^{2}P^{2}

Separate plus and minus sign terms in same side.

(1 + d^{2}P^{2})(1 + d^{2}x_{1}^{2}x_{2}^{2}y_{1}^{2}y_{2}^{2} + dx_{1}^{2}y_{1}^{2} + dx_{2}^{2}y_{2}^{2}) +(1 + d^{2}P^{2})( – dx_{1}^{2}y_{1}^{2} – dx_{2}^{2}y_{2}^{2}) – 4d^{2}P^{2}

We can rewrite the term (1 + d^{2}x_{1}^{2}x_{2}^{2}y_{1}^{2}y_{2}^{2} + dx_{1}^{2}y_{1}^{2} + dx_{2}^{2}y_{2}^{2}) as (1 + dx_{1}^{2}y_{1}^{2})(1 + dx_{2}^{2}y_{2}^{2})

(1 + d^{2}P^{2})(1 + dx_{1}^{2}y_{1}^{2})(1 + dx_{2}^{2}y_{2}^{2}) + (1 + d^{2}P^{2})( – dx_{1}^{2}y_{1}^{2} – dx_{2}^{2}y_{2}^{2}) – 4d^{2}P^{2}

Rewrite (1 + d^{2}P^{2})( – dx_{1}^{2}y_{1}^{2} – dx_{2}^{2}y_{2}^{2})

(1 + d^{2}P^{2})(1 + dx_{1}^{2}y_{1}^{2})(1 + dx_{2}^{2}y_{2}^{2}) – dx_{1}^{2}y_{1}^{2} – dx_{2}^{2}y_{2}^{2} – d^{3}x_{2}^{2}y_{2}^{2}(x_{1}^{4}y_{1}^{4}) – d^{3}x_{1}^{2}y_{1}^{2}(x_{2}^{4}y_{2}^{4}) – 2d^{2}x_{1}^{2}y_{1}^{2}x_{2}^{2}y_{2}^{2} – 2d^{2}x_{1}^{2}y_{1}^{2}x_{2}^{2}y_{2}^{2}

Combine the parts that containing dx_{1}^{2}y_{1}^{2} and dx_{2}^{2}y_{2}^{2}

(1 + d^{2}P^{2})(1 + dx_{1}^{2}y_{1}^{2})(1 + dx_{2}^{2}y_{2}^{2}) – dx_{1}^{2}y_{1}^{2}(1 + d^{2}x_{2}^{4}y_{2}^{4} + 2dx_{2}^{2}y_{2}^{2}) – dx_{2}^{2}y_{2}^{2}(1 + d^{2}x_{1}^{4}y_{1}^{4} + 2dx_{1}^{2}y_{1}^{2})

The second and third terms can be expressed as power of an addition.

(1 + d^{2}P^{2})(1 + dx_{1}^{2}y_{1}^{2})(1 + dx_{2}^{2}y_{2}^{2}) – dx_{1}^{2}y_{1}^{2}(1 + dx_{2}^{2}y_{2}^{2})^{2} – dx_{2}^{2}y_{2}^{2}(1 + dx_{1}^{2}y_{1}^{2})^{2}

Combine left and right sides

(1 + d^{2}P^{2})(ax_{1}^{2} + y_{1}^{2})(ax_{2}^{2} + y_{2}^{2}) + (2dP)(ax_{1}^{2} – y_{1}^{2})(ax_{2}^{2} – y_{2}^{2}) – 8adP^{2} = d[(x_{1}^{2}y_{1}^{2}y_{2}^{4} + a^{2}x_{1}^{4}x_{2}^{2}y_{2}^{2} + x_{2}^{2}y_{1}^{4}y_{2}^{2} + a^{2}x_{1}^{2}x_{2}^{4}y_{1}^{2}) + 2P(ax_{1}^{2} – y_{1}^{2})(ax_{2}^{2} – y_{2}^{2})- 4aP^{2}] + (1 + d^{2}P^{2})(1 + dx_{1}^{2}y_{1}^{2})(1 + dx_{2}^{2}y_{2}^{2}) – dx_{1}^{2}y_{1}^{2}(1 + dx_{2}^{2}y_{2}^{2})^{2} – dx_{2}^{2}y_{2}^{2}(1 + dx_{1}^{2}y_{1}^{2})^{2}

Move d multiplier into the paranthesis

(1 + d^{2}P^{2})(ax_{1}^{2} + y_{1}^{2})(ax_{2}^{2} + y_{2}^{2}) + (2dP)(ax_{1}^{2} – y_{1}^{2})(ax_{2}^{2} – y_{2}^{2}) – 8adP^{2} = dx_{1}^{2}y_{1}^{2}y_{2}^{4} + da^{2}x_{1}^{4}x_{2}^{2}y_{2}^{2} + dx_{2}^{2}y_{1}^{4}y_{2}^{2} + da^{2}x_{1}^{2}x_{2}^{4}y_{1}^{2} + 2dP(ax_{1}^{2} – y_{1}^{2})(ax_{2}^{2} – y_{2}^{2})- 4adP^{2} + (1 + d^{2}P^{2})(1 + dx_{1}^{2}y_{1}^{2})(1 + dx_{2}^{2}y_{2}^{2}) – dx_{1}^{2}y_{1}^{2}(1 + dx_{2}^{2}y_{2}^{2})^{2} – dx_{2}^{2}y_{2}^{2}(1 + dx_{1}^{2}y_{1}^{2})^{2}

The both left and right side have (2dP)(ax_{1}^{2} – y_{1}^{2})(ax_{2}^{2} – y_{2}^{2}). We can remove these terms.

(1 + d^{2}P^{2})(ax_{1}^{2} + y_{1}^{2})(ax_{2}^{2} + y_{2}^{2}) – 8adP^{2} = dx_{1}^{2}y_{1}^{2}y_{2}^{4} + da^{2}x_{1}^{4}x_{2}^{2}y_{2}^{2} + dx_{2}^{2}y_{1}^{4}y_{2}^{2} + da^{2}x_{1}^{2}x_{2}^{4}y_{1}^{2} – 4adP^{2} + (1 + d^{2}P^{2})(1 + dx_{1}^{2}y_{1}^{2})(1 + dx_{2}^{2}y_{2}^{2}) – dx_{1}^{2}y_{1}^{2}(1 + dx_{2}^{2}y_{2}^{2})^{2} – dx_{2}^{2}y_{2}^{2}(1 + dx_{1}^{2}y_{1}^{2})^{2}

Move – 8adP^{2 }to the right side.

(1 + d^{2}P^{2})(ax_{1}^{2} + y_{1}^{2})(ax_{2}^{2} + y_{2}^{2}) = dx_{1}^{2}y_{1}^{2}y_{2}^{4} + da^{2}x_{1}^{4}x_{2}^{2}y_{2}^{2} + dx_{2}^{2}y_{1}^{4}y_{2}^{2} + da^{2}x_{1}^{2}x_{2}^{4}y_{1}^{2} + 4adP^{2} + (1 + d^{2}P^{2})(1 + dx_{1}^{2}y_{1}^{2})(1 + dx_{2}^{2}y_{2}^{2}) – dx_{1}^{2}y_{1}^{2}(1 + dx_{2}^{2}y_{2}^{2})^{2} – dx_{2}^{2}y_{2}^{2}(1 + dx_{1}^{2}y_{1}^{2})^{2}

Put the real value of P in 4adP^{2}

(1 + d^{2}P^{2})(ax_{1}^{2} + y_{1}^{2})(ax_{2}^{2} + y_{2}^{2}) = dx_{1}^{2}y_{1}^{2}y_{2}^{4} + da^{2}x_{1}^{4}x_{2}^{2}y_{2}^{2} + dx_{2}^{2}y_{1}^{4}y_{2}^{2} + da^{2}x_{1}^{2}x_{2}^{4}y_{1}^{2} + 2adx_{1}^{2}x_{2}^{2}y_{1}^{2}y_{2}^{2} + 2adx_{1}^{2}x_{2}^{2}y_{1}^{2}y_{2}^{2 }+ (1 + d^{2}P^{2})(1 + dx_{1}^{2}y_{1}^{2})(1 + dx_{2}^{2}y_{2}^{2}) – dx_{1}^{2}y_{1}^{2}(1 + dx_{2}^{2}y_{2}^{2})^{2} – dx_{2}^{2}y_{2}^{2}(1 + dx_{1}^{2}y_{1}^{2})^{2}

Combine terms that containing x_{1}^{2}y_{1}^{2} and x_{2}^{2}y_{2}^{2}

(1 + d^{2}P^{2})(ax_{1}^{2} + y_{1}^{2})(ax_{2}^{2} + y_{2}^{2}) = dx_{1}^{2}y_{1}^{2}(y_{2}^{4} + ax_{2}^{4} + 2ax_{2}^{2}y_{2}^{2}) + dx_{2}^{2}y_{2}^{2}(y_{1}^{4} + ax_{1}^{4} + 2ax_{1}^{2}y_{1}^{2}) + (1 + d^{2}P^{2})(1 + dx_{1}^{2}y_{1}^{2})(1 + dx_{2}^{2}y_{2}^{2}) – dx_{1}^{2}y_{1}^{2}(1 + dx_{2}^{2}y_{2}^{2})^{2} – dx_{2}^{2}y_{2}^{2}(1 + dx_{1}^{2}y_{1}^{2})^{2}

We can rewrite the term (y_{2}^{4} + ax_{2}^{4} + 2ax_{2}^{2}y_{2}^{2}) as (ax_{2}^{2} + y_{2}^{2})^{2} and (y_{1}^{4} + ax_{1}^{4} + 2ax_{1}^{2}y_{1}^{2}) as (ax_{1}^{2} + y_{1}^{2})^{2}

(1 + d^{2}P^{2})(ax_{1}^{2} + y_{1}^{2})(ax_{2}^{2} + y_{2}^{2}) = dx_{1}^{2}y_{1}^{2}(ax_{2}^{2} + y_{2}^{2})^{2} + dx_{2}^{2}y_{2}^{2}(ax_{1}^{2} + y_{1}^{2})^{2}+ (1 + d^{2}P^{2})(1 + dx_{1}^{2}y_{1}^{2})(1 + dx_{2}^{2}y_{2}^{2}) – dx_{1}^{2}y_{1}^{2}(1 + dx_{2}^{2}y_{2}^{2})^{2} – dx_{2}^{2}y_{2}^{2}(1 + dx_{1}^{2}y_{1}^{2})^{2}

Combine the terms that containing (1 + d^{2}P^{2})

(1 + d^{2}P^{2})[(ax_{1}^{2} + y_{1}^{2})(ax_{2}^{2} + y_{2}^{2}) – (1 + dx_{1}^{2}y_{1}^{2})(1 + dx_{2}^{2}y_{2}^{2}) ] = dx_{1}^{2}y_{1}^{2}(ax_{2}^{2} + y_{2}^{2})^{2} – dx_{1}^{2}y_{1}^{2}(1 + dx_{2}^{2}y_{2}^{2})^{2} + dx_{2}^{2}y_{2}^{2}(ax_{1}^{2} + y_{1}^{2})^{2} – dx_{2}^{2}y_{2}^{2}(1 + dx_{1}^{2}y_{1}^{2})^{2}

Still, we can combine terms containing dx_{1}^{2}y_{1}^{2} and dx_{2}^{2}y_{2}^{2}

(1 + d^{2}P^{2})[(ax_{1}^{2} + y_{1}^{2})(ax_{2}^{2} + y_{2}^{2}) – (1 + dx_{1}^{2}y_{1}^{2})(1 + dx_{2}^{2}y_{2}^{2}) ] = dx_{1}^{2}y_{1}^{2}[(ax_{2}^{2} + y_{2}^{2})^{2} – (1 + dx_{2}^{2}y_{2}^{2})^{2}]+ dx_{2}^{2}y_{2}^{2}[(ax_{1}^{2} + y_{1}^{2})^{2} – (1 + dx_{1}^{2}y_{1}^{2})^{2}]

All terms in brackets must be equal to zero based on the twisted Edwards curve equation. This proves the theorem as claimed.

We have already proven the addition formula for regular Edward curves Bernstein and Tanja introduced by setting variable a to 1.

x^{2} + y^{2} = 1 + dx^{2}y^{2}

x_{3} = (x_{1}y_{2} + y_{1}x_{2})/(1 + dx_{1}x_{2}y_{1}y_{2})

y_{3} = (y_{1}y_{2} – ax_{1}x_{2})/(1 – dx_{1}x_{2}y_{1}y_{2}) = (y_{1}y_{2} – x_{1}x_{2})/(1 – dx_{1}x_{2}y_{1}y_{2})

So, we haven proven addition formula for both twisted and regular Edwards curves. In particular, twisted ones are backbones of edwards curve based digital signatures. This signatures offer either high speed and high security. Every security specialist must put Edwards curves in their toolbox.

The post Twisted Edwards Curves for Digital Signatures appeared first on Sefik Ilkin Serengil.

]]>The post A Gentle Introduction to Edwards-curve Digital Signature Algorithm (EdDSA) appeared first on Sefik Ilkin Serengil.

]]>The original paper recommends to use **twisted Edwards curve**. This curve looks like a bird’s-eye roundabout intersection of a road.

ax^{2} + y^{2} = 1 + dx^{2}y^{2}

This has a similar addition formula to regular Edwards curves. I mention the difference boldly. It exists in the numerator of the y coordinate of the new point.

(x_{1}, y_{1}) + (x_{2}, y_{2}) = (x_{3}, y_{3})

x_{3} = (x_{1}y_{2} + y_{1}x_{2})/(1 + dx_{1}x_{2}y_{1}y_{2})

y_{3} = (y_{1}y_{2} + **a**x_{1}x_{2})/(1 – dx_{1}x_{2}y_{1}y_{2})

Ed25519 is a special form of this curve where a = -1, d = -121665/121666. It handles over prime fields where p = 2^{255} – 19. The final form of the ed25519 is illustrated below.

-x^{2} + y^{2} = 1 – (121665/121666) dx^{2}y^{2} (mod 2^{255} – 19)

The base point of the curve is y = (u-1)/(u+1) where u = 9. The integer equivalent is demonstrated below.

p = pow(2, 255) - 19 base = 15112221349535400772501151409588531511454012693041857206046113283949847762202 , 46316835694926478169428394003475163141307993866256225615783033603165251855960

Moreover, the variable d is a double value. We can convert it to integer by moving its denominator to numerator by switching its multiplicative inverse value.

#ax^2 + y^2 = 1 + dx^2y^2 a = -1; d = findPositiveModulus(-121665 * findModInverse(121666, p), p) #ed25519

Regular elliptic curves in weierstrass form have different formulas for addition and doubling. In contrast, the both operation will be handled by same addition formula in edwards curves. Working on finite fields requires to move denominators into the numerator by replacing its multiplicative inverse.

def pointAddition(P, Q, a, d, mod): x1 = P[0]; y1 = P[1]; x2 = Q[0]; y2 = Q[1] x3 = (((x1*y2 + y1*x2) % mod) * findModInverse(1+d*x1*x2*y1*y2, mod)) % mod y3 = (((y1*y2 - a*x1*x2) % mod) * findModInverse(1- d*x1*x2*y1*y2, mod)) % mod return x3, y3

Alice needs to generate a 32-byte private key. Then, she need to calculate private key times base point. This would be her public key.

import random privateKey = random.getrandbits(256) #32 byte secret key publicKey = applyDoubleAndAddMethod(base, privateKey, a, d, p)

She can use double-and-add method to find her public key fast.

def applyDoubleAndAddMethod(P, k, a, d, mod): additionPoint = (P[0], P[1]) kAsBinary = bin(k) #0b1111111001 kAsBinary = kAsBinary[2:len(kAsBinary)] #1111111001 #print(kAsBinary) for i in range(1, len(kAsBinary)): currentBit = kAsBinary[i: i+1] #always apply doubling additionPoint = pointAddition(additionPoint, additionPoint, a, d, mod) if currentBit == '1': #add base point additionPoint = pointAddition(additionPoint, P, a, d, mod) return additionPoint

Firstly, Alice needs to convert the message to the numeric value.

def textToInt(text): encoded_text = text.encode('utf-8') hex_text = encoded_text.hex() int_text = int(hex_text, 16) return int_text message = textToInt("Hello, world!")

Remember that a random key was involved in elliptic curve digital signature algorithm (ECDSA). This must be different for each signing. Otherwise, it causes an important security issue. This security disaster appears in Sony PlayStation game console in 2010. In EdDSA, this is handled by generating random key based on the hash of the message. In this way, every message has a different random key.

def hashing(message): import hashlib return int(hashlib.sha512(str(message).encode("utf-8")).hexdigest(), 16) r = hashing(hashing(message) + message) % p

Random key times base point will be random point R and it is a type of curve point. Extracting secret random key r from known random point R is a really hard problem (ECDLP). Besides, combination of the random point, public key and the message will be stored in the variable h after hashing. This can be calculated by receiver party, too. Then, s variable stores (r + h x private key) which is a type of integer. Signature of the message consists of (R, s) pair.

R = applyDoubleAndAddMethod(base, r, a, d, p) h = hashing(R[0] + publicKey[0] + message) % p s = (r + h * privateKey)

Bob receives the message and its signature (R, s). Also, he knows Alice’s public key, and public curve configuration (base point, a, d, p). He needs to find the folowing P1 and P2 pair.

h = hashing(R[0] + publicKey[0] + message) % p P1 = applyDoubleAndAddMethod(base, s, a, d, p) P2 = pointAddition(R , applyDoubleAndAddMethod(publicKey, h, a, d, p) , a, d, p)

P1 is signature’s s value times base point. P2 is the addition of signature’s R value and h times public key. Remember that h can be calculated by Bob, too. Herein, P1 and P2 pair must be equal if the signature if valid.

You might wonder how this works. Focus on the calculation of P1.

P1 = s x basePoint

Signature’s s value is retrieved by (r + h x privateKey). Bob knows exact value of s also he knows h and r values but he does not know the private key of Alice. Replace s in P1 calculation.

P1 = (r + h x privateKey) x basePoint

Transfer base point multiplication into the parenthesis.

P1 = r x basePoint + h x privateKey x basePoint

In the equation above includes private key times base point. This is exactly equal to public key of Alice. Moreover, random key r times base point is equal to random point R.

P1 = R+ h x publicKey

Now, P1 is exactly equal to P2. Expectation of equality of these two points is obviously normal.

So, we have mentioned the EdDSA and covered simply python implementation. This scheme is designed to be faster than any existing digital signature scheme. Also, signing two different messages with same random key causes secret key to be disclosed in ECDSA. This issue is handled in EdDSA. Finally, the code project of this post is pushed into the GitHub.

The post A Gentle Introduction to Edwards-curve Digital Signature Algorithm (EdDSA) appeared first on Sefik Ilkin Serengil.

]]>The post A Gentle Introduction to Edwards Curves appeared first on Sefik Ilkin Serengil.

]]>Edwards curves show similarity with the unit circle. The unit circle satisfies the following equation.

x^{2} + y^{2} = 1

Suppose that (x_{1}, y_{1}) and (x_{2}, y_{2}) are points on the unit circle. The angle between y-axis and (x_{1}, y_{1}) is α and angle between y-axis and (x_{2}, y_{2}) is β.

We can express (x_{1}, y_{1}) as (sinα, cosα) and (x_{2}, y_{2}) as (sinβ, cosβ).

Now, I can add these two points by adding their corresponding angles. Angle sum identities will help me to formulate.

x_{3}= sin(α+β) = sinα.cosβ + cosα.sinβ

y_{3} = cos(α+β) = cosα.cosβ – sinα.sinβ

We know that (x_{3}, y_{3}) will satisfy the unit circle equation. This is satisfactory but not elliptic!

Edwards curves satisfy the form x^{2} + y^{2} = a^{2} + a^{2}x^{2}y^{2}. This is the form Harold Edwards studied in the original paper. Additionally, Bernstein and Lange contributed the study and transformed edward curves to a simpler form x^{2} + y^{2} = 1+ dx^{2}y^{2}.

Setting the variable d to 0 create unit circle. The curve looks like a Starfish when the variable d increases in negative direction.

I will mention the form that Harold Edwards mentioned in the following parts of this post. You can find the proof of simpler form x^{2} + y^{2} = 1+ dx^{2}y^{2} here.

Elliptic curves are based on constructing new points from existing points. Traditional forms such as Weierstrass or Koblitz use chords and tangents to construct a new point.

Edwards Curves use neither chords nor tangents. They have a their own characteristic construction method similar to unit circle’s addition law.

Edwards addition law says that if (x_{1}, y_{1}) and (x_{2}, y_{2}) are points on the edwards curve, the following (x_{3}, y_{3}) point derived from known points must be on the same curve.

x_{3} = (x_{1}y_{2 }+ x_{2}y_{1})/(a.(1+x_{1}y_{1}x_{2}y_{2}))

y_{3} = (y_{1}y_{2} – x_{1}x_{2})/(a.(1 – x_{1}y_{1}x_{2}y_{2}))

Euler and Gauss have worked on this kind of elliptic equations and discovered addition formulas in late 1700s.

This is Euler’s very first study that appears in **Observations on the comparison of arcs of unrectifiable curves** (Observationes de Comparatione Arcuum Curvarum Irrectificabilium) published in 1761 (pp. 83 to 103).

Then, Gauss was interested in same integrals and documented it in his **Werke** published in 1799 (pp. 404). He worked on the form x^{2} + y^{2} = 1+ dx^{2}y^{2} where d = -1.

The following illustration demonstrates the difference between unit circle and elliptic form worked by Gauss.

We can say that Harold Edwards revealed already discovered theorems.

It is not fully clear that how Euler and Gauss found this addition formula. They might just observe. However, we can still apply mathematical induction to validate the formula. Exact values of the new point (x_{3}, y_{3}) derived from the known points x_{1}, y_{1}, x_{2}, y_{2} must satisfy the equation x^{2} + y^{2} = a^{2} + a^{2}x^{2}y^{2} if the addition formula is valid. This is how exactly Harold Edwards proves the addition law in the original paper.

x_{3}^{2} + y_{3}^{2} = a^{2} + a^{2}x_{3}^{2}y_{3}^{2}

x_{3} = (x_{1}y_{2 }+ x_{2}y_{1})/(a.(1+x_{1}y_{1}x_{2}y_{2})) , y_{3} = (y_{1}y_{2} – x_{1}x_{2})/(a.(1 – x_{1}y_{1}x_{2}y_{2}))

Put the exact values into the Edwards form.

(x_{1}y_{2 }+ x_{2}y_{1})^{2}/a^{2}.(1+x_{1}y_{1}x_{2}y_{2})^{2} + (y_{1}y_{2} – x_{1}x_{2})^{2}/a^{2}.(1 – x_{1}y_{1}x_{2}y_{2})^{2} = a^{2} + a^{2}.(x_{1}y_{2 }+ x_{2}y_{1})^{2}(y_{1}y_{2} – x_{1}x_{2})^{2}/a^{2}.(1+x_{1}y_{1}x_{2}y_{2})^{2}.a^{2}.(1 – x_{1}y_{1}x_{2}y_{2})^{2}

The identity will become very complex. That’s why, say P to x_{1}y_{1}x_{2}y_{2}. We will restore it later.

(x_{1}y_{2 }+ x_{2}y_{1})^{2}/a^{2}.(1+P)^{2} + (y_{1}y_{2} – x_{1}x_{2})^{2}/a^{2}.(1 – P)^{2} = a^{2} + a^{2}.(x_{1}y_{2 }+ x_{2}y_{1})^{2}(y_{1}y_{2} – x_{1}x_{2})^{2}/a^{2}.(1+P)^{2}.a^{2}.(1 – P)^{2}

Denominators must be equal to apply addition

(x_{1}y_{2 }+ x_{2}y_{1})^{2}.(1 – P)^{2}/a^{2}.(1+P)^{2}.(1 – P)^{2} + (y_{1}y_{2} – x_{1}x_{2})^{2}.(1+P)^{2}/a^{2}.(1 – P)^{2}.(1+P)^{2} = a^{2}.a^{2}.(1+P)^{2}.(1 – P)^{2}/1.a^{2}.(1+P)^{2}.(1 – P)^{2} + a^{2}.(x_{1}y_{2 }+ x_{2}y_{1})^{2}(y_{1}y_{2} – x_{1}x_{2})^{2}/a^{2}.(1+P)^{2}.a^{2}.(1 – P)^{2}

The second term on the right side has a^{2} multiplier in both dividend and denominator. We can simplify the expression.

(x_{1}y_{2 }+ x_{2}y_{1})^{2}.(1 – P)^{2}/a^{2}.(1+P)^{2}.(1 – P)^{2} + (y_{1}y_{2} – x_{1}x_{2})^{2}.(1+P)^{2}/a^{2}.(1 – P)^{2}.(1+P)^{2} = a^{2}.a^{2}.(1+P)^{2}.(1 – P)^{2}/1.a^{2}.(1+P)^{2}.(1 – P)^{2} + (x_{1}y_{2 }+ x_{2}y_{1})^{2}(y_{1}y_{2} – x_{1}x_{2})^{2}/a^{2}.(1+P)^{2}.(1 – P)^{2}

Now, all denominators are same. We can simplify the denominators.

(x_{1}y_{2 }+ x_{2}y_{1})^{2}.(1 – P)^{2} + (y_{1}y_{2} – x_{1}x_{2})^{2}.(1+P)^{2} = a^{4}.(1+P)^{2}.(1 – P)^{2} + (x_{1}y_{2 }+ x_{2}y_{1})^{2}(y_{1}y_{2} – x_{1}x_{2})^{2}

Note that simplified denominator must not be equal to 0.

a^{2}.(1+x_{1}y_{1}x_{2}y_{2})^{2}.(1 – x_{1}y_{1}x_{2}y_{2})^{2} ≠ 0

Please focus on the term (1+P)^{2}.(1 – P)^{2}. We can rewrite it as [(1+P)(1-P)]^{2} = (1 – P^{2})^{2}

(x_{1}y_{2 }+ x_{2}y_{1})^{2}.(1 – P)^{2} + (y_{1}y_{2} – x_{1}x_{2})^{2}.(1+P)^{2} = a^{4}.(1 – P^{2})^{2} + (x_{1}y_{2 }+ x_{2}y_{1})^{2}(y_{1}y_{2} – x_{1}x_{2})^{2}

Focus on the left side of the equation. Evaluate the powers.

(x_{1}^{2}y_{2}^{2} + x_{2}^{2}y_{1}^{2} + 2P)(1 + P^{2} – 2P) + (y_{1}^{2}y_{2}^{2} + x_{1}^{2}x_{2}^{2} – 2P)(1 + P^{2} + 2P)

Combine the parts that contain 1 + P^{2} and 2P respectively.

(1 + P^{2})(x_{1}^{2}y_{2}^{2} + x_{2}^{2}y_{1}^{2} + 2P + y_{1}^{2}y_{2}^{2} + x_{1}^{2}x_{2}^{2} – 2P) + (2P)(- x_{1}^{2}y_{2}^{2} – x_{2}^{2}y_{1}^{2} – 2P + y_{1}^{2}y_{2}^{2} + x_{1}^{2}x_{2}^{2} – 2P)

(1 + P^{2})(x_{1}^{2}y_{2}^{2} + x_{2}^{2}y_{1}^{2} + y_{1}^{2}y_{2}^{2} + x_{1}^{2}x_{2}^{2}) + (2P)(- x_{1}^{2}y_{2}^{2} – x_{2}^{2}y_{1}^{2} + y_{1}^{2}y_{2}^{2} + x_{1}^{2}x_{2}^{2} – 4P)

(1 + P^{2})(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) + (2P)[(x_{1}^{2} – y_{1}^{2})(x_{2}^{2} – y_{2}^{2}) – 4P]

(1 + P^{2})(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) + 2P.(x_{1}^{2} – y_{1}^{2})(x_{2}^{2} – y_{2}^{2}) – 8P^{2}

Now, focus on the right side of the equation

(x_{1}y_{2 }+ x_{2}y_{1})^{2}(y_{1}y_{2} – x_{1}x_{2})^{2} = (x_{1}^{2}y_{2}^{2}_{ }+ x_{2}^{2}y_{1}^{2} + 2P)(y_{1}^{2}y_{2}^{2} + x_{1}^{2}x_{2}^{2} – 2P)

Put the term 2P to the outside.

(x_{1}^{2}y_{2}^{2}_{ }+ x_{2}^{2}y_{1}^{2})(y_{1}^{2}y_{2}^{2} + x_{1}^{2}x_{2}^{2}) + 2P(y_{1}^{2}y_{2}^{2} + x_{1}^{2}x_{2}^{2} – x_{1}^{2}y_{2}^{2}_{ }– x_{2}^{2}y_{1}^{2}) – 4P^{2}

(x_{1}^{2}y_{1}^{2}y_{2}^{4} + x_{1}^{4}x_{2}^{2}y_{2}^{2} + x_{2}^{2}y_{1}^{4}y_{2}^{2} + x_{1}^{2}x_{2}^{4}y_{1}^{2}) + 2P(x_{1}^{2} – y_{1}^{2})(x_{2}^{2} – y_{2}^{2}) – 4P^{2}

Put left and right side together again

(1 + P^{2})(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) + 2P.(x_{1}^{2} – y_{1}^{2})(x_{2}^{2} – y_{2}^{2}) – 8P^{2} = a^{4}.(1 – P^{2})^{2} + (x_{1}^{2}y_{1}^{2}y_{2}^{4} + x_{1}^{4}x_{2}^{2}y_{2}^{2} + x_{2}^{2}y_{1}^{4}y_{2}^{2} + x_{1}^{2}x_{2}^{4}y_{1}^{2}) + 2P(x_{1}^{2} – y_{1}^{2})(x_{2}^{2} – y_{2}^{2}) – 4P^{2}

The both left and right side have the term 2P.(x_{1}^{2} – y_{1}^{2})(x_{2}^{2} – y_{2}^{2}). We can simplify the equation. Also, we can add the term +8P^{2} on both side.

(1 + P^{2})(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) = a^{4}.(1 – P^{2})^{2} + (x_{1}^{2}y_{1}^{2}y_{2}^{4} + x_{1}^{4}x_{2}^{2}y_{2}^{2} + x_{2}^{2}y_{1}^{4}y_{2}^{2} + x_{1}^{2}x_{2}^{4}y_{1}^{2}) + 4P^{2}

Here, we can manipulate the term (1 – P^{2})^{2}. The left side has the term (1 + P^{2}), we should assimilate (1 – P^{2})^{2} to (1 + P^{2}).

(1 – P^{2})^{2} = (1 + P^{2})^{2} – 4P^{2} = (1 + P^{2})(1 + P^{2}) – 4P^{2}

Put the real value instead of P

(1 + P^{2})(1 + x_{1}^{2}y_{1}^{2}x_{2}^{2}y_{2}^{2}) – 4P^{2}

Adding and subtracting same values would not change the value

(1 + P^{2})(1 + x_{1}^{2}y_{1}^{2}x_{2}^{2}y_{2}^{2} + x_{1}^{2}y_{1}^{2} + x_{2}^{2}y_{2}^{2} – x_{1}^{2}y_{1}^{2} – x_{2}^{2}y_{2}^{2}) – 4P^{2}

We can seperate the second parentheses

(1 + P^{2})(1 + x_{1}^{2}y_{1}^{2}x_{2}^{2}y_{2}^{2} + x_{1}^{2}y_{1}^{2} + x_{2}^{2}y_{2}^{2}) – (1 + P^{2})(x_{1}^{2}y_{1}^{2} + x_{2}^{2}y_{2}^{2}) – 4P^{2}

The second parentheses can be expressed as multiplication of two terms

(1 + P^{2})(1 + x_{1}^{2}y_{1}^{2})(1 + x_{2}^{2}y_{2}^{2}) – (1 + P^{2})(x_{1}^{2}y_{1}^{2} + x_{2}^{2}y_{2}^{2}) – 4P^{2}

Reflect the minus sign into the parentheses

(1 + P^{2})(1 + x_{1}^{2}y_{1}^{2})(1 + x_{2}^{2}y_{2}^{2}) + (- 1 – P^{2})(x_{1}^{2}y_{1}^{2} + x_{2}^{2}y_{2}^{2}) – 4P^{2}

Multiply two parentheses on the second term

(1 + P^{2})(1 + x_{1}^{2}y_{1}^{2})(1 + x_{2}^{2}y_{2}^{2}) – x_{1}^{2}y_{1}^{2} – x_{2}^{2}y_{2}^{2} – P^{2}x_{1}^{2}y_{1}^{2} – P^{2}x_{2}^{2}y_{2}^{2} – 4P^{2}

(1 + P^{2})(1 + x_{1}^{2}y_{1}^{2})(1 + x_{2}^{2}y_{2}^{2}) – x_{1}^{2}y_{1}^{2} – x_{2}^{2}y_{2}^{2} – x_{2}^{2}y_{2}^{2}(x_{1}^{4}y_{1}^{4})- x_{1}^{2}y_{1}^{2}(x_{2}^{4}y_{2}^{4}) – 2x_{1}^{2}y_{1}^{2}x_{2}^{2}y_{2}^{2} – 2x_{1}^{2}y_{1}^{2}x_{2}^{2}y_{2}^{2}

(1 + P^{2})(1 + x_{1}^{2}y_{1}^{2})(1 + x_{2}^{2}y_{2}^{2}) – x_{1}^{2}y_{1}^{2}(1 + x_{2}^{4}y_{2}^{4} + 2x_{2}^{2}y_{2}^{2}) – x_{2}^{2}y_{2}^{2}(1 + x_{1}^{4}y_{1}^{4} + 2x_{1}^{2}y_{1}^{2})

(1 + P^{2})(1 + x_{1}^{2}y_{1}^{2})(1 + x_{2}^{2}y_{2}^{2}) – x_{1}^{2}y_{1}^{2}(1 + x_{2}^{2}y_{2}^{2})^{2} – x_{2}^{2}y_{2}^{2}(1 + x_{1}^{2}y_{1}^{2})^{2}

So, the term (1 – P^{2})^{2} can be expressed as (1 + P^{2})(1 + x_{1}^{2}y_{1}^{2})(1 + x_{2}^{2}y_{2}^{2}) – x_{1}^{2}y_{1}^{2}(1 + x_{2}^{2}y_{2}^{2})^{2} – x_{2}^{2}y_{2}^{2}(1 + x_{1}^{2}y_{1}^{2})^{2}

Turn back to the main equation

(1 + P^{2})(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) = a^{4}.(1 – P^{2})^{2} + (x_{1}^{2}y_{1}^{2}y_{2}^{4} + x_{1}^{4}x_{2}^{2}y_{2}^{2} + x_{2}^{2}y_{1}^{4}y_{2}^{2} + x_{1}^{2}x_{2}^{4}y_{1}^{2}) + 4P^{2}

Restore the 4P^{2}

(1 + P^{2})(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) = a^{4}.(1 – P^{2})^{2} + (x_{1}^{2}y_{1}^{2}y_{2}^{4} + x_{1}^{4}x_{2}^{2}y_{2}^{2} + x_{2}^{2}y_{1}^{4}y_{2}^{2} + x_{1}^{2}x_{2}^{4}y_{1}^{2}) + 2x_{1}^{2}y_{1}^{2}x_{2}^{2}y_{2}^{2} + 2x_{1}^{2}y_{1}^{2}x_{2}^{2}y_{2}^{2}

Combine the parts that contain x_{1}^{2}y_{1}^{2} and x_{2}^{2}y_{2}^{2}

(1 + P^{2})(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) = a^{4}.(1 – P^{2})^{2} + x_{1}^{2}y_{1}^{2}(x_{2}^{4} + y_{2}^{4} + 2x_{2}^{2}y_{2}^{2}) + x_{2}^{2}y_{2}^{2}(x_{1}^{4} + y_{1}^{4} + 2x_{1}^{2}y_{1}^{2})

(1 + P^{2})(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) = a^{4}.(1 – P^{2})^{2} + x_{1}^{2}y_{1}^{2}(x_{2}^{2} + y_{2}^{2})^{2} + x_{2}^{2}y_{2}^{2}(x_{1}^{2} + y_{1}^{2})^{2}

Now, set (1 – P^{2})^{2} to its manipulated value

(1 + P^{2})(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) = a^{4}.[(1 + P^{2})(1 + x_{1}^{2}y_{1}^{2})(1 + x_{2}^{2}y_{2}^{2}) – x_{1}^{2}y_{1}^{2}(1 + x_{2}^{2}y_{2}^{2})^{2} – x_{2}^{2}y_{2}^{2}(1 + x_{1}^{2}y_{1}^{2})^{2}] + x_{1}^{2}y_{1}^{2}(x_{2}^{2} + y_{2}^{2})^{2} + x_{2}^{2}y_{2}^{2}(x_{1}^{2} + y_{1}^{2})^{2}

(1 + P^{2})(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) = a^{4}(1 + P^{2})(1 + x_{1}^{2}y_{1}^{2})(1 + x_{2}^{2}y_{2}^{2}) – a^{4}.x_{1}^{2}y_{1}^{2}(1 + x_{2}^{2}y_{2}^{2})^{2} – a^{4}.x_{2}^{2}y_{2}^{2}(1 + x_{1}^{2}y_{1}^{2})^{2} + x_{1}^{2}y_{1}^{2}(x_{2}^{2} + y_{2}^{2})^{2} + x_{2}^{2}y_{2}^{2}(x_{1}^{2} + y_{1}^{2})^{2}

Combine the parts that contain (1 + P^{2})

(1 + P^{2}).[(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) – a^{4}(1 + x_{1}^{2}y_{1}^{2})(1 + x_{2}^{2}y_{2}^{2})] + a^{4}.x_{1}^{2}y_{1}^{2}(1 + x_{2}^{2}y_{2}^{2})^{2} + a^{4}.x_{2}^{2}y_{2}^{2}(1 + x_{1}^{2}y_{1}^{2})^{2} – x_{1}^{2}y_{1}^{2}(x_{2}^{2} + y_{2}^{2})^{2} – x_{2}^{2}y_{2}^{2}(x_{1}^{2} + y_{1}^{2})^{2} = 0

Reflect a multipliers into the parentheses

(1 + P^{2}).[(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) – (a^{2} + a^{2}x_{1}^{2}y_{1}^{2})(a^{2} + a^{2}x_{2}^{2}y_{2}^{2})] + (a^{2})^{2}.x_{1}^{2}y_{1}^{2}(1 + x_{2}^{2}y_{2}^{2})^{2} + (a^{2})^{2}.x_{2}^{2}y_{2}^{2}(1 + x_{1}^{2}y_{1}^{2})^{2} – x_{1}^{2}y_{1}^{2}(x_{2}^{2} + y_{2}^{2})^{2} – x_{2}^{2}y_{2}^{2}(x_{1}^{2} + y_{1}^{2})^{2} = 0

(1 + P^{2}).[(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) – (a^{2} + a^{2}x_{1}^{2}y_{1}^{2})(a^{2} + a^{2}x_{2}^{2}y_{2}^{2})] + x_{1}^{2}y_{1}^{2}(a^{2} + a^{2}x_{2}^{2}y_{2}^{2})^{2} + x_{2}^{2}y_{2}^{2}(a^{2} + a^{2}x_{1}^{2}y_{1}^{2})^{2} – x_{1}^{2}y_{1}^{2}(x_{2}^{2} + y_{2}^{2})^{2} – x_{2}^{2}y_{2}^{2}(x_{1}^{2} + y_{1}^{2})^{2} = 0

(1 + P^{2}).[(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) – (a^{2} + a^{2}x_{1}^{2}y_{1}^{2})(a^{2} + a^{2}x_{2}^{2}y_{2}^{2})] + (x_{1}^{2}y_{1}^{2})[(a^{2} + a^{2}x_{2}^{2}y_{2}^{2})^{2} – (x_{2}^{2} + y_{2}^{2})^{2}] + (x_{2}^{2}y_{2}^{2})[(a^{2} + a^{2}x_{1}^{2}y_{1}^{2})^{2} – (x_{1}^{2} + y_{1}^{2})^{2}] = 0

Remember the main equation for Edwards form x^{2} + y^{2} = a^{2} + a^{2}x^{2}y^{2} . We’ve already known that points (x_{1}, y_{1}) and (x_{2}, y_{2}) satisfy this equation.

x_{1}^{2} + y_{1}^{2} = a^{2} + a^{2}x_{1}^{2}y_{1}^{2}

x_{2}^{2} + y_{2}^{2} = a^{2} + a^{2}x_{2}^{2}y_{2}^{2}

Multiply these two equations

(x_{1}^{2} + y_{1}^{2}).(x_{2}^{2} + y_{2}^{2}) = (a^{2} + a^{2}x_{1}^{2}y_{1}^{2})(a^{2} + a^{2}x_{2}^{2}y_{2}^{2})

Move the terms on the right side to the left side

(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) – (a^{2} + a^{2}x_{1}^{2}y_{1}^{2})(a^{2} + a^{2}x_{2}^{2}y_{2}^{2}) = 0

Also, we can apply same approach to individual equation

x_{1}^{2} + y_{1}^{2} – (a^{2} + a^{2}x_{1}^{2}y_{1}^{2}) = 0

x_{2}^{2} + y_{2}^{2} – (a^{2} + a^{2}x_{2}^{2}y_{2}^{2}) = 0

As seen, these all appear in the final form of the equation

(1 + P^{2}).[(x_{1}^{2} + y_{1}^{2})(x_{2}^{2} + y_{2}^{2}) – (a^{2} + a^{2}x_{1}^{2}y_{1}^{2})(a^{2} + a^{2}x_{2}^{2}y_{2}^{2})] + (x_{1}^{2}y_{1}^{2})[(a^{2} + a^{2}x_{2}^{2}y_{2}^{2})^{2} – (x_{2}^{2} + y_{2}^{2})^{2}] + (x_{2}^{2}y_{2}^{2})[(a^{2} + a^{2}x_{1}^{2}y_{1}^{2})^{2} – (x_{1}^{2} + y_{1}^{2})^{2}] = 0

(1 + P^{2}).[0] + (x_{1}^{2}y_{1}^{2})[0] + (x_{2}^{2}y_{2}^{2})[0] = 0

Finally, the equation becomes 0 = 0. This proves the addition law as claimed!

Addition law can also be applied for doubling a point. Replacing (x_{2}, y_{2}) pair with (x_{1}, y_{1}) in the addition formula gives the doubling formula.

(x_{1}, y_{1}) + (x_{1}, y_{1}) = (x_{3}, y_{3})

x_{3} = (x_{1}y_{1 }+ x_{1}y_{1})/(a.(1+x_{1}y_{1}x_{1}y_{1})) = (x_{1}y_{1 }+ x_{1}y_{1})/(a.(1+x_{1}^{2}y_{1}^{2}))

y_{3} = (y_{1}y_{1} – x_{1}x_{1})/(a.(1 – x_{1}y_{1}x_{1}y_{1})) = (y_{1}^{2} – x_{1}^{2})/(a.(1 – x_{1}^{2}y_{1}^{2}))

So, we have all the necessary stuff to find the coordinates of a point. Point addition and doubling enable to calculate a target point fast with double and add method.

So, we’ve mentioned elliptic curves in Edwards form. Even though addition law was discovered more than 2 centuries ago by math geniuses, adapting into the cryptography occurs in last decade. Proof of Edwards addition law may seem much harder than Weierstrass or Koblitz curves, but calculations would be handled much easier. This makes Edwards curves so popular today.

Bernstein shows Weierstrass as a turtle (bird’s-eye view) and Edwards as a starfish in his slides. This metaphor supports the speed of elliptic curve forms. Weierstrass is old and slow whereas Edwards is new and fast. This is really funny!

Publications of Christiane Peters, in particular her PhD thesis has been driver for me to enjoy and understand Edwards Curves. Besides, I got much help from studies of Tanja Lange and Daniel J. Bernstein.

The post A Gentle Introduction to Edwards Curves appeared first on Sefik Ilkin Serengil.

]]>The post A Step by Step Hill Cipher Example appeared first on Sefik Ilkin Serengil.

]]>First, sender and receiver parties need to agree with a secret key. This key must be a square matrix.

key = np.array([ [3, 10, 20], [20, 9, 17], [9, 4, 17] ]) key_rows = key.shape[0] key_columns = key.shape[1] if key_rows != key_columns: raise Exception('key must be square matrix!')

The key matrix must have an inverse matrix. This means that determinant of the matrix must not be 0.

if np.linalg.det(key) == 0: raise Exception('matrix must have an inverse matrix')

Hill cipher is language dependent encryption method. That’s why, all character will be in lowercase and we’ll remove blank characters as well. Then, every letter will be replaced with its index value in the alphabet.

def letterToNumber(letter): return string.ascii_lowercase.index(letter) raw_message = "attack is to night" print("raw message: ",raw_message) message = [] for i in range(0, len(raw_message)): current_letter = raw_message[i:i+1].lower() if current_letter != ' ': #discard blank characters letter_index = letterToNumber(current_letter) message.append(letter_index)

Encryption will be handled by multiplying message and key. This requires that column size of the message must be equal to row size of the key. Otherwise, multiplication cannot be handled. We can append beginning letter of the alphabet to the end of the message until multiplication can be handled. Hill cipher is a block cipher method and repetition won’t be cause weakness. Still, I prefer to append beginning of the message instead of repeating characters. BTW, column number of my message and row number of my key are equal. The following code block won’t be run for this case.

if len(message) % key_rows != 0: for i in range(0, len(message)): message.append(message[i]) if len(message) % key_rows == 0: break

Now, we can transform the message into a matrix.

message = np.array(message) message_length = message.shape[0] message.resize(int(message_length/key_rows), key_rows)

Now, my message is stored in a 5×3 sized matrix as illustrated below.

[[ 0 19 19] [ 0 2 10] [ 8 18 19] [14 13 8] [ 6 7 19]]

The message is 5×3 sized matrix and the key is 3×3 sized matrix. Message’s column size is equal to key matrix’s row count. They can be multiplied. Multiplication might produce values greater than the alphabet size. That’s why, we will apply modular arithmetic. Here, 26 refers to the size of English alphabet. We can consume either matmul or dot functions.

encryption = np.matmul(message, key) encryption = np.remainder(encryption, 26)

Encrypted text will be stored in 5×3 sized matrix as illustrated below.

[[ 5 13 22] [ 0 6 22] [ 9 6 9] [10 3 13] [17 17 16]]

Remember that plaintext was attackistonight. Please focus on the 2nd and 3rd letter in plaintext. They are both letter of t. However, 2nd and 3rd characters in the ciphertext are 13 and 22 respectively. Same characters substituted with different characters. This is idea behind block ciphers.

Multiplying ciphertext and inverse of key will create plaintext. Here, we need to find the inverse of key. Finding matrix inverse is a complex operation. Even though numpy has a matrix inverse function, we also need to apply modular arithmetic on this decimal matrix. On the other hand, SymPy handles modular arithmetic for matrix inverse operations easily.

from sympy import Matrix inverse_key = Matrix(key).inv_mod(26) inverse_key = np.array(inverse_key) #sympy to numpy inverse_key = inverse_key.astype(float)

We could find the inverse key.

[[11. 22. 14.] [ 7. 9. 21.] [17. 0. 3.]]

We can validate inverse key matrix. Multiplication of key and inverse key must be equal to idendity matrix.

check = np.matmul(key, inverse_key) check = np.remainder(check, module)

This is really produces the identity matrix.

[[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]]

Bob found the inverse key and he has the ciphertext. He need to multiply ciphertext and inverse key matrices.

decryption = np.matmul(encryption, inverse_key) decryption = np.remainder(decryption, module).flatten()

As seen, decrytpion stores the exact message Alice sent.

decryption: [ 0. 19. 19. 0. 2. 10. 8. 18. 19. 14. 13. 8. 6. 7. 19.]

We can restore these values into characters.

decrypted_message = "" for i in range(0, len(decryption)): letter_num = int(decryption[i]) letter = numberToLetter(decryption[i]) decrypted_message = decrypted_message + letter

This restores the following message.

decrypted message: attackistonight

Inventor Lester S. Hill registered this idea to patent office. You should have a view on his drawings. He designed an **encrypted telegraph machine** at the beginning of 1930’s and named message protector. Today, we call this Hill’s Cipher Machine.

In this post, we’ve worked on 3×3 sized key and its key space is 26^{9}. Patented mechanism works on 6×6 sized keys. This increases key space to 26^{36}. This is very large even for today computation power. Increasing the size of key matrix makes the cipher much stronger. We can say that Hill is secure against *ciphertext only attacks*.

However, if an attacker can capture a plaintext ciphertext pair, then he can calculate key value easily. That’s why, ciphertext is weak against *known plaintext attacks*. That’s why, this cipher got out of the date.

The source code of this post is pushed into the GitHub.

The post A Step by Step Hill Cipher Example appeared first on Sefik Ilkin Serengil.

]]>The post Using Custom Activation Functions in Keras appeared first on Sefik Ilkin Serengil.

]]>Herein, advanced frameworks cannot catch innovations. For example, you cannot use Swish based activation functions in Keras today. This might appear in the following patch but you may need to use an another activation function before related patch pushed. So, this post will guide you to consume a custom activation function out of the Keras and Tensorflow such as Swish or E-Swish.

All you need is to create your custom activation function. In this case, I’ll consume swish which is x times sigmoid. Besides, I include this in a convolutional neural networks model.

import keras def swish(x): beta = 1.5 #1, 1.5 or 2 return beta * x * keras.backend.sigmoid(x) model = Sequential() #1st convolution layer model.add(Conv2D(32, (3, 3) #32 is number of filters and (3, 3) is the size of the filter. , activation = swish , input_shape=(28,28,1))) model.add(MaxPooling2D(pool_size=(2,2))) #2nd convolution layer model.add(Conv2D(64,(3, 3), activation = swish)) # apply 64 filters sized of (3x3) on 2nd convolution layer model.add(MaxPooling2D(pool_size=(2,2))) model.add(Flatten()) # Fully connected layer. 1 hidden layer consisting of 512 nodes model.add(Dense(512, activation = swish)) model.add(Dense(num_classes, activation='softmax')) model.compile(loss='categorical_crossentropy' , optimizer=keras.optimizers.Adam() , metrics=['accuracy'] ) model.fit_generator(x_train, y_train , epochs=epochs , validation_data=(x_test, y_test) )

Remember that we will use this activation function in feed forward step whereas we need to use its derivative in the backpropagation. We just define the activation function but we do offer its derivative. That’s the power of TensorFlow. The framework knows how to apply differentiation for backpropagation. This comes from importing **keras backend module**. If you design swish function without keras.backend then fitting would fail.

So, we’ve mentioned how to include a new activation function for learning process in Keras / TensorFlow pair. Picking the most convenient activation function is the state-of-the-art for scientists just like structure (number of hidden layers, number of nodes in the hidden layers) and learning parameters (learning rate, epoch or learning rate). Now, you can design your own activation function or consume any newly introduced activation function just similar to the following picture.

*My friend and colleague Giray inspires me to produce this post. I am grateful to him as usual.*

The post Using Custom Activation Functions in Keras appeared first on Sefik Ilkin Serengil.

]]>