Homomorphic Facial Recognition with TenSEAL

Facial recognition requires high privacy concerns. Blindly use of facial data can cause the stolen of the credentials and lead to potential security leaks. Cryptography provides to avoid data leaks mostly. Herein, homomorphic encryption enables to make calculations on encrypted data. It becomes more important nowadays because of the high cloud adoption rates. In this post, we are going to adopt homomorphic encryption on a facial recognition pipeline with TenSEAL.

Cute seal on crystal clear water by Elianne Dipp

Vlog

You can either continue to read this tutorial or watch the following video. They both cover the same topic.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

Do we really need homomorphic encryption?

Please consider the scenario to find the sum of account activities of a customer. Suppose that account activity records are stored in the cloud as encrypted.

What if we would cover this use case with regular encryption algorithms? If its activity has 1K records, then we need to retrieve those 1K encrypted records from cloud to the client, decrypt them and find the sum of those transactions. In other words, all costly operations should be done in the client side.

On the other hand, we can find the sum of those transactions in the cloud side directly if it is homomorphic encrypted. We just need to decrypt the encrypted sum value in the client side. In this way, all costly processing power is consumed in the server side. This is very IOT and edge friendly technology!

Partially Homomorphic Encryption

Even though fully homomorphic encryption (FHE) has become available in recent times supportingly this tutorial, but when considering the trade-offs, partially homomorphic encryption (PHE) emerges as a more efficient and practical choice. If your specific task doesn’t demand the full homomorphic capabilities, opting for partial homomorphism is the logical decision. PHE is notably faster and demands fewer computational resources compared to FHE. Besides, it generates smaller ciphertexts, making it well-suited for memory-constrained environments. Finally, PHE strikes a favorable balance between security and efficiency for practical use cases.

Herein, LightPHE is a lightweight partially homomorphic encryption library for python. It wraps many partially homomorphic algorithms such as RSAElGamalExponential ElGamalElliptic Curve ElGamalPaillierDamgard-JurikOkamoto–UchiyamaBenalohNaccache–SternGoldwasser–Micali. With LightPHE, you can build homomorphic crypto systems with a couple of lines of code, encrypt & decrypt your data and perform homomorphic operations such as homomorphic addition, homomorphic multiplication, homomorphic xor, regenerate cipher texts, multiplying your cipher text with a known plain constant according to the built algorithm.

# pip install lightphe
from lightphe import LightPHE
 
# supported algorithms
algorithms = [
  'RSA',
  'ElGamal',
  'Exponential-ElGamal',
  'Paillier',
  'Damgard-Jurik',
  'Okamoto-Uchiyama',
  'Benaloh',
  'Naccache-Stern',
  'Goldwasser-Micali',
  'EllipticCurve-ElGamal'
]
 
# build a Paillier cryptosystem which is homomorphic
# with respect to the addition
cs = LightPHE(algorithm_name = algorithms[3])
 
# define plaintexts
m1 = 17
m2 = 23
 
# calculate ciphertexts
c1 = cs.encrypt(m1)
c2 = cs.encrypt(m2)
 
# performing homomorphic addition on ciphertexts
assert cs.decrypt(c1 + c2) == m1 + m2
 
# scalar multiplication (increase its value 5%)
k = 1.05
assert cs.decrypt(k * c1) == k * m1
 
# pailier is not homomorphic with respect to the multiplication
with pytest.raises(ValueError):
  c1 * c2
 
# pailier is not homomorphic with respect to the xor
with pytest.raises(ValueError):
  c1 ^ c2

You may consider watching the following demo of LightPHE.





Let’s turn back to the fully homomorphic encryption.

TenSEAL

Microsoft SEAL library is mainly based on C++ whereas TenSEAL is its python wrapper. It is fully homomorphic. In other words, we can make calculations for 4 operations (+, -, x, /). Besides, it accepts tensors as inputs. Notice that outputs of face recognition models are just tensors (aka multidimensional vectors). So, we can pass the output of facial recognition models to TenSEAL as is.

The library is available at PyPI. The easiest way to install the library is running pip install tenseal command. Then, you will be able to import the library and use its functionalities under its interface.

#!pip install tenseal
import tenseal as ts

Initialize TenSEAL

Homomorpic encryption is a type of public key cryptography. So, it has a secret – public key pair. Firstly, we are going to initialize a context and it will create a random key pairs. Here, the output vector of facial recognition models are real numbers. So, we have to use CKKS as a scheme type. Alternatively, BFV scheme type is serving for integer tensors.

context = ts.context(ts.SCHEME_TYPE.CKKS, poly_modulus_degree = 8192, coeff_mod_bit_sizes = [60, 40, 40, 60])
context.generate_galois_keys()
context.global_scale = 2**40

Storing Keys

Context object stores the both private and public key pair. Now, we are going to export it to re-use. We tend to use pickle to store complex objects in python. However, TenSEAL does not support pickle unfortunately. Documentation proposes to serialize the context to export. Serialization function converts the context and tensors to bytes. In this way, they can be restored later.

We are going to use the following read and write data functions in the following stages. Notice that I prefer to store SEAL objects in base64 encoded instead of bytes.

def write_data(file_name, data):
    if type(data) == bytes:
        #bytes to base64
        data = base64.b64encode(data)
        
    with open(file_name, 'wb') as f: 
        f.write(data)

def read_data(file_name):
    with open(file_name, "rb") as f:
        data = f.read()
    #base64 to bytes
    return base64.b64decode(data)

Once the context is initialized, we are able to store the both secret and public key pairs with those functions. Notice that context stores the both secret and public key pairs. Make context public function drops the secret key to store public key. Calling this function is very important because misusing it causes private key leaks.

secret_context = context.serialize(save_secret_key = True)
write_data("secret.txt", secret_context)

context.make_context_public() #drop the secret_key from the context
public_context = context.serialize()
write_data("public.txt", public_context)

Notice that the cloud system will have the public.txt whereas our client system will have the secret.txt.

Finding facial embeddings

Basically, facial recognition models finds vector representations of facial images. Storing plain vector embeddings in the cloud systems causes privacy problems. Because if an attacker has the facial embedding of an identity, it can apply some adversarial attacks. We have to store facial embeddings in the cloud systems as encrypted.





Here, we are going to find vector representations of facial images with DeepFace library. I use the unit test items of deepface library in this study. Besides, I will build FaceNet model. Notice that this operation will be done in the client side.

#!pip install deepface
from deepface import DeepFace

img1_path = "deepface/tests/dataset/img1.jpg"
img2_path = "deepface/tests/dataset/img2.jpg"

img1_embedding = DeepFace.represent(img1_path, model_name = 'Facenet')[0]["embedding"]
img2_embedding = DeepFace.represent(img2_path, model_name = 'Facenet')[0]["embedding"]

In this experiment, I am using the following image pair.

Image pair

Encryption

We are going to encrypt facial embeddings with homomorphic encryption and store homomorphic encrypted tensors in the cloud system. This operation will be done in the client side.

context = ts.context_from(read_data("secret.txt"))

enc_v1 = ts.ckks_vector(context, img1_embedding)
enc_v2 = ts.ckks_vector(context, img2_embedding)

enc_v1_proto = enc_v1.serialize()
enc_v2_proto = enc_v2.serialize()

write_data("enc_v1.txt", enc_v1_proto)
write_data("enc_v2.txt", enc_v2_proto)

Here, enc_v1 and enc_v2 pairs are homomorphic encrypted tensors of facial embeddings. We can store the contents of enc_v1.txt and enc_v2.txt in the cloud.

Calculations

This operation will be done in the server side. Once we have the homomorphic encrypted tensors, we are able to make calculations on the encrypted data. We basically need to find the euclidean distance between two vectors to determine an identity.

Let’s remember the formula of euclidean distance. We need to find the difference of each dimension value first and find its squared value second. The sum of this operation will find the squared euclidean distance value. This is demonstrated for 3 dimensional space below but this is still applicable for multidimensional vectors. Remember that FaceNet create 128 dimensional vector embeddings.

p1 = (x1, y1, z1), p2 = (x2, y2, z2)

d2 = (x1 – x2)2 + (y1 – y2)2 + (z1 – z2)2

In the cloud side, we have homomorphic encrypted tensors for both p1 and p2. Let’s code this logic with SEAL.

#cloud system will have the public key
context = ts.context_from(read_data("public.txt"))

#restore the embedding of person 1
enc_v1_proto = read_data("enc_v1.txt")
enc_v1 = ts.lazy_ckks_vector_from(enc_v1_proto)
enc_v1.link_context(context)

#restore the embedding of person 2
enc_v2_proto = read_data("enc_v2.txt")
enc_v2 = ts.lazy_ckks_vector_from(enc_v2_proto)
enc_v2.link_context(context)

#euclidean distance
euclidean_squared = enc_v1 - enc_v2
euclidean_squared = euclidean_squared.dot(euclidean_squared)

#store the homomorphic encrypted squared euclidean distance
write_data("euclidean_squared.txt", euclidean_squared.serialize())

Remember that we drop the private key when we store the public key. So, euclidean squared variable is homomorphic encrypted and cloud system must not decrypt it. Let’s try to decrypt it.





try:
    euclidean_squared.decrypt()
except Exception as err:
    print("Exception: ", str(err))

As expected, decrypt function throws an exception mentioning the current context of the tensor doesn’t hold a secret_key, please provide one as argument. So, you have to have the secret key to decrypt the data.

Decryption

We will transfer the homomorphic encrypted euclidean squared value from cloud to client.

#client has the secret key
context = ts.context_from(read_data("secret.txt"))

#load euclidean squared value
euclidean_squared_proto = read_data("euclidean_squared.txt")
euclidean_squared = ts.lazy_ckks_vector_from(euclidean_squared_proto)
euclidean_squared.link_context(context)

#decrypt it
euclidean_squared_plain = euclidean_squared.decrypt()[0]

Euclidean squared plain variable stores the 66.36 value in my case. Facenet face recognition model and euclidean distance value is tuned for 10 threshold value in deepface. In other words, the threshold value of the squared euclidean distance should be 100. We need to check the plain value is less than the threshold value 100.

if euclidean_squared_plain < 100:
    print("they are same person")
else:
    print("they are different persons")

This control will return the decision that they are same person. They really are!

Validation

We handled euclidean distance formula on encrypted data. What if we done everything in the client side? Squared euclidean distance value should be the same.

from deepface.commons import distance as dst

distance = dst.findEuclideanDistance(img1_embedding, img2_embedding)
squared_distance = distance * distance

I got 66.36774485705642 when I applied homomorphic encryption whereas I got 66.3677359167053 when I run everything in the client side. The difference comes from the rounding and it is totally acceptable.

Conclusion

So, homomorphic encryption is a public key cryptography method and it enables to make calculations on encrypted data. We adopted fully homomorphic TenSEAL library for a facial recognition use-case in this tutorial. In that way, facial embeddings can be stored in the cloud system without any concern of privacy.

Unfortunately, fully homomorphic algorithms cannot perform compare operator such as being greater than anything. That’s why, we have to decrypt the result in the client side and determine. So, finding an identity in a set will have O(n) time complexity which is not pretty.

I pushed the source code of this tutorial into the GitHub. You can star⭐️ the repo to support this study.






Like this blog? Support me on Patreon

Buy me a coffee


1 Comment

  1. Hi Sir ,

    your blog is very useful to my project, your explanation is also good , do you have any videos related to Facial recognition using Fully homomorphic encryption (FHE). In this video , u can explain about tenseal using Homomrphic encryption but i am very curious about how FHE works while using Tenseal with machine learning algorithms

Comments are closed.