Faces: AI Blitz XIII with Team GLaDOS

Sneha Nanavati
AIcrowd
Published in
10 min readMay 26, 2022

--

This blog covers the top winning solutions for all the puzzles in Blitz XIII: Faces. It covers the leaderboard winning team GLaDOS and their paper detailing their solutions. Keep reading to learn more.

Introducing Team GLaDOS

AIcrowd Blitz XIII was themed around Faces. The Blitz challenge consisted of five puzzles: Sentiment Classification, Age Prediction, Mask Prediction, Face Recognition and Face De-Blurring. The team GLaDOS consisting of username kita, eren23, kadruschki, Hik, led by izmehd secured the second position on the leaderboard, winning $100 in cash prize.

The members of team GLaDOS published a paper detailing their approach and solutions, sharing it with the community. The paper “Faces: AI Blitz XIII Solutions” is authored by Andrew Melnik, Eren Akbulut, Jannik Sheikh, Kira Loos, Michael Büttner and Tobias Lenze. The authors are all students at Bielefeld University in the Faculty of Technology in Germany. The team has open-sourced their code implementation for all five puzzles; you can explore them over here. We’ll now go through each puzzle and understand more about the dataset, the methods and the final results. You can read the paper over here. The team also published a survey on Deep Learning for Face Generation and Editing, which may be of interest to those interested in the topic.

What is Blitz 13: Faces all about?

Face recognition technology was seen as something out of science fiction until recently. But the technological advances in the past few years have made it viable and widespread. The fast-growing field of facial recognition has several real-world applications ranging from banking, law enforcement, biometrics, retail and more. One of the most common uses is the Face Unlock that you might be using on your phones. Like any technology, face recognition is not free of controversy. The technology can aid forensics, make our devices safer, or help solve cases of missing persons. Yet it can also be used for purposes such as mass surveillance. But face recognition is here to stay, and the first way to ensure its appropriate use is to understand its technology.

We build Blitz 13 to celebrate the good AI can do for the world through computer vision and facial recognition. Our aim is to use AI to increase accessibility for all. This Blitz consists of essential computer vision AI puzzles around face recognition. By solving these five puzzles, you will be well-versed in basic face recognition problems in AI. Using the easy-2-understand starter kits, you can make a submission for these puzzles in less than 15 minutes.

Let’s dive deeper into these puzzles and see how team GLaDOS solved them.

Sentiment Classification Puzzle

In this puzzle, participants are given embedding generated from an input image of a face, and your model needs to classify the expression in one of the three categories — negative, neutral and positive.

​​The dataset doesn’t contain real images but 512-dimensional embedding vectors generated from images of faces by models such as ResNet18 or VGG16. The embedding vectors are extracted from a specific layer of such a model. The dataset is divided into a training set consisting of 5000 samples, a validation set consisting of 2000 samples, and a test set consisting of 3001 samples. The metrics used for evaluation are the F1 score (average=weighted) as a primary score and the accuracy score as a secondary score.

How did they solve this puzzle?

The team tested the following approaches: Random Forest Classifier (RFC), K-Nearest Neighbours, Gaussian Naive Bayes, LightGBM, Neural Networks, and Support Vector Machines (SVM).

The Support Vector Classifier (SVC) gave the best results out of all the models. The multi-class classification is performed using a one-vs-rest approach. Normalization with l1 norm for predictions was the preprocessing method that improved results, albeit small.

Results

For the predictions, they used the SVC with l1 normalization. They used tenfold cross-validation by combining the training and validation sets, shuffling them and normalizing them, and obtaining ten estimators. The test set's final F1 and Accuracy Score was 0.806 and 0.806, respectively.

Read about the various models and their results in detail over here.

Age Prediction Puzzle

The participants are tasked with building an age prediction model that accurately predicts the age of a human face from an input image? There are ten buckets, each with an age range of 10 years (0–10, 10–20, 20–30, … 90–100). The task is to predict the age of an input image into one of the buckets.

The data is divided into training, validation, and testing sets. The train set contains 4000, the validation set 2000, and the test set 3000 images. The labels are given in the training and validation parts of the dataset. The distribution of the dataset is shared below.

Images of human faces in the dataset were generated by a StyleGAN3 model trained using the Flickr-Faces-HQ dataset. All images are colour images (RGB) of size 512x512x3. Since the images are generated artificially, some images have unnatural artefacts of human faces. For evaluation of this puzzle, the weighted average F1 score is used as the primary score and the accuracy score as the secondary score. The best score for the F1 score is 1, and the worst score is 0.

How did they solve this puzzle?

Using the You Only Look Once(YOLO) architecture, the team achieved good results. Although YOLO is an object detection algorithm, it only used its class prediction capabilities and abandoned the bounding box prediction functionality. The model was trained to predict the age class.

They merged the training and validation sets to get more images for training. Additionally, the team created a cropped fixed-size version for every image, doubling our training set. They trained YOLOv5 with network size L for 200 epochs and used a batch size of 16. Read more about their novel approach in detail over here.

Results

Our result for the Age Prediction problem with the mentioned method and hyperparameters was an F1 score of 0.870 and an accuracy score of 0.871.

Mask Prediction Puzzle

In this puzzle, you are given an input image of a person wearing a mask, and your task is to predict their mask type. Additionally, identify the mask in the image by creating a bounding box. The provided dataset was divided into three different sets. Train, validation and test sets. Each set contains 5000, 2000, and 3000 512x512pixel images, respectively. Each image contains one of 4 mask types: Surgical, N95, KN95 and Cloth. CSV files are also provided for the training and validation images, which contain information about the mask type for each image and the corresponding bounding box of the mask in pixel coordinates

How did they solve this puzzle?

Like the previous puzzle, You Only Look Once was chosen as the prediction method. YOLOv5 is the fifth iteration of YOLO, a state-of-the-art, real-time object detection system capable of creating bounding boxes around detected objects in images.

Training in YOLOv5 can be started by a command with parameters that specify the image size in pixel, the batch size, the number of epochs, the size of the YOLOv5 network on which to train and the path to the mask_prediction.yml. The training time per epoch for the training set of 3000 images and a batch size of 16 is about 15 minutes on a Tesla K80 via Google Colab. These bounding box coordinates have to be converted into the format of the CSV files to be submitted for scoring. The scoring is based on the Average Precision of mask type and bounding box. Read more about YOLOv5 for mask prediction in detail over here.

Results

The prediction with the highest confidence score for each image was selected as the predicted object. With 99.3% accuracy, the best result was achieved after training for 100 epochs, with a batch size of 16 images and the YOLOv5l model.

Face Recognition Puzzle

Build your very own face recognition system to find a target face from a collection of 100 other faces. Your input will be a target face image, and your output will be the location of the input image from a grid of 100 other faces. Your task is to find a missing face in a crowd (target face).

Each target image shows 100 faces. The unlabeled dataset consists of 1000 missing and target image pairs. The faces were all generated using a StyleGAN3 model. The person in the missing image differs in some attributes in the corresponding target image.

How did they solve this puzzle?

Since the provided dataset is unlabeled, the team used pre-trained models and methods for face recognition.

The team used face-alignment to compute 68 3D landmarks in the first approach. They computed the distances between each possible pair of landmarks for each given face. Afterwards, comparing these distances between the missing face and every face seen in the target image, select the one with the smallest error as the prediction. The distances are normalized by the jaw width, which was defined as the distance between landmarks 1 and 17. This gave an accuracy of 22%, which could not be improved by using only a subset of distances to compare.

They used Google’s MediaPipe to compute 468 3D landmarks in a given face in the second approach. Initially, only the landmark groups which were provided by Google’s MediaPipe were used, but by using groups in addition to that, the accuracy increased to 50%. The team’s own groups provided landmark paths for the nose and the facial structure.

In the third approach, they decided to use a face-recognition method. The team used two different encoding models to find the encodings: small uses five landmarks, and largely uses 68 landmarks. The mean over the distances was calculated by these two models to predict the target images. This increased the initial accuracy for this approach by 2%, resulting in the final accuracy: of 78.8%. Read more about how they achieved the accuracy in detail over here.

Results

For the third approach, the team got an accuracy of 78.8%. The selection of a random image would give an accuracy of 1% since there are 100 possible target images. The prediction time for the notebook was 2.5 hours using Google Colab Pro computation resources.

Face De-blurring Puzzle

You are given a blurred face image as input data in this puzzle. Your task is to convert the blurred face into a clear image. The dataset used for this challenge is images of human faces generated by a StyleGAN3 model, which was trained on the Flickr-FacesHQ dataset. The data is split into train, validation & test sets. The train set contains 5000 pairs of blurred and original images, the validation set 2000 pairs of blurred and original images, and the test set 3000 blurred images. All images are colour images (RGB) of size 512x512x3. Face images are highly structured and composed of several components and semantic information.

How did they solve this puzzle?

The team used the SRCNN approach for his puzzle as it was the best they could do given their computing. SRCNN is an end-to-end learning approach that maps low-resolution images into high-resolution images. This approach is based on a CNN layer and can be adapted to all input sizes of images.

The network only contains 3 CNN layers, the first CNN layer with the kernel size 9 takes a lower resolution image and extracts a set of feature maps, then the second CNN layer with the kernel size 1map there features non-linearly to a higher resolution patch representations and the final CNN layer with the kernel size 5 combines the predictions within a spatial neighbourhood to produce the final high-resolution image. Read about the SRCNN method in detail over here.

Results

In this problem, we got SSIM of 0.748 and an PSNR of 26.627.

Conclusion

The team used a variety of approaches and experiments to get the best score possible using the tools at their disposal. You can read about it in their paper in detail over here.

Did you learn something new from this blog? Let us know your feedback and suggestions for new topics in the comments section. :)

--

--

📚 books 🎥 movies ⚽️ football 📻 21st-century internet hedonist ✏️ garden variety Thoreau making this my Walden 👇https://www.instagram.com/_bubblegumfactory_/