Finds similar celebrities given the photo.
You can test the service here: http://95.213.170.235:50002
To build index:
- Extract embeddings.
- Cluster them with K-means algo into K=450 centroids.
- Build inverse index: learn mapping from centroids to list of id of images.
- Subtract centroid's center from corresponding images.
- Build compressed descriptors: compute the sign of scalar products on 512 random directions and add small bias. This step reduces the size of descriptor from 128 to 8 64-bit numbers.
To process new image:
- Load all the data in the memory from prev. algo.
- Find top (K=10 out of 450) nearest centroids.
- For each nearest centroid subtract centroid's center from the query.
- Compute Hamming distance in the compressed descriptors space between the query and all the images in the nearest centroids for each of 8 64-bit numbers.
- Take top N=10 images from each table and sort them using full descriptions by euclidean distance.
- Output top 5 images.
This algo implemented in C++.