Programmer/Data Scientist・Mostly write Python & R・Big fan of OpenCV
Published Mar 04, 2018
In this post we’ll be using the pretrained ResNet50 ImageNet weights shipped with Keras as a foundation for building a small image search engine. In the below image we can see some sample output from our final product.
The above diagram shows a high level flow for not only our image search engine, but for search engines in general.
You’ll notice the foundation for searching is built around having quantitative features to compare the query against the potential results. The engine we’re about to build will generate these features using the pretrained ResNet50 ImageNet weights shipped with Keras.
So how do we extract features with the ResNet50 model? Turns out its pretty simple thanks to the amazing work of the Keras developers.
By default, the pretrained model will classify the images we throw at
it. The translation into the ImageNet classes is done by the
fully-connected layer at the ‘top’ of the network. In our case we want
don’t want this top classification layer, so we set
include_top = False and voila, we now have features coming out of the
network. We’ll also set
pooling='avg' to flatten our feature output
into 1 dimension. The full definition of our feature extractor is shown
from keras.applications.resnet50 import ResNet50 model = ResNet50(weights='imagenet', include_top=False, pooling='avg')
With that model definition we’re well on our way to builing a feature DB
for us to search against. The rest of the script to build out a database
of features is iterating over our images, calling
extract featrures, and writing the output to file. Since we’re working
with a relatively small set of images (300) we’re going to write to a
.csv. If we were working with larger data there would come a point
where looking into other storage methods could be beneficial (eg
hdf5 or a
traditional database technology).
The full script that was used to generate features for our search engine
can be found
We can call the script from the command line by providing a path to our
directory of images and the path to our output
python create_imagenet_features.py --dataset icons --output imagenet_features.csv
To perform a search we need to build a framework to accept a query image, extract features, compare these features to the images in our feature DB, and return the images with the most similar features. The first two steps are repeats from buiding up our feature DB. The next 2 steps will have us stepping into new territory.
To be able to compare our features we’ll need to decide on a distance/similarity metric. Potential metrics could be Euclidean distance, cosine distance, and Χ^2^ (chi squared) distance. In our search we’re going to be using Χ^2^ (there was no quantitative testing to arive at this decision, but it qualitatively produces good results with our icon data). The below code chunk shows how we can implement Χ^2^ in python.
import numpy as np # function to compute chi square dist def chi2_distance(histA, histB, eps=1e-10): d = 0.5 * np.sum(((histA - histB) ** 2) / (histA + histB + eps)) return d
The rest of our search script handles applying our distance metric exhaustively to our query and each image in our database. Again, we’re taking advantage of how small our data is; if we had larger data this exhaustive search would be painfully slow and we’d want to take some steps to speed up our queries with some additional preprocessing of our feature DB. Once we’ve calculated the distance between our query and all possible results we’ll do some sorting and return the top N results.
To call the search script you can use the below line from your terminal. We just need to specify the paths to our query image and the feature DB we’re querying against.
python imagenet_search.py --query search.jpg --featuresPath imagenet_features.csv
So we now have an image search engine built up. Let’s take a look at some example results.
The first output we’ll look at is searching with app icons that exist in our feature DB. Since we already have features for these images, our search script looks them up in the database instead of calling the ResNet50 to extract them.
Our first example result is using the Papa’s Burgeria To Go! app icon. There are a few other ‘To Go!’ apps in our dataset, and if our search is performing well we should see these come up in our results. As you can see below, our engine performed exceptionally well on this search.
Our output includes the icon used as our query with a distance of 0.00. This is an uninteresting result, but it serves as confirmation that our distance metric is doing what it’s supposed to. The other aspect in our output to note is our distance metric displayed at the top of each result. The relative differences in these distances are more important to note than the actual values.
Below are some more example results of searching with an image already included in our feature DB.
Let’s say we’re big fans of Assassin’s Creed and want to perform an image search for an app suggestion. To do this we can screen shot our favorite hooded assassin and query our app icons.
Our top result is Assassin’s Creed Identity, which seems to be as good of a result as we could have hoped for. The rest of our results aren’t as relevant, but note that we see a steep increase in distance after our first result.