Using Google Cloud's Vision API

I must start with; I can't believe I got this to work...

Preamble

For this, the final exam submission, it is required that this website makes use of ONE external dependency or framework in order to explore the concept of augmenting, reflecting or challenging the physical world with the digital; specifically in context of the representation of Johannesburg. Since I suspected that implementing an interesting and useful dependency would take time and effort (I was correct), I also wanted to use a dependency that my character, Eddie, would actually be interested and inspired by as well. In other words, I really wanted this dependency to be something that Eddie would actually want in his life, and one of the most important aspects of Eddie's life is the fact that he lives in Johannesburg.

Thus, my first thought process was:

When Eddie travels around Johannesburg...
How does he view his city?
What does he see?
Does he want to know more?

Another extremely important aspect of Eddie's life is academics; learning and teaching. Eddie has been established as a person with a keen urge to learn. His second love is teaching and he has incorporated studying into most of his life. He holds a diploma in Mechanical Engineering and has studied further in other areas such as Theology and Social Studies. Over and above this, he lectured and taught mechanical drawing for many years.

With this in mind, my next thought was:

Since Eddie LOVES learning...
What if he could learn more about his city no matter where he is?

What if I could implement some dependency into this website that allows the user to simply upload any image, and it immediately tells the user information about the said image? Specifically WHERE the image was taken - perhaps using landmarks as the means of identification. After lots of research, I found the perfect dependency: Google's Cloud Vision API.

About the Vision API

According to Google's documentation, the Cloud Vision API allows developers to easily integrate vision detection features within applications, including image labelling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content.

In other words, this API does EXACTLY what I need it to do! Which was super exciting!

After more research, it became apparent that the Vision API is definitely the best choice for me. It is known as an almost plug-and-play image recognition API and has a very wide range of image analysis options. It is arguably the best API to use for landmark and object detection since it is able to take advantage of Google's own data and machine-learning libraries. It's Optical Character Recognition (OCR) feature is exciting as it can identify not only printed, but also handwritten text, from an image, PDF or even a TIFF file. Most reviews that I read warned that it is very expensive to use for commercial/bulk image analysis, but luckily (see Google Cloud Account section below) there is not only a free-trial option that gives the user $300 free to spend within the first year, but also allows the account to run 1000 different image searches PER analysis category.

Since the Vision API is definitely the best suited API for detecting landmarks - due to Google's large data and machine learning libraries - it was actually disappointing how little of the Johannesburg and even South African landmarks it was able to detect compared to other (usually Northern) countries. I talk more about this in my next blog about Testing the Vision API's Landmark Detection with Local Landmarks, as well as in the actual Vision API section of this website.

Implementing the Vision API

Requirements: Google Cloud Account

In order to use the API, Google requires that I create a Google Cloud account, with billing enabled. I was initially concerned about this, as did not want to spend any money. Fortunately my concerns were unwarranted as a Google Cloud account has a few convenient features:

There is a free trial period in which a new account gets $300 free to spend during the trial period (within one year).
A credit card is required to start an account, but this will not be billed, even after the free trial period has ended.
- Projects are required to have billing setup before you can use many of the APIs.
- This is only to prove your identity and authenticity to Google.
Every month, the Vision API allows the first 1000 units analysed PER feature free.
- See Google's pricing page for more details.

I thus was able to setup a Google Cloud account with sufficient access to use the Vision API without a hitch. Little did I know, this would be the least of my problems.

Requirements: API Key

To use the API from my website, I had to create an API key that authenticates calls made to the Vision API. It took me SO long to figure out how this worked. I had spent so much time researching and learning about keys and attempting various solutions to no avail. I was honestly was about to give up, when I came across the perfect example of how to do this in Google Cloud's own community GitHub repository. Specifically the GoogleCloudPlatform/web-docs-samples repository.

After successfully using this API key within my HTML and JavaScript, I was able to setup a connection to the Vision API on my website. However, I quickly realised that I had just done something very unsecure and rather dangerous: made my API key public. This means that anyone can simply inspect my website, copy out my key and use it to make authentic calls to the Vision API... Not good! Google even sent me an email saying:

Dear Customer,
We have detected a publicly accessible Google API key associated with the following Google Cloud Platform project:

Project WSOA3028A (id: XXX) with API key XYZ

The key was found at the following URL: JessWhosBack

We believe that you or your organization may have inadvertently published the affected API key in public sources or on public websites (for example, credentials mistakenly uploaded to a service such as GitHub.)

Please note that as the project/account owner, you are responsible for securing your keys. Therefore, we recommend that you take the following steps to remedy this situation:

If this key is intended to be public (or if a publicly accessible key isn’t preventable):

Log in to the Google Cloud Console and review the API and billing activity on your account, ensuring the usage is in line with what you expected.

Add API key restrictions to your API key, if applicable.

If this key was NOT meant to be public:

Regenerate the compromised API key: Search for Credentials in the cloud console platform, Edit the leaked key, and use the Regenerate Key button to rotate the key. For more details, review the instructions on handling compromised GCP credentials.

Take immediate steps to ensure that your API key(s) are not embedded in public source code systems, stored in download directories, or unintentionally shared in other ways.

Add API key restrictions to your API key, if applicable.

The security of your Google Cloud Platform account(s) is important to us.

Sincerely,
Google Cloud Platform Trust & Safety

All research I did told me that hiding this key when using only JavaScript, HTML, CSS, etc. was impossible. I was completely stuck. Fortunately, following the advice in the above email from Google Cloud, I was able to restrict my API key to ONLY allow specified webpages or domains to call the API - in other words, only THIS website can use the key. While I am definitely not happy with this "solution", I believe it is the best that I can do for this assignment within the allocated time. As soon as this assignment has been marked I will remove the public API key.

Alternatives to the Vision API

There are some amazing and super useful image analysis APIs out there, but hardly any of them facilitate landmark identification or detection. A few alternatives to the Vision API are listed below, with the reasons why they were not chosen.

Amazon's Rekognition API
- Known for the Celebrity Recognition, Capture Movement and Detect Text in Image features, this API also has a free tier for one year, however it is not as well suited to analyse images for landmarks as the Vision API.
IBM's Watson Visual Recognition API
- Known for allowing users to build, train and test custom machine learning models - which sounds SUPER cool, but probably too complicated for this assignment, and does not facilitate landmark detection.
Microsoft Azure Cloud's Computer Vision API
- I think this is the best alternative to the Vision API to facilitate my needs.
- Perhaps the best API for analysing image properties. It also facilitates landmark detection, but does not have as extensive database to draw from as Google. Is also priced per region as well as per number of images analysed.
Clarifai
- This API also allows for machine learning, and is known for it's fashion identification system. Another noteworthy feature is the food algorithm (which can analyse food items down to the ingredient level). Again, does not facilitate landmark detection.
Imagga
- This API is an automated image tagging and categorization API - i.e. a Digital Asset Management API. While an incredibly useful API for the analysis of images and other digital media, this does not suit my need for a landmark identification API.