- Previous Blog + Next Blog

Using Google Cloud's Vision API

I must start with; I can't believe I got this to work...

Preamble

For this, the final exam submission, it is required that this website makes use of ONE external dependency or framework in order to explore the concept of augmenting, reflecting or challenging the physical world with the digital; specifically in context of the representation of Johannesburg. Since I suspected that implementing an interesting and useful dependency would take time and effort (I was correct), I also wanted to use a dependency that my character, Eddie, would actually be interested and inspired by as well. In other words, I really wanted this dependency to be something that Eddie would actually want in his life, and one of the most important aspects of Eddie's life is the fact that he lives in Johannesburg.

Thus, my first thought process was:

When Eddie travels around Johannesburg...
How does he view his city?
What does he see?
Does he want to know more?

Another extremely important aspect of Eddie's life is academics; learning and teaching. Eddie has been established as a person with a keen urge to learn. His second love is teaching and he has incorporated studying into most of his life. He holds a diploma in Mechanical Engineering and has studied further in other areas such as Theology and Social Studies. Over and above this, he lectured and taught mechanical drawing for many years.

With this in mind, my next thought was:

Since Eddie LOVES learning...
What if he could learn more about his city no matter where he is?
What if I could implement some dependency into this website that allows the user to simply upload any image, and it immediately tells the user information about the said image? Specifically WHERE the image was taken - perhaps using landmarks as the means of identification. After lots of research, I found the perfect dependency: Google's Cloud Vision API.

About the Vision API

According to Google's documentation, the Cloud Vision API allows developers to easily integrate vision detection features within applications, including image labelling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content.

In other words, this API does EXACTLY what I need it to do! Which was super exciting!

After more research, it became apparent that the Vision API is definitely the best choice for me. It is known as an almost plug-and-play image recognition API and has a very wide range of image analysis options. It is arguably the best API to use for landmark and object detection since it is able to take advantage of Google's own data and machine-learning libraries. It's Optical Character Recognition (OCR) feature is exciting as it can identify not only printed, but also handwritten text, from an image, PDF or even a TIFF file. Most reviews that I read warned that it is very expensive to use for commercial/bulk image analysis, but luckily (see Google Cloud Account section below) there is not only a free-trial option that gives the user $300 free to spend within the first year, but also allows the account to run 1000 different image searches PER analysis category.

Since the Vision API is definitely the best suited API for detecting landmarks - due to Google's large data and machine learning libraries - it was actually disappointing how little of the Johannesburg and even South African landmarks it was able to detect compared to other (usually Northern) countries. I talk more about this in my next blog about Testing the Vision API's Landmark Detection with Local Landmarks, as well as in the actual Vision API section of this website.

Implementing the Vision API

Requirements: Google Cloud Account

In order to use the API, Google requires that I create a Google Cloud account, with billing enabled. I was initially concerned about this, as did not want to spend any money. Fortunately my concerns were unwarranted as a Google Cloud account has a few convenient features:

I thus was able to setup a Google Cloud account with sufficient access to use the Vision API without a hitch. Little did I know, this would be the least of my problems.

Requirements: API Key

To use the API from my website, I had to create an API key that authenticates calls made to the Vision API. It took me SO long to figure out how this worked. I had spent so much time researching and learning about keys and attempting various solutions to no avail. I was honestly was about to give up, when I came across the perfect example of how to do this in Google Cloud's own community GitHub repository. Specifically the GoogleCloudPlatform/web-docs-samples repository.

After successfully using this API key within my HTML and JavaScript, I was able to setup a connection to the Vision API on my website. However, I quickly realised that I had just done something very unsecure and rather dangerous: made my API key public. This means that anyone can simply inspect my website, copy out my key and use it to make authentic calls to the Vision API... Not good! Google even sent me an email saying:

Dear Customer,
We have detected a publicly accessible Google API key associated with the following Google Cloud Platform project:

Project WSOA3028A (id: XXX) with API key XYZ

The key was found at the following URL: JessWhosBack

We believe that you or your organization may have inadvertently published the affected API key in public sources or on public websites (for example, credentials mistakenly uploaded to a service such as GitHub.)

Please note that as the project/account owner, you are responsible for securing your keys. Therefore, we recommend that you take the following steps to remedy this situation:
  1. If this key is intended to be public (or if a publicly accessible key isn’t preventable):
    • Log in to the Google Cloud Console and review the API and billing activity on your account, ensuring the usage is in line with what you expected.
    • Add API key restrictions to your API key, if applicable.
  2. If this key was NOT meant to be public:
    • Regenerate the compromised API key: Search for Credentials in the cloud console platform, Edit the leaked key, and use the Regenerate Key button to rotate the key. For more details, review the instructions on handling compromised GCP credentials.
    • Take immediate steps to ensure that your API key(s) are not embedded in public source code systems, stored in download directories, or unintentionally shared in other ways.
    • Add API key restrictions to your API key, if applicable.
The security of your Google Cloud Platform account(s) is important to us.

Sincerely,
Google Cloud Platform Trust & Safety

All research I did told me that hiding this key when using only JavaScript, HTML, CSS, etc. was impossible. I was completely stuck. Fortunately, following the advice in the above email from Google Cloud, I was able to restrict my API key to ONLY allow specified webpages or domains to call the API - in other words, only THIS website can use the key. While I am definitely not happy with this "solution", I believe it is the best that I can do for this assignment within the allocated time. As soon as this assignment has been marked I will remove the public API key.

Alternatives to the Vision API

There are some amazing and super useful image analysis APIs out there, but hardly any of them facilitate landmark identification or detection. A few alternatives to the Vision API are listed below, with the reasons why they were not chosen.