Google introduced a new API called the Google Cloud Vision API. This API has full potential to understand the contents of an image by using the Google’s machine learning platform and TensorFlow. It allows the API to process individual pieces of an image separately and return the results really fast in a unified format. Therefore in simple terms now you can submit an image to the Google Cloud Vision API and find out what’s in the image. What’s more; is that, when making a request to process an image; Google gives us the capability to specify the types of analysis that should be performed on this image. For example, a simple object identification, landmark detection, facial detection, sentimental analysis and many more such analysis could be performed on an image. But what’s even more great is that, this API can also be integrated directly into the Android apps. Making Android image recognition very simple. Also this is another way, Google lets us perform image recognition on Android, powered by Google cloud platform. Imagine the power an end user can have through this API.
The new Google Cloud Vision API is a multi-platform solution for image recognition, weather its an Android app, iOS app or cloud storage, this API is available for image analysis. As this Cloud Vision API has SDK support for Java, Go Lang, Node.js, Python, and more importantly JSON format. But here in this article we would mainly discuss on how to do image recognition on android. Therefore continuing on Java/Android, you might be aware that, there are many ways to perform android image recognition like OpenCV, OCR reading libraries and facial recognition APIs. But none of them are as accurate and light weight as this new Android Cloud Vision API as they require a huge amount of data to be present on the app beforehand to perform object matching. Which in turn increases the APK size and still result in in-accurate data. Also to use this new Android Cloud Vision API and perform all sorts of image analysis; no heavy gradle or project files are required to be included in your project. Just the basic google-api-client-android
, google-http-client-gson
and google-api-services-vision
dependencies are required in your gradle for models. Read more about it in the next section:
Enabling yourself to use Google Cloud Vision API on Android
To perform Android image recognition using the Google Cloud Vision API, we must first enable it from the Google Cloud Developer Console, please follow the steps:
- Create a project in Google Cloud Console or use an existing one.
- Enable Billing for the project. if you have a new or un-used account you can start a free trial (It might ask your credit card info to validate your identity but not charge you).
- Enable the Google Cloud Vision API using this link. OR
- Navigate to “API Manager” section from the hamburger menu.
- Search and select “Google Cloud Vision API”.
- Enable it.
- Then go to the credentials section from the side menu.
- Click on credentials drop down menu and select OAuth Client ID.
- Select Application Type as Android.
- Add a suitable name like Android client for Cloud Vision API
- Enter your SHA1 fingerprint in the desired format. Using the mentioned command on screen or use this SHA1 fingerprint tutorial to get your fingerprint.
- Enter the package name for your app, can be located in the
defaultConfig
block of your gradle. - Click on create.
- That’s it, you’re done.
If you are planning to access Google Cloud Vision API through some other platform you may need to create the credentials in a different manner. As when accessing this API on Android to perform image analysis, we need the end user’s consent that we are accessing their pictures. Although if you are planning to perform image recognition in background, like a server to server interaction. Then things should be handled differently. We may not need to generate an OAuth ID, instead a Service Account key may be required. But lets not discuss that here; as here in this tutorial our main focus is to perform; Android image recognition. Therefore before moving on to next section lets add the following dependencies in your app gradle, although at the end of this example full source code is available:
dependencies { compile fileTree(dir: 'libs', include: ['*.jar']) testCompile 'junit:junit:4.12' compile 'com.android.support:appcompat-v7:23.4.0' compile 'com.google.android.gms:play-services-base:9.0.2' compile 'com.google.android.gms:play-services-auth:9.0.2' compile 'com.google.apis:google-api-services-vision:v1-rev16-1.22.0' compile ('com.google.api-client:google-api-client-android:1.22.0') { exclude module: 'httpclient' } compile ('com.google.http-client:google-http-client-gson:1.20.0') { exclude module: 'httpclient' } }
Supported Image Analysis Techniques in Cloud Vision API on Android
In general Google Cloud Vision API support many types of image analysis techniques. Be it optical character recognition, landmark detection or a simple logo detection. Cloud vision API does it very accurately. Also the best part about this API is that; its cross platform and is available via API access, which results in very light weight applications. Also since this technology is comparatively new, there is a lot of scope for improvement. And while that’s happening behind the scenes we don’t need to worry about the changing code-base as basically its just an API call. Moving on lets have a look at the supported Android image recognition techniques.
1. LABEL_DETECTION
One of the most basic types of techniques available in Android Cloud Vision API is the label detection. This allows the API to analyse image content and list out all the items contained in it individually. Have a look at the image I uploaded:
2. TEXT_DETECTION
Another interesting type of image analysis technique in Google Cloud Vision API is this text detection feature. It allows us to extract out the text from an image. My personal opinion is that, this feature alone could turn out to be the torch bearer for whole Cloud Vision API suite.
3. LANDMARK_DETECTION
Google has gathered so much of data that now they can even identify a landmark, by just scanning through its picture. In a way it might just sound a little scary, but this image recognition technique is equally powerful and useful. Imagine how useful it could be in adding caption on your vacation images automatically.
4. LOGO_DETECTION
By using the new Cloud Vision API you can also identify some of the popular logos. Although when I tried this API with some logos, it wasn’t able to identify most of them. But, maybe in future this will get improved.
5. FACE_DETECTION
This is also one the very powerful features of the Google Cloud Vision API. This allows us to mark the number of faces in a picture. It also helps in identifying the placement of individual facial features in an image. By using this image recognition technique on Android we can highlight a face with a polygon very easily.
6. SAFE_SEARCH_DETECTION
This technique allows us to detect inappropriate images. This could majorly help in moderation of images when performing a server to server integration. Currently it supports four types of annotations adult, spoof, medical and violence.
7. IMAGE_PROPERTIES
Android Cloud Vision API also provides a way to identify the dominant colors in an image. In a way this feature would not be very much used on Android as a great alternative in form of Palette library is already available, through which dominant colors can be identified.
Is Google Cloud Vision API Free ?
The answer is NO. To perform image recognition on android via this API, I assume an infrastructure cost is involved. Therefore just like most of the Google products, Cloud Vision API is FREE for initial use on all platforms but when usage increases, they will start charging you. Please find the detailed pricing plan here.
Android Image Recognition using Google’s Cloud Vision API
Behind the scenes Google is harnessing the power of TensorFlow and Machine Learning platforms to perform this powerful image analysis on Android. As I mentioned earlier through this Android image recognition technique, we can categorize our images in to thousands of tags. This will definitely help us in organizing our data in a better way; leading us to build better location aware apps. To build an Android image recognition app using the Cloud Vision API, hope you have enabled the API from the Cloud Console using the steps mentioned in the first section of this tutorial and included all the dependencies in your build.gradle
file. Next lets move on to defining a layout where all the results would be displayed. (Although full source code is available at the end of the tutorial.)
<?xml version="1.0" encoding="utf-8"?> <RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android" xmlns:tools="http://schemas.android.com/tools" android:layout_width="match_parent" android:layout_height="match_parent" android:paddingBottom="@dimen/activity_vertical_margin" android:paddingLeft="@dimen/activity_horizontal_margin" android:paddingRight="@dimen/activity_horizontal_margin" android:paddingTop="@dimen/activity_vertical_margin" tools:context="com.truiton.cloudvisionapi.MainActivity"> <TextView android:id="@+id/selected_image_txt" android:layout_width="match_parent" android:layout_height="wrap_content" android:layout_alignParentTop="true" android:layout_marginTop="10dp" android:text="Selected Image: "/> <ImageView android:id="@+id/selected_image" android:layout_width="wrap_content" android:layout_height="wrap_content" android:layout_below="@+id/selected_image_txt" android:layout_centerHorizontal="true" android:layout_marginTop="10dp"/> <TextView android:id="@+id/result" android:layout_width="match_parent" android:layout_height="wrap_content" android:layout_below="@+id/selected_image" android:layout_marginTop="10dp"/> <Button android:id="@+id/select_image_button" android:layout_width="wrap_content" android:layout_height="wrap_content" android:layout_alignParentBottom="true" android:layout_centerHorizontal="true" android:text="Select Image"/> </RelativeLayout>
Next lets define the GET_ACCOUNTS
and INTERNET
permission in the manifest. As to perform an OAuth request with Android to Could Vision API we need to access the account information on the android device.
<?xml version="1.0" encoding="utf-8"?> <manifest package="com.truiton.cloudvisionapi" xmlns:android="http://schemas.android.com/apk/res/android"> <uses-permission android:name="android.permission.GET_ACCOUNTS"/> <uses-permission android:name="android.permission.INTERNET"/> <application android:allowBackup="true" android:icon="@mipmap/ic_launcher" android:label="@string/app_name" android:supportsRtl="true" android:theme="@style/AppTheme"> <activity android:name=".MainActivity"> <intent-filter> <action android:name="android.intent.action.MAIN"/> <category android:name="android.intent.category.LAUNCHER"/> </intent-filter> </activity> </application> </manifest>
In this example to make a Google Cloud Vision API request on Android, we will be using Google API Client library for java. And as you might know, to perform a Google API Client OAuth request on Android we first need to get an auth token from google via a client call. Therefore lets first define a class to get an OAuth token.
Please Note: Don’t forget to generate a ClientID for your OAuth token in the Cloud Console, using the steps mentioned in first section.
package com.truiton.cloudvisionapi; import android.accounts.Account; import android.app.Activity; import android.os.AsyncTask; import com.google.android.gms.auth.GoogleAuthException; import com.google.android.gms.auth.GoogleAuthUtil; import com.google.android.gms.auth.UserRecoverableAuthException; import java.io.IOException; /** * Created by MG on 04-06-2016. */ public class GetTokenTask extends AsyncTask<Void, Void, Void> { Activity mActivity; String mScope; Account mAccount; int mRequestCode; GetTokenTask(Activity activity, Account account, String scope, int requestCode) { this.mActivity = activity; this.mScope = scope; this.mAccount = account; this.mRequestCode = requestCode; } @Override protected Void doInBackground(Void... params) { try { String token = fetchToken(); if (token != null) { ((MainActivity)mActivity).onTokenReceived(token); } } catch (IOException e) { e.printStackTrace(); } return null; } /** * Gets an authentication token from Google and handles any * GoogleAuthException that may occur. */ protected String fetchToken() throws IOException { String accessToken; try { accessToken = GoogleAuthUtil.getToken(mActivity, mAccount, mScope); GoogleAuthUtil.clearToken (mActivity, accessToken); // used to remove stale tokens. accessToken = GoogleAuthUtil.getToken(mActivity, mAccount, mScope); return accessToken; } catch (UserRecoverableAuthException userRecoverableException) { mActivity.startActivityForResult(userRecoverableException.getIntent(), mRequestCode); } catch (GoogleAuthException fatalException) { fatalException.printStackTrace(); } return null; } }
Moving on lets define the MainActivity where the Android image recognition would take place. In this example we would perform LABEL_DETECTION
, TEXT_DETECTION
, and LANDMARK_DETECTION
. Although if you wish to perform more analysis on single image. You can add more feature requests on the same image upload. But since this Google Could Vision API is in very nascent stage, therefore as and when more features are added, it gets slower.
package com.truiton.cloudvisionapi; import android.Manifest; import android.accounts.Account; import android.accounts.AccountManager; import android.content.Intent; import android.content.pm.PackageManager; import android.graphics.Bitmap; import android.net.Uri; import android.os.AsyncTask; import android.provider.MediaStore; import android.support.annotation.NonNull; import android.support.v4.app.ActivityCompat; import android.support.v7.app.AppCompatActivity; import android.os.Bundle; import android.util.Log; import android.view.View; import android.widget.Button; import android.widget.ImageView; import android.widget.TextView; import android.widget.Toast; import com.google.android.gms.auth.GoogleAuthUtil; import com.google.android.gms.common.AccountPicker; import com.google.api.client.extensions.android.http.AndroidHttp; import com.google.api.client.googleapis.auth.oauth2.GoogleCredential; import com.google.api.client.googleapis.json.GoogleJsonResponseException; import com.google.api.client.http.HttpTransport; import com.google.api.client.json.JsonFactory; import com.google.api.client.json.gson.GsonFactory; import com.google.api.services.vision.v1.Vision; import com.google.api.services.vision.v1.model.AnnotateImageRequest; import com.google.api.services.vision.v1.model.BatchAnnotateImagesRequest; import com.google.api.services.vision.v1.model.BatchAnnotateImagesResponse; import com.google.api.services.vision.v1.model.EntityAnnotation; import com.google.api.services.vision.v1.model.Feature; import com.google.api.services.vision.v1.model.Image; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.util.ArrayList; import java.util.List; import java.util.Locale; public class MainActivity extends AppCompatActivity { private static String accessToken; static final int REQUEST_GALLERY_IMAGE = 10; static final int REQUEST_CODE_PICK_ACCOUNT = 11; static final int REQUEST_ACCOUNT_AUTHORIZATION = 12; static final int REQUEST_PERMISSIONS = 13; private final String LOG_TAG = "MainActivity"; private ImageView selectedImage; private TextView resultTextView; Account mAccount; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); Button selectImageButton = (Button) findViewById(R.id .select_image_button); selectedImage = (ImageView) findViewById(R.id.selected_image); resultTextView = (TextView) findViewById(R.id.result); selectImageButton.setOnClickListener(new View.OnClickListener() { @Override public void onClick(View v) { ActivityCompat.requestPermissions(MainActivity.this, new String[]{Manifest.permission.GET_ACCOUNTS}, REQUEST_PERMISSIONS); } }); } private void launchImagePicker() { Intent intent = new Intent(); intent.setType("image/*"); intent.setAction(Intent.ACTION_GET_CONTENT); startActivityForResult(Intent.createChooser(intent, "Select an image"), REQUEST_GALLERY_IMAGE); } @Override public void onRequestPermissionsResult(int requestCode, @NonNull String[] permissions, @NonNull int[] grantResults) { super.onRequestPermissionsResult(requestCode, permissions, grantResults); switch (requestCode) { case REQUEST_PERMISSIONS: if (grantResults.length > 0 && grantResults[0] == PackageManager.PERMISSION_GRANTED) { getAuthToken(); } else { Toast.makeText(MainActivity.this, "Permission Denied!", Toast.LENGTH_SHORT).show(); } } } @Override protected void onActivityResult(int requestCode, int resultCode, Intent data) { super.onActivityResult(requestCode, resultCode, data); if (requestCode == REQUEST_GALLERY_IMAGE && resultCode == RESULT_OK && data != null) { uploadImage(data.getData()); } else if (requestCode == REQUEST_CODE_PICK_ACCOUNT) { if (resultCode == RESULT_OK) { String email = data.getStringExtra(AccountManager.KEY_ACCOUNT_NAME); AccountManager am = AccountManager.get(this); Account[] accounts = am.getAccountsByType(GoogleAuthUtil.GOOGLE_ACCOUNT_TYPE); for (Account account : accounts) { if (account.name.equals(email)) { mAccount = account; break; } } getAuthToken(); } else if (resultCode == RESULT_CANCELED) { Toast.makeText(this, "No Account Selected", Toast.LENGTH_SHORT) .show(); } } else if (requestCode == REQUEST_ACCOUNT_AUTHORIZATION) { if (resultCode == RESULT_OK) { Bundle extra = data.getExtras(); onTokenReceived(extra.getString("authtoken")); } else if (resultCode == RESULT_CANCELED) { Toast.makeText(this, "Authorization Failed", Toast.LENGTH_SHORT) .show(); } } } public void uploadImage(Uri uri) { if (uri != null) { try { Bitmap bitmap = resizeBitmap( MediaStore.Images.Media.getBitmap(getContentResolver(), uri)); callCloudVision(bitmap); selectedImage.setImageBitmap(bitmap); } catch (IOException e) { Log.e(LOG_TAG, e.getMessage()); } } else { Log.e(LOG_TAG, "Null image was returned."); } } private void callCloudVision(final Bitmap bitmap) throws IOException { resultTextView.setText("Retrieving results from cloud"); new AsyncTask<Object, Void, String>() { @Override protected String doInBackground(Object... params) { try { GoogleCredential credential = new GoogleCredential().setAccessToken(accessToken); HttpTransport httpTransport = AndroidHttp.newCompatibleTransport(); JsonFactory jsonFactory = GsonFactory.getDefaultInstance(); Vision.Builder builder = new Vision.Builder (httpTransport, jsonFactory, credential); Vision vision = builder.build(); List<Feature> featureList = new ArrayList<>(); Feature labelDetection = new Feature(); labelDetection.setType("LABEL_DETECTION"); labelDetection.setMaxResults(10); featureList.add(labelDetection); Feature textDetection = new Feature(); textDetection.setType("TEXT_DETECTION"); textDetection.setMaxResults(10); featureList.add(textDetection); Feature landmarkDetection = new Feature(); landmarkDetection.setType("LANDMARK_DETECTION"); landmarkDetection.setMaxResults(10); featureList.add(landmarkDetection); List<AnnotateImageRequest> imageList = new ArrayList<>(); AnnotateImageRequest annotateImageRequest = new AnnotateImageRequest(); Image base64EncodedImage = getBase64EncodedJpeg(bitmap); annotateImageRequest.setImage(base64EncodedImage); annotateImageRequest.setFeatures(featureList); imageList.add(annotateImageRequest); BatchAnnotateImagesRequest batchAnnotateImagesRequest = new BatchAnnotateImagesRequest(); batchAnnotateImagesRequest.setRequests(imageList); Vision.Images.Annotate annotateRequest = vision.images().annotate(batchAnnotateImagesRequest); // Due to a bug: requests to Vision API containing large images fail when GZipped. annotateRequest.setDisableGZipContent(true); Log.d(LOG_TAG, "sending request"); BatchAnnotateImagesResponse response = annotateRequest.execute(); return convertResponseToString(response); } catch (GoogleJsonResponseException e) { Log.e(LOG_TAG, "Request failed: " + e.getContent()); } catch (IOException e) { Log.d(LOG_TAG, "Request failed: " + e.getMessage()); } return "Cloud Vision API request failed."; } protected void onPostExecute(String result) { resultTextView.setText(result); } }.execute(); } private String convertResponseToString(BatchAnnotateImagesResponse response) { StringBuilder message = new StringBuilder("Results:\n\n"); message.append("Labels:\n"); List<EntityAnnotation> labels = response.getResponses().get(0).getLabelAnnotations(); if (labels != null) { for (EntityAnnotation label : labels) { message.append(String.format(Locale.getDefault(), "%.3f: %s", label.getScore(), label.getDescription())); message.append("\n"); } } else { message.append("nothing\n"); } message.append("Texts:\n"); List<EntityAnnotation> texts = response.getResponses().get(0) .getTextAnnotations(); if (texts != null) { for (EntityAnnotation text : texts) { message.append(String.format(Locale.getDefault(), "%s: %s", text.getLocale(), text.getDescription())); message.append("\n"); } } else { message.append("nothing\n"); } message.append("Landmarks:\n"); List<EntityAnnotation> landmarks = response.getResponses().get(0) .getLandmarkAnnotations(); if (landmarks != null) { for (EntityAnnotation landmark : landmarks) { message.append(String.format(Locale.getDefault(), "%.3f: %s", landmark.getScore(), landmark.getDescription())); message.append("\n"); } } else { message.append("nothing\n"); } return message.toString(); } public Bitmap resizeBitmap(Bitmap bitmap) { int maxDimension = 1024; int originalWidth = bitmap.getWidth(); int originalHeight = bitmap.getHeight(); int resizedWidth = maxDimension; int resizedHeight = maxDimension; if (originalHeight > originalWidth) { resizedHeight = maxDimension; resizedWidth = (int) (resizedHeight * (float) originalWidth / (float) originalHeight); } else if (originalWidth > originalHeight) { resizedWidth = maxDimension; resizedHeight = (int) (resizedWidth * (float) originalHeight / (float) originalWidth); } else if (originalHeight == originalWidth) { resizedHeight = maxDimension; resizedWidth = maxDimension; } return Bitmap.createScaledBitmap(bitmap, resizedWidth, resizedHeight, false); } public Image getBase64EncodedJpeg(Bitmap bitmap) { Image image = new Image(); ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); bitmap.compress(Bitmap.CompressFormat.JPEG, 90, byteArrayOutputStream); byte[] imageBytes = byteArrayOutputStream.toByteArray(); image.encodeContent(imageBytes); return image; } private void pickUserAccount() { String[] accountTypes = new String[]{GoogleAuthUtil.GOOGLE_ACCOUNT_TYPE}; Intent intent = AccountPicker.newChooseAccountIntent(null, null, accountTypes, false, null, null, null, null); startActivityForResult(intent, REQUEST_CODE_PICK_ACCOUNT); } private void getAuthToken() { String SCOPE = "oauth2:https://www.googleapis.com/auth/cloud-platform"; if (mAccount == null) { pickUserAccount(); } else { new GetTokenTask(MainActivity.this, mAccount, SCOPE, REQUEST_ACCOUNT_AUTHORIZATION) .execute(); } } public void onTokenReceived(String token){ accessToken = token; launchImagePicker(); } }
To perform a successful Android image recognition, using the new Google Cloud Vision API, we first need to authenticate the our app and generate an OAuth token for making an API call. But before generating an OAuth token we may first need to pick an account through which the OAuth token can be generated. But to pick an account we first need to get a permission called GET_ACCOUNTS
. Once we have the permission to get accounts, we can call the method getAuthToken()
mentioned above to pick an account and get its OAuth token. This method internally takes consent from the user, using the GetTokenTask
; after they select an account using the pickUserAccount()
method. Now once we have this API token we can call the launchImagePicker()
method to pick an image from gallery and pass it on to the callCloudVision()
method which uploads the image to Google Cloud Vision API and applies all the Android image analysis techniques on it. This method internally encodes the bitmap into a jpeg, creates and sends a request to the google cloud vision API. Further we parse the result and display it on screen. View the full source code here:
In the above Android image recognition example, we simply picked an account authenticated it, selected and uploaded an image to perform image analysis on it. Although we applied very basic set of image recognition techniques, like LABEL_DETECTION
, TEXT_DETECTION
, and LANDMARK_DETECTION
. Which gave deep insights on how google looks at an image. Maybe in future Google would enhance this API and launch something like this for videos as well. But as of now even the simple Google Cloud Vision API for images is doing wonders. Let me know what you guys are planning to build with these awesome set of APIs. Connect with us on Facebook, Google+ and Twitter for more updates.
Born in New Delhi, India. A software engineer by profession, an android enthusiast and an evangelist. My motive here is to create a group of skilled developers, who can develop something new and good. Reason being programming is my passion, and also it feels good to make a device do something you want. In a very short span of time professionally I have worked with many tech firms. As of now too, I am employed as a senior engineer in a leading tech company. In total I may have worked on more than 20 projects professionally, and whenever I get spare time I share my thoughts here at Truiton.
Thank you, Mohit, for your tuts
and pricing plan link changed to this https://cloud.google.com/vision/pricing
google cloud vision api is not accepting my credit card,its says that “Correct this card info or try a different card” . i tried with differnt cards but not helped..plz help me
It takes only international cards.
why i got this error?
com.google.android.gms.auth.GoogleAuthException: UNREGISTERED_ON_API_CONSOLE
Have you solved this problem?
I m experiencing same problem. triend lots of things, can’t solve
pls help
This helped me out: https://stackoverflow.com/questions/40941556/com-google-android-gms-auth-googleauthexception-unregistered-on-api-console
Also refer : https://www.truiton.com/2015/04/obtaining-sha1-fingerprint-android-keystore/
How do I upload the image to Cloud vision if I already have a preloaded image on my imageview?. It’s just that I already have a crop function and the finished images was loaded to the imageview alright so there will be no need to launch the imagepicker. TIA
Try using the Mobile Vision API. They perform image recognition on your device.
Hey Mohit,
Thank you for this!
How do I use the WEB_DETECTION feature? Can’t seem to find any samples for that.
Hi Rohan,
Will update the article soon.
-Thanks