Create a deep learning dataset using Google Images

Ravi Ranjan
6 min readJun 12, 2020

Today’s articles is part one of a three part of series on a building a jhonny walker wine.We are going to spend the next three articles building to identify jhonny walker wine.Sometimes DeepLearning Datsets is not available .We make datasets from Google Images.

Part #1:Gather Jhonny walker Red Wine Training data using Google images(this post).

Part #2:Train our Jhonny walker Red Wine detector using deep learning,Python and keras

Part #3:Deploy our trained Deep learning model to the Raspberry pi.

Let’s go ahead and get started!

Using Google Images for training data and machine learning models

The method I’m about to share with you for gathering Google Images for deep learning is from a fellow deep learning practitioner and friend of mine,Singhal,Alok and Trivedi.

I am going to elaborate on these steps and provide further instructions on how you can use this technique to quickl gather training data for deep learning models using Google Images,JavaScript, and a bit of python.

The first step in using Google Images to gather training data for our Convolutional Neural Network is head to Google Images and enter a query.

In this case we’ll be using query term “Jhonny Walker Red Wine”:

Query of Jhonny Walker Red Wine

As you can see from the example image above we have our search results.

The next steps is to use a tiny bit of JavaScript to gather the image URLs.

Fire up the JavaScript Console(I will assume you are using Chrome web browser,but you can use firefox as well) by clicking(Ctrl+Shift+J ) in windows.

Opening Google Chrome’s JavaScript Console

This will enable you to execute JavaScript in REPL-like manner.The next step to Start Scrolling.

from there ,we manually intervene with JavaScript .Switch back to the JavaScript and copy+paste the following function into console to simulate right click on a image.

JavaScript Code:

function simulateRightClick( element ) {

var event1 = new MouseEvent( ‘mousedown’, {

bubbles: true,

cancelable: false,

view: window,

button: 2,

buttons: 2,

clientX: element.getBoundingClientRect().x,

clientY: element.getBoundingClientRect().y

} );

element.dispatchEvent( event1 );

var event2 = new MouseEvent( ‘mouseup’, {

bubbles: true,

cancelable: false,

view: window,

button: 2,

buttons: 0,

clientX: element.getBoundingClientRect().x,

clientY: element.getBoundingClientRect().y

} );

element.dispatchEvent( event2 );

var event3 = new MouseEvent( ‘contextmenu’, {

bubbles: true,

cancelable: false,

view: window,

button: 2,

buttons: 0,

clientX: element.getBoundingClientRect().x,

clientY: element.getBoundingClientRect().y

} );

element.dispatchEvent( event3 );

}

This function effectively simulates right clicking on an image shown in your browser.Notice how the click involves dispatching both a mousedown and mouseup.

Next we’ll define a function to extract the URL:

JavaScript Code:

function getURLParam( queryString, key ) {

var vars = queryString.replace( /^\?/, ‘’ ).split( ‘&’ );

for ( let i = 0; i < vars.length; i++ ) {

let pair = vars[ i ].split( ‘=’ );

if ( pair[0] == key ) {

return pair[1];

}

}

return false;

}

Each image URL is stored in a query string.The snippet above pulls the URL out of the query.

JavaScript Console

Our next function assembles all the URLs in a convenient text file:

JavaScript Code:

function createDownload( contents ) {

var hiddenElement = document.createElement( ‘a’ );

hiddenElement.href = ‘data:attachment/text,’ + encodeURI( contents );

hiddenElement.target = ‘_blank’;

hiddenElement.download = ‘urls.txt’;

hiddenElement.click();

}

Each of our URLs will be in the contents parameter passed to our createDownload function.Here we first create a hidden Element. We then populate it with the contents

,create a destination link with a filename of urls.txt ,and simulate a click of the element.

Ultimately when the createDownload function runs,your browser will trigger a download,Depending on your browser settings,your download may go to your default download location or you may be prompted to select a name and location for your image URLs file download.

Our last function brings the component together:

JavaScript Code:

function grabUrls() {

var urls = [];

return new Promise( function( resolve, reject ) {

var count = document.querySelectorAll(

‘.isv-r a:first-of-type’ ).length,

index = 0;

Array.prototype.forEach.call( document.querySelectorAll(

‘.isv-r a:first-of-type’ ), function( element ) {

// using the right click menu Google will generate the

// full-size URL; won’t work in Internet Explorer

// (http://pyimg.co/byukr)

simulateRightClick( element.querySelector( ‘:scope img’ ) );

// Wait for it to appear on the <a> element

var interval = setInterval( function() {

if ( element.href.trim() !== ‘’ ) {

clearInterval( interval );

// extract the full-size version of the image

let googleUrl = element.href.replace( /.*(\?)/, ‘$1’ ),

fullImageUrl = decodeURIComponent(

getURLParam( googleUrl, ‘imgurl’ ) );

if ( fullImageUrl !== ‘false’ ) {

urls.push( fullImageUrl );

}

// sometimes the URL returns a “false” string and

// we still want to count those so our Promise

// resolves

index++;

if ( index == ( count — 1 ) ) {

resolve( urls );

}

}

}, 10 );

} );

} );

}

Our grabUrls function creates what JavaScript calls a promise.

The promise is that all image URLs will be obtained via the right -click context menu simulation.

Our final snippet which you need to paste into the JavaScript console is what calls our grabUrls function.

JavaScript Code:

grabUrls().then( function( urls ) {

urls = urls.join( ‘\n’ );

createDownload( urls );

} );

Our main entry point to start execution is this call to grabUrls.Notice how each URL is joined by a newline character so that each URL is on its own line in the text file.As you can see,the createDownload function is called from here as the final step.

While this method calls our functions we defined in the JavaScript console directly,alternatively,you could use the logic to create a Chrome Browser plugin without too much hassle.

After executing the above snippet you’ll have a file named urls.txt in your default Downloads directory.

final snippet of Console

Downloading Google Images using Python

Now that we have our urls.txt file,we need to download each of the individual images.

Contents of urls.txt

There are two steps to convert txt file into csv file.

  1. Convert txt file into csv file by using ms-excel.
  2. Convert txt file into csv file by using python-script.

Convert txt file into csv file using ms-excel

Steps:

  1. Open MS-EXCEL .
  2. Open the txt file in MS-EXCEL sheet.
  3. Go to the file Menu and select export option .
  4. Click on Change File type option and click on Comma Separated File(CSV file)option.
  5. Save it

Convert txt file into csv file using Python script

Python Script:

import pandas as pd

df = pd.read_csv(“urls.txt”,delimiter=’,’)
df.to_csv(‘Ravi_urls.csv’)

But, in this txt text file delimiter(‘,’) is not present.that ‘s why we go through other alternative method.

Using requests, we just need to specify the url and a timeout for the download.We attempt to download the image file into a variable .

Snippet of Python Script

Above code,we can seen remove urls from csv file and save into Image file with extension of .jpg.

Directory of Saving images file.

That’s all there is to the Google Images downloader script -It’s pretty self-explanatory.

To download our example images ,make sure you use the “Downloads” section of this article .

you should also expect some images to be corruot and unable to open-these images get deleted from our dataset,

Pruning irrelevant images from our dataset

Of course,not every image we downloaded is relevant.

To resolve this,we need to do a bit of manual inspection.

Snippet shot Downloaded Image
Downloaded Image from Google Image

Finally,it is over .I think ,this article is very useful for Deep learning practitioner.I also expect that you gain Knowledge to download Deep learning Dataset from Google Images.

Thank you so Much !!!!

--

--

Ravi Ranjan

Software Developer-Boeing|MSDS-University of Arizona| Machine Learning|Deep Learning Enthusiast