MARGO

News

Introduction to TensorFlow on the datalab of Google Cloud Platform

Discover TensorFlow and develop your first program


30/05/2018

What is TensorFlow?

 

TensorFlow is a software library, open source since 2015, of numerical computation developed by Google. The particularity of TensorFlow is its use of data flow graphs.

TensorFlow was developed by researchers and engineers from Google to carry out research projects in machine learning as well as on the subject of deep neural networks. The system is nonetheless generalistic enough to be applicable in a wide range of application domains.

TensorFlow can be used to drive models requiring large volumes of data (eg image banks) in an optimal way.

TensorFlow’s flexible architecture makes it possible to deploy the calculation on one or more CPU / GPUs on a personal computer, a server, and so on, without having to rewrite code.

Google has also developed Tensor Processing Units (or TPUs) built specifically for automated learning and for use with TensorFlow. TPUs are intended to use and test models rather than train them. Since February 2018, the TPUs have been available on the beta version of Google Cloud Platform.

TensorFlow is based on the DistBelief infrastructure (Google 2011) and has a Python interface that comes in an atypical low-level form (more adapted to the machine architecture) compared to the usual uses of Python. So do not panic if you need some adaptation time for TensorFlow !

Many companies and applications today use TensorFlow. Among them, Airbnb, Nvidia, Uber, Dropbox, Ebay, Google (of course), Snapchat, Twitter … and many more!

Thanks to the datalab, TensorFlow can be used on the GCP, either on the default configuration, or by customizing a virtual machine: choice of the number of cores, choice CPU / GPU etc.

 

How does TensorFlow look?

 

As mentioned above, TensorFlow represents the calculations as an execution graph.

The nodes of the graph represent mathematical operations, such as addition, multiplication, matrix multiplication, functional derivation, and so on.

The edges of the graph represent the tensors, communicating between the nodes. A tensor can be for example an integer, a vector, an image, etc. Each node of the graph thus takes in input different tensors, carries out its computation, then returns new tensors.

 

Example:

1 + 4 = 5

deriv (2x) = 2

The code associated with TensorFlow is divided into two main stages, construction and execution. During the construction phase, the variables and operations of the graph are defined and assembled. The graph creation is then automatically managed by TensorFlow to allow optimization and parallelization of code and execution.

 

The execution phase uses a session to execute the graph operations. A graph will only execute operations after creating a session. A session is used to place the operations of the graph in the components (CPU / GPU / TPU) and provide methods to execute them. The start of the graph operations is done via the run () method of the session as we will see a little later. This graph execution system is one of the fundamental properties of TensorFlow and allows you to execute all the operations of the graph at one time.

 

TensorFlow has many options for Deep Learning, making it easy to build a neural network, use it, and train it optimally.

 

 

Google Cloud Platform

 

Google Cloud Platform (or GCP) is an online platform developed by Google. This platform provides services for creating virtual machines and networks.

What can we do ?

  • calculate
  • store
  • do automated learning / deep learning

And much more !

In this mini-tutorial, we will teach you how to:

  • open a free account on GCP
  • use datalab on GCP via a personal computer under GNU / Linux
  • write your first TensorFlow program

 

Create an account on GCP

 

How to access GCP?

 

You have access to a free credit of about 300 $ / 250 € offered on the platform valid for one year. You will be asked to fill in the details of your credit card. Your account will NOT be charged. This is a precautionary measure against misuse of GCP.

 

Then,

  • click Discover console. A tutorial starts to accompany your handling of the platform.
  • create / select a project
  • follow the instructions
  • click Go to console

Create (or select) a project on GCP by clicking
https://console.cloud.google.com/cloud-resource-manager

Click the following link to enable Google Compute Engine and Cloud Source Repositories APIs for the selected project:
https://console.cloud.google.com/flows/enableapi?apiid=compute,sourcerepo.googleapis.com

 

Installing gcloud on your computer

 

We will now install Google Cloud SDK on your computer. Google Cloud SDK does not work with Python 3 yet. Verify that you have a version of Python 2 of type Python 2.7.9 or later on your computer using the command

 python2 -V

If necessary, upgrade your Python

     sudo apt-get install python2.7

Now, download one of the following packages depending on the architecture of your machine:

Platform Link Size
Linux 64-bit (x86_64) https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-187.0.0-linux-x86_64.tar.gz 18.5 MB
Linux 32-bit (x86) https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-187.0.0-linux-x86.tar.gz 18.1 MB

You can extract the file wherever you want in your file system.

 tar zxvf filename.tar.gz

 

Then type the following two lines of commands

 source ‘./google-cloud-sdk/path.bash.inc’

 source ‘./google-cloud-sdk/completion.bash.inc’

 

Finally run gcloud init to initialize the SDK.

 ./google-cloud-sdk/bin/gcloud init

 

Follow the configuration instructions: choose / create a project, choose a zone etc.

You can now access help:

 gcloud -help

 

Access to the datalab

 

Now update gcloud:

gcloud components update

 

Install gcloud datalab component now.

gcloud components install datalab

 

Now create a Cloud Datalab instance. The instance name must start with a lowercase letter, followed by a maximum of 63 lowercase letters, numbers, or hyphens.

Warning ! The name of the instance cannot end with a hyphen.

datalab create name_of_your_instance

 

Follow the instructions. Choose a geographic area close to where you are. You must enter a sentence to generate a public key.

Warning ! If the process is too long or fails, feel free to restart the instance creation. During this step, there are often connection problems.

If you go back to your GCP dashboard, and click on the 3 bars at the top left, then Compute Engine, your instance appears as active.

Finally, when the connection to the datalab is established,  open the Cloud Datalab homepage in your browser with the following link: http://localhost:8081

You have access to a Jupyter notebook. Create a new Jupyter notebook by clicking on the ‘+’ symbol to the left of Notebook.

Note: to restart an instance after exiting

datalab connect name_of_your_instance

 

We now offer a micro tutorial introduction to TensorFlow on the Google Cloud Platform.

 

 

My first program in TensorFlow on GCP

 

1) Hello world!

 

Type the following command lines in your Jupyter notebook. To validate a cell, type Shift + Enter.

 import tensorflow as tf # Import the TensorFlow library

hello = tf.constant (‘Hello!’) # Definition of a hello constant containing the string ‘Hello!’

 

If you type hello in your notebook, you get

           <tf.Tensor ‘Const: 0’ shape = () dtype = string>

 

This is normal, it lacks a fundamental TensorFlow brick that we discussed earlier in this article, the session via TensorFlow’s tf.Session () function.

session = tf.Session () # Create a session

An error message may appear, ignore it.

session.run (hello) # Running the session

‘Hello!’

This small example introduces two essential elements for any program in TensorFlow, the session () and run () functions.

 

2) Basic mathematical operations

 

x = tf.constant (3) # Definition of the constant x = 3

 y = tf.constant (2) # Definition of the constant x = 2

X = tf.constant ([1,0], shape = (2,1)) # Definition of the constant vector X

M = tf.constant ([1,1,2,2], shape = (2,2)) # Definition of the constant matrix M

     result_1 = tf.add (x, y) # Addition

     result_2 = tf.multiply (x, y) # Multiplication

resultat_3 = tf.matmul (M, X) # Matrix multiplication

 session = tf.Session () # Create a session

 session.run (result_1) # Run the session

 session.run (result_2) # Run the session

session.run (result_3) # Run the session

As before, do not forget to run the session!

 

3) Variables and initialization

 

In TensorFlow, variables are defined and manipulated by tf.Variable ().

x = tf.Variable (0)

A variable represents a tensor whose value can change by running a calculation.

x = tf.constant (0)

 y = tf.Variable (x + 1)

Note: Variables must be initialized before the graph can use them. The function tf.global_variables_initializer () is used for this purpose.

A small example of a program:

import tensorflow as tf

 x = tf.constant (0)

 y = tf.Variable (x + 1)

 initialization = tf.global_variables_initializer () # Initializing variables

     with tf.Session () as session:

 session.run (initialization)

 print (session.run (y))

 

Example of use: A Variable can be used to hold the weights w of a neural network, and is thus updated during the training of the model.

 

4) Placeholders

A Placeholder is a Tensor created via the tf.placeholder () method. When it is created, we do not assign any precise value to the Placeholder, we only specify the type of data and their dimensions

x = tf.placeholder (tf.float32, shape = (1024, 1024)) # creation of placeholder x of type float32 and dimension (1024, 1024)

A Placeholder can be seen as a Variable that will only receive its data later in the program, during the execution part. The value is set during the run of a calculation.

     x = tf.placeholder (tf.float32, shape = (1024, 1024)) # creating the placeholder x

 y = tf.matmul (x, x) #

 with tf.Session () as sess:

        rand_array = np.random.rand (1024, 1024) # rand_array definition

print (sess.run (y, feed_dict = {x: rand_array})) # give x the value rand_array during execution

 

Example of use: at each iteration of the training of a neural network, a Placeholder is used to feed the model with a new batch of images.

 

Conclusion

You now have a credit of € 250 on GCP and you also know how to access the datalab from your computer. Finally, you have written your first lines of code in TensorFlow and you have the basics to understand advanced codes written in this language.

To go further, you’ll find in the GCP datalab more complex examples of TensorFlow applications, as well as other automated learning tutorials.

Enjoy your exploration!

 

References and sources

https://www.tensorflow.org/

https://github.com/tensorflow/tensorflow

https://cloud.google.com/

https://cloud.google.com/datalab/docs/quickstart

https://cloud.google.com/solutions/running-distributed-tensorflow-on-compute-engine

https://www.nvidia.fr/daa-center/gpu-accelerated-applications/tensorflow/

https://blog.xebia.fr/2017/03/01/tensorflow-deep-learning-episode-1-introduction/

https://learningtensorflow.com/lesson2/

 


Big Data
Data
Google
News

Successfully completing a data project: a path still strewn with pitfalls

In 2020, corporate investment in data projects is expected to exceed 203 billion dollars worldwide. But at a time when many are claiming to be Data Driven Companies, lots of data projects end in failure. Yet most of these failures are unnecessary and due to well-known causes! Focus on the recurrent pitfalls to avoid.

05/02/2019 Discover 
News

Kaggle Challenge: TalkingData AdTracking Fraud Detection

TalkingData, China’s largest independent big data service platform, covers over 70% of active mobile devices nationwide. Their current approach to prevent click fraud for app developers is to measure the journey of a user’s click across their portfolio, and flag IP addresses who produce lots of clicks, but never end up installing apps. While successful, they want to always be one step ahead of fraudsters and have turned to the Kaggle community for help in further developing their solution.

31/05/2018 Discover 
News

Data Science applied to the retail industry: 10 essential use cases

Data Science is having an increasing impact on business models in all industries, including retail. According to IBM, 62% of retailers say the use of Big Data techniques gives them a serious competitive advantage. Knowing what your customer wants and when, is today at your fingertips thanks to data science. You just need the right tools and the right processes. We present in this article 10 essential applications of data science in the field of retail.

31/05/2018 Discover 
News

Lamport clocks and the pattern of the Idempotent Producer (Kafka)

Do you know the Lamport clocks? Devoxx France 2018 was the opportunity, during the very interesting talk of DuyHai DOAN , to discover or rediscover this algorithm formalized by Leslie Lamport in 1978, more than ever used today in the field of distributed systems, and which would have inspired the Kafka developers in the implementation of the pattern of Idempotent Producer .

23/05/2018 Discover 
News

Establishment of a centralised log management platform with the Elastic suite

The volume of data generated by our systems and applications continues to grow, resulting in the proliferation of data centers and data storage systems.  In the face of this data explosion and the investment in skills and resources, decision-makers need sophisticated analysis and sophisticated dashboards to help them manage their systems and customers.

14/05/2018 Discover 
News

Introduction to Reactive Systems

Margo Consultants participated in  Devoxx France 2018 , the conference for Passionate Developers, organized from April 18 to 20, 2018 in Paris. Discover a synthesis on reactive systems illustrated by a concrete use case.

11/05/2018 Discover