Successfully completing a data project: a path still strewn with pitfalls

Focus on the recurrent pitfalls to avoid in data projects.

By Pierre Farès Digital Offering & Business Transformation Officer @pierrefares


Companies need to create innovative products adapted to their customers’ much quicker and much more regularly. These days it is the main vector for retaining or increasing market share in a constantly changing economic context. In-depth analysis of market data and consumer behaviours makes it possible to anticipate future requirements. Thus turned towards the identification of future uses, to meet customer expectations or to disrupt the market, a company will be able to take better operational decisions to define its strategy and position itself in its market segment. Data analysis is thus becoming more than ever a major driver of growth. In 2020, corporate investment in data projects is expected to exceed 203 billion dollars worldwide. But at a time when many are claiming to be Data Driven Companies, lots of data projects end in failure. Yet most of these failures are unnecessary and due to well-known causes! Focus on the recurrent pitfalls to avoid.



An approach focused exclusively on the technology

Many data projects concentrate on the implementation of technology solutions (Data Lake, installation of Hadoop clusters, use of a NoSQL database etc.) without considering their purpose, in other words, the requirements or the uses to which they are to be put. The investment is in fact, focused on IT rather than on the operational departments and provides few business benefits. Not all such projects are a waste of money as they at least contribute to developing the technology skills of the IT and IS teams, whilst providing little value at a company level.


Results built on non-industrialisable models

Data Scientists have the required knowledge of models (predictive analysis, machine learning etc.) but usually have little experience of development, especially in an industrial environment. The scripts they provide are often unusable for IT teams. On the IT side, whilst teams are expert in industrialisation aspects, they encounter certain difficulties in understanding the cogs and sequencing of the models proposed by the Data Scientists, whose constraints they don’t always understand. Most data initiatives end up with unsuitable results that are hard to use, both for the analysis of data and for the deployment of the use cases handled. The design of industrial, automatable methods requires a learning process in which the IT department and the Data Science team both participate. This is a preliminary to making effective deployment the rule.


A lack of perspective, analysis and preparation of the company

Companies that introduce data projects often assume that the model and strategies to be implemented are the same as those they have already come across in Business Intelligence projects, thinking that in the end that only the technology and tools will change. This encourages them to retain the same organisation and the same design methods as on previous projects. Yet data science is a discipline based on a prospective approach, with exploratory and trial and error stages. It is impossible to run a project today as we did ten years ago, on the basis of predetermined studies, without a learning phase and without the collaboration of the Data Scientists and IT.


Little collaboration due to in-house cultural differences

The companies, especially the major groups, currently operate with internal silos. There are few synergies between different departments where all work to their own cultures and habits. Confusion might arise when the operational departments, the data science team and IT are working towards distinct and exclusive objectives. Companies lack maturity in respect of the new approaches and working methods, both in terms of the organisation of the work and the way data is handled. To remove this continuing brake on innovation, the various teams must learn to work together more.


An organisation awaiting mobilisation

Despite loading lots of expectations on data projects, top management is still scarcely involved in them. Yet the success of data initiatives also depends on suitable investment in company management. This organisation must be reflected in the entire structure – from management to operational teams – to ensure that the players involved are working towards the same objectives. It must be arranged around a set of common practices capable of developing the skills of every team, from the operational departments to IT.


Complex initiative

A data project is a complex initiative, there is a need for suitable architectures to handle the huge quantities of data (structured and non-structured), advanced statistical techniques (proven algorithms, integration of the predictive dimension, training) and access to multiple sources of information (stored and managed in-house or externally, or shared and accessible via initiatives such as Open Data or Open API).


But the key to success is elsewhere, residing in the structuring of the approach. The only way to avoid the usual pitfalls is to bring them to light upstream, by completing each of the various stages (concept, framing of use cases, trialling), within defined timescales and with precise objectives. Completing these three stages will prove the project’s business benefits before entering the industrialisation phase, giving it maximum chances of success!


By Pierre Farès Digital Offering & Business Transformation Officer @pierrefares
Big Data
Data to business


Kaggle Challenge: TalkingData AdTracking Fraud Detection

TalkingData, China’s largest independent big data service platform, covers over 70% of active mobile devices nationwide. Their current approach to prevent click fraud for app developers is to measure the journey of a user’s click across their portfolio, and flag IP addresses who produce lots of clicks, but never end up installing apps. While successful, they want to always be one step ahead of fraudsters and have turned to the Kaggle community for help in further developing their solution.

31/05/2018 Discover 

Data Science applied to the retail industry: 10 essential use cases

Data Science is having an increasing impact on business models in all industries, including retail. According to IBM, 62% of retailers say the use of Big Data techniques gives them a serious competitive advantage. Knowing what your customer wants and when, is today at your fingertips thanks to data science. You just need the right tools and the right processes. We present in this article 10 essential applications of data science in the field of retail.

31/05/2018 Discover 

Introduction to TensorFlow on the datalab of Google Cloud Platform

TensorFlow is a software library, open source since 2015, of numerical computation developed by Google. The particularity of TensorFlow is its use of data flow graphs.

30/05/2018 Discover 

Lamport clocks and the pattern of the Idempotent Producer (Kafka)

Do you know the Lamport clocks? Devoxx France 2018 was the opportunity, during the very interesting talk of DuyHai DOAN , to discover or rediscover this algorithm formalized by Leslie Lamport in 1978, more than ever used today in the field of distributed systems, and which would have inspired the Kafka developers in the implementation of the pattern of Idempotent Producer .

23/05/2018 Discover 

Establishment of a centralised log management platform with the Elastic suite

The volume of data generated by our systems and applications continues to grow, resulting in the proliferation of data centers and data storage systems.  In the face of this data explosion and the investment in skills and resources, decision-makers need sophisticated analysis and sophisticated dashboards to help them manage their systems and customers.

14/05/2018 Discover 

Introduction to Reactive Systems

Margo Consultants participated in  Devoxx France 2018 , the conference for Passionate Developers, organized from April 18 to 20, 2018 in Paris. Discover a synthesis on reactive systems illustrated by a concrete use case.

11/05/2018 Discover