Data Science applied to the retail industry: 10 essential use cases

Using Big Data in retail brings you a competitive advantage

By Youssef Bennani Senior Data Scientist


Data Science is having an increasing impact on business models in all industries, including retail. According to IBM, 62% of retailers say the use of Big Data techniques gives them a serious competitive advantage. Knowing what your customer wants and when, is today at your fingertips thanks to data science. You just need the right tools and the right processes. We present in this article 10 essential applications of data science in the field of retail.


Offering a smarter customer experience

A consumer expects companies to anticipate their needs, to have the products they want and to communicate with them in real time. To meet customer expectations, Mall of America, the second largest US shopping complex, worked with IBM to provide a chatbot named E.L.F. This chatbox accompanies visitors to the vast complex by creating personalized shopping itineraries and offering the right customer experience according to the needs of each of them.

This personalization also means offering incentives at the point of sale with loyalty rewards or promotions. The collected personal data also enable companies to send targeted direct mails to customers to boost sales.

With these kind of techniques, retailers have the ability to combine customer data results with inventory level or price promotion data to determine which products are sold in each store. This ensures that the products presented correspond to the shopping habits of the customers of each site.


Using social medias to forecast trends

As a retailer, if you do not listen to social media you miss a lot of free and potentially valuable information that can help you spot trends.

Nordstrom, a luxury retailer, has mastered the exploitation of Big Data to merge online and offline experiences. For example, the Nordstom marketing team follows the Pinterest, Instagram and Twitter networks to identify the most popular products. They then use this data to promote “good” products in their physical stores. In addition to this, Nordstrom hosts interactive touch screens in their locker rooms to allow customers to order products and check their inventory online.

This support on social networks mainly uses unstructured data. Using Natural Language Processing (NLP), to extract information from social media and machine learning, to make sense of it, can give the business an edge over the competition. However, you must find the right balance by using this kind of data to gain customers’ loyalty while respecting their privacy.


Implementing augmented reality

Since 2010, TopShop, a multinational clothing company, has been experimenting with new technologies to integrate augmented reality into its shopping experience. Flagship stores have virtual fitting rooms where customers can choose clothes to see what they look like by wearing them on a screen. This saves the customer the time and effort of trying on the clothes themselves.

The Swedish furniture giant IKEA presented image recognition and augmented reality for the first time in the 2013 catalog presentation. Customers could browse the catalog with their mobile devices to highlight the products they were interested in, and from this, the brand offered personalized digital content and reviews to inform them of their purchase. The brand also used image recognition technology, with which customers can scan catalog items and virtually place them in their own home to see what they look like. They can then select the colors and sizes that work best in the space without having to go out and buy the product. This has allowed catalog readers to make informed purchases, thus increasing customer satisfaction and reducing the number of items returned.


Boosting recommendation engines

It has been reported that more than 35% of all Amazon sales are generated by their referral engine. The principle is pretty basic: based on a user’s purchase history, items he already has in his shopping cart, items he has noticed or liked in the past, and what other customers have seen or bought recently, recommendations on other products are automatically generated.

The recommendation is one of the classic use cases of data science in retail. Implementing machine learning models on historical data can lead to accurate and effective recommendations plans.


Analyzing the Path to Purchase

Analyzing the way a customer came to make a purchase is another retail tool that can be improved by Data Science.

While marketers have been studying Path to Purchase techniques for many years, the advent of Data Science enables them to make the most of this type of analysis. The rise of multichannel marketing in retail and omnichannel sales has created a lot of different paths that customers can follow to buy a product.

Machine learning tools can help understand customers’ buying habits and focus on what exactly works in the real world.


Analysing purchase tickets

Market basket analysis is a standard technique used by retailers to determine which product groups customers are likely to buy together. It’s a classic process from a business point of view, but it’s now automated with Data Science.

The data storage capacities, which are increasing more and more, make it possible to analyze a larger volume of tickets and thus to have more confidence in the analysis.


Managing real estate

Data science can also help large retailers optimize their real estate management spendings. Thus, analyzing data relating to the different equipment of a building (preventive maintenance) can prevent catastrophic failures. Implementing machine learning as part of predictive maintenance in addition to relying on historical data provides models that improve over time while reducing associated costs. Retailers can also save a lot of money by using Data Science to analyze their energy consumption. In this context, Data Science helps us not only to establish a budget, but also to look for improvements in particularly energy-intensive properties such as shopping malls.


Optimizing prices

Having the right price on a product can make the difference between making a sale and losing a customer. But what is the right price? Retailers who approach this issue with Big Data tools may have an advantage over those who do not.

In many cases, setting the right price requires knowing what your competitors are charging. These data can be collected electronically using algorithms that explore competitors’ websites for detailed product price information.


Maximising the inventory

Increasingly accelerating product life cycles and increasingly complex operations are forcing retailers to use Data Science to understand supply chains and provide optimal product distribution.

Optimizing inventories is an operation that affects many aspects of the supply chain and often requires close coordination between manufacturers and distributors. Retailers are more and more looking to improve product availability while increasing store profitability to gain a competitive advantage and drive better business performance.

This is possible thanks to shipping algorithms that determine which products to store by taking into account external data such as macroeconomic conditions, climate data and social data. Servers, factory machines, customer-owned devices, and energy network infrastructure are all examples of valuable data sources.


Detecting fraud

Fraud is a huge problem in retail. It represents huge sums lost each year.

Data science can help retailers create reference sales forecasts for each product. If a product deviates significantly from this range, it could indicate some “fishy” activity.

Fraud committed by employees can be difficult to stop. But with the power of Data Science, controllers may be able to create more transparency in internal activities.




These innovative uses of Data Science really improve the customer experience and have the potential to boost retail sales. The benefits are many: better risk management, improved performance and the ability to discover information that could have been hidden.

Most retailers are already using Data Science solutions to increase customer loyalty, enhance brand awareness, and improve developer ratings. As technology continues to advance, one thing is certain: Data Science still has a lot to offer in the world of retail!

By Youssef Bennani Senior Data Scientist
Machine Learning


Tutorial - Artificial Intelligence: from prototype to deployment

In this article we present a practical case study for building an AI model and deploying it in a mobile application, all in less than an hour.

02/05/2019 Discover 

Successfully completing a data project: a path still strewn with pitfalls

In 2020, corporate investment in data projects is expected to exceed 203 billion dollars worldwide. But at a time when many are claiming to be Data Driven Companies, lots of data projects end in failure. Yet most of these failures are unnecessary and due to well-known causes! Focus on the recurrent pitfalls to avoid.

05/02/2019 Discover 

Kaggle Challenge: TalkingData AdTracking Fraud Detection

TalkingData, China’s largest independent big data service platform, covers over 70% of active mobile devices nationwide. Their current approach to prevent click fraud for app developers is to measure the journey of a user’s click across their portfolio, and flag IP addresses who produce lots of clicks, but never end up installing apps. While successful, they want to always be one step ahead of fraudsters and have turned to the Kaggle community for help in further developing their solution.

31/05/2018 Discover 

Introduction to TensorFlow on the datalab of Google Cloud Platform

TensorFlow is a software library, open source since 2015, of numerical computation developed by Google. The particularity of TensorFlow is its use of data flow graphs.

30/05/2018 Discover 

Lamport clocks and the pattern of the Idempotent Producer (Kafka)

Do you know the Lamport clocks? Devoxx France 2018 was the opportunity, during the very interesting talk of DuyHai DOAN , to discover or rediscover this algorithm formalized by Leslie Lamport in 1978, more than ever used today in the field of distributed systems, and which would have inspired the Kafka developers in the implementation of the pattern of Idempotent Producer .

23/05/2018 Discover 

Establishment of a centralised log management platform with the Elastic suite

The volume of data generated by our systems and applications continues to grow, resulting in the proliferation of data centers and data storage systems.  In the face of this data explosion and the investment in skills and resources, decision-makers need sophisticated analysis and sophisticated dashboards to help them manage their systems and customers.

14/05/2018 Discover