xingtwittersharerefreshplay-buttonpicture as pdflogo--invertedlinkedinkununuinstagram icon blackShapeGroup 3 Copy 2Group 2 Copydepartment_productdepartment_datascienceuserclosebasic clockblogShapearrows slim right copy 3arrows slim right copy 3arrows slim right copy 3

5 principles to integrate data science and AI into product development


Fabian |

09. Aug. 2019 |

- min Lesezeit

5 principles to integrate data science and AI into product development
Data science and AI are a big thing right now. In this article I want to present some principles of how you can integrate that “magic” into your product development process.

When you know and understand your data

Back in the days when I was a young, highly motivated data mining consultant, I had an assignment at a large pharmaceutical company. I was seated in the IT department, eager to deliver some real value to the research departments. One day it happened: I was told that I would have the chance to talk to the renowned chemists, leader of the ADME department. I prepared myself, created interesting basic machine learning models as a starting point. In order to do that I also visualized the data in a scatterplot. Then the day arrived and I sat together with this extremely experienced and successful research expert. I wanted to quickly show him which data I had loaded and showed the scatterplot. And this totally caught him. We sat for two hours where he directed me exactly which data to show, which dimensions to combine, let me highlight and inspect certain data points. When he left he had a list of molecules he wanted his team to look at. And he told me: “I have never seen my data like this before”. That experience deeply impressed me and shaped my position towards data science until today. He was not interested in complicated and advanced machine learning algorithms. For him it was enough to look at the data. Because he knew and totally understood his data.

As outlined in my previous post, there are two modes in product management (one for operating a product and one for developing a new one). In this post I will present some principle for the innovation mode.

1. Start with the user need

Always start with the user need. We are preaching this for years now. And I have the impression that with the hype about AI people forgot about it. People are searching desperately for opportunities to apply AI regardless of the user values. This won‘t work. Let me give you an example why. We are currently working on hotel recommendations based on style or taste. Think of the last time you planned a trip and searched for a hotel. You enter the relevant filter criteria such as stars, budget, close to the beach etc. and end up with a list of hundreds of hotels where you have to go through many hotels just to find hotel number 27 interesting. How nice would it be if you could now say: show me more hotels like this one in terms of style. And that’s exactly what we are working on right now. When you think of how you would design the interface you have two different options. Either you go for a tinder style app where the user swipes left or right in order to tell which hotels they like or not. Or you could enter your three favorite hotels. Believe it or not: the different UX approaches have deep implications on the data science part: which algorithm to use and how you would train the model etc.

For advanced users: you may start by exploring your data or evaluating AI possibilities and identify the user value later on. But I wouldn’t recommend that for your first data science or AI project. All too easily the technical capabilities can lure you into a mode where you want the user to adapt to it, where you ignore simpler solutions just because you wanted to adopt to AI so much.

Hence, always start with the user. Talk to them, understand their preferences, needs and values. Then ideate about solutions and see whether you need data science or not. And involve your data scientists, align them with the user needs and give them an understanding of the context and goals.

2. Machines learn from data. Data first.

Machines learn from data. That is how machine learning works. You provide a set of examples and the machine tries to generalize from that. The more examples – the better the results. The data where you provide the correct answer/solution is called training data or labeled data.

Another example is our internal product TiO. Inspired by a project together with Langenscheidt we came up with the idea to support language learners. Because once you reach a certain level, no one will correct you anymore and you stop to improve. Taking an individual language coach is expensive, so we wanted to build an AI enabled language coach. Turned out that it wasn‘t easy at all. All existing speech APIs only support native speakers. We could always reproduce the excellent recognition rates when we used native speakers but the model always failed when we tried to recognize non-native speakers. So the algorithm is perfect and we could use them. But we would have to train them with enough data of non-native speakers. Hence, we totally shifted our focus from building the product to collecting the necessary data.

Think of which data you have and how you can get or generate it. A concept of how to get training data is key for every data-driven product.

3. Create a baseline model

Part of the job of a data scientist is science. They are constantly catching up with the latest research, reading papers, etc. That’s important. However, this leads to a bias towards more complex and sophisticated solutions. It is your job as a product manager to ensure that you start with a simple comprehensible baseline model and add complexity only step by step.

Another example. This time a recommendation for new cars. As cars are not bought that frequently a „other customers also bought“ model (aka collaborative filtering) is not appropriate. So we looked for different data sources. One of which are articles and test reviews of cars. Our hypothesis was that we could capture the relationships of makes and models by analyzing the reviews and articles. We had one more complex model using word embeddings and a simple one which only counts the co-occurrences of makes in the same article. The baseline model was better in different test examples and was able to catch relationships the naive word embedding approach was not able to.

Always start with the easiest way to solve the problem, e.g. counting, linear regression, etc. By doing so you learn about the data, about the problem you like to solve and if you add more sophisticated models you have an understanding of the output and how to improve it.

4. Collaborate interdisciplinary

No product manager would expect a developer to sit down and implement features that create user values without specifying it. Why would you expect that from a data scientist? Funny thing is that some companies do exactly that: they hire data scientists, put them in a room and expected them to create user or business value. It is important to align your data scientists with user needs and business roadmap. Extend your knowledge about data science, the problems and requirements when training and deploying a model. Show them how you understand the model and interpret the outcome helps them to explain their approach and data science principles better.

It is your job as a product manager that you understand your data, the basic principles of data science, and to align you data science team with your user’s needs and business goals.

5. Know and understand your data

Imagine the following situation:

  • You come to the office on a Monday morning
  • Get to the elevator and your boss steps in
  • Door closes. Then the boss says: „How does your AI feature work?“

Can you imagine why this can be a stressful situation?

Let‘s assume you achieved 87% accuracy.

  • Is that good and sufficient? Is this a benchmark for that application or a rather bad performance?
  • Do you know where your model makes mistakes and why?
  • What can you do in order to improve it?

In order to prevent this kind of situations you have to know and understand your data!

Dig into you data, explore it, visualize it, understand it. Then start with a simple and small approach (baseline model) that you totally understand. Add complexity step by step and try to beat your baseline model. And of course work together with your data scientists – they might have some ideas about how to improve the model.


There are reasons why AI is such a hyped topic nowadays. However, it roots in a data-driven culture. In order to utilize the power of AI for your product, you have to do your homework first and operate your product based on data as described in the first article of this series. Once you established data thinking in your company you may want to start with more advanced usage of data such as data science or machine learning. If you do so, let yourself guide by the presented guidelines in order to make it a success.

Ähnliche Artikel

Ähnliche Artikel