“I can’t see a future without greater use of machine learning. Customers want better, personalised experiences - faster. You can’t achieve that by just hiring more people.”
Retail is making the transition from focusing on ‘the average consumer’ to obsessing over the individual customer. This demands a shift in thinking from bi-annual pricing, stocking and distribution decisions to real-time decisions personalised to every single unique customer. Of course, achieving this kind of scale hasn’t been easy.
There are few experts who understand this challenge better than Gabriel Straub, Chief Data Scientist at Ocado Technology. Gabriel’s role is to help Ocado find new ways of using customer data and machine learning models to successfully automate decisions at the segment of one. These models then need to be flexible enough to be scaled across a massive global customer base of different people, preferences and priorities.
We recently sat down with Gabriel to discuss his journey at Ocado, his approach to managing large and complex development teams, and the value machine learning delivers for customers and retailers.
An introduction to Gabriel Straub
Gabriel joined Ocado Technology in 2020 as Chief Data Scientist, bringing over 10 years’ experience leading data science teams and helping organisations realise the value of their data. At Ocado Technology, his role is to help the organisation take advantage of data and machine learning to better serve retail partners and their customers.
Before joining Ocado, Gabriel was previously Head of Data Science at the BBC, Data Director at notonthehighstreet.com, and Head of Data Science at Tesco. He has also advised start-ups and VCs on data and machine learning strategies. During this time, Gabriel refined his experience of helping businesses solve complex challenges by scaling their use of data and AI technology.
Gabriel is a guest lecturer at London Business School and an Honorary Senior Research Associate at UCL. He holds an MA in Mathematics from Cambridge and an MBA from London Business School.
An introduction to Ocado Technology
Ocado Group is a UK-based technology company running the world’s largest dedicated online supermarket, with over 639,000 active customers. It has spent 20 years innovating online grocery and investing in a wide technology estate. It’s aim is to provide customers with the best end-to-end grocery shopping experience, founded on sustainability and unbeaten service.
Ocado Technology supports this goal by developing world-class systems and solutions in the areas of automation, robotics, AI, machine learning, simulation, big data and more. Ocado Technology is the creator of the Ocado Smart Platform (OSP) - the unique end-to-end ecommerce, fulfilment, and logistics platform which allows its customers across the world to do online grocery scalably, sustainably, and profitably.
Ed Challis (00:08): (Music) Hi, Gabriel, thanks so much for joining me today and contributing to this interview series, what we're calling the Pioneer Series. We set up these conversations because we felt that this field of data, machine learning, AI, is still relatively new, and more than being new, it is evolving and changing really, really fast. And as a result of all of that change, there's no manual or set pieces or accepted practice about how businesses and organizations adopt this technology, embed it, get the most value they can from it in a safe way, and all of those concerns. And so we want to talk to people that have been at the forefront of that field, adopting the technology, and solving these hard challenges. So it's a real pleasure to have you on the series today, you're currently chief data scientist at Ocado Technologies.
Ed Challis (01:13): So Ocado Technologies is a platform which enables really scalable grocery deliveries and from a e-commerce to point of delivery. I'm a customer. And it's from a data perspective, from a technology perspective, I don't think you could have a more interesting stack. Taking it from acquiring customers, to the customer journey, customer experience, e-commerce, but then beyond that whole e-commerce thing, it's really interesting in terms of how you're fulfilling that. You've got robots, computer vision, a very interesting logistical set of challenges, operational set of challenges, so super interesting work I would imagine. So it's really great to have you here today. I know you started off studying maths at university and went into management consulting, but how did you get into data? How did you get into machine learning?
Gabriel Straub (02:20): Thanks, Ed, for having me, I'm really looking forward to our conversation. I have to be honest, I almost fell into it a bit by coincidence. I worked as a management consultant, as you said, for a couple of years in the Middle East and in Germany, and decided to come back to the UK and do an MBA. And doing my MBA I got on really well with one of my professors who was doing operations research, and he introduced me to someone who had been building operations research stuff and forecasting algorithms and something that for Tesco.
Gabriel Straub (02:50): And at that point, they were starting to build a new program to do demand forecasting and order optimization for general merchandise, and they were looking for someone who could help build up their team and to do the translations between the maps. So the algorithmic side and the business side of it, and it sounded something really interesting to me. I wanted to do something that was a bit more technological and a bit more practical, if you want, after my consulting times. And, yeah, so I ended up more or less coincidentally working at Tesco, and continued building a team there for a couple of years and extending what the team was doing from demand forecasting to lots of applications over time.
Ed Challis (03:31): Wow. What year was that? Because Tesco was really at the forefront, they invented many of the early... it was my understanding they invented the loyalty card, they were really the vanguard of data in terms of driving efficiencies and better customer experiences.
Gabriel Straub (03:50): Yeah. So Clubcard Started before Google was formed and in the '90s already Tesco did a pretty good job with demand forecasting, especially on groceries. They had done less investment into general merchandise, so that's where I came in. That was about 10 years ago. It's quite a while now.
Ed Challis (04:05): So general merchandise was... What was the data challenge there?
Gabriel Straub (04:10): General merchandise is products that have a longer lead time. You have to bring in new TVs and electronicals most of the time from China, so it's about six months. They sell significantly less than a can of baked beans. You might sell one or two TVs in a big store, and that means that your patterns are a little bit different. So you need to be able to forecast lower demand products, but you also need to be able to deal with a much longer lead time. So that was the big distinction compared to grocery items.
Ed Challis (04:36): Okay, right. So it was a really different, high value and high costs, I would imagine, but slightly harder to model statistically.
Gabriel Straub (04:46): Yeah, exactly. Your trade-offs, but on the other hand, they don't go off. So it's not like a can of baked beans that you can only sell... Well, actually that's a bad example. It's not like a salad that you can only sell for a couple of days. It stays good much longer, but then at some point the trend changes and you might not be able to sell last year's iPhone quite as well as this year's, for example.
Ed Challis (05:03): Okay. And was that the first production... I was always interested, what was the first time you got a machine learning or a data science system into production? Was that the problem for you?
Gabriel Straub (05:16): Yeah, we were forecasting about 40,000 products across hundreds of Tesco stores up to six months in advance. So it was a quite complicated system from a forecasting perspective alone, and then it was a multi-actional optimization problem. Tesco's stores get their demand from a distribution center, so you model the demand on the store, that then gets contributed back to the distribution center. And then you try and figure out when are you going to order, from which supplier, how many cases? So it was a really, really interesting way to learn about production machine learning at scale.
Ed Challis (05:53): Well, yeah. And were there any mega gotchas or something that you... It's interesting you came into it through operations research, I guess, because that is so practical, but a lot of people come into this from stats or maths backgrounds or computer science backgrounds. And when you encounter your first real life problem, you're like, "Oh, this is thorny." There's so many more constraints and real life considerations. I don't know. Is that an experience that you had there?
Gabriel Straub (06:26): Yeah, I think what was interesting is 10 years ago was about the time when neural networks were starting to slowly resurface, but not yet properly. It was I think more of the troughs in the deep learning time. And so while we looked at that stuff, we actually figured out that it's putting more effort into data manipulation upfront was significantly more valuable. So the machine learning models themselves were fairly simplistic, they were multi-level regression models, but we put a lot of effort upfront into the data aspect of it.
Gabriel Straub (06:58): To be honest though, for me, the bigger things I've learned was around how do you run a program of that size? Or how do you organize yourself? Because back then Tesco was still fairly traditional IT, so it was fairly waterfall, so you had to define all of your data requirements upfront. You would then in inverted commas 'throw it over the fence' to the IT department, that was so-called IT back then, who would then deliver on it. You then looked at the data and you realized that the quality wasn't quite there, and stuff that. So it was actually quite hard to deliver something like this, where you didn't quite know yet what data was valuable and how quality could be measured and stuff that. So I think I learned probably more about that aspect of it than anything else.
Ed Challis (07:41): Well, that's so interesting. It's that the traditional software lifecycle, that whole way in which software was developed and released in this legacy environment being quite waterfall, just didn't match up really with the requirements of what you needed in this test and learn... Am I hearing you right? You wanted a more iterative framework to put the best model forward, see how it did and iteratively calibrate?
Gabriel Straub (08:11): Yeah. There was a lot of design going up front to say, "Look, this is how we're going to build the system." Then we brought in external help. We were writing all of this bespoke software. So it was actually also really hard to hire people with the right skill sets, because it wasn't necessarily the language that everyone learns at university or that is the most common language used across other industries. So yeah, there was lots of different, interesting challenges that I've definitely since then tried to tackle differently.
Ed Challis (08:40): So did you actually change the way in which software was released? Did you get a model in place which was more agile? Or did you have to get your program of work to fit within that more waterfall approach that was already in place?
Gabriel Straub (08:58): On that project itself we weren't able to change that anymore, it was more for subsequent projects where we were trying to be a bit more... getting something out faster, using more open source software, so switching more towards the Python direction. Hiring people much more based on their ability and attitudes towards being able to solve problems, than necessarily based on specific skill sets related to a specific problem domain. So, yeah, I think it changed over time, but not on that project itself.
Ed Challis (09:26): Yeah, okay. So the two things I'm hearing is, the quality of the data and ensuring that that is... Am I right? The two big lessons, the two most important things were the quality of the data coming into this thing and getting that in a really good shape, and then the second piece is a framework for it to be more agile and iterative, to refine and...
Gabriel Straub (09:54): Yeah, and I think for me this has been one of the biggest learnings throughout my time in this space is, I think at the beginning a lot of the focus was on solving a data science problem, and that was hard. But fundamentally now, there's a lot of ultra ML tools and all of that stuff that has become significantly easier. The second challenge we then had to figure out is how do you solve a data science problem at scale when you have a lot of demand against your service, or you have a lot of data. But even that's getting easier and easier now, because you have tools like TensorFlow or other cloud things that allow you to build this relatively simply. So now the things that I think is the most difficult one is how do you scale your data science organization, and how do you scale your approach to data science? And that's definitely where most of my learning has been and where it's probably been the biggest changes I've implemented for the last 10 years.
Ed Challis (10:43): That's super interesting. That's super interesting. And then I guess from that first experience of getting something live and in production in an industrial, at scale, kind of context, how about today? How's machine learning embedded in the work you're doing today?
Gabriel Straub (11:04): One of the things that, as you mentioned, that's really interesting around Ocado Technology is that we have technology across the whole end to end of the grocery e-commerce life cycle. So we use machine learning in the front end to help you select which product is the most appropriate and most exciting for you out of the 50,000 or 60,000 products that are available on the website. We then use machine learning to try and figure out which slots should be available for you, and in order to get the vans to you in the most efficient way. It's good from our perspective because it saves costs, but it's good from an environmental perspective because it saves petrol, and stuff that. We then also use machine learning because we have all these robots that are whizzing around on our grid. And so, the way that those robots move around is based on top of a lot of pretty impressive simulation, but we also have something in place where we use vision systems, et cetera.
Gabriel Straub (11:54): If something does go wrong, which happens every now and then, we can spot where the robot is stranded, and we can generally also spot which robot it is, and then that makes the engineering part of it simpler so it reduces the cost of operating quite a complex system. And to be honest, that for me was one of the key things that I found exciting about Ocado technology, is that diverse use of machine learning that I think it's quite unique towards the retail businesses.
Ed Challis (12:21): Yeah, that was almost going to be what I was going to say next, it's like, how do you perceive the value of what machine learning can bring to an operation, to an organization?
Gabriel Straub (12:33): I think retail has been always more at the forefront of machine learning, I think for two reasons. Retail, it's fairly easy to track, because you have repeat customers, what the value is, as in I can show you that by using a recommendation engine it increases the adds to basket or basket size, et cetera. But grocery retail as well has been super low margin in general, so taking out costs out of the operations has been always super easy to then make an argument for. And so from that perspective, for a lot of the stuff that we do, we can either track the benefit it has on customer retention, higher basket sizes, whatever it is, or we can track it in terms of the costs of the operations and improvements in efficiency in that space. And so that makes ROI tracking in that space almost easier than in some other industries.
Ed Challis (13:24): Yeah, that's cool. You always think, efficiency arguments also... they're not distinct things. If you have more margin that feeds directly into growth. It feeds into growth in multiple ways where it gives you more cash to acquire customers, you can acquire customers better, you can sell your products cheaper than your competitors and therefore also would drive acquisition that way. So is that how traditionally investments in this kind of stuff have been made in your... not just where you are today, but throughout your career, are you making ROI cases based on efficiency saves or do you have sophisticated models that also factor in the growth narrative?
Gabriel Straub (14:22): I think it really depends on the industry, but I think Ocado Technology is interesting because we're a B2B business. So we sell platforms to other retailers around the world. The metrics that we care about on the basic level is fairly simplistic. We want to make sure that once the customers get to the platform, the experience is as good as possible so that our retail partners can have satisfied customers and high sales. And then there's aspects of the operation that end up being costs to us, and we're trying to minimize that, and there's aspects of the operation that are cost to retailers, and we're trying to minimize that as well. So from that perspective, it's fairly straightforward, ROI cases, in many cases. I think, as you say, the growing demand is always a little bit more challenging. How do you account for that stuff? Because the demand isn't necessarily profitable in some industries, so I think you need to take them into account of both sides.
Ed Challis (15:19): And then also, there's cookie cutter applications of machine learning where you know the technology is going to work and then there's just a simple ROI calculation. But then there are these other applications where we have to try, it's true R&D, we have to try, we have to make this investment, we believe that it could work or it will work, but we don't know that it will. That's so often the case in a more exploratory data analysis, that you have to put the investment in with no guaranteed success. Is that something that you encounter a lot or..?
Gabriel Straub (15:57): Yes, I think there's definitely certain areas that we know will always be important to us, and therefore it's easy to invest into that thing. So we know that we have a massive amount of products and we want to make sure that customers find them as quickly as possible, so investing into probably product discovery always makes sense. There's other areas as well where, again, it's a big part of the operational costs, so therefore investing into making those operations better and more efficient makes sense. We might sometimes not know what the right approach to it is, but we will generally know that there's value to be gained there. And so we will maybe at the beginning invest a little bit, try and see, okay, which ones out of these different approaches make sense. And then as we narrow down on a more promising approach, we can then scale up the investment in that space.
Ed Challis (16:39): Yeah, that makes sense. If you have a big block of cost or you know that there's something sizable there, it just makes sense to attack it. You don't know what the exact opportunity will be but you have to invest in it, and as you find things that don't work, you're narrowing down the- [crosstalk 00:16:55]
Gabriel Straub (16:55): Yeah, exactly. I mean, vehicle routing is one of these typical ones. Vehicle routing, you will have to do it. A big part of grocery delivery to people's homes is to bring the stuff to their home, so it makes sense to invest into something that and make those as efficient as possible, because you know it will always be a cost aspect of your business.
Ed Challis (17:15): You say obviously, but I think that's more a consequence probably of the culture of the firms you've worked at. There's a lot of places where people have huge cost bases and don't apply that analytical rigor or that data led approach to attacking those things. But yeah, that's super interesting. And then, this field is changing so quickly, I've been working in this space I think for 10 years or something, something is different now. I don't know how you think about that. How have your views on machine learning and AI changed over the last two, four years?
Gabriel Straub (17:53): I think there's a few different things that have changed. I think in the past we've talked a lot more about the metrics of success being related to accuracy, and I think that's changed a little bit. I think data science or machine learning was much more academic, so most of the papers you would look at were related to basically training something on a toy data set and trying to beat each other. And I think there's much more of an appreciation that in the real world the constraints might be a bit different. And so actually the metrics that you need to look at, isn't probably not just accuracy, there's probably an aspect of, how easy is it to put this out? How easy is it to change it? There's obviously also much more discussion right now about, what's the responsibility angle in this? Is it going to create a fair result for all of your customers, all the stakeholders and this stuff. So I think that's changed a bit.
Gabriel Straub (18:40): Obviously also, AI has been massively more prominent in literature, et cetera, fictional literature as well as in the press. And I think you now get into this weird world where why responsibility is definitely a sensible thing for us to think about. AI has almost become like, AI is responsible for fake news and stuff that, which I think is fundamentally not true. It's business models that are using AI in a way that is responsible for fake news and things that. And I think getting back to a discussion, that's a bit more nuanced around the benefits and the cost of machine learning, but again, also talking about the benefits, is the other thing that I think we're going to see over the coming years as well.
Gabriel Straub (19:19): So it was like, "AI's all amazing." Then there was a time of, "AI's really scary and it's going to destroy our world." And I think now we're getting towards this world where it's like, "Well, AI like any other technology has its pros and cons and you need to know how to use it, and you need to make sure that you have the right risk management in place, but then actually it can be super beneficial."
Ed Challis (19:37): Yeah. And that accuracy thing, it's so true. I don't think we are over that hump fully. There are people that are really at the forefront of implementing these things and they know... It's almost like thinking about a car only in terms of what's its max speed. It's like, that is not how you buy a car. And if you did, you'd end up with some sort of Frankenstein that you could never ever use. So, so interesting. [crosstalk 00:20:04]
Gabriel Straub (20:04): Yeah. And I think there's still some work around, how do we get broader stakeholders to understand the differences between traditional software engineering and machine learning, based off engineering as well. Because I think that journey has evolved a lot as well.
Ed Challis (20:19): Evolved a lot, but still so early. I think that, especially in some of these more legacy environments where it is a real waterfall approach, and UAT dev and test. We operate in environments where UAT can't have production data, so it's like, well that invalidates machine learning, you know what I mean? If you can't have production data in UAT, I don't know. It's like there's a problem there that needs to be worked through.
Gabriel Straub (20:54): Yeah, and testing for machine learning applications in general is an interesting question. How do you do that in the right way? How do you make sure that you test the performance over the long-term? What kind of metrics can you figure out upfront before you put it live? But then also, how do you keep a good understanding of your model drift and data drift and all of that stuff? And I think there's just not yet the platforms in place either. There's obviously a lot of investment by cloud providers to make that easier, but we're nowhere near in the maturity as we are with software engineering, more broadly.
Ed Challis (21:25): Yeah. Nowhere near. I guess you've had to build a lot of that stuff yourself from scratch, right?
Gabriel Straub (21:31): Yeah, we have internal teams that are building our own ML platforms and stuff that. I think, we're also in quite a unique space the way that we develop software, because each of our retailers has their own environments. So there's a bit more complexity in terms of how we deploy software than maybe other organizations that tend to deploy this once and they're done with it. We have to take into account that Japanese customers behave differently to American customers, and that depending on how sophisticated the retailer is and how much data there is already available there, and what kind of constraints they have, the models have to be able to behave slightly differently. But we need to be able to build this once and then ideally deploy them to all the different areas, and still have good quality results. So it's definitely a little bit different to the general way that I've seen machine learning done in previous experiences.
Ed Challis (22:24): Yeah. There are so many constraints like that. Even a single business will typically have those same constraints because of jurisdictional data laws and various sub entities within the parent company. And I guess, do you think there's so much hype around machine learning and AI, you just read about it in the popular press, you hear about it on the radio, it's on the television, it's in your face, in popular culture. Do you think it is in the mainstream in business? A simple question, but how do you react to that?
Gabriel Straub (23:05): I think there's almost two aspects to it. Why is machine learning necessary? And I think fundamentally, if we look at most of the organizations today or the products that they sell, I don't really see a future without much more use of machine learning. Customers want better personalized experience's faster, and fundamentally, at some point you can't staff that up by individuals. You can't hire enough shoppers to decide for you what you should be buying every single week out of a catalog of 60,000 products, and bring it to your doorstep within a couple of hours, that's just fundamentally not possible.
Gabriel Straub (23:40): So I think there's a business requirement to get towards more automation and machine learning purely because of the demand from a customer perspective. And similarly, you have your other organizations within your space, they're likely going to invest into this. It is very likely that at some point you will in some way be competing with big tech, and they have been investing into this space for a long time. So there's the competitive pressure and the customer demand, so I think organizations are more and more waking up to the reality that without machine learning and automation, in some way, it's going to be really, really hard to find your space that allows you to still operate profitability.
Ed Challis (24:27): So what's holding them back. I mean, I totally agree. What's holding them back. Because IT, someone who knows what ML can do and knows what ML can't do, you just see so much opportunity. So what do you think is holding a bit faster, more widespread adoption of this technology back?
Gabriel Straub (24:50): I think there's probably been a lot of hype over the last years around like 'AI can solve every problem'. That, I think, in certain times wasn't particularly helpful. We know that there's a lot of AI solutions out there that aren't really delivering on their promises, and there's been a lot of over-promising. And I think probably a few organizations have been burned by having tried something that fundamentally didn't deliver. I think that's one reason. I think the other reason is that actually culturally, that shift is quite significant. And the cultural shift has to happen both on the AI professional side and on the other side of things.
Gabriel Straub (25:25): I remember when I started my career, we were proud of how complicated this stuff was that we were building. We had hired lots of PhDs and stuff that, but actually trying to sell something to someone by saying, "Look, our solution is so smart." It doesn't necessarily always work that well. You have to shift towards the, "Look, my solution solves your problem so much better than something else, and here's all the value and stuff like that." And there's a little bit of learning that machine learning professionals had to do in that case. And then obviously on the other side, if you look at many traditional organizations, they've grown up being successful doing what they've done for a long time. So in the retail businesses, a lot of the retailers were super successful because they knew how to manage averages and they knew how to manage cycle times.
Gabriel Straub (26:10): So you had an average customer that was doing an average shop in an average store, and you would look at this every six months or something like that, because your operations were super constrained. You didn't want to change prices every day, you couldn't change ranging in your stores every day, because it cost a lot of money. And so they became super successful in that space, and now making that shift towards the online world where you have to be super personalized and just stuff can change all the time, you have to get rid of averages, because Ed has a very different shopping behavior than Gabriel, and Ed wants to make sure that he gets the stuff that he cares about, and that might be different to Gabriel. And at the same time, Ed wants that now and wants the changes now. I think that requires a fundamental rethink and that's not necessarily easy. And all of those businesses also have significant businesses that still make money in the old ways and that shift, again, isn't necessarily easy.
Ed Challis (27:00): That's super interesting. And what do you think of that foundational... I don't know what percentage of firms have a chief data scientist role, but what do you think of the... I guess, when you were ever considering moving to a new job, what are the things, the foundational bits of infrastructure, culture, technology that you think are ground zero for doing this stuff?
Gabriel Straub (27:30): It's probably around three dimensions, there's something around structures, and probably culture sits a bit in there, the people aspect of it. Where do you discuss data? Is that something that's discussed in the teams, lower down, if you want, in your organization? Or is it something that's being discussed at the top table? And what kind of discussions is it about data? Is it purely around like, "Oh wow, we have a risk because something happened and we've had a data breach." Or is it also about the opportunities? Therefore, how is your structure and how is the people... What kind of skills do you have? Is a data skill something that's required from everyone, or is there a bunch of specialists, and basically it's their problem. Data people solve data problems, and data problems are there to be solved by data people. Then there's something around processes and- [crosstalk 00:28:14]
Ed Challis (28:16): So, do you have an opinion on that? Should you have an internal, centralized data team that is almost a shared service for every line of business? Or do you make sure it's embedded with almost a data scientist for every team? Do you have any opinions on that?
Gabriel Straub (28:37): I think it does depend on the maturity and the set up of your organization. So if I look through my career, I've seen this move where, very early on before data science was really a thing, they ended up being embedded because you just hired smart people who could solve a problem. They didn't have any bespoke tooling, they didn't have any bespoke skills. Then over time you realized, well, actually those people are doing something similar and you have them all over the place and you have a bit of a marketplace issue. You have smart people who can solve very complex problems, not necessarily sitting in the part of the organization where the biggest impact problems are. So you then went into the centralized approach, where you then had bespoke ways of working, you had bespoke tooling, et cetera. But what you then ended up having as a problem is that it was actually really difficult for those people to deliver value as quickly, because they were getting quite far away from the implementation and from the teams.
Gabriel Straub (29:24): And that's the way that we are running it, you're almost in what we call central line management by local task management, where you have a data science organization that is all part of the same craft, but they are embedded within delivery teams so that they can really understand the local needs. And I think there's similar questions to be had around like, how do you organize your data teams more broadly? I think you can't scale particularly well with the central data team, at some point you run into problems. So we prefer to have teams that build the central tooling, and then the rest of it, putting the data into that is distributed. But you need to make sure that you have really good best practices in place and that you figure out, well, how does your data model add up and stuff like that. And so that, having autonomous teams that can move fast with enough central support, I think is at the moment the model that I believe works best if you have a more mature organization that is in general running technology that way.
Ed Challis (30:17): Makes sense. You need that because without an investment in tooling and without an investment in those kind of central data resources, a lone data scientist in every team is just going to be... every single one's going to be reinventing the wheel, every single one's going to be not sharing that knowledge. So that's just so interesting from a cultural and organizational perspective. What else do you think is foundational?
Gabriel Straub (30:45): Well, then the other two parts, and we've semi talked about them already, is that process. So is your data science team separate or is it part of an embedded? And how does work go from one part to the other? Is there a handover or are they all..? Is it the same team developing, running, maintaining, what you build? And then tooling. I think, for us, one of the things that we really want to get to is that the development and deployment of machine learning at Ocado scale, is as simple as shopping online. Because it has to be, right? The more tools we're going to build, the less time we can spend on trying to get to a point where it ends up being in production, and we need to be able to ship early and ship often so that we can really iterate on what we build, and really learn what works and what doesn't work.
Ed Challis (31:27): What are the most interesting trends in this field that are exciting you, and do you think will be the most impactful over the next few years?
Gabriel Straub (31:37): I think the stuff that I'm most interested in at the moment is the discussions around ML ops. I think dev ops made such a massive impact on software engineering more generally, then there was something called data ops, which I think is still fairly new as well. And then obviously ML ops is the aspect of how do you build and manage your machine learning algorithms. I think that's going to become more and more complicated. As you have more models that are in production all the time, as you need to manage model drift and data drift and all of that stuff. Being able to be on top of that stuff without necessarily increasing the size of your organization, I think it's going to be super fascinating to try and figure out.
Gabriel Straub (32:15): What's the right tooling that you need to put in place? How do you keep on top of your increasing data sets to measure that quality before it goes into it? Who's responsible for this? Do you put your data scientists on 24/7 support, even though you might have less of them, or is that something that the software engineers do? How do you avoid handovers? Because handovers generally create some sort of grumpiness in the teams and probably slow down your development process as well.
Gabriel Straub (32:38): I think that operationalizing it, so that you can really put machine learning in all of the parts of the organization where it provides value, without you having to build a insanely large data science organization. I think that's super interesting. Obviously there's a lot of stuff also in deep learning and stuff that, and I think the stuff that I'm interested in that space is actually almost more active learning. And how do you get on top of your data quality issues and make sure that you find the right places and explore and reinforce, I guess, data collection and data validation in those specific areas.
Ed Challis (33:12): That's so interesting that, because the rate of advance in deep learning is fascinating and the fact that what is capable, but your two most exciting things were active learning and ML ops. ML ops maybe sounds very dry, but both of those things, to me, active learning and ML ops, they're about agility. They're all about, "I have an idea, I want it in production and I want to iterate and improve on that thing." And if I don't have either of those things, a great ML ops framework or any ML ops framework or an active learning approach, getting value, changing the system, improving it, is just going to be a nightmare.
Gabriel Straub (33:58): I think that, for me, a lot of the stuff that I think is interesting is how do you get to value quicker? Because that's really where the crux is. We're not doing ML because it's interesting, we're doing ML because it creates better customer experiences and ideally better business opportunities for the organizations that we work for, that we serve. And I think therefore that's where, in my view, the interesting aspects are right now. And this is the difference, I guess, maybe with what you see a lot of in academia, in terms of where their focus is, most of the focus is on new algorithms. While actually in businesses, the bigger challenge is probably data, and data collection, data quality, and then, as we talked about, ML ops and that kind of area.
Ed Challis (34:38): It's one of the greatest gaps. Academia is still obsessed in many ways with accuracy and what's ultimately possible in some sort of asymptotic notion. Whereas business is totally compounded by time to value, agility, how quickly you can get version one into production, and how quickly you can get to version 20. It's almost a chasm, it's just totally different priorities, totally different needs and requirements.
Gabriel Straub (35:10): Yeah, and how easy can you fix it if it goes wrong? How easy can you understand if a customer complains about not having had the right order? How easy can you improve it and explain to them around like, "Well, sorry, this is what happened, but it won't happen again."
Ed Challis (35:22): 100%. This is the last question, and it's a bit cheesy, but what do you see as the ultimate evolution of how ML and AI will work with people? Business is people doing stuff, right? And now it's ML also doing stuff, technology doing stuff. How do you see this thing panning out in the 10 to 20 year timescale?
Gabriel Straub (35:50): For me, I think the reason why ML is beneficial, if we look from a customer perspective. I don't know if you've ever watched a parent trying to take a child through a supermarket, you can clearly see that they're not enjoying their journey. There's so much cognitive stuff that's happening, they're trying to keep their child quiet, and there's so much complexity of trying to make the right choices, given all of the different offers and all the products that are available there. I think the world has become super complicated, both from an employee perspective and from a customer perspective, and machine learning can help make it simpler. If we can help you make the choices on your behalf that really respect what you care about, but mean that you have less effort to do so. And similarly, if we can help automate it from a business perspective to make the choices that make sense, that trade off business requirements in the right way. I think that is super helpful because then a lot of our customers and our colleagues can focus on the more creative aspects of their job.
Gabriel Straub (36:44): So I think that's one thing where data science or machine learning will go a bit more. It's that understanding of that, in my view, the most rare resource you have in an organization is cognitive energy, and helping you optimize that is going to be really, really big. I think the second aspect of it is, I think at the moment most of our machine learning models are, if you want single turn conversations with a retailer. Or with the organization, sorry, it's not just retailers.
Gabriel Straub (37:10): It's the same, to be honest, in the media. I tell you, "I'm bored." You show me a bunch of possible movies. I select one or I don't select one. We then have a separate interaction, it's completely independent of that previous interaction. I think with the stuff like voice coming around, we're going to get more and more towards a multi-turn conversation. And that doesn't have to be voice, it can be other interactions, it can be chat messages, or it could just be how you flow through a user interface. Nothing that's going to be really interesting around how do we build true multi-turn conversations with our customers? If you talk to friends or you say, "Oh, I'm interested in this thing." You then follow up on it, there's a link. And that currently doesn't exist that much- [crosstalk 00:37:49]
Ed Challis (37:48): You mean there's this... it's like Marcovian, it's like, "I only remember the last day."
Gabriel Straub (37:54): Yeah, exactly.
Ed Challis (37:55): But a true relationship, true loyalty is the whole history of that.
Gabriel Straub (38:01): Where do you come from? What was your previous question? How did you interact with me beforehand? What did you tell me you didn't like that I can not show you in this next interaction with you. How do I just make your life so much easier, so that your experience is really pleasant and as short as it has to be?
Ed Challis (38:20): Great. Well, super great to talk to you today, Gabriel, thank you so much for taking the time. Thank you.
Gabriel Straub (38:27): Thanks a lot, Ed. Always good catching up with you.
Ed Challis (38:29): Cheers. (Music)