Home Cognitive Trek Newsletter #084 – ML in production

Newsletter #084 – ML in production

by info.odysseyx@gmail.com
0 comment 24 views


Last week I wrote that companies in one of the two groups when it comes to machine learning and data scienceThe first class contains companies where ML/DS is a core competency. Without their machine learning models, datasets and expertise, these companies would be completely different in terms of their product and service offerings, their technical philosophies, and staffing. The second, much larger class contains companies where ML/DS is not core to their existence. These companies vary in their experience levels with applying data science, the types of data science roles they employ, and the number of employees they have in those roles. What they have in common, however, is that data science is a add power. These companies want to make their operations more efficient, or they want to add a new feature to their product to increase revenue, or they just want to say to their investors, “Look, we’re doing machine learning too!”

I don’t see either group as objectively better than the other. Whether you want to work for companies in group 1 or group 2 is purely a matter of personal opinion. I’ve worked for companies in both groups and can say with certainty that both have advantages and disadvantages. I can even argue that it’s better to work for companies that have data science teams just to market that capability. A person’s satisfaction is a function of their expectations, goals, and beliefs.

While neither group is objectively “better,” they are quite different. The biggest difference between companies in these groups is probably the extent to which data science has an impact on the businessIn the remainder of this issue, I want to emphasize the importance of influence within data science teams at companies where ML/DS is not a core competency.

If you work at one of these companies, the biggest challenge you face is: convincing key business stakeholders and decision makers to take a chance on data science. These companies have gotten to where they are without relying on sophisticated machine learning algorithms. They likely have multiple key employees who have the final say on key decisions. These employees have likely been there for years and have seen many hypes come and go. They have also seen many employees come and go. Their livelihoods are tied to the success of the company. I won’t go any further – my point is that these people must have a very good reason to introduce risk into their organization.

If you’re on a data science team at one of these companies, you need to be able to influence these people if you want them to embrace your solutions. How do you develop that influence?

One way is to have a foot at the table with the most senior decision makers. This might mean having a chief data scientist who is involved in shaping the company’s strategy at the highest levels. This person has the difficult task of building trusting relationships with people across the business, deeply understanding their needs, and communicating those needs to the data science team. It takes a real mix of business knowledge, marketing skills, and technical knowledge to fill this role.

What if you don’t have a C-level data science leader? While it’s not impossible to make progress here, I’d argue the task is more difficult. I recommend learning as much as you can about different areas of the business and identifying the key pain points and opportunities. Learn to make a business case for why machine learning or data science is needed to solve these challenges. Be able to talk in dollars and cents. And be willing to be in it for the long haul. Ultimately, you’re trying to convince someone to risk their reputation on stochastic algorithms that can fail for a variety of reasons. It shouldn’t be too shocking that they’d rather solve their problems through a if-else rack.


Here’s what I’ve been reading/watching/listening to lately:

  • Overlapping experimental infrastructure: more, better, faster experiments – At Google, experiments are used to evaluate “almost any change that could potentially affect what users experience,” including not only tweaks to a user interface but also testing various changes to machine learning algorithms that could affect rankings or the selection of content. This article describes Google’s “overlapping experimental infrastructure” and its associated tools and educational processes to support more, better, and faster experiments. While the article is specific to Google web search, the lessons contained therein are applicable to any company looking to evaluate changes with empirical data.
  • How to build an experiment pipeline from scratch – While the previous article discusses how to optimize an already mature experimental infrastructure, this blog post describes a quick and resourceful way to spin up an experimental framework. The author describes the approach he used to build an experimental pipeline at Shopify to measure marketing decisions, such as the benefits of a recommendation engine that generates personalized suggestions for blog posts. The post includes his step-by-step approach to understanding the business need, getting stakeholder buy-in, technical planning, and implementation.
  • How to Use Quasi-Experiments and Counterfactuals to Build Great Products – While A/B testing is the gold standard method for determining causality, sometimes it’s not possible to set up experiments for reasons such as lack of tooling, lack of time, or ethical concerns. In these situations, we can look to other methods on the “evidence ladder for causal inference,” including quasi-experiments and counterfactuals. While these methods are great when completely randomized experiments aren’t possible, keep in mind that they make calculating confidence intervals more difficult, i.e., there’s more uncertainty involved.
  • How should our company structure our data team? – Should your data team be organized functionally or by division (different business units)? Do you use a center of excellence model, where all data people are on one team, or an embedded model where people report to individual business units? As this post describes, these are difficult questions that depend on the specific context of your company. The author describes how his team evolved their organization in 5 phases over 3 years and the pros and cons of each.
  • What is a Feature Store? – I’ve read many blog posts and conference talks that talk about the benefits of feature stores, but none do it as comprehensively as this blog post from Tecton. According to the post, feature stores enable operational machine learning (i.e., ML-driven real-time applications like recommendation engines, fraud detection, and personalization) by enabling teams to solve common data management problems. There are 5 components of a modern feature store: transformation, storage, serving, monitoring, and feature registry. While the post is clearly marketing material, I think it is valuable for data science leaders who are facing data management challenges.

That’s it for this week. If you have any ideas, I’d love to hear them in the comments below!

You may also like

Leave a Comment

Our Company

Welcome to OdysseyX, your one-stop destination for the latest news and opportunities across various domains.

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2024 – All Right Reserved. Designed and Developed by OdysseyX