Newsletter #086 – ML in production

A common perception is that individual contributors (IC) in technical roles who move to management must sacrifice the amount of technical work they perform. To some extent, I think this is true, at least in terms of the type of technical work performed as an individual contributor. However, as someone who moved from an IC role to management, I can confidently say that technical work is a huge part of data science management and that managers have ample opportunity to improve their technical skills. In fact, I think you’ll be exposed to even more technical work as a manager, depending on the types of projects and employees you manage.

Let me first talk about the type of technical work that you will probably stop doing, at least if you are 100% managing and are not expected to contribute code or perform analysis. As an IC, you are writing code all day long, analyzing data, generating hypotheses, training models, validating those models, and deploying those models. This is technical work in the sense that you are building; you are translating your knowledge directly into deliverables like code or visualizations. When you move into management and are no longer expected to produce those deliverables, this type of technical work stops. And if you have never managed before, you might think, “Oh, I don’t do technical work anymore.” But that’s not true. It’s just that you don’t that type no more technical work.

What technical work do you actually do?

First, you will likely be doing some technical work when coaching a direct report. Suppose one of your reports needs to explain his or her analysis to you. You need to be able to understand their approach. You need to be able to look at their code and understand it. You need to be able to advise this report on his or her approach and implementation. Providing technical advice is an area where there is scope for a lot of technical work as a manager.

If you are managing multiple reports across projects, the breadth of techniques and skills required for projects is broader than for a single project. You may need to use traditional statistical analysis tools for one project, while another project requires modeling skills. One project may be more focused on engineering, while another task may be more focused on analysis. And regardless of the techniques used, you will be continually asking technical questions and thinking about how the answers can connect the analytical results to business outcomes. This is all very, very technical work.

One caveat here is that I’m specifically talking about managing individual contributors, i.e. the people who actually do the work. I don’t know if the same applies to managers of managers, because I’ve never managed managers. I’ve managed data science ICs for a little over a year, and I’ve been a tech lead for a couple of years. So I’m still very close to projects and individual products.

I will also add that technical work at the management level is higher level in the sense that you are typically trying to answer more general questions than you would as an IC. You have to take a general question, break it down into sub-questions, and then distribute those questions across a team. While ICs are producing analysis, your job is to synthesize the individual insights into a comprehensive and coherent body of work that answers the original general question. This process is highly non-linear, iterative, and very technical. And keep in mind that answering the general question requires you to step in and understand each sub-question. You can think of this as a neural network, where the lower layers (ICs) are responsible for learning low-level feature representations, such as edge detection in computer vision. The higher layers (managers) are responsible for learning higher-level concepts, such as detecting faces in an image.

High fives for using a neural net as an analogy for data science management? No? Okay. See you next week then!

Here’s what I’ve been reading/watching/listening to lately:

Data Cleaning IS Analysis, Not Boring Work – “The act of cleaning data imposes values/judgments/interpretations on the data that are intended to enable downstream analysis algorithms to function and produce results. This is exactly the same as performing data analysis. In fact, “cleaning” is just a spectrum of reusable data transformations on the path to performing full data analysis.” This post is eloquently written and humorously written and argues that the goal of data cleaning is to improve the signal-to-noise ratio in data in order to improve analytical results. Rather than viewing cleaning as a separate step, we should recognize that data cleaning IS data analysis.
Four Communication Techniques for Solving Technical Problems – A few weeks ago I wrote that generating business value with ML is often a people problem where key decision makers must be convinced to take risks. Effectively solving problems with people requires strong communication skills, especially when discussing technical topics that involve multiple trade-offs. This article describes 4 approaches to technical communication: “1) Get on the right track by working from problem to solution; 2) Avoid chaotic conversations in a circle by breaking them up; 3) Avoid friction and landmines by emphasizing empathy; 4) Monitor if conversations are getting stuck or deviating above, below, or to the side of the scope you have set in your agenda.”
Andrew Ng: Bridging the gap between AI proof-of-concept and production – Andrew Ng recently gave a talk on the common obstacles that prevent AI projects from making it to production. According to him, the biggest challenges in bridging the gap from research to production are small data, generalizability and robustness, and change management. While AI success lies primarily in the big data domain of consumer internet companies, the biggest opportunities lie in non-consumer technology sectors such as retail and healthcare. Note: the talk itself starts around 6:30 and ends after 45 minutes.
A contextual bandit approach to personalized news article recommendations – A multi-armed bandit is an experimental approach where the system learns to redirect traffic from poorly performing treatments to better performing ones. contextual bandit uses additional context on users/items to improve the assignment of users to different treatments. This article describes Yahoo’s success (that’s right, Yahoo!) with using contextual bandits to recommend news articles to users and their approach to validating their algorithms in an offline setting. For a great (and short) introduction to bandit algorithms for website optimization, check out Bandit Algorithms for Website Optimization (affiliate link).
WhyLogs: Embrace Data Logging in All Your ML Systems – Last week I wrote about a critical bug in one of our production applications that could have been diagnosed with better monitoring and logging. This post introduces WhyLogs, an open source package purpose-built for data logging in ML pipelines. According to the post, WhyLogs records properties of data as it moves through an ML system, aggregates logs, supports a wide range of ML data types, and tags data segments with labels for slicing and dicing. I’m excited to see companies and tools emerge that tackle challenges at the intersection of data science and software engineering.

That’s it for this week. If you have any ideas, I’d love to hear them in the comments below!

Our Company

About Links

Useful Links

Newsletter

Laest News

Newsletter #086 – ML in production

Easily deploy .NET apps to Azure Container Apps with default configuration for data protection

Windows Admin Center (v2) Public Preview build has been updated!

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Laest News