Home Cognitive Trek Newsletter #087 – ML in production

Newsletter #087 – ML in production

by info.odysseyx@gmail.com
0 comment 26 views


After much consideration, I have decided to temporarily stop publishing my weekly newsletter. It was not an easy decision, but I believe it is the right one for me, given where I am in my life right now. I won’t go into too much detail, but I would like to briefly explain the reasoning behind my decision.

When I the first problem from the newsletter I was a data scientist at 2U. As an individual contributor, I was focused on the technical work of building, deploying, and operating machine learning models in production environments. When I was promoted to Director, my focus started to shift away from the purely technical aspects of ML. However, my responsibilities were split about 50-50 between managing a small team and continuing to work on implementation.

I’m proud to have continued to grow at 2U, and so have my responsibilities. Today, I lead a much larger team, oversee multiple projects, and think about how we can expand the data science practice at the company. One of the reasons I’m choosing to pause my newsletter is to focus on my new responsibilities at 2U. I believe that one’s manager is one of, if not the most important aspects of an employee’s work life, and I want to be the best manager I can be for my team. I’m obsessed with the concept of mastery, and I know I still have a lot to learn about being a great manager. That’s why I want to focus on improving as much as possible in this role.

There are 2 personal reasons behind my decision.

As a son of immigrants, I really believe in hard work. Taking time off is very difficult for me. Sometimes that’s good, but sometimes it can really detract from my life. For example, I wrote about experiencing burnout and depression in my post on why i started MLinProduction. I am super proud of how MLinProduction has developed over the past 2 years, but a few months ago I was on the verge of another burnout. I was busy with my course on SageMaker, a consulting assignment, a few sponsored blog posts, my full-time job and a newborn baby. I invested all the time I gained through the coronavirus lockdowns into MLinProduction. Luckily, I have experienced enough to know that I cannot keep up this pace. So I consciously choose to take a step back.

The second personal reason, and perhaps the most important reason of all, is my son. I am the proud father of a beautiful 7-month-old boy and I want to spend as much time with him as possible right now. I have been told many times that children grow up fast. I choose to follow this advice and enjoy my moments with him. Plus, he is a lot cuter than my Sublime text editor.

This will be my last weekly email for a while (for now). It has been a real pleasure writing with you for the past 2 years.


Here’s what I’ve been reading/watching/listening to lately:

  • The Definitive Guide to AI Monitoring – If you can look past the clickbait title, this blog post presents 5 useful classes of metrics to track for ML systems. These metrics appear to be organized in terms of increasing complexity and required infrastructure – starting with model performance metrics where predictions are compared to actual values, and ending with metadata and performance measurements captured during training and testing. Helpful when considering which metrics to monitor for your production ML systems.
  • Data Quality at Airbnb – As Airbnb scaled from a startup to a mature organization with thousands of employees, the company realized it needed to overhaul its entire process to ensure the timeliness and quality of its data. This post summarizes their Data Quality Initiative, a company-wide investment in data ownership, architecture, and governance. If your company is growing rapidly and needs to scale its data infrastructure, this is a super valuable post.
  • Bootstrapping prediction intervals – While confidence intervals measure the uncertainty around statistics such as means or model parameters, prediction intervals measure the uncertainty around single values ​​and can be used to estimate the likely interval in which the outcomes of a regression model are expected to occur. The author of this post does a very nice job of explaining a general way to calculate prediction intervals for generic regression models (i.e. nonlinear models such as random forests). +1 to the author for including a discussion of the math, a Python implementation of the algorithm, and a simulation all in one easy-to-read post!
  • 4 Principles to Make Experiments Count – A data scientist at Growth at Airbnb describes 4 principles that helped the company scale from 100 experiments per week to over 700. Key pieces of advice include adding sanity metrics to experiments to ensure appropriate user exposure and understanding the base rates of the phenomenon you’re testing (this is something I’ve been focusing on lately).
  • The technical problem of A/B testing – It’s one thing to say you want to A/B test a product change, it’s another thing to actually run the tests. The technical requirements for running tests vary widely depending on the type of application you need to test (e.g. mobile app vs. single page site vs. microservice). This short post will outline several key technical features needed and five ways to implement A/B testing. (Regular readers will notice that I’ve been sharing a lot of A/B testing resources lately. We’re ramping up experiments at 2U, so I’ve been catching up on best practices. Here’s another technical-style link that shows you how to A/B testing with Kubernetes and Istio.)

That’s it for this week. If you have any ideas, I’d love to hear them in the comments below!

You may also like

Leave a Comment

Our Company

Welcome to OdysseyX, your one-stop destination for the latest news and opportunities across various domains.

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2024 – All Right Reserved. Designed and Developed by OdysseyX