Setting up an effective experiment program (Experiment Program Series: Guide 01)

Seamless. Frictionless. Elegant. Efficient.

These are some of the words used to describe the products and services offered by the world’s largest and most successful companies. For example, unwrapping a package and holding the physical copy of a book whose image you clicked on in an unsolicited email from Amazon the day before is nothing short of magical.

A little at magical to have happened by chance. But it is very likely that this experience was not shaped in some top-down, divine-intervention-type way either.

Instead, many of these products and services went through hundreds, sometimes thousands, of iterations, with teams of engineers, designers, product managers, data scientists, and others tweaking every conceivable aspect of the experience to arrive at a new version that’s slightly better than the previous iteration.

In any case, teams ran randomized controlled experimentsmodify certain parts of their products, release them into the wild for users to consume, and measure the results in a statistically rigorous way. These teams can be thought of as demigods of multiple parallel universes, where the only difference between such universes might be the color of a button on a web page. Winning universes are selected based on optimizing some metric, such as the total revenue generated by a web page, resulting in a line of product improvements determined by an evolutionary process.

Many business leaders today are familiar with examples of companies that have developed their products and services and optimized their profit and loss statements through experimentation. These executives are eager to improve the performance of their own business units, but lack the know-how or technical resources to conduct experiments without external support.

Data science teams don’t need convincing about the benefits of running experiments. But they often lack the business knowledge, cross-team relationships, and structured processes to engage with business teams and help them optimize their products through experiments.

Building an Experiment Program from Scratch

Over the past 16 months I have been intensively involved in the development and implementation of an experimental program at 2UOur goal was to achieve positive business outcomes through experimentation and required us to develop the processes, infrastructure, and institutional knowledge and relationships required by using reliable online controlled experiments.

While there are a multitude of resources available for learning how to perform the required technical parts of an A/B test, such as how to randomize units into variants or how to perform statistical hypothesis testing, there is little content that comprehensively describes the process of creating an experimental program from scratch. My goal in this blog series is to help fill that gap by describing my experiences designing, implementing, and iteratively improving an experimental program (which I’ll call an ExPr for short).

These posts are primarily intended to help data science teams, particularly the leadership of those teams – including chief data scientists, VPs, and senior managers – and data science product managers. However, I believe individual contributors, i.e., data scientists, data analysts, and data engineers, will also find the content valuable. This is especially true for ICs leading experimental initiatives or considering moving into data science management.

This blog series is a summary of my experiences

This series of posts is based on, and subject to the limitations of, my experiences over the past 16 months:

I lead a medium-sized team of 5 data scientists and 3 engineers and work closely with a dedicated project manager.
We focused on optimizing the operational efficiency of our tech-enabled service stack by testing the impact of specific human interventions. These experiments are better classified as offline field tests than online, software-based experiments. That said, we also run software-based (i.e., “online”) experiments that are more familiar to those in the tech industry.
Typical sample sizes for these experiments are in the range of 1000-10,000 units. We focused on understanding how to perform small to medium sized experiments.

Before we developed our ExPr, we had no direct experience with A/B testing. There were no internal processes in place to manage experiments, nor was there an infrastructure to run experiments. But after 16 months of research, development, bugs, and iterative improvements, we:

Perform dozens of experiments.
Operationalized the results of several successful experiments, which were credited by senior management significantly increase operational efficiency.
We have developed an effective working model for interacting with diverse stakeholders from our business to devise, prioritize and implement experiments.
An infrastructure has been developed that allows us to easily design, launch and analyze controlled experiments.

The plan for this series

Now that I have covered this background, I want to discuss the following in future posts:

What is an experiment program?
Which stakeholders should be involved in an ExPr? What level of involvement can be expected from each of these stakeholders?
How should these stakeholders interact to move the ExPr forward? How often should they meet? How should they interact outside of meetings?
What is each stakeholder accountable for? How should you be accountable?
How do you go from an idea or hypothesis to an experimentally driven conclusion? What is the end-to-end process for conducting a controlled experiment?
What is expected of a data science team? How should that team manage its efforts?
What infrastructure and tools do data scientists need to run reliable experiments?
How do you measure the results of an experiment?
How do you measure the success of the experiment program?

In the next post in this series we discuss what an experiment program is and which stakeholders need to be involved to maximize successful outcomes. If you would like to be notified when I publish this series, sign up below and I will email you each post I publish.

The opinions expressed here are my own and do not reflect the views or opinions of my employers.

Building an Experiment Program from Scratch

This blog series is a summary of my experiences

The plan for this series

Our Company

About Links

Useful Links

Newsletter

Laest News

Setting up an effective experiment program (Experiment Program Series: Guide 01)

Building an Experiment Program from Scratch

This blog series is a summary of my experiences

The plan for this series

Demystifying Exchange Online Provisioning: Architecture, Exchange Object Types, and Attributes

Active Directory improvements in Windows Server 2025

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Laest News