Reference: Mediation Modeling at Uber: Understanding Why Product Changes Work (and Don’t Work)

Scenario formulation

We start with the most straight ideal case for the uplift modeling. Suppose we have a dataset satisfy the following properties:

The dataset is not confounded (e.g. RCT setup)
All users receive the treatment effect
The treatment effect has the same effects on the users

With the properties, we could use average treatment effects (ATE) to represent the effects brought by the treatment. The below is an example setup:

Scenario: Uber drivers might file supporting tickets to ask for explanation of their earnings. We want to reduce the number of supporting tickets by showing earning graphs to drivers.
Treatment: Showing earning graphs to drivers
Outcome \(Y\): The number of filed supporting tickets

Mediation modeling formulation

However, sometimes we want to know how important a mediator is for the treatment effects. For example, the mediator in this case could be the understanding of the graphs. We want to know if the ATE is caused by the fact that drivers understanding the graph or simply by seeing there is an earning graph without understanding it would have the same treatment effects. So the mediator is

Mediator \(M\): Understanding of earning graphs

With the above setup, we model that ATE could be decomposed into two elements: ATE = ADE (average direct effect) + ACME (average causal mediated effect )

ATE: The total effect
- Which usually we model as \(\mathbb{E}[Y(1, M(1)) - Y(0, M(0))]\) where \(M(1)\) means the mediator is turned on or exists. \(Y(T, M(m))\) indicates whether the treatment \(T\) is applied and whether the mediator exists.
ADE: ADE is the impact from the treatment on the outcome that does not go through the mediator.
- Which usually we would model it as \(\mathbb{E}[Y(1, M(0)) - Y(0, M(0))]\), \(M(0)\) means the moderator is turned off or not exists.
ACME: ACME corresponds to the difference in potential outcomes that would occur if we were to flip the mediator into the value it would take under the treatment status while holding the treatment status itself fixed.
- Which usually we would model it as \(\mathbb{E}[Y(1, M(1)) - M(1, M(0))]\)

mediation_modeling_diagram

Example

Goal: Using the setup above, we want to quantify how much of the supporting tickets reducing could be attributed to that the graphs helps the driver understanding their earnings more

Step 1: Estimate ATE

Suppose in our dataset we see that

Treatment group average support tickets per driver: 0.80
Control group average support tickets per driver: 1.00

So, \(ATE = 0.80 - 1.00 = -0.20\). This means the graph reduces tickets by 0.20 per driver.

Step 2: Estimate ACME

It breaks down into two parts.

Part1: Fit a model for the mediator

We want to know how showing the graphs could help drivers understanding.

We use user survey data to represent the driver’s understanding.
In the survey, 5 indicates understanding well and 1 indicates the opposite. We conducted the user survey for drivers with and without showing the earning graphs. (Our treatment and control groups.)
We train a model with the dataset whose label is the survey score.

With the above, we could use the model to estimate the ATE regarding the driver understanding. Suppose we have

\[\mathbb{E}[M∣T=1]−\mathbb{E}[M∣T=0]=0.05\]

The above shows that showing the earning graphs could increase the drivers’ understanding by 0.05.

Part2: Fit a model for the outcome

Secondly, we want to know how improving the understanding would influence the outcome which is the number of filed supporting tickets for the treatment group users.

We train a model using treatment group driver’s understanding as the input and whether the driver file the ticket as the label.
For simplicity, we model it as a simple linear function

Suppose we found the slope of the linear function is \(\frac{dY}{dM}=-0.75\). It’s saying as we increase the understanding score by 1 we would reduce 0.75 tickets under the treatment.

Part 3: ACME estimation

Combining the above two parts, we know that

The earning graphs could increase the drivers’ understanding by 0.05
If we increase the understanding score by 1 we would reduce 0.75 tickets under the treatment

Therefore, the overall impacts on the outcome caused by the increased understanding (ACME) is \(0.05 * -0.75 = -0.375\) tickets

Step 3: Calculate the Proportion Mediated

With the previous two steps, we know: -Total effect (ATE) = -0.20

Mediated effect (ACME) = -0.0375

We have proportion mediated \(ATE/ACME =−0.20/−0.0375 =0.1875\)

The interpretation would be out of the 0.20 ticket reduction:

0.0375 tickets were reduced because of improved understanding (mediated effect)
The rest (0.1625 tickets) came from other factors (direct effect).