Matched-Pair Analysis

blog

SHARE

A data analyst would frequently want to know whether there is a significant difference between two sets of data, usually pre- and post- campaign, and whether that difference is likely to occur due to random fluctuations or is instead unusual determining the existence of difference.

What is Matched-Pair Analysis?

MPA involves two groups: a study group and a comparison group, that are made by individually pairing study subjects with the comparison group subjects. 

There are usually two situations in analyzing the related data:

  1. When we take repeated measurements from the same set of participants
  2. When we match item or participant according to some characteristic

In either situation, the analysis is conducted on the difference between two related values rather than individuals themselves. Since, the groups are comparable – the difference determines the statistically significant difference.

Why Matched-Pair Analysis?

The purpose of matched samples is to get better and accurate output in determining significant difference by controlling the effects of all other characteristics. Since each observation is paired, apart from the one characteristic that is being analysed, all other characteristics remain the same for both cases. For example, if we are analysing the impact of a campaign conducted by a company in the beauty industry, you can control for age-related shopping behaviour by matching respective participants. The pairs can be the same person, thing or the same group of observations.

Types of Matched-Pair Analysis:

Example:

Let us consider an e-commerce retailer who wants to determine the impact of dollar value discount on the conversion rate campaign across all states in the US.

It is important to note that, as a rule of thumb, all parametric tests require sample sizes >=30. As the sample size increases, the statistical power increases.

H0 : µD = 0 (There is no difference in the mean conversion rate before and after the implementation of campaign)

HA : µD > 0 (There is a difference in the mean conversion rate before and after the implementation of campaign)

Since, the objective is to identify whether the campaign results in significantly better conversion rates or not – the alternate hypothesis is to prove the difference is greater than 0.

StatesBeforeAfterDifference
Massachusetts1.86%2.01%0.15%
Colorado1.83%2.01%0.17%
Maryland1.79%1.91%0.11%
California1.77%1.88%0.11%
Washington1.71%1.77%0.06%
Connecticut1.70%1.74%0.04%
Minnesota1.70%1.74%0.04%
Utah1.66%1.74%0.08%
Virginia1.66%1.73%0.07%
Delaware1.62%1.70%0.08%

Representations of null and alternative hypotheses:

  • H0   : µD = 0
  • HA1 : µD ≠ 0 (two-tailed)
  • HA2 : µD > 0 (upper-tailed)
  • HA3 : µD < 0 (lower-tailed)

There are 4 important assumptions in performing a  test as listed below:

  1. The dependant variable must be continuous
  2. Observations are independent
  3. The dependant variable must be normally distributed
  4. The dependant variable cannot have outliers
19
19

We reject the null hypothesis and state that the conversion rate is significantly better after the implementation of the campaign with 95% confidence (α=0.05) which means that the statistically significant difference in result is not by chance.

Related Blogs

Businesses are embracing the scalability and flexibility offered by cloud solutions. However, cloud migration often poses…

Streamlit is an open-source Python library designed to effortlessly create interactive web applications for data science…

Fleet business is huge and ever-expanding. The global fleet market will grow at a CAGR of…

Scroll to Top