Mastering Statistical Correlation: A Complete Guide with 10+ Solved Problems

1. What is Correlation?

Correlation is a statistical technique that measures the degree and direction of relationship between two or more variables.

Simple Definition
Correlation tells us: Do two things move together? Example: When temperature rises, do ice cream sales also rise? If YES → They are correlated. If NO → They are not correlated.

Key Point: Correlation does NOT mean Causation. Just because two variables move together doesn’t mean one causes the other.

2. Types of Correlation

A. Based on Direction

TypeMeaningExample
Positive CorrelationBoth variables move in the SAME directionHeight ↑ Weight ↑
Negative CorrelationVariables move in OPPOSITE directionsPrice ↑ Demand ↓
Zero CorrelationNO relationship between variablesShoe size & IQ

B. Based on Number of Variables

TypeMeaningExample
Simple2 variables onlyPrice vs Demand
Multiple3 or more variablesCrop yield vs Rain, Fertilizer
Partial2 variables, effect of others removedIncome vs Savings (removing age)

C. Based on Ratio of Change

TypeMeaningGraph Shape
LinearConstant ratio of changeStraight line on graph
Non-Linear (Curvilinear)Changing ratioCurve on graph

3. Methods of Studying Correlation

Overview of Methods
1. Scatter Diagram Method — Visual/Graphical 2. Karl Pearson’s Coefficient (r) — Mathematical (Quantitative data) 3. Spearman’s Rank Correlation (ρ) — Mathematical (Ranked/Qualitative data)

4. Scatter Diagram Method

A scatter diagram is a graph where each pair of values (X, Y) is plotted as a dot. The pattern of dots shows the type and strength of correlation.

PatternTyper value
Dots rise left to rightPositive Correlation+ve
Dots fall left to rightNegative Correlation–ve
Dots scattered randomlyNo Correlation0
Dots close to a lineHigh Correlation±1
Dots spread widelyLow CorrelationNear 0

Advantage: Quick visual understanding. Limitation: Cannot give exact numerical value of correlation.

5. Method 1: Karl Pearson’s Coefficient (r)

This is the most widely used method to calculate the exact value of correlation between two quantitative variables.

Formula
r = Σ(X–X̄)(Y–Ȳ) / √[Σ(X–X̄)² × Σ(Y–Ȳ)²] OR equivalently: r = [ΣXY – n·X̄·Ȳ] / √[(ΣX² – nX̄²)(ΣY² – nȲ²)]

Where:

  • X̄ = Mean of X values
  • Ȳ = Mean of Y values
  • n = Number of pairs of observations
  • r always lies between –1 and +1

Properties of r:

  1. r = +1 → Perfect positive correlation
  2. r = –1 → Perfect negative correlation
  3. r = 0 → No correlation
  4. r is unit-free (no units like kg, cm, etc.)
  5. r is not affected by change of origin or scale

6. Method 2: Spearman’s Rank Correlation (ρ)

Used when data is in the form of ranks (ordinal data) or when exact measurement is not possible.

Formula (Without Repeated Ranks)
ρ = 1 – [6ΣD² / n(n²–1)]
Formula (With Repeated Ranks)
ρ = 1 – [6(ΣD² + CF) / n(n²–1)] Correction Factor (CF) = (m³–m)/12 for each tie group (m = number of tied ranks)

Where:

  • D = Difference between ranks (R₁ – R₂)
  • n = Number of pairs
  • ρ lies between –1 and +1

7. Solved Problems

Solved Problem 1 — Karl Pearson’s Method
Find the Karl Pearson’s coefficient of correlation for the following data:
XYXY
1015150100225
2025500400625
303510509001225
4040160016001600
5050250025002500
ΣX=150ΣY=165ΣXY=5800ΣX²=5500ΣY²=6175

n = 5, X̄ = 150/5 = 30, Ȳ = 165/5 = 33

r = [5800 – 5×30×33] / √[(5500 – 5×900)(6175 – 5×1089)]

r = [5800 – 4950] / √[(5500–4500)(6175–5445)]

r = 850 / √[1000 × 730] = 850 / √730000 = 850 / 854.4 = 0.9948

Answer: r = 0.995 (approx.) → Very high positive correlation

Solved Problem 2 — Spearman’s Rank Method
Two judges ranked 8 contestants. Find the rank correlation coefficient.
R₁ (Judge A)R₂ (Judge B)D = R₁–R₂
12-11
2111
34-11
4311
56-11
6511
78-11
8711
  ΣD²8

ρ = 1 – [6 × 8 / 8(64–1)] = 1 – 48/504 = 1 – 0.0952 = 0.905

Answer: ρ = 0.905 → High positive correlation

Solved Problem 3 — Karl Pearson (Larger Dataset)
Calculate r for the following data of advertising expenditure (X in ₹1000) and sales (Y in ₹1000):
XYXY
540200251600
10505001002500
15609002253600
208016004006400
259022506258100
30100300090010000
Σ=105Σ=420Σ=8450Σ=2275Σ=32200

n = 6, X̄ = 17.5, Ȳ = 70

r = [8450 – 6×17.5×70] / √[(2275 – 6×306.25)(32200 – 6×4900)]

r = [8450 – 7350] / √[(2275–1837.5)(32200–29400)]

r = 1100 / √[437.5 × 2800] = 1100 / √1225000 = 1100 / 1106.8 = 0.9939

Answer: r = 0.994 → Very strong positive correlation

Solved Problem 4 — Karl Pearson (Negative Correlation)
Price (X) and Demand (Y) for a commodity. Find r.
X (Price)Y (Demand)XY
25010042500
440160161600
635210361225
82520064625
1015150100225
Σ=30Σ=165Σ=820Σ=220Σ=6175

n = 5, X̄ = 6, Ȳ = 33

r = [820 – 5×6×33] / √[(220 – 5×36)(6175 – 5×1089)]

r = [820 – 990] / √[(220–180)(6175–5445)] = –170 / √[40×730]

r = –170 / √29200 = –170 / 170.88 = –0.9948

Answer: r = –0.995 → Very strong negative correlation

Solved Problem 5 — Spearman’s (With Repeated Ranks)
Marks of 7 students in two subjects. Some marks are equal. Find ρ.
StudentMarks ARank R₁Marks BRank R₂D
P682653-11
Q643.56821.52.25
R75170100
S55660424
T643.5585-1.52.25
U50745700
V605506-11
    ΣD² 10.5

Repeated rank: 64 appears twice in A → Ranks 3 & 4 → Average = 3.5, m = 2

CF = (2³–2)/12 = 6/12 = 0.5

ρ = 1 – [6(10.5 + 0.5) / 7(49–1)] = 1 – [66/336] = 1 – 0.1964 = 0.804

Answer: ρ = 0.804 → High positive correlation

Solved Problem 6 — Karl Pearson (Step Deviation Method)
Use step deviation to find r when: X: 100, 200, 300, 400, 500 and Y: 30, 50, 60, 80, 100

Take A = 300 (assumed mean for X), B = 60 (assumed mean for Y)

Let dX = (X–300)/100, dY = (Y–60)/10

XYdXdYdX×dYdX²dY²
10030-2-3649
20050-1-1111
3006000000
4008012214
500100248416
Σ 02171030

r = [ΣdXdY – n·d̄X·d̄Y] / √[(ΣdX² – nd̄X²)(ΣdY² – nd̄Y²)]

r = [17 – 5×0×0.4] / √[(10 – 0)(30 – 5×0.16)] = [17–0] / √[10 × 29.2]

r = 17 / √292 = 17 / 17.088 = 0.9948

Answer: r = 0.995 → Very strong positive correlation

Note: Step deviation simplifies calculations but gives the SAME r value!

Solved Problem 7 — Spearman’s (Ranks Given Directly)
10 students ranked by two teachers. Find ρ.
StudentR₁R₂D
A13-24
B2111
C35-24
D4224
E5411
F68-24
G7611
H8711
I910-11
J10911
   ΣD²22

ρ = 1 – [6×22 / 10(100–1)] = 1 – 132/990 = 1 – 0.1333 = 0.867

Answer: ρ = 0.867 → High positive agreement

Solved Problem 8 — Perfect Positive Correlation
X: 1, 2, 3, 4, 5 and Y: 2, 4, 6, 8, 10. Find r.
XYXY
12214
248416
3618936
48321664
5105025100
Σ=15Σ=30Σ=110Σ=55Σ=220

r = [110 – 5×3×6] / √[(55–45)(220–180)] = 20/√[10×40] = 20/20 = 1.00

✅ Answer: r = +1.00 → Perfect positive correlation (Y = 2X)

Solved Problem 9 — Karl Pearson (Real-World: Study Hours vs Marks)
Hours studied (X) and Marks obtained (Y). Find r.
X (Hours)Y (Marks)XY
2408041600
35015092500
565325254225
780560496400
885680647225
10959501009025
Σ=35Σ=415Σ=2745Σ=251Σ=30975

n = 6, X̄ = 35/6 = 5.833, Ȳ = 415/6 = 69.167

r = [2745 – 6(5.833)(69.167)] / √[(251 – 6×34.03)(30975 – 6×4784.08)]

r = [2745 – 2420.83] / √[(251–204.17)(30975–28704.5)]

r = 324.17 / √[46.83 × 2270.5] = 324.17 / √106336 = 324.17 / 326.1 = 0.9941

Answer: r = 0.994 → More study hours = Better marks!

Solved Problem 10 — Spearman’s (Negative Correlation)
Ranks of 6 items by Quality (R₁) and Price (R₂). Find ρ.
ItemR₁ (Quality)R₂ (Price)D
A16-525
B25-39
C34-11
D4311
E5239
F61525
   ΣD²70

ρ = 1 – [6×70 / 6(36–1)] = 1 – 420/210 = 1 – 2 = –1.00

✅ Answer: ρ = –1.00 → Perfect negative correlation (ranks are exactly reversed!)

8. Interpretation of r and ρ Values

Value of r/ρDegreeMeaning
+1.00Perfect PositiveVariables move exactly together
+0.75 to +0.99High PositiveStrong direct relationship
+0.50 to +0.74Moderate PositiveFair direct relationship
+0.25 to +0.49Low PositiveWeak direct relationship
0No CorrelationNo linear relationship
–0.25 to –0.49Low NegativeWeak inverse relationship
–0.50 to –0.74Moderate NegativeFair inverse relationship
–0.75 to –0.99High NegativeStrong inverse relationship
–1.00Perfect NegativeVariables move exactly opposite
Important Warning
Correlation ≠ Causation! Example: Ice cream sales and drowning deaths are positively correlated. Does ice cream cause drowning? NO! Both increase in summer (hidden variable: temperature). Always look for the LURKING VARIABLE before concluding causation.

9. Which Method to Use When?

SituationBest MethodSymbol
Data is quantitative (numbers)Karl Pearsonr
Data is already rankedSpearmanρ
Data is qualitative (beauty, intelligence)Spearmanρ
Quick visual check neededScatter DiagramVisual
Small sample + ordinal dataSpearmanρ
Large sample + continuous dataKarl Pearsonr
Data has extreme outliersSpearman (more robust)ρ
Exact numerical value neededKarl Pearson/Spearmanr or ρ

10. Practice Problems (Try Yourself!)

Practice 1
Find Karl Pearson’s r: X: 12, 14, 16, 18, 20, 22 and Y: 25, 30, 28, 35, 40, 38 Hint: n = 6. First find ΣX, ΣY, ΣXY, ΣX², ΣY²
Practice 2
Find Spearman’s ρ for ranks: R₁: 1,2,3,4,5,6,7 and R₂: 7,5,3,4,1,2,6 Hint: Compute D = R₁ – R₂, then ΣD²
Practice 3
X: 5, 8, 12, 15, 18, 22, 25 and Y: 110, 95, 80, 70, 55, 40, 25. Find r and interpret. Hint: Expect a negative value!
Practice 4
Two interviewers rank 8 candidates. Some candidates score equal marks. Find ρ with correction factor. Marks A: 75, 80, 68, 80, 90, 65, 72, 85 | Marks B: 70, 85, 65, 75, 95, 60, 70, 80

11. Exam Tips

Top Exam Tips
Always check: –1 ≤ r ≤ +1. If your answer is outside this range, recheck!Write the formula first, then substitute values — it earns marks even if calculation goes wrong.Make a table for calculations — it’s organized and reduces errors.Don’t forget the correction factor for repeated ranks in Spearman’s!Always state interpretation: “high positive”, “moderate negative” etc.Remember: r is unit-free. Don’t write units in your answer.For step deviation method: r doesn’t change with change of origin/scale.If question asks “comment on relationship” — always mention direction + strength.Practice at least 3 problems of each type before exams.In MCQs: If all ranks are reversed, ρ = –1. If ranks are identical, ρ = +1.

12. Formula Summary Box 📋

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top