⏱ 40–45 min👥 Groups of 3–4SL & HL🖊 Work on your whiteboard
1
What Do You See?
4.4 · Scatter diagrams · Correlation
🔓
Task 1a
Describe and rank the six scatter plots
Six soccer club datasets are shown below. For each plot, describe the relationship in words — direction, strength, pattern. Then rank all six from the strongest relationship to the weakest.
A Training hrs vs Sprint speed
B Possession % vs Goals
C Fouls vs Yellow cards
D Age vs Transfer value
E Shirt number vs Goals
F Goals conceded vs Rank
Task 1b
Invent a measuring number
If you had to create a single number to measure the strength of a relationship between two variables, what properties would it need to have? Write your answer on the board.
Task 1c
Find the trap
One of the six plots would give a misleading result for any linear measuring number. Which one, and why?
🔐 Enter teacher code to unlock Phase 2
2
Someone Already Invented It
4.4 · Pearson's r · Regression line of y on x
🔒
🔒
Complete Phase 1 to unlock
📋 Dataset — Youth Academy Sprint Training (n = 12)
x (hrs/wk)
4
5
6
6
7
8
8
9
10
11
12
13
y (m/s)
6.1
6.4
6.3
6.8
7.0
7.2
6.9
7.5
7.4
7.8
7.9
8.1
Task 2a
Calculate Pearson's r by hand
It is called Pearson's product-moment correlation coefficient, denoted \(r\), and it is calculated as:
Split the 12 players across your group members. Each person computes their rows, then reconvene to sum the columns and calculate \(r\). Check: does your answer have the properties your group listed in Task 1b?
Task 2b
Verify with your GDC and interpret
Now let your calculator do it. Follow the steps for your model:
Casio fx-CG50: MENU → Statistics → enter x values in List 1, y values in List 2 → CALC (F2) → REG (F3) → ax+b (F1). Read off \(r\) from the output. If \(r\) is not showing, go to SET UP → Stat Wind → Manual.
TI-Nspire: Insert a Lists & Spreadsheet page → enter x in column A, y in column B → Menu → Statistics → Stat Calculations → Linear Regression (mx+b) → select columns A and B. Read off \(r\). If \(r\) is missing, go to Home → Settings → General → Diagnostics → On.
Does the GDC value of \(r\) match your hand calculation? Write one sentence interpreting \(r\) in context — what does it tell the scout about the relationship between training and sprint speed?
Task 2c
Find the mean point and the regression line of y on x
Calculate \(\bar{x}\) and \(\bar{y}\). Plot the mean point \((\bar{x}, \bar{y})\) on your scatter diagram. Draw a line of best fit by eye through this point. Then use your GDC to find the regression line of \(y\) on \(x\). Compare the GDC line to your sketch — how close were you?
The regression line must pass through the mean point. Why do you think this has to be the case?
Task 2d
Make your first prediction
Use the regression line of \(y\) on \(x\) to predict the sprint speed of a player who trains 10 hours per week. Show your substitution clearly.
Using the regression line from Phase 2, make three predictions and classify each one. For each: state whether it is an interpolation or extrapolation, and decide whether you trust the prediction. Justify your answer on the board.
Predict sprint speed for a player training 9 hrs/wk
Predict sprint speed for a player training 12 hrs/wk
Predict sprint speed for a player training 25 hrs/wk
The data was collected for players training between 4 and 13 hours per week. Does that matter?
Task 3b
The reverse question
The scout wants to recruit a player who runs at exactly 7.6 m/s. How many training hours per week should that player be doing?
Step 1: Try answering using the regression line you already have. Rearrange it to make \(x\) the subject and substitute \(y = 7.6\).
Step 2: Now find the regression line of \(x\) on \(y\) using your GDC (swap the list roles). Substitute \(y = 7.6\) directly into this new line.
Step 3: Compare both answers — are they the same? Discuss with your group why you might get different results.
Task 3c
Two lines — why?
There is a regression line of \(y\) on \(x\), and a separate regression line of \(x\) on \(y\). Both pass through the mean point \((\bar{x}, \bar{y})\). Use your GDC to find the \(x\) on \(y\) line, plot both lines on the same diagram, and answer:
Where do the two lines intersect?
Are they the same line? If not, what is different about them?
Which line should you use to answer Task 3b — and why?
When would the two lines be identical?
🔐 Enter teacher code to unlock Phase 4
4
The Churros Conspiracy
4.4 · Reading data critically · Correlation vs causation
🔒
🔒
Complete Phase 3 to unlock
📋 Dataset — Churros Sold vs Goals Scored (20 home matches)
Match
1
2
3
4
5
6
7
8
9
10
Churros sold (x)
210
185
340
290
155
410
375
230
460
310
Goals scored (y)
1
0
3
2
1
4
3
1
4
2
Match
11
12
13
14
15
16
17
18
19
20
Churros sold (x)
270
195
385
440
160
325
280
500
215
355
Goals scored (y)
2
1
3
4
0
3
2
5
1
3
Task 4a
Run the numbers
A club sponsor has looked at this data and made a claim: "On matches when we sell more churros, the team scores more goals. We should invest in selling more churros to help the team win."
Before you agree or disagree, run the full analysis: calculate \(r\), find the regression line of goals on churros, and predict the number of goals if 600 churros are sold. Write all results on the board — do not draw any conclusions yet.
Task 4b
Question the claim
Now look critically at your results and discuss these questions on the board:
Is your prediction for 600 churros an interpolation or an extrapolation? Does that affect how much you trust it?
Think about when matches tend to have large crowds. What kinds of matches attract more fans?
Would those same matches also tend to have more goals? Why?
So what is actually driving both the churros sales and the goals — is it the churros themselves, or something else?
If the club sells 600 churros at a quiet Tuesday league match, do you expect the prediction to hold?
Task 4c
The verdict
Write a final recommendation on the board for the scout. Your verdict must include: the value of \(r\) and what it shows, whether you trust the prediction and why, and the real reason the two variables appear to be linked. Should the sponsor invest in churros?
🔐 Show your teacher to complete the activity
🎉
Activity Complete!
You've worked through all four phases of The Soccer Scout Report. Well done — get ready for the class consolidation.