โšฝ The Scenario

Background โ€” your teacher will read this aloud

A football scout has been collecting data on youth academy players and rival clubs. Before you analyse any numbers, she needs you to look at some charts and tell her what you see. Your job: figure out how to measure relationships in data, build the tool mathematicians invented for exactly this, and call out a very suspicious claim.

1
Phase 1
What Do You See?
Task 1a
Describe and rank the six scatter plots.
For each plot: describe the direction, strength, and pattern. Then rank all six from strongest relationship to weakest.
Task 1b
Invent a measuring number.
If you had to create a single number to measure relationship strength, what properties would it need? Write them on the board.
Task 1c
Find the trap.
One plot would give a misleading result for any linear measure. Which one, and why?
Code to unlock Phase 2:SCATTER
2
Phase 2
Someone Already Invented It
Dataset โ€” Youth Academy (n = 12)
x (hrs/wk)4566788910111213
y (m/s)6.16.46.36.87.07.26.97.57.47.87.98.1
Task 2a
Calculate Pearson's r by hand.
\(r = S_{xy}/\sqrt{S_{xx}\cdot S_{yy}}\) where \(S_{xx}=\sum(x_i-\bar x)^2\), \(S_{yy}=\sum(y_i-\bar y)^2\), \(S_{xy}=\sum(x_i-\bar x)(y_i-\bar y)\). Split rows across group members.
Task 2b
Verify with GDC and interpret.
Casio: STAT โ†’ CALC โ†’ REG โ†’ ax+b. Read r. TI: Lists & Spreadsheet โ†’ Statistics โ†’ Linear Regression. Write one sentence interpreting r in context.
Task 2c
Find \(\bar x\), \(\bar y\), plot the mean point, find the regression line.
Sketch a line of best fit through the mean point, then use GDC. Why must the line pass through the mean point?
Task 2d
Predict sprint speed for 10 hrs/wk.
Substitute into the regression line. Show full working.
Code to unlock Phase 3:PEARSON
3
Phase 3
How Far Can You Trust It?
Task 3a
Which predictions can you trust?
For each training load, state interpolation or extrapolation and whether you trust the prediction: 9 hrs/wk ยท 12 hrs/wk ยท 25 hrs/wk
Task 3b
The reverse question โ€” find x when y = 7.6 m/s.
Step 1: rearrange the y-on-x line. Step 2: find the x-on-y line via GDC (swap list roles). Step 3: compare both answers โ€” are they the same?
Task 3c
Two lines โ€” why?
Plot both regression lines on the same diagram. Where do they intersect? Are they the same line? Which to use for Task 3b, and why? When would they be identical?
Code to unlock Phase 4:PREDICT
4
Phase 4
The Churros Conspiracy
Dataset โ€” Churros vs Goals (20 home matches)
Match12345678910
Churros (x)210185340290155410375230460310
Goals (y)1032143142
Match11121314151617181920
Churros (x)270195385440160325280500215355
Goals (y)2134032513
Task 4a
Run the numbers first.
"We sell more churros โ†’ team scores more goals. Invest in churros!" Find r, the regression line of goals on churros, and predict goals for 600 churros. Write results on the board โ€” no conclusions yet.
Task 4b
Question the claim.
Is 600 churros interpolation or extrapolation? When do crowds tend to be large โ€” and would those matches also tend to have more goals? What is actually driving both variables?
Task 4c
Write the verdict.
Include: the value of r, whether the prediction is reliable, and the real reason both variables appear linked. Should the sponsor invest in churros?
Code to complete:CHURROS