| Player | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Training hrs/wk (x) | 4 | 5 | 6 | 6 | 7 | 8 | 8 | 9 | 10 | 11 | 12 | 13 |
| Sprint speed m/s (y) | 6.1 | 6.4 | 6.3 | 6.8 | 7.0 | 7.2 | 6.9 | 7.5 | 7.4 | 7.8 | 7.9 | 8.1 |
| Means | \(\bar{x} = 8.25\) hrs/wk · \(\bar{y} \approx 7.117\) m/s | |||||||||||
| Match | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Churros sold (x) | 210 | 185 | 340 | 290 | 155 | 410 | 375 | 230 | 460 | 310 |
| Goals scored (y) | 1 | 0 | 3 | 2 | 1 | 4 | 3 | 1 | 4 | 2 |
| Match | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
|---|---|---|---|---|---|---|---|---|---|---|
| Churros sold (x) | 270 | 195 | 385 | 440 | 160 | 325 | 280 | 500 | 215 | 355 |
| Goals scored (y) | 2 | 1 | 3 | 4 | 0 | 3 | 2 | 5 | 1 | 3 |
| Means | \(\bar{x} = 304.5\) churros · \(\bar{y} = 2.25\) goals | |||||||||
Give code SCATTER once all groups have ranked all six plots and answered Task 1c.
Expected descriptions and rankings. Accept reasonable variation in language — the goal is that students use direction, pattern, and spread.
Expected ranking (strongest to weakest, by |r|):
Generative task — no single correct answer. Look for groups who independently arrive at any of:
Target answer: Plot D (Age vs Transfer value).
The relationship is clearly curved — transfer value rises through a player's twenties and falls sharply after ~27–30. A linear measuring number would give a value near zero, making it appear there is no relationship — which is obviously false.
Give code PEARSON once all groups have found the regression line and made their first prediction.
The activity introduces the formula directly on screen — no teacher introduction needed. Students read the definition of \(S_{xx}\), \(S_{yy}\), \(S_{xy}\) on the page and begin the table immediately.
Step 1 — compute the means:
Step 2 — complete the calculation table:
| \(x_i\) | \(y_i\) | \(x_i - \bar{x}\) | \(y_i - \bar{y}\) | \((x_i-\bar{x})^2\) | \((y_i-\bar{y})^2\) | \((x_i-\bar{x})(y_i-\bar{y})\) |
|---|---|---|---|---|---|---|
| 4 | 6.1 | −4.25 | −1.017 | 18.063 | 1.034 | 4.321 |
| 5 | 6.4 | −3.25 | −0.717 | 10.563 | 0.514 | 2.329 |
| 6 | 6.3 | −2.25 | −0.817 | 5.063 | 0.667 | 1.838 |
| 6 | 6.8 | −2.25 | −0.317 | 5.063 | 0.100 | 0.713 |
| 7 | 7.0 | −1.25 | −0.117 | 1.563 | 0.014 | 0.146 |
| 8 | 7.2 | −0.25 | +0.083 | 0.063 | 0.007 | −0.021 |
| 8 | 6.9 | −0.25 | −0.217 | 0.063 | 0.047 | 0.054 |
| 9 | 7.5 | +0.75 | +0.383 | 0.563 | 0.147 | 0.288 |
| 10 | 7.4 | +1.75 | +0.283 | 3.063 | 0.080 | 0.496 |
| 11 | 7.8 | +2.75 | +0.683 | 7.563 | 0.467 | 1.879 |
| 12 | 7.9 | +3.75 | +0.783 | 14.063 | 0.614 | 2.938 |
| 13 | 8.1 | +4.75 | +0.983 | 22.563 | 0.967 | 4.671 |
| Sums → | \(S_{xx} = 88.25\) | \(S_{yy} = 4.657\) | \(S_{xy} = 19.65\) | |||
Step 3 — substitute into the formula:
GDC steps are printed on the student page — students follow them independently. Your role here is to circulate and check that DiagnosticOn / Stat Wind settings are active before anyone gets stuck.
The GDC applies the same formula internally. The hand calculation was for understanding; the GDC gives speed in examinations.
Interpretation: There is a strong positive linear correlation between weekly training hours and sprint speed. Players who train more hours per week tend to have higher sprint speeds, and the relationship is well-described by a straight line.
Mean point from the sums above:
Students plot this point on their scatter diagram and draw a line through it that best follows the trend. The constraint is explicit: the line must pass through \((\bar{x}, \bar{y})\).
The regression coefficients follow directly from the table sums:
Verify mean point: \(\hat{y}(8.25) = 0.2227(8.25) + 5.280 = 1.837 + 5.280 = 7.117\) ✓
Students compare their hand-sketched line to the GDC line. In most cases these are close but not identical — the GDC minimises the sum of squared vertical residuals, which the eye cannot do exactly.
x = 10 lies within the data range (4–13) — this is an interpolation.
The scout now has a concrete tool: give a training load, get a predicted sprint speed. Phase 3 will examine how far this tool can be trusted — and in which direction.
Give code PREDICT once all groups have found the x on y line and completed Task 3c.
Using the regression line from Phase 2 (ŷ = 0.223x + 5.28), classify and assess three predictions:
| x value | In range? (4–13 hrs) | Predicted ŷ | Verdict |
|---|---|---|---|
| 9 hrs/wk | ✓ Interpolation | 0.2227(9) + 5.280 = 7.28 m/s | Reliable |
| 12 hrs/wk | ✓ Interpolation (near edge) | 0.2227(12) + 5.280 = 7.95 m/s | Reliable, with mild caution |
| 25 hrs/wk | ✗ Extrapolation | 0.2227(25) + 5.280 = 10.85 m/s | Unreliable |
Discussion questions for the whiteboard:
Students work through three steps independently — the activity page guides all of it. Your role is to circulate and prompt the Step 3 discussion if groups accept the two answers without comparing them.
Step 1 — rearranging the y-on-x line (what most groups do first):
Now find the proper x on y regression line using the same sums from Phase 2:
Verify mean point: \(\hat{x}(7.117) = 4.220(7.117) - 21.78 = 30.03 - 21.78 = 8.25\) ✓
Predict x for y = 7.6 m/s:
This is the conceptual heart of Phase 3. Push groups to articulate the answer on their boards before giving it.
What each line minimises:
These are genuinely different optimisation problems, so they give different lines. They only coincide when r = ±1 (perfect linear fit — all points on the line, no residuals in either direction). As |r| decreases, the angle between them opens up.
Rule for which line to use:
Give code CHURROS once all groups have written their verdict and invented a spurious correlation.
Students run the full analysis before drawing any conclusions. Do not pre-empt the tension.
The activity scaffolds the discovery through five guided questions. Expected responses: