
| Hours (h) | 4≤h<5 | 5≤h<6 | 6≤h<7 | 7≤h<8 | 8≤h<9 | 9≤h<10 | 10≤h<11 |
|---|---|---|---|---|---|---|---|
| Frequency | 3 | 7 | 18 | 24 | 16 | 9 | 3 |
| Midpoint | 4.5 | 5.5 | 6.5 | 7.5 | 8.5 | 9.5 | 10.5 |
| Cumul. freq. | 3 | 10 | 28 | 52 | 68 | 77 | 80 |
"A public health researcher surveyed 80 teenagers and young adults in a large city about their nightly sleep. The data is in front of you. Your job today: interrogate this data — figure out how it was collected, what patterns it shows, and what it tells us about the city's health. All work goes on the whiteboard first."
SL & HL 4.1 · Sampling methods & bias
Source of bias: People at a shopping mall on Saturday afternoon are not representative of all residents aged 13–25. Teenagers may be under-represented if they don't visit malls independently; wealthier individuals may be over-represented.
Justification: Sleep patterns likely differ between teenagers (school schedules) and young adults (work/university schedules). Stratified sampling ensures both groups are proportionally represented, producing a more reliable estimate for each subgroup.
Total population = 3 200 + 4 800 = 8 000
SL & HL 4.2 · Ogive · Quartiles · Percentiles · IQR
| Upper boundary | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
|---|---|---|---|---|---|---|---|
| Cumulative frequency | 3 | 10 | 28 | 52 | 68 | 77 | 80 |
Quartiles are read at cumulative frequencies \(\tfrac{1}{4}n\), \(\tfrac{1}{2}n\), \(\tfrac{3}{4}n\):
Accept: \(Q_1\in[6.3,6.5]\) | \(Q_2\in[7.2,7.4]\) | \(Q_3\in[8.1,8.3]\)
Accept: \(P_{20}\in[5.9,6.2]\) | \(P_{90}\in[9.1,9.4]\)
Contextual interpretation of \(P_{90}\): 90% of the 80 residents surveyed sleep fewer than approximately 9.2 hours per night. Only 10% sleep that long or more.
Why IQR is more useful: The IQR describes the spread of the middle 50% of the data, ignoring extreme values at both ends. The range is sensitive to just two data points (the minimum and maximum) and is easily distorted by outliers.
IQR is resistant to outliers; range is not. Both describe spread, but IQR is more robust for skewed or extreme data.
SL & HL 4.3 · Mean · σ · Outlier test · Box and whisker
Justification: The 80 residents are the complete dataset being analysed — we treat them as the full population of interest, not a sample used to estimate a larger unknown parameter, so we divide by \(n\).
Contextual meaning: A resident's nightly sleep deviates from the mean (≈7.5 h) by about 1.38 hours on average. The researcher prefers \(\sigma\) over the range because \(\sigma\) uses every data value, making it a more stable measure of spread.
Using GDC quartile values based on midpoints: \(Q_1=6.5\) h, \(Q_3=8.5\) h, IQR \(=2.0\) h
Contextual meaning: All sleep durations fall within the expected range. An outlier would represent a resident sleeping an unusually extreme amount — fewer than 3.5 h or more than 11.5 h — which would warrant a follow-up interview to check for data errors or unusual circumstances.
Five-number summary (midpoint-based, no outliers):
Mean vs median: \(\bar{x}\approx7.53\) h, \(Q_2=7.5\) h. The mean and median are very close, suggesting an approximately symmetric distribution with a very slight positive skew.
SL & HL 4.2–4.3 · Comparative box plots · Linear transformations
Given data for the neighbouring city: mean = 6.8 h, σ = 1.1 h, \(Q_1=6.1\) h, \(Q_2=6.7\) h, \(Q_3=7.4\) h.
Box plot summary (neighbouring city): box from 6.1 to 7.4; median line at 6.7; whiskers to reasonable min/max (no outliers given, so accept approximately \(6.1 - 1.5(1.3) = 4.15\) and \(7.4 + 1.5(1.3) = 9.35\)).
Model comparative statements:
Multiplying all data values by 60 applies the transformation \(Y=60X\):
What stays the same: The shape of the distribution — skewness, symmetry, relative position of the median within the box, and whether outliers exist are all unchanged.
For \(Y=X+c\): location shifts by \(c\), spread is unchanged. For \(Y=kX\): both location and spread scale by \(k\). This is a high-frequency exam question.