| Hours (h) | 4≤h<5 | 5≤h<6 | 6≤h<7 | 7≤h<8 | 8≤h<9 | 9≤h<10 | 10≤h<11 |
|---|---|---|---|---|---|---|---|
| Frequency | 3 | 7 | 18 | 24 | 16 | 9 | 3 |
| Midpoint | 4.5 | 5.5 | 6.5 | 7.5 | 8.5 | 9.5 | 10.5 |
| Cumul. freq. | 3 | 10 | 28 | 52 | 68 | 77 | 80 |
Method:
Source of bias: People at a shopping mall on Saturday afternoon are not representative of all residents aged 13–25. Teenagers may be under-represented if they don't visit malls independently; wealthier individuals may be over-represented. Any well-reasoned contextual bias is acceptable.
Most appropriate:
Justification: Since the study aims to compare sleep across age groups, stratified sampling ensures both groups appear in correct proportion — preventing simple random sampling from accidentally under-representing either group.
| Upper boundary (h) | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
|---|---|---|---|---|---|---|---|
| Frequency | 3 | 7 | 18 | 24 | 16 | 9 | 3 |
| Cumulative frequency | 3 | 10 | 28 | 52 | 68 | 77 | 80 |
Quartiles are read at cumulative frequencies \(\tfrac{1}{4}n\), \(\tfrac{1}{2}n\), \(\tfrac{3}{4}n\):
Accept: \(Q_1\in[6.3,6.5]\) | \(Q_2\in[7.2,7.4]\) | \(Q_3\in[8.1,8.3]\)
Accept: \(P_{20}\in[6.0,6.3]\) | \(P_{90}\in[9.3,9.6]\)
Contextual interpretation of \(P_{90}\): 90% of the surveyed residents aged 13–25 sleep fewer than approximately 9.4 hours per night on a typical school/work night. Only 10% sleep more than this.
IQR: accept any value consistent with the student's own \(Q_1\) and \(Q_3\).
IQR vs range: The IQR measures the spread of the middle 50% of the data and is resistant to extreme values and outliers. The range uses only the two most extreme values and can be heavily distorted by a single outlier, giving a misleading impression of typical spread.
The two values the GDC gives:
Correct choice:
Justification: The 80 surveyed residents are the complete dataset being analysed — we are not using a sample to estimate a larger unknown population parameter. We treat the 80 values as the full population of interest, so we divide by \(n\).
Contextual interpretation: On average, a resident's nightly sleep deviates from the mean (\(\approx 7.5\) h) by about 1.38 hours. The researcher prefers \(\sigma\) over the range because \(\sigma\) incorporates every data value, not just the two extremes, making it a more stable and informative measure of spread.
Using the GDC quartile values based on midpoints:
Contextual meaning: All sleep durations in the sample fall within the expected range. No resident's sleep time is extreme enough to be flagged as an outlier relative to the rest of the group.
Five-number summary (midpoint-based, no outliers):
Mean vs median: \(\bar{x}\approx7.53\) h, \(Q_2=7.5\) h. The mean and median are very close, suggesting the distribution is approximately symmetric with only a very slight positive skew.
Model comparative statements:
Multiplying all data values by 60 applies the linear transformation \(Y=60X\):
What stays the same: The shape of the distribution — skewness, symmetry, relative position of the median within the box, and whether outliers exist are all unchanged.
Formal justification: