Stand Up & Math · Teacher Answers

The Flagged Essay

Not for distribution · Full worked solutions + marking guidance · AHL 4.13
🔐 Interactive Student HTML — Unlock Codes (case-insensitive)
Phase 1 → Phase 2PRIOR01
Phase 2 → Phase 3BAYES02
Phase 3 → Phase 4UPDATE03
Phase 4 → CompleteVERDICT04
📋 Reference — Dataset (all phases, n = 1000 essays)
AI-assisted Human-written Total
Flagged by detector 153 82 235
Not flagged 27 738 765
Total 180 820 1000

Additional data for Phase 3: among AI-assisted essays, 76% score "low" on vocabulary diversity. Among human-written essays, 24% score "low".

1

Reading the Evidence — Priors & Likelihoods

AHL 4.13 · Conditional probability from a table · Vocabulary: prior, likelihood, hypothesis
Task 1a Prior probabilities
Reading directly from the totals row:
\( P(\text{AI}) = \dfrac{180}{1000} = \) 0.18
\( P(\text{Human}) = \dfrac{820}{1000} = \) 0.82
These are called prior probabilities — our belief about an essay's origin before any evidence from the detector is considered.
Task 1b Likelihoods — how the detector behaves
\( P(\text{Flagged} \mid \text{AI}) = \dfrac{153}{180} = \) 0.85
\( P(\text{Flagged} \mid \text{Human}) = \dfrac{82}{820} = \) 0.10
\( P(\text{Not flagged} \mid \text{AI}) = \dfrac{27}{180} = \) 0.15
\( P(\text{Not flagged} \mid \text{Human}) = \dfrac{738}{820} = \) 0.90
These are called likelihoods — how probable each piece of evidence is, given a particular hypothesis. Note: likelihoods are not the probability of the hypothesis.
Task 1c Interpreting the detector
The detector correctly flags 85% of AI essays — this is its sensitivity (true positive rate). It incorrectly flags 10% of human essays — this is its false positive rate.
If a student is flagged, this does not immediately mean P(AI | flagged) = 0.85. Groups should articulate why: the denominator includes both AI essays that were flagged and human essays that were wrongly flagged.
Teaching note: Push groups to distinguish between P(Flagged | AI) = 0.85 and P(AI | Flagged). Many students conflate these. This is precisely the misconception Bayes resolves in Phase 2.
2

The Bayes Flip — First Encounter with Bayes' Theorem

AHL 4.13 · Tree diagram (student-constructed) → posterior → formula as formalisation
First-exposure note: Students have not seen Bayes' theorem before this activity. The design is intentional — they construct the tree from scratch on the VNPS (Task 2a), arrive at the posterior using tree logic alone (Task 2b), verify it against the frequency table (Task 2c), and only then encounter the formula as a formalisation of what they already did (Task 2d). Do not show or name the formula before Task 2d. Let groups struggle productively with Task 2b — the "wait, really?" moment at 65.1% is the pedagogical core of this phase.
Task 2a Tree diagram — fully labelled (student-constructed on VNPS)
Students build this from scratch using their Phase 1 values. The expected tree has two first-level branches (AI-assisted, Human-written) with prior probabilities, and four second-level branches (Flagged / Not flagged from each) with likelihood probabilities. All four end-node products should be written and sum to 1.
🌿 Tree Diagram — Visual Correspondence with the Bayes Formula
Essay selected P(H₁) = 0.18 P(H₂) = 0.82 AI-assisted Human-written P(F|H₁) = 0.85 P(NF|H₁) = 0.15 P(F|H₂) = 0.10 P(NF|H₂) = 0.90 H₁ ∩ Flagged 0.85 × 0.18 = 0.153 ✦ H₁ ∩ Not Flagged 0.15 × 0.18 = 0.027 H₂ ∩ Flagged 0.10 × 0.82 = 0.082 ✦ H₂ ∩ Not Flagged 0.90 × 0.82 = 0.738
Tree → Formula mapping
Numerator = AI ∩ Flagged path product = 0.153
Denominator term 2 = Human ∩ Flagged = 0.082
Full denominator = sum of ✦ end-nodes = 0.235
Teaching note
The formula is not separate from the tree — it is the tree written algebraically. The denominator collects every path that ends in the observed evidence. The numerator is the single path corresponding to the hypothesis of interest. Students who understand this mapping can set up Bayes from a tree without memorising the formula.
Task 2b Posterior probability — the Bayes flip
An essay has been flagged. Applying Bayes' theorem:
Step 2 — Apply Bayes' theorem
\( P(\text{AI} \mid \text{Flagged}) = \dfrac{P(\text{Flagged} \mid \text{AI}) \cdot P(\text{AI})}{P(\text{Flagged})} \)

\( = \dfrac{0.85 \times 0.18}{0.235} = \dfrac{0.153}{0.235} \)

\( \approx \) 0.6511 (≈ 65.1%)
Even though the detector flagged the essay, there is only a 65.1% probability it is genuinely AI-assisted. This is the posterior probability — our updated belief after incorporating the evidence.
The "wait, really?" moment: Students typically expect this to be close to 85% (or even 100%). The large gap arises because 82 human-written essays are also flagged — they dilute the pool of flagged essays significantly. This is the Bayes surprise.
Task 2c Verify from the frequency table
From the table: 235 essays were flagged in total. Of these, 153 were genuinely AI-assisted.
\( P(\text{AI} \mid \text{Flagged}) = \dfrac{153}{235} \approx \) 0.6511 ✓
This confirms the Bayes calculation. Groups can use the table as a sanity check for all Phase 2 answers.
Task 2d Formula formalisation — Bayes' theorem introduced
After groups have computed the posterior from their tree, present the formula and ask them to label every part using their tree diagram on the board:
\( P(H_1 \mid F) = \dfrac{P(F \mid H_1) \cdot P(H_1)}{P(F \mid H_1) \cdot P(H_1) \;+\; P(F \mid H_2) \cdot P(H_2)} \)
Expected labelling:
· Numerator = the end-node product for the path they wanted (AI ∩ Flagged = 0.153)
· Each denominator term = one path product ending in "Flagged" (0.153 and 0.082)
· Full denominator = sum of all end-node products that end in the observed evidence = 0.235
Substituting: \( \dfrac{0.153}{0.153 + 0.082} = \dfrac{0.153}{0.235} \approx \) 0.6511 ✓
Key consolidation point: The formula is not a new method — it is the tree written algebraically. The denominator always collects every path that ends in the observed evidence. The numerator is the single path for the hypothesis of interest. Students who see this mapping can reconstruct the formula from any tree, without memorising it. This is the insight to draw out during consolidation from the walls.
3

Sequential Update — The Posterior Becomes the Prior

AHL 4.13 · Bayes with three events · Second signal: vocabulary diversity
Task 3a Tree diagram — updated prior, vocabulary diversity signal
Students draw a fresh tree using their Phase 2 posterior as the new root probabilities. The structure is identical to Phase 2 but with updated numbers. Expected values:
\( P(H_1) = 0.6511 \)  ·  \( P(H_2) = 0.3489 \)  ·  \( P(\text{Low vocab} \mid H_1) = 0.76 \)  ·  \( P(\text{Low vocab} \mid H_2) = 0.24 \)
The key observation to draw out: the tree is structurally identical to Phase 2 — same two hypotheses, same two outcomes — but the root has changed. This is the sequential update made visible. The posterior from one tree becomes the prior for the next.
🌿 Phase 3 Tree — Updated Prior, Vocabulary Diversity Evidence
Given: Flagged P(H₁) = 0.6511 P(H₂) = 0.3489 AI-assisted Human-written P(LV|H₁) = 0.76 P(NLV|H₁) = 0.24 P(LV|H₂) = 0.24 P(NLV|H₂) = 0.76 H₁ ∩ Low vocab 0.76 × 0.6511 = 0.4948 ✦ H₁ ∩ Not low vocab 0.24 × 0.6511 = 0.1563 H₂ ∩ Low vocab 0.24 × 0.3489 = 0.0837 ✦ H₂ ∩ Not low vocab 0.76 × 0.3489 = 0.2652
✦ Low vocab paths — what we need
AI ∩ Low vocab: 0.4948
Human ∩ Low vocab: 0.0837
Denominator: 0.4948 + 0.0837 = 0.5786
Posterior: 0.4948 ÷ 0.5786 ≈ 0.8553
What students should notice
The tree structure is identical to Phase 2 — two hypotheses, two outcomes, four end-nodes. Only the numbers changed. This is the power of the sequential update: the method is always the same, only the prior shifts.
LV = Low vocab  ·  NLV = Not low vocab
Task 3b Updated posterior — both signals combined
Let \( r = P(\text{AI} \mid \text{Flagged}) = 0.6511 \) from Phase 2. This is our new prior.
Step 1 — New denominator (total probability of low vocab, given flagged)
\( P(\text{Low vocab} \mid \text{Flagged}) = 0.76 \times 0.6511 + 0.24 \times 0.3489 \)
\( = 0.4948 + 0.0837 = 0.5786 \)

Step 2 — Apply Bayes again
\( P(\text{AI} \mid \text{Flagged} \cap \text{Low vocab}) = \dfrac{0.76 \times 0.6511}{0.5786} \)

\( = \dfrac{0.4948}{0.5786} \approx \) 0.8553 (≈ 85.5%)
The vocabulary signal moves the posterior from 65.1% to 85.5% — a meaningful update. Both signals together provide substantially stronger evidence than either alone.
Teaching note: This sequential update is equivalent to applying Bayes' theorem directly with the joint evidence (assuming conditional independence of the two signals given the hypothesis). Stronger groups can verify this algebraically.
Task 3c Interpretation
With both signals present, there is an 85.5% probability the essay is AI-assisted and a 14.5% probability it is human-written. The committee has stronger grounds to act — but meaningful uncertainty remains.
Expected response: groups should notice the posterior has increased significantly but is still not certain. Strong groups may question whether the two signals are truly independent — vocabulary diversity and AI writing style may be correlated, which could affect the calculation.
4

The Threshold — Sensitivity of the Posterior to the Prior

AHL 4.13 · Posterior as a function of prior · Numerical solution
Task 4a Two-tree chain — general form
Students draw two linked trees with \( p \) at the root of Tree 1. Expected general expressions:
Tree 1 — Detector:   \( r(p) = \dfrac{0.85p}{0.85p + 0.10(1-p)} = \dfrac{0.85p}{0.75p + 0.10} \)

Tree 2 — Vocab:   \( \text{Final posterior} = \dfrac{0.76\,r(p)}{0.76\,r(p) + 0.24\,(1 - r(p))} \)
Groups that understand the chain structure can write these expressions directly from their tree labels — no formula memorisation needed.
🌿 Phase 4 Tree Chain — Testing p = 0.25  ·  Final posterior: 89.97%  ✗ below 90%
TREE 1 — Detector p = 0.25 0.25 0.75 AI Human 0.85 0.15 0.10 0.90 AI ∩ Flagged 0.85×0.25 = 0.2125 ✦ AI ∩ Not flagged 0.15×0.25 = 0.0375 H ∩ Flagged 0.10×0.75 = 0.0750 ✦ H ∩ Not flagged 0.90×0.75 = 0.6750 r = 0.2125/0.2875 = 0.7391 becomes new root → TREE 2 — Vocab Diversity r = 0.7391 0.7391 0.2609 AI Human 0.76 0.24 0.24 0.76 AI ∩ Low vocab 0.76×0.7391 = 0.5617 ✦ AI ∩ Not low vocab 0.24×0.7391 = 0.1774 H ∩ Low vocab 0.24×0.2609 = 0.0626 ✦ H ∩ Not low vocab 0.76×0.2609 = 0.1983 0.5617/(0.5617+0.0626) = 89.97% ✗
Tree 1 result
AI∩F = 0.2125  ·  H∩F = 0.0750
r = 0.2125 ÷ 0.2875 = 0.7391
Tree 2 result
AI∩LV = 0.5617  ·  H∩LV = 0.0626
Posterior = 0.5617 ÷ 0.6243 = 89.97% — below 90%
🌿 Phase 4 Tree Chain — Testing p = 0.26  ·  Final posterior: 90.44%  ✓ exceeds 90%
TREE 1 — Detector p = 0.26 0.26 0.74 AI Human 0.85 0.15 0.10 0.90 AI ∩ Flagged 0.85×0.26 = 0.2210 ✦ AI ∩ Not flagged 0.15×0.26 = 0.0390 H ∩ Flagged 0.10×0.74 = 0.0740 ✦ H ∩ Not flagged 0.90×0.74 = 0.6660 r = 0.2210/0.2950 = 0.7492 becomes new root → TREE 2 — Vocab Diversity r = 0.7492 0.7492 0.2508 AI Human 0.76 0.24 0.24 0.76 AI ∩ Low vocab 0.76×0.7492 = 0.5694 ✦ AI ∩ Not low vocab 0.24×0.7492 = 0.1798 H ∩ Low vocab 0.24×0.2508 = 0.0602 ✦ H ∩ Not low vocab 0.76×0.2508 = 0.1906 0.5694/(0.5694+0.0602) = 90.44% ✓
Tree 1 result
AI∩F = 0.2210  ·  H∩F = 0.0740
r = 0.2210 ÷ 0.2950 = 0.7492
Tree 2 result
AI∩LV = 0.5694  ·  H∩LV = 0.0602
Posterior = 0.5694 ÷ 0.6296 = 90.44% — exceeds 90% ✓
Task 4b Finding the critical prior
Testing values numerically:
Test p = 0.25
\( r(0.25) = \dfrac{0.85 \times 0.25}{0.75 \times 0.25 + 0.10} = \dfrac{0.2125}{0.2875} \approx 0.7391 \)
Posterior \( = \dfrac{0.76 \times 0.7391}{0.76 \times 0.7391 + 0.24 \times 0.2609} \approx \dfrac{0.5617}{0.6244} \approx 0.8997 \approx 89.97\% \)

Test p = 0.26
\( r(0.26) = \dfrac{0.85 \times 0.26}{0.75 \times 0.26 + 0.10} = \dfrac{0.221}{0.295} \approx 0.7492 \)
Posterior \( = \dfrac{0.76 \times 0.7492}{0.76 \times 0.7492 + 0.24 \times 0.2508} \approx \dfrac{0.5694}{0.6296} \approx 0.9044 \approx 90.4\% \)
Minimum prior: \( p = \) 0.26   (i.e. 26% prior probability of AI use)
The exact threshold is \( p \approx 0.2506 \), so \( p = 0.25 \) gives a posterior just below 90% (89.97%) while \( p = 0.26 \) exceeds it (90.4%). The answer to 2 d.p. is 0.26.
Teaching note: This question reverses the usual direction — instead of computing a posterior from a given prior, students find what prior is needed to reach a decision threshold. It demonstrates that the posterior is highly sensitive to the prior, especially near decision boundaries.
Task 4c Reflection — what does this tell us about the model?
At our current prior of 18%, the posterior (85.5%) falls just short of the 90% threshold. The model is only one percentage point of prior away from triggering action. This reveals how sensitive the conclusion is to the assumed base rate.
If the school's estimate of AI use were updated from 18% to 26% — a plausible shift — the same detector output would now justify acting. The prior matters enormously.
Expected discussion: groups should recognise that "objectivity" of the algorithm depends entirely on the prior chosen. A Bayesian system is only as good as the prior it is given.