Teacher Answers — The Flagged Essay

	AI-assisted	Human-written	Total
Flagged by detector	153	82	235
Not flagged	27	738	765
Total	180	820	1000

1

Reading the Evidence — Priors & Likelihoods

AHL 4.13 · Conditional probability from a table · Vocabulary: prior, likelihood, hypothesis

Task 1a Prior probabilities

Reading directly from the totals row:

\( P(\text{AI}) = \dfrac{180}{1000} = \) 0.18

\( P(\text{Human}) = \dfrac{820}{1000} = \) 0.82

These are called prior probabilities — our belief about an essay's origin before any evidence from the detector is considered.

Task 1b Likelihoods — how the detector behaves

\( P(\text{Flagged} \mid \text{AI}) = \dfrac{153}{180} = \) 0.85

\( P(\text{Flagged} \mid \text{Human}) = \dfrac{82}{820} = \) 0.10

\( P(\text{Not flagged} \mid \text{AI}) = \dfrac{27}{180} = \) 0.15

\( P(\text{Not flagged} \mid \text{Human}) = \dfrac{738}{820} = \) 0.90

These are called likelihoods — how probable each piece of evidence is, given a particular hypothesis. Note: likelihoods are not the probability of the hypothesis.

Task 1c Interpreting the detector

The detector correctly flags 85% of AI essays — this is its sensitivity (true positive rate). It incorrectly flags 10% of human essays — this is its false positive rate.

If a student is flagged, this does not immediately mean P(AI | flagged) = 0.85. Groups should articulate why: the denominator includes both AI essays that were flagged and human essays that were wrongly flagged.

Teaching note: Push groups to distinguish between P(Flagged | AI) = 0.85 and P(AI | Flagged). Many students conflate these. This is precisely the misconception Bayes resolves in Phase 2.

2

The Bayes Flip — First Encounter with Bayes' Theorem

AHL 4.13 · Tree diagram (student-constructed) → posterior → formula as formalisation

First-exposure note: Students have not seen Bayes' theorem before this activity. The design is intentional — they construct the tree from scratch on the VNPS (Task 2a), arrive at the posterior using tree logic alone (Task 2b), verify it against the frequency table (Task 2c), and only then encounter the formula as a formalisation of what they already did (Task 2d). Do not show or name the formula before Task 2d. Let groups struggle productively with Task 2b — the "wait, really?" moment at 65.1% is the pedagogical core of this phase.

Task 2a Tree diagram — fully labelled (student-constructed on VNPS)

Students build this from scratch using their Phase 1 values. The expected tree has two first-level branches (AI-assisted, Human-written) with prior probabilities, and four second-level branches (Flagged / Not flagged from each) with likelihood probabilities. All four end-node products should be written and sum to 1.

🌿 Tree Diagram — Visual Correspondence with the Bayes Formula

Tree → Formula mapping

Numerator = AI ∩ Flagged path product = 0.153

Denominator term 2 = Human ∩ Flagged = 0.082

Full denominator = sum of ✦ end-nodes = 0.235

Teaching note

The formula is not separate from the tree — it is the tree written algebraically. The denominator collects every path that ends in the observed evidence. The numerator is the single path corresponding to the hypothesis of interest. Students who understand this mapping can set up Bayes from a tree without memorising the formula.

Task 2b Posterior probability — the Bayes flip

An essay has been flagged. Applying Bayes' theorem:

Step 2 — Apply Bayes' theorem

\( P(\text{AI} \mid \text{Flagged}) = \dfrac{P(\text{Flagged} \mid \text{AI}) \cdot P(\text{AI})}{P(\text{Flagged})} \)

\( = \dfrac{0.85 \times 0.18}{0.235} = \dfrac{0.153}{0.235} \)

\( \approx \) 0.6511 (≈ 65.1%)

Even though the detector flagged the essay, there is only a 65.1% probability it is genuinely AI-assisted. This is the posterior probability — our updated belief after incorporating the evidence.

The "wait, really?" moment: Students typically expect this to be close to 85% (or even 100%). The large gap arises because 82 human-written essays are also flagged — they dilute the pool of flagged essays significantly. This is the Bayes surprise.

Task 2c Verify from the frequency table

From the table: 235 essays were flagged in total. Of these, 153 were genuinely AI-assisted.

\( P(\text{AI} \mid \text{Flagged}) = \dfrac{153}{235} \approx \) 0.6511 ✓

This confirms the Bayes calculation. Groups can use the table as a sanity check for all Phase 2 answers.

Task 2d Formula formalisation — Bayes' theorem introduced

After groups have computed the posterior from their tree, present the formula and ask them to label every part using their tree diagram on the board:

\( P(H_1 \mid F) = \dfrac{P(F \mid H_1) \cdot P(H_1)}{P(F \mid H_1) \cdot P(H_1) \;+\; P(F \mid H_2) \cdot P(H_2)} \)

Expected labelling:

· Numerator = the end-node product for the path they wanted (AI ∩ Flagged = 0.153)

· Each denominator term = one path product ending in "Flagged" (0.153 and 0.082)

· Full denominator = sum of all end-node products that end in the observed evidence = 0.235

Substituting: \( \dfrac{0.153}{0.153 + 0.082} = \dfrac{0.153}{0.235} \approx \) 0.6511 ✓

Key consolidation point: The formula is not a new method — it is the tree written algebraically. The denominator always collects every path that ends in the observed evidence. The numerator is the single path for the hypothesis of interest. Students who see this mapping can reconstruct the formula from any tree, without memorising it. This is the insight to draw out during consolidation from the walls.

3

Sequential Update — The Posterior Becomes the Prior

AHL 4.13 · Bayes with three events · Second signal: vocabulary diversity

Task 3a Tree diagram — updated prior, vocabulary diversity signal

Students draw a fresh tree using their Phase 2 posterior as the new root probabilities. The structure is identical to Phase 2 but with updated numbers. Expected values:

\( P(H_1) = 0.6511 \) · \( P(H_2) = 0.3489 \) · \( P(\text{Low vocab} \mid H_1) = 0.76 \) · \( P(\text{Low vocab} \mid H_2) = 0.24 \)

The key observation to draw out: the tree is structurally identical to Phase 2 — same two hypotheses, same two outcomes — but the root has changed. This is the sequential update made visible. The posterior from one tree becomes the prior for the next.

🌿 Phase 3 Tree — Updated Prior, Vocabulary Diversity Evidence

✦ Low vocab paths — what we need

AI ∩ Low vocab: 0.4948

Human ∩ Low vocab: 0.0837

Denominator: 0.4948 + 0.0837 = 0.5786

Posterior: 0.4948 ÷ 0.5786 ≈ 0.8553

What students should notice

The tree structure is identical to Phase 2 — two hypotheses, two outcomes, four end-nodes. Only the numbers changed. This is the power of the sequential update: the method is always the same, only the prior shifts.

LV = Low vocab · NLV = Not low vocab

Task 3b Updated posterior — both signals combined

Let \( r = P(\text{AI} \mid \text{Flagged}) = 0.6511 \) from Phase 2. This is our new prior.

Step 1 — New denominator (total probability of low vocab, given flagged)

\( P(\text{Low vocab} \mid \text{Flagged}) = 0.76 \times 0.6511 + 0.24 \times 0.3489 \)
\( = 0.4948 + 0.0837 = 0.5786 \)

Step 2 — Apply Bayes again

\( P(\text{AI} \mid \text{Flagged} \cap \text{Low vocab}) = \dfrac{0.76 \times 0.6511}{0.5786} \)

\( = \dfrac{0.4948}{0.5786} \approx \) 0.8553 (≈ 85.5%)

The vocabulary signal moves the posterior from 65.1% to 85.5% — a meaningful update. Both signals together provide substantially stronger evidence than either alone.

Teaching note: This sequential update is equivalent to applying Bayes' theorem directly with the joint evidence (assuming conditional independence of the two signals given the hypothesis). Stronger groups can verify this algebraically.

Task 3c Interpretation

With both signals present, there is an 85.5% probability the essay is AI-assisted and a 14.5% probability it is human-written. The committee has stronger grounds to act — but meaningful uncertainty remains.

Expected response: groups should notice the posterior has increased significantly but is still not certain. Strong groups may question whether the two signals are truly independent — vocabulary diversity and AI writing style may be correlated, which could affect the calculation.

4

The Threshold — Sensitivity of the Posterior to the Prior

AHL 4.13 · Posterior as a function of prior · Numerical solution

Task 4a Two-tree chain — general form

Students draw two linked trees with \( p \) at the root of Tree 1. Expected general expressions:

Tree 1 — Detector: \( r(p) = \dfrac{0.85p}{0.85p + 0.10(1-p)} = \dfrac{0.85p}{0.75p + 0.10} \)

Tree 2 — Vocab: \( \text{Final posterior} = \dfrac{0.76\,r(p)}{0.76\,r(p) + 0.24\,(1 - r(p))} \)

Groups that understand the chain structure can write these expressions directly from their tree labels — no formula memorisation needed.

🌿 Phase 4 Tree Chain — Testing p = 0.25 · Final posterior: 89.97% ✗ below 90%

Tree 1 result

AI∩F = 0.2125 · H∩F = 0.0750

r = 0.2125 ÷ 0.2875 = 0.7391

Tree 2 result

AI∩LV = 0.5617 · H∩LV = 0.0626

Posterior = 0.5617 ÷ 0.6243 = 89.97% — below 90%

🌿 Phase 4 Tree Chain — Testing p = 0.26 · Final posterior: 90.44% ✓ exceeds 90%

Tree 1 result

AI∩F = 0.2210 · H∩F = 0.0740

r = 0.2210 ÷ 0.2950 = 0.7492

Tree 2 result

AI∩LV = 0.5694 · H∩LV = 0.0602

Posterior = 0.5694 ÷ 0.6296 = 90.44% — exceeds 90% ✓

Task 4b Finding the critical prior

Testing values numerically:

Test p = 0.25

\( r(0.25) = \dfrac{0.85 \times 0.25}{0.75 \times 0.25 + 0.10} = \dfrac{0.2125}{0.2875} \approx 0.7391 \)
Posterior \( = \dfrac{0.76 \times 0.7391}{0.76 \times 0.7391 + 0.24 \times 0.2609} \approx \dfrac{0.5617}{0.6244} \approx 0.8997 \approx 89.97\% \)

Test p = 0.26

\( r(0.26) = \dfrac{0.85 \times 0.26}{0.75 \times 0.26 + 0.10} = \dfrac{0.221}{0.295} \approx 0.7492 \)
Posterior \( = \dfrac{0.76 \times 0.7492}{0.76 \times 0.7492 + 0.24 \times 0.2508} \approx \dfrac{0.5694}{0.6296} \approx 0.9044 \approx 90.4\% \)

Minimum prior: \( p = \) 0.26 (i.e. 26% prior probability of AI use)

The exact threshold is \( p \approx 0.2506 \), so \( p = 0.25 \) gives a posterior just below 90% (89.97%) while \( p = 0.26 \) exceeds it (90.4%). The answer to 2 d.p. is 0.26.

Teaching note: This question reverses the usual direction — instead of computing a posterior from a given prior, students find what prior is needed to reach a decision threshold. It demonstrates that the posterior is highly sensitive to the prior, especially near decision boundaries.

Task 4c Reflection — what does this tell us about the model?

At our current prior of 18%, the posterior (85.5%) falls just short of the 90% threshold. The model is only one percentage point of prior away from triggering action. This reveals how sensitive the conclusion is to the assumed base rate.

If the school's estimate of AI use were updated from 18% to 26% — a plausible shift — the same detector output would now justify acting. The prior matters enormously.

Expected discussion: groups should recognise that "objectivity" of the algorithm depends entirely on the prior chosen. A Bayesian system is only as good as the prior it is given.