Part 06 — Adversarial Review Without a Peer Team

Previous: Part 05 — Analysis Without Institutional Support Next: Part 07 — Production and Writing for Non-Institutional Consumers

The institutional analyst writes inside a system that is structurally designed to challenge her conclusions before they are released. Drafts pass through line editors, branch chiefs, dissent channels, and — in mature services — formal red teams. The independent analyst has none of that. The draft moves directly from the working file to the published artefact, mediated only by the analyst’s own judgment.

This chapter is about closing that gap deliberately rather than pretending it does not exist.

1. Why the Absence of Peer Review Amplifies Risk

Fact: Institutional analytic production distributes the cognitive bias mitigation function across multiple humans and structural checkpoints. Solo production collapses that function onto a single operator.

This is not a quality-of-life observation. It is a structural vulnerability with predictable failure modes. Each of the following is a well-documented mechanism (Heuer, Psychology of Intelligence Analysis, 1999; UK PHIA Professional Head of Intelligence Assessment Common Analytical Standards) and each operates more aggressively in solo production:

Confirmation Bias is invisible to the analyst experiencing it. By definition, an analyst cannot detect, from inside her own reasoning, the systematic discounting of disconfirming evidence — because the discounting feels like ordinary evidentiary triage. An external reviewer, working from the same evidence base but without the analyst’s narrative investment, catches the pattern. The solo analyst lacks that observer.
Mirror Imaging is culturally conditioned. A reviewer from a different cultural, linguistic, or doctrinal background is more likely to flag the imposition of the analyst’s own logic onto the adversary or subject. Solo analysts working in their native cultural register cannot self-audit what is culturally invisible to them.
Overconfidence in novel interpretations is corrected institutionally by the friction of having to defend the interpretation to a skeptical chain. Solo analysts publish without that friction, and the rate of novelty-overconfidence is correspondingly higher.
Publication momentum. Once an analyst has invested two weeks of evidence collection in a hypothesis, the sunk cost generates psychological pressure to confirm rather than revise. Institutional pipelines absorb this by separating the analyst from the editorial decision; solo analysts must do both jobs and cannot — without deliberate intervention — separate the roles in time.

Assessment (high confidence): The paradox at the core of this chapter is that the absence of peer review makes disciplined adversarial self-review more important, not less. The common rationalisation in independent practice — “I’m experienced, I don’t need to review my own work that aggressively” — is a precise description of the cognitive conditions under which these biases operate unchecked. Experience does not immunise the analyst against confirmation bias; in many studies, it worsens the effect by increasing confidence without proportionally increasing calibration (Tetlock, Expert Political Judgment, 2005).

Gap: There is no body of literature on the differential failure rates of solo OSINT analysts versus institutional analysts on equivalent problems. We are reasoning from first principles and from the institutional cognitive-bias literature, which is robust but was developed against the assumption of multi-analyst pipelines.

The operational consequence: every solo analyst needs an explicit, written, repeatable adversarial review protocol. Improvisation is insufficient.

2. The Two-AI Adversarial Protocol (Adapted for Independent Use)

The most operationally powerful tool currently available to the independent analyst for compensating for the absence of a human peer team is the Two-AI Adversarial Protocol, adapted from the A2IC SOP (AI-Augmented Intelligence Cell methodology, UK Defence Intelligence lineage). The full technical architecture — model selection, context isolation, prompt scaffolding, evidence-handling — is documented in LLM-OSINT-SOP-A2IC. This chapter does not duplicate that material. It focuses on the analytical workflow and the discipline required to use the protocol seriously rather than performatively.

2.1 The Workflow

The protocol has three roles and three sessions. The discipline is in keeping them separate.

Analyst AI (working session). This is the session in which the assessment is drafted, with full access to the evidence base, collection notes, source-confidence matrix, and the analyst’s working hypotheses. The output of this stage is a finished draft assessment, not a work-in-progress.
Adversary AI (fresh session, different context, ideally different model family). A second AI instance is presented with the draft assessment only — no working notes, no collection base, no analyst commentary. The prompt is explicit and hostile.
Reconciliation (analyst-led). The analyst reviews the adversary AI output, scores each criticism, drafts written responses to each, and decides which require revision of the assessment.

2.2 The Adversary AI Prompt

The prompt is not “review this assessment.” That produces commentary, not adversarial review. The prompt must explicitly licence hostility and specify the failure modes to hunt for. The following has been tested in independent practice and consistently produces useful output:

You are a Senior Intelligence Reviewer with 20 years of experience and a strong
institutional bias toward skepticism. Your task is to critique the following
assessment. You are NOT trying to be constructive — you are trying to find
fatal flaws. Specifically:

1. List every assumption the assessment rests on that is not explicitly stated.
2. Identify every place where correlation is presented as causation.
3. Identify every place where absence of contradicting evidence is used as
   positive evidence.
4. List at least 3 alternative hypotheses the assessment dismisses or ignores
   entirely.
5. Rate the assessment's analytical discipline on a scale of 1 (commentary
   with facts attached) to 5 (rigorous structured analysis).
6. Identify the single weakest link in the analytical chain.

Be specific. Quote the assessment. Do not soften your critique. If the
assessment is weak, say so directly.

Why each clause matters:

“You are NOT trying to be constructive” — current frontier models default to constructive critique. Without explicit permission to be destructive, the output is filled with affirmations that dilute the signal.
Unstated assumptions — this is the highest-yield prompt element. Solo analysts routinely embed assumptions that feel like premises but are actually load-bearing claims.
Correlation/causation — catches the most common analytical failure in OSINT work, especially in attribution and intent assessments.
Absence as evidence — catches the second-most-common failure: “no public reporting suggests X” deployed as evidence that X is not happening, when the actual epistemic status is that nothing is known either way.
Three alternative hypotheses — forces the model to actively generate competing explanations rather than simply assessing the offered one. This is the ACH instinct (Analysis of Competing Hypotheses) imported into the review step.
Discipline rating 1–5 — the rating is calibration. A “3” in this scale is the median honest assessment of well-researched independent work. A “5” is genuinely rare. If the model returns “5” without prompting, the rating is performative and should be discarded; re-prompt.
Single weakest link — forces prioritisation. If the analyst can only fix one thing before publication, this is the target.

2.3 Technical Requirements for the Protocol to Actually Work

The protocol fails — silently and predictably — if any of the following are skipped:

Fresh context window, zero evidence access. The adversary AI must see only the draft assessment, not the collection notes. This mimics a real adversarial reviewer who does not have access to the analyst’s working file. If the adversary AI sees the evidence base, it will tend to confirm the analyst’s reading of that evidence rather than challenge the analytical structure. The whole point is to test whether the assessment stands on its own to a reader who does not already share the analyst’s working context.
Different model family where possible. A frontier model reviewing an assessment drafted by a model from the same family catches fewer novel failure patterns than one drawn from a different family — a leading model from another lab (e.g. OpenAI, Google, Anthropic, or a strong open-weight model) carries different training biases, different RLHF priors, and different blind spots. Using two instances of the same model is better than nothing but materially worse than cross-family review. As of writing, a reasonable pairing is one frontier model for drafting and a frontier model from a different family for adversarial review, or vice versa.
Documented output. The adversarial review output and the analyst’s written responses go into the analytical log (see §7). Undocumented reviews produce drift — the analyst remembers the criticisms she found easy to dismiss and forgets the ones she found uncomfortable.
No iteration on the adversary’s output. The analyst does not get to re-prompt the adversary AI to soften its critique. The first run is the record. Re-prompting to extract a more palatable review is the AI-era equivalent of asking the harshest reviewer to “consider revising the tone” — it defeats the purpose.

2.4 Reconciliation: Reading the Adversary AI Critically

The adversarial review is an input, not a verdict. AI adversarial reviewers are not infallible:

They can miss domain-specific context that an experienced human reviewer would catch.
They can produce false positives — criticisms that sound substantive but rest on the reviewer’s own factual errors or misreading.
They have systematic tendencies — e.g., over-flagging “absence of primary sources” even when primary sources are cited but in non-English; over-recommending “more diverse perspectives” as a substantive critique.

Operational rule: every criticism deserves a written response. The response can be a revision to the assessment, a documented rebuttal in the analytical log, or a noted gap to be addressed in a follow-on assessment. What it cannot be is silent dismissal. If the analyst cannot rebut a criticism in writing, the assessment needs revision before publication.

3. Pre-Publication Review Networks

The Two-AI protocol does not replace human peer review; it compensates for its absence. Independent analysts who can build even a thin layer of human review on top of the AI protocol produce materially stronger work.

3.1 Available Infrastructure

The OSINT and independent intelligence community is small, geographically distributed, and reachable. The mechanisms that work in practice:

Trusted-colleague reciprocal review networks. Three to five peers who agree to review each other’s pre-publication drafts on a reciprocal basis. This is the single highest-value non-institutional substitute for formal peer review available to the independent analyst. The relationships develop over years and require active maintenance.
Community infrastructure. OSINT Foundation, Global Investigative Journalism Network (GIJN) for the investigative-journalism overlap, First Draft Network / Meedan ecosystem for verification methodology, and subject-specific researcher communities (e.g., Bellingcat Discord, OSINT-focused Mastodon servers, regional researcher mailing lists). These provide ad-hoc review for specific technical questions — geolocation calls, attribution chains, source authentication — rather than full-draft review.
Pre-publication embargo with mainstream media. For significant investigations, approaching a mainstream publication for co-publication gives access to professional fact-checking standards and legal review. The tradeoff is editorial control and attribution. This is a high-cost, high-value option appropriate for major investigations, not routine assessments.
Academic peer review (slow path). For analyses with sufficient methodological substance, submission to peer-reviewed journals (e.g., Intelligence and National Security, International Journal of Intelligence and CounterIntelligence) provides the most rigorous review available, at the cost of 6–18 month timelines. Not suitable for time-sensitive intelligence reporting but valuable for methodological work and retrospective assessments.

3.2 Building a Reciprocal Review Network

The networks that actually work share four properties:

Reciprocity is the foundation. Offer reviews before requesting them, and offer reviews of equivalent quality to what you expect to receive. Networks that operate on extractive logic — one analyst constantly requesting review and rarely returning the favour — collapse within a year.
Clear scope expectations. Specify the review you want: fact-check, logical structure, source assessment, domain expertise gap-fill, or full adversarial read. Reviewers cannot provide all four in one pass, and unclear scope produces shallow reviews.
OPSEC discipline for pre-publication material. What can be shared with a reviewer before publication without compromising sources, ongoing investigation security, or third-party identities? This must be negotiated explicitly with each reviewer. Some material — investigations involving named non-public figures, material with source-protection considerations, work involving cooperation with law enforcement — cannot enter even a trusted review network. The reviewer must understand the constraint; the analyst must understand that some material will not be peer-reviewed and must compensate with stronger self-review.
Realistic timelines. Reviewers need lead time. Sending a 4,000-word investigation for review at 23:00 with a “publishing tomorrow morning” tag produces worse reviews than reviewing nothing, and damages the relationship. Minimum reasonable lead time for a substantive review: 72 hours for short pieces, one week for major investigations.

Assessment (high confidence): The single most underdeveloped capability among independent analysts is the deliberate construction of pre-publication review networks. The cost is high — years of relationship-building, sustained reciprocal investment — but the analytical return is greater than any tool, framework, or AI workflow can provide on its own.

4. Structured Self-Challenge Protocols

For occasions when no human reviewer is available and the AI adversarial protocol has been run, additional structured self-challenge mechanisms further reduce error rates.

4.1 The 24-Hour Rule

Finalise the evidence collection. Write the draft to completion. Set it aside for at least 24 hours. Return to it cold and read it as a skeptical external reader.

The mechanism: active drafting produces attentional capture — the analyst is reading her own argument in the order and frame in which she constructed it. The drafted text is internally coherent to the analyst because the analyst built the coherence. After 24 hours, the construction context has decayed enough that the text reads as text rather than as the externalisation of the analyst’s working memory. Logical gaps that were invisible during drafting become visible.

For investigations with hard time pressure (breaking-news adjacency), the rule degrades to a minimum 4-hour gap with a different activity (sleep, physical exercise, unrelated task) between drafting completion and re-read. The full 24-hour rule should be the default for any assessment that does not have a genuine publication deadline within the same news cycle.

4.2 The Hostile Reader Exercise

Read the draft as if you are:

The subject of the investigation (for investigative work focused on a named actor).
An analyst holding the opposite geopolitical priors (for conflict and geopolitical analysis).
The institutional defender of the position the draft attacks (for accountability journalism / corruption-focused work).

The question for each reading: where does the argument break down? Where would I, as this hostile reader, attack first?

This is distinct from the adversary AI protocol because the hostile reader exercise is positional rather than methodological. The AI critic looks for logical and evidentiary failures; the hostile reader looks for rhetorical and framing vulnerabilities that the subject or opposing analyst would weaponise. Both matter.

4.3 The Reversal Test

Write a 300-word brief arguing the opposite conclusion of your assessment. Use the same evidence base. Take it seriously — write it as well as you can.

Two outcomes:

You cannot write a coherent reversal. The hypothesis is either trivially correct or — more often — not actually a well-formed hypothesis. If no plausible alternative exists, the assessment is not adding analytical value; it is restating an established fact. Re-frame.
You can write a very convincing reversal. The evidence base is weaker than your draft assessment implies. Either strengthen the evidence base or reduce the confidence level of the assessment to reflect the genuine state of the evidence.

The reversal test is harder than it sounds. Most analysts attempting it for the first time produce a weak reversal because they are not actually arguing the opposite — they are producing the steel-manned version they would expect to demolish. The discipline is to write the reversal as if you genuinely held it.

4.4 The Confidence Ladder Audit

Review every confidence-tagged claim in the draft. For each:

“High confidence” — list the minimum number of independent sources supporting the claim. Confirm there is no single-source dependence. If a high-confidence claim rests on a single source — even an excellent source — the confidence level is wrong. Either find independent confirmation or downgrade to “Moderate confidence.”
“Moderate confidence” — confirm the claim has at least one supporting source and no significant contradicting evidence. If contradicting evidence exists, address it in-text or downgrade.
“Low confidence” — confirm the claim is genuinely contributing analytical value rather than padding the assessment with weakly-supported speculation. Low-confidence claims should be present only when the analytical structure requires acknowledging the possibility.
Untagged “Assessment” — assign a confidence level. Every assessment in the draft must have one. The act of forced tagging surfaces over-confident claims that the analyst would not have flagged voluntarily.

4.5 The Intelligence Gap Completion

Before publication, the draft must contain an explicit intelligence gaps section. If it does not, write one before going further. If it does, challenge it:

Are these the real gaps in my knowledge, or the gaps that are least embarrassing to admit?
Are there gaps I have rationalised away by treating them as outside the scope of the assessment, when in fact they bear directly on the assessment’s central claims?
Are there gaps I have not stated because stating them would weaken the publishable conclusion?

The intelligence gap section is the most important section of the assessment for the experienced reader. Analysts who present comprehensive-looking assessments without identified gaps are signalling either inexperience or epistemic dishonesty. Analysts who identify gaps clearly are signalling discipline. The reputational economics favour explicit gap-statement.

5. Mirror Imaging Detection

Mirror Imaging is, in the long view, the bias that has produced the largest number of catastrophic intelligence failures in the 20th and 21st centuries (Pearl Harbor, Tet, Yom Kippur, the 1998 Indian nuclear tests, the 2003 Iraq WMD assessments, the February 2022 estimates of Ukrainian collapse). It is the hardest bias for the solo analyst to detect because it is culturally invisible — the analyst cannot see her own cultural priors from inside them.

5.1 Operational Definition

Mirror imaging occurs when the analyst assumes the adversary will behave as the analyst would behave in the adversary’s position, rather than as the adversary will behave given the adversary’s actual strategic culture, historical experience, institutional incentives, operational doctrine, and risk tolerance.

The error is not in using empathy or perspective-taking — those are valuable. The error is in substituting the analyst’s own decision logic for the adversary’s, and presenting the substitution as analysis of the adversary.

5.2 Domain-Specific Manifestations

Geopolitics. Assuming PRC cost-benefit calculations reflect Western economic rationality and short-horizon discount rates, when PRC strategic culture operates on substantially different time horizons and weights regime-survival considerations Western analysts systematically under-rate. Assuming Russian decision-making follows Western escalation-management doctrine, when Russian doctrine treats escalation thresholds as instruments rather than constraints. Assuming Iranian foreign policy decision-makers respond to deterrence signals as Western policy actors would, when Iranian decision-making is structured around a different theology of legitimacy and a different domestic political economy.
Cyber CTI. Assuming nation-state APT groups have the same operational security priorities as commercial criminal hackers. Assuming attribution evidence is always planted deliberately (the “everything is a false flag” failure mode) or assuming it is never planted (the naive-attribution failure mode). Assuming attacker tradecraft will improve linearly over time — when in fact APT groups often allow tradecraft to degrade once a particular operation has achieved its access objective.
Financial crime / illicit finance. Assuming corrupt actors behave like Western-legal-framework financiers operating illicitly, when in many jurisdictions the actors operate within a system where the formal rules have never been the operative rules. Western OSINT analysts routinely mis-read jurisdiction-specific corruption patterns by importing the assumption that the formal legal framework is the baseline against which deviations are measured.
Fact-checking and disinformation analysis. Assuming audiences who believe false claims are simply uninformed, when in fact most belief in disinformation is motivated reasoning and the audience is not lacking information — they are processing it through identity-protective cognition. Counter-disinformation work that mirrors the audience as a rational-information-deficit actor systematically fails.

5.3 Countermeasures

Area-studies depth. The single most powerful mirror-imaging countermeasure is sustained area-studies investment in the adversary or subject’s strategic culture, history, and institutional logic. This is a years-long commitment, not a check-the-box task. Independent analysts who specialise in two or three regions and develop genuine area-studies depth produce better adversary analysis than generalists with broad superficial coverage.
Domain-expert consultation. When working outside the analyst’s area-studies depth, consultation with experts from the relevant cultural context. This is not the same as consulting Western academics who study the region; it is consulting people whose own strategic-culture instincts are shaped by the context being analysed.
Explicit articulation. Before applying any inference about adversary behaviour, write out — in one paragraph — the adversary’s strategic logic as the adversary understands it. If the paragraph reads like a Western strategic-studies department’s version of the adversary’s logic rather than the adversary’s actual self-understanding, the inference is mirror-imaging.
Native-language primary sources. The actor-language-tier discipline (see LLM-OSINT-SOP-A2IC and the CLAUDE.md actor-language-tiers reference). Reading the adversary’s official communications in the adversary’s primary language, even with translation assistance, surfaces framing deltas that the EN-only relay strips out. The framing deltas are often the most analytically valuable signal.

6. Managing Cognitive Bias in Extended Investigations

For investigations spanning weeks or months, the cognitive bias problem intensifies in identifiable ways. The mechanisms below are well-documented in the institutional literature and operate aggressively in solo work:

Sunk cost in a hypothesis. The more time the analyst has invested in a hypothesis, the harder it becomes to abandon it — even when accumulated evidence no longer supports it. Institutional analysis manages this by rotating analysts or by formal hypothesis-testing checkpoints. Solo analysts must impose checkpoints on themselves.
Familiarity bias in sources. Sources consulted repeatedly feel more reliable over time, not because their track records have improved, but because they are familiar. A source the analyst has read every week for six months acquires a perceived authority that is decoupled from its actual reliability. This is corrosive in extended investigations where source diversity tends to narrow over time as the analyst returns to the sources that “work” for the investigation’s frame.
Narrative coherence pressure. Partial evidence begins to feel like complete evidence once a coherent story has been constructed. The story-completeness is doing analytical work that the evidence itself does not actually support. This is the single most dangerous failure mode in long-form investigative work.
Source-network capture. Analysts conducting extended investigations develop relationships with sources, fellow researchers, and subject-matter contacts. These relationships generate access — and access generates obligation. Over time, the obligation pressures filter what the analyst is willing to publish.

6.1 Management Discipline

Periodic (weekly or biweekly for active investigations; monthly minimum for longer-cycle work) cold assessment of the investigation:

Write a one-page status assessment as if briefing someone who does not share the working context. The act of explaining the state of the investigation to a hypothetical naive reader exposes assumptions that have become invisible.
Rebuild the ACH matrix from scratch (Analysis of Competing Hypotheses). Do not edit the existing matrix; rebuild it. Hypotheses that were eliminated months ago may now be back in play given new evidence, or may have been eliminated on grounds that no longer hold up.
Re-evaluate the Priority Intelligence Requirements against current evidence. The PIR set drafted at investigation-start often diverges from what the investigation is actually answering by month two. Either update the PIR or refocus the investigation.
Source-base audit. Which sources have I consulted in the last 30 days? Has the diversity narrowed? Are there source categories I have stopped consulting because they were not producing material that fit the working hypothesis?

This discipline is uncomfortable. The first time an analyst rebuilds an ACH matrix and discovers that the working hypothesis is no longer the strongest hypothesis after eight weeks of investment, the cost is significant. The cost of not doing the discipline — publishing an assessment that the analyst’s own ACH no longer supports — is catastrophically larger.

7. Documentation as Analytical Accountability

The analytical log is the operational substrate of all of the above. Without documentation, none of the protocols in this chapter are auditable, repeatable, or defensible. The log serves four functions: quality control, legal defence, correction capability, and professional accountability.

7.1 Log Entry Minimums

For every assessment that reaches publication, the log must contain:

Date of assessment.
Evidence base at time of assessment — list of sources consulted, including those reviewed and rejected. Not just the sources cited in the publication; the full collection set.
Hypotheses considered — the ACH set, including eliminated hypotheses.
Elimination reasoning — for each eliminated hypothesis, the specific evidence or reasoning that eliminated it.
Confidence level at time of assessment — for the published conclusion and for the major sub-claims.
Indicators that would change the assessment — what new evidence, if it emerged, would force a revision? The act of writing this out explicitly is a discipline check; analysts who cannot articulate what would change their assessment are signalling that the assessment is not falsifiable.
Adversarial review output — the AI adversary review and the analyst’s responses; the human peer reviews (if any) and the responses.

7.2 When the Assessment Changes

If subsequent evidence forces a revision of a published assessment:

The log entry is updated with the triggering evidence and the reasoning for the revision.
The published artefact is updated with a dated correction note that is clearly visible — not buried in a footnote. The correction note states what changed and why.
The original published version is preserved (in archive or via versioning) so that the revision history is auditable.

Silent revision of published work — editing the original to fit the new view without a dated correction note — is one of the most damaging failures in independent intelligence practice. It destroys the audit trail, makes the analyst’s track record un-evaluable, and signals to sophisticated readers that the analyst is managing reputation rather than analysis. The credibility cost is permanent.

7.3 The Correction Discipline

A public correction issued promptly, with clear methodology and acknowledgment of the prior error, is an analytical asset, not a liability. Sophisticated readers — the audience that matters for independent intelligence analysis — treat the presence of a documented correction history as evidence of analytical seriousness. The reputational economics favour transparency.

The norm to internalise: a corrected analyst with a documented track record of corrections is more trustworthy than an analyst whose published work has never been corrected. The latter is either too cautious to make non-trivial claims or too defensive to acknowledge errors. Neither is the analyst whose work should be trusted on time-sensitive questions.

This norm is owed in significant part to Richards J. Heuer Jr’s articulation of the analyst’s professional obligation in Psychology of Intelligence Analysis — the obligation runs not to the analyst’s reputation but to the consumer of the analysis. Corrections serve the consumer; reputation-management serves the analyst.

Key Connections

Methodology series: Part 05 — Analysis Without Institutional Support · Part 07 — Production and Writing for Non-Institutional Consumers
Core frameworks: Analysis of Competing Hypotheses · LLM-OSINT-SOP-A2IC
Cognitive biases: Confirmation Bias · Mirror Imaging · Cognitive Warfare
Foundational thinker: Richards J. Heuer Jr

Intelligence notes

Explorer

Independent Intelligence Analysis — Part 06: Adversarial Review Without a Peer Team