Part 11 — Tools and Technology Stack

“A practitioner’s stack is a working artefact, not a wishlist. Every tool you adopt is a tool you must maintain, defend, and justify under analytical scrutiny.”

This chapter is the operational tool reference for the independent intelligence analyst. It does not catalogue every tool that exists — that work is done elsewhere, and is obsolete within a quarter. It catalogues the tools that an independent practitioner can realistically deploy, justify against an OPSEC threat model, and sustain at solo-operator scale, organised by analytical function.

The audience is the working analyst building or rebuilding a personal stack: a journalist, a researcher, a small consultancy operator, a solo open-source intelligence practitioner. The frame is operational. Each tool entry answers four questions: what does it do, what is the cost-tier and what do you get for it, what does using it cost you in OPSEC terms, and when do you prefer it to its alternatives.

For the foundational tool-by-tool walkthrough at beginner level, see OSINT Toolkit Essentials. This chapter does not duplicate that material — it sits one layer above, treating tools as components of an analyst’s working architecture rather than as standalone capabilities.

1. Design Principles for an Independent Analyst’s Stack

The independent analyst designs a stack under constraints that institutional environments do not face. A government intelligence officer or a corporate threat-intelligence team operates inside a procured stack, with vendor relationships, dedicated infrastructure staff, classified data feeds, and a budget envelope that absorbs $50,000 per-seat enterprise licences without flinching. The independent practitioner has none of that. The independent practitioner has a laptop, a credit card, and a finite weekly maintenance budget measured in hours, not in full-time engineering staff.

Four design principles govern selection. Each is a constraint, and each constraint forces an architectural decision before any specific tool is named.

1.1 Open-source preferred

Tools with auditable source code are preferable to black-box commercial tools, particularly for OPSEC-sensitive collection work. The reasons are operational, not ideological:

Auditability. You can inspect what an open-source tool does on your machine. With a commercial closed-source tool, you trust the vendor’s privacy policy and their implementation of it.
No vendor lock-in. Commercial vendors discontinue products, pivot business models, raise prices, or get acquired and have their tooling absorbed into enterprise suites that no longer serve independent users (the RiskIQ acquisition by Microsoft is a textbook case). An open-source tool you can fork, archive, and self-host outlives its vendor.
No budget risk. A free-tier commercial tool can move behind a paywall on a quarterly earnings call. Your stack should not have single points of catastrophic failure tied to another company’s product decisions.
Better community knowledge. Open-source tools have public issue trackers, public documentation, public abuse reports. You learn how a tool actually behaves in the field, not how its marketing department describes its behaviour.

This is a preference, not an absolute rule. There are categories — high-quality satellite imagery, certain corporate registries, the better neural machine translation systems — where the open-source alternative is materially weaker than the commercial product, and the commercial product is the right operational choice. The principle says prefer open-source when capability is comparable, not refuse commercial tools categorically.

1.2 OPSEC-compatible

Every tool that requires connecting to a third-party server with your analytical activity creates an OPSEC surface. The tool’s vendor sees what you search, what entities you investigate, what subjects you find interesting. For the independent analyst, this is not theoretical:

A practitioner who runs Shodan searches for industrial control systems in a specific country has told Shodan that they are interested in industrial control systems in that country.
A practitioner who reverse-image-searches a face on a Russian search engine has handed that face, and their interest in it, to that search engine’s owner.
A practitioner who routes their email through a free webmail service has agreed that the provider may scan that mail for advertising or training data.

Selection should be evaluated against the threat model developed in Part 08. Where the analytical subject is low-sensitivity (open-source historical research, public-policy analysis), permissive tool selection is acceptable. Where the subject is high-sensitivity (live conflict attribution, threat-actor profiling, work touching state interests), prefer local processing, hardened collection environments, and tools with strong data-handling policies. Where local processing is technically feasible without significant quality loss, prefer local processing.

The independent analyst’s OPSEC threat model is not the threat model of a clandestine intelligence officer; it is the threat model of a professional whose research subject may attempt to identify, profile, or retaliate against them, and whose work has commercial and reputational value that should not leak to competitors or training-data scrapers.

1.3 Maintainable at solo scale

Tools that require significant ongoing maintenance or have frequent breaking changes create operational drag. Every tool you adopt costs you maintenance time: updates, configuration drift, dependency churn, integration with other tools in your stack. At solo scale, the maintenance cost is paid out of your analytical time.

The failure mode is the analyst who spends Tuesday updating their self-hosted instances, Wednesday debugging a broken integration, Thursday refactoring their scraper scripts after a platform change, and Friday writing the deliverable they were supposed to write all week. Maintenance is not analysis. A stack that maximises analytical throughput minimises tool-tending time.

Three operational heuristics follow:

Adopt slowly. Resist the impulse to add a new tool for each new analytical need. Stretch existing tools first.
Standardise where possible. One reference manager. One knowledge base. One translation tool. Parallel tools fragment your attention and your archive.
Retire ruthlessly. A tool you have not used in six months is consuming maintenance overhead for no analytical return. Remove it.

1.4 Cost-justified

For commercial tools, analytical value must justify cost. A simple test: would you pay this much for this tool if you were paying out of your own pocket every month? You are.

Many free-tier tools are adequate for independent practitioners. Free-tier Shodan, free-tier Censys, free Sentinel Hub, free OpenSanctions, free OCCRP Aleph, free SEC EDGAR — these are not constrained substitutes for premium tools; they are professional-grade resources that institutions also use. The premium tiers add scale (higher rate limits, larger result sets, historical depth), not capability.

The highest-cost tools — Palantir, Babel Street, Refinitiv/LSEG World-Check, LexisNexis Risk Solutions, the enterprise tier of Maltego — are not accessible at independent-practitioner price points (entry pricing is typically $30,000–$250,000 per year), and are not the appropriate tools for independent practice in any case. Their architecture assumes a team with shared workspaces, an institutional data agreement framework, and infrastructure to absorb the data they produce. The independent practitioner who somehow accessed Palantir Gotham would discover they have purchased a Ferrari to drive on a dirt road.

2. Stack by Functional Category

The functional taxonomy below maps the analyst’s workflow: collection → infrastructure investigation → corporate/financial investigation → geolocation → image/video verification → document processing → knowledge management → AI augmentation → OPSEC. Tools are organised by what they let you do, not by what category they belong to in a vendor catalogue.

Pricing and tiers in this chapter were verified June 2026. Vendor pricing and free-tier limits drift faster than any other fact in a tool catalogue — confirm current cost and tier limits on the vendor’s own pricing page before relying on any figure below.

2.1 Collection and Feed Monitoring

Tool	Cost tier	OPSEC	Best for
Inoreader Pro	Freemium ($9.99/mo Pro)	Server-side; provider sees feed list	High-volume RSS + rule-based routing
Feedly	Freemium	Server-side; provider sees feed list	AI-filtered feed summarisation
Telegram Desktop	Free	Telegram metadata visible	Manual channel monitoring
tgstat.com / Telemetr.io	Freemium	Server-side; web access	Channel analytics, cross-post mapping
Google Alerts	Free	Server-side; very weak privacy	Crude name monitoring
Talkwalker Alerts	Freemium	Server-side	Better than Google Alerts
F5Bot	Free	Server-side	Reddit keyword monitoring

Inoreader Pro is the operational default for serious RSS-based collection. It is the strongest filtering and rule-based routing engine in the public market; the Pro tier ($9.99/month) supports rules that route feed items by keyword, source pattern, or topic into separate streams, which is the only way to manage a high-volume cross-domain feed architecture without drowning. The free tier is adequate below 500 feeds. The API is exposed for programmatic integration into downstream pipelines.

Feedly is the principal alternative. Comparable free tier; the Leo AI feature in paid tiers performs adequate priority filtering and summarisation, which can compress the morning triage. UX is more polished than Inoreader’s. The trade-off is weaker rule-based routing — Feedly is opinionated about how it thinks you should consume feeds, where Inoreader is configurable.

Telegram monitoring is the unsolved problem of the open-source collection toolkit. There is no single dominant tool. Operational approaches:

Manual monitoring via Telegram Desktop (free) — adequate for following 20–50 channels personally. Above that count, fatigue sets in and you start missing things.
Telegram Bot API for programmatic monitoring (free, requires scripting) — bots can join public channels and forward content to your own ingestion pipeline. Rate limits are non-trivial. Requires a phone number and self-hosted scripting.
tgstat.com and Telemetr.io (freemium) — channel analytics, cross-posting analysis, growth metrics, audience overlap. Best for understanding a network of channels, not for live monitoring of message content.
Channel discovery — use existing curated monitoring lists (Ukrainian war-reporter aggregations, Middle East tracker networks, regional aggregator channels) rather than building a list from scratch.

Google Alerts is crude but free. False-positive rate is high; coverage is patchy; latency is unpredictable. Adequate as a supplementary tripwire for less-covered names or topics where you want to know if anything gets indexed, but not reliable as a primary monitoring tool. Mention.com and Talkwalker Alerts are materially more capable; Talkwalker has a usable free tier.

F5Bot is the underrated tool in this category — free keyword monitoring across Reddit (and historically other forums). For analytical subjects with active community discussion, F5Bot surfaces conversation that does not reach mainstream RSS feeds.

2.2 Network and Infrastructure Investigation (Cyber CTI)

This is the category where free-tier commercial tools and free open-source tools are jointly so strong that an independent practitioner can match institutional capability for routine infrastructure investigation work. See BGP Routing and DNS Infrastructure for the analytical background.

Tool	Cost tier	OPSEC	Best for
Shodan	Freemium ($69/mo Freelancer)	Server-side; queries logged	Internet-exposed services, CVE search
Censys	Freemium (250 queries/mo free)	Server-side	Complement to Shodan; different scan methodology
GreyNoise	Freemium	Server-side	Distinguishing background scanning from targeted activity
MS Defender TI (ex-RiskIQ) / Validin	Free tier (check current access)	Server-side	Passive DNS, certificate history, infrastructure pivot
crt.sh	Free	Public log search	Certificate transparency, subdomain enumeration
ViewDNS.info	Free	Server-side	WHOIS history (free tier)
DomainTools	Paid-required	Server-side	WHOIS history (comprehensive, expensive)
CIRCL pDNS	Free (registration)	Server-side	Passive DNS lookups
Farsight DNSDB	Paid-required	Server-side	Comprehensive passive DNS
VirusTotal	Freemium	Server-side; uploads are searchable	File and URL reputation, passive DNS
BGP.tools / BGPView	Free	Server-side queries	ASN, BGP routing analysis
RIPE NCC RIS	Free	Server-side queries	BGP routing data

Shodan is the foundational tool. The free tier (2 saved searches, 100 results per query, no historical data) is adequate for occasional infrastructure lookups; the Freelancer tier ($69/month) unlocks full search, ~1M results/month, scanning and monitoring, more API calls, and commercial use, and is the right tier for any practitioner doing routine cyber-CTI work. (The higher Small Business and Corporate tiers — $359 and $1,099/month — add scan and result scale aimed at teams, not solo practice; note Shodan grandfathers existing subscribers’ pricing.) Shodan’s CVE search — querying by CVE identifier to find exposed vulnerable services — is the single most operationally useful feature for vulnerability-context investigations.

Censys uses a different scanning methodology than Shodan and frequently captures hosts and services that Shodan does not, and vice versa. Censys Community provides 250 free queries per month. Censys ASM (the enterprise attack-surface-management product) is not accessible to independent practitioners and is not necessary for independent work. The operational pattern is: run both Shodan and Censys on infrastructure questions and merge results. Their coverage is non-redundant.

GreyNoise solves a specific analytical problem: distinguishing internet background-scanning noise from targeted activity directed at a specific host. If you find an IP scanning your target, GreyNoise tells you whether that IP is scanning the entire internet (in which case it is uninteresting) or only your target (in which case it warrants investigation). The Community tier provides limited API calls; the web visualiser at viz.greynoise.io is free for IP lookups.

Microsoft Defender Threat Intelligence (Defender TI) is the successor to RiskIQ Community / PassiveTotal, folded into Microsoft’s security portal after the 2021 acquisition. The standalone RiskIQ Community free tier has been retired post-acquisition; a free/limited community tier of Defender TI still exists in the Defender portal but with reduced data depth, and access should be checked at time of use. It provides passive DNS, certificate history, WHOIS history, and infrastructure pivot — the four pillars of infrastructure investigation in a single interface. Validin offers a free community edition for researchers — passive DNS and historical DNS for threat-hunting and infrastructure tracking — and is the most useful current free replacement for the role RiskIQ Community used to play; SecurityTrails (now Recorded Future) offers a limited free trial but is primarily a paid product.

crt.sh is the certificate transparency log search engine. It is free, has no rate limits worth caring about, and is the canonical tool for subdomain enumeration via certificate logs. For any target domain, the first move is crt.sh?q=%.target.tld to enumerate every subdomain that has ever had a publicly logged certificate. Nothing else in the free toolkit is this productive on a per-query basis.

Passive DNS (the historical record of which domains have resolved to which IPs over time) is critical for infrastructure investigation and is the area where free tooling is most constrained. CIRCL’s passive DNS service (free with registration) provides moderate historical depth. VirusTotal’s passive DNS data (free with a login) is broad but shallow. Farsight DNSDB is the gold-standard commercial source and is priced for institutions. For the independent practitioner, the operational pattern is to triangulate CIRCL + VirusTotal + Validin (free community tier) for passive DNS, accept that you will sometimes miss historical resolutions, and document the limitation in your sourcing.

BGP analysis — BGP.tools, BGPView, RIPE NCC RIS — is essential for attributing infrastructure to organisations and jurisdictions. Knowing which ASN announces a prefix, who owns that ASN, where it peers, and how routing has changed over time is the bridge between “this IP is hosting X” and “X is operated from jurisdiction Y by organisation Z”. All three sources are free; their coverage is highly complementary.

2.3 Corporate and Financial Investigation

Tool	Cost tier	OPSEC	Best for
OpenSanctions	Free (API: free with registration)	Server-side	Sanctions screening, 180+ source aggregation
OpenCorporates	Freemium	Server-side	Global corporate registry search
ICIJ Offshore Leaks DB	Free	Server-side	Panama, Pandora, Paradise, Bahamas papers
OpenOwnership / BODS Register	Free	Server-side	Beneficial ownership
OCCRP Aleph	Free (registration)	Server-side	Investigative documents, oligarch research
SEC EDGAR / EFTS	Free	Server-side	US public company filings, full-text search
PACER	Paid-per-page	Server-side	US federal court records (primary)
CourtListener / RECAP	Free	Server-side	US federal court records (mirror)
USASpending.gov	Free	Server-side	US federal contracts, grants, loans
TED.europa.eu	Free	Server-side	EU public procurement

OpenSanctions is the most operationally useful free corporate tool in existence. It aggregates 180+ sanctions and political-exposure sources (OFAC SDN, EU Consolidated, UN, UK, Swiss SECO, Ukrainian SBU lists, Canadian SEMA, Australia DFAT, and dozens of national lists) into a single searchable database, updated daily. The API is free with registration. Every corporate investigation begins here. The data quality is good enough that institutional sanctions-screening vendors increasingly use OpenSanctions as one of their feeds.

OpenCorporates is the largest open corporate registry aggregator, covering 150+ jurisdictions. The free tier provides web search with rate limits; bulk and API access is paid. For independent practice, the web tier is usually sufficient. The structural value is being able to search for officers and directors across jurisdictions in a single query, which catches cross-border corporate networks that a per-jurisdiction search would miss.

ICIJ Offshore Leaks Database is the front-end to the Panama Papers, Pandora Papers, Paradise Papers, Bahamas Leaks, and Swiss Leaks. Free search and download. Essential for any investigation touching offshore structures, beneficial-ownership obfuscation, or wealth-concealment networks. The data is dated (the leaks are from specific years) but the entity and relationship graph remains analytically valuable for understanding network structure.

OpenOwnership aggregates beneficial-ownership data from national registries with growing global coverage. The UK PSC register, Denmark, Norway, and Ukraine (via OpenDataBot) are the most useful jurisdictions inside the OpenOwnership ecosystem. Beneficial-ownership registers in most other countries remain either non-existent, paywalled, or restricted to authorised users.

OCCRP Aleph is the investigative-journalism document and entity database operated by the Organized Crime and Corruption Reporting Project. Free for individual researchers with registration. Coverage is strongest for oligarch research, Eastern Europe, South America, and corruption networks. The corpus combines court records, corporate filings, media reports, leaks, and contributed datasets. Treat Aleph as the equivalent of an institutional analyst’s case-management system: a place to search across heterogeneous document collections and find the entity-of-interest references that no single source surfaces.

SEC EDGAR is the US Securities and Exchange Commission’s filing system. It is free and exhaustive for US public-company filings. The EDGAR Full-Text Search (efts.sec.gov), added in 2022, is dramatically more useful than the legacy filing-by-filing browse interface — you can now search the text of every 10-K, 10-Q, 8-K, proxy statement, and registration statement in a single query. For any investigation touching US public companies, EDGAR full-text search is the first stop.

PACER and CourtListener together cover US federal court records. PACER (the official Public Access to Court Electronic Records system) charges per-page fees; the fees are modest individually but cumulative across a large case. CourtListener, operated by the Free Law Project’s RECAP initiative, mirrors documents that researchers have purchased from PACER, making them available for free. The operational pattern is to check CourtListener first for any document, and only buy from PACER when CourtListener does not have it. For state court records, Trellis is the principal commercial aggregator, and state-specific portals exist with widely varying quality and access models.

USASpending.gov provides full-text search of US federal contracts, grants, and loans. Free. Essential for tracing government-contractor relationships, identifying which entities receive federal funds, and mapping contract-network structure. For EU public procurement, TED.europa.eu (Tenders Electronic Daily) is the equivalent free resource.

2.4 Geolocation and Satellite Imagery

Tool	Cost tier	OPSEC	Best for
SunCalc.net	Free	Server-side query	Shadow-angle geolocation, temporal verification
Sentinel Hub EO Browser	Free	Server-side	Copernicus Sentinel-2 (10m, 5-day revisit)
Google Earth Pro	Free	Server-side	Historical imagery, terrain, measurement
Planet Labs	Paid-required (academic via NICFI)	Server-side	Daily-revisit commercial (3m)
Maxar Open Data	Free (event-specific releases)	Server-side	30cm imagery for major events
Google Maps / Bing Maps	Free	Server-side	Comparative imagery, different revisits
What3words	Free	Server-side	Imprecise-location resolution

SunCalc.net is the irreplaceable geolocation tool. Enter a coordinate and a datetime; it returns sun azimuth and elevation. Shadow-angle analysis on a photograph — measuring the bearing and length of a shadow against a known object’s height — yields a sun-azimuth/elevation pair, which constrains the photograph’s location and time to a narrow band. SunCalc is the lookup that turns the measurement into a verification.

Sentinel Hub EO Browser is the free alternative to commercial satellite imagery. It provides browser access to the European Space Agency’s Copernicus Sentinel-2 imagery (10-metre resolution, 5-day revisit at the equator, free under Copernicus open-data policy). True-colour, false-colour, and specialised band combinations are available. The 10-metre resolution is inadequate for identifying individual vehicles or persons but is fully adequate for confirming the presence of constructed infrastructure, troop concentrations, damage extent, and large-scale ground change. For independent conflict OSINT and geopolitical analysis where commercial imagery is unavailable, Sentinel-2 is the analytical baseline.

Google Earth Pro (free since 2015) is the historical-imagery workhorse. Its archive of historical imagery in specific areas often spans a decade or more, which is sometimes irreplaceable for before-and-after analysis. The 3D terrain, measurement tools, and overlay capability make it the standard tool for terrain matching during geolocation work.

Planet Labs is the commercial standard for high-cadence imagery: daily revisit, 3-metre resolution, global coverage. No meaningful free tier exists for operational use. Academic and NGO access is available through Planet’s NICFI (Norway’s International Climate and Forests Initiative) programme, which is restricted to tropical-forest research. The commercial cost is significant — typical institutional licences run into five figures annually. Independent practitioners who need Planet imagery for specific work occasionally collaborate with an institutional partner who has access; this is the only realistic path.

Maxar Open Data Program releases high-resolution imagery (30-centimetre) for significant events — major disasters, escalating conflicts, attention-grabbing geopolitical incidents — on an ad-hoc basis. It is not a routine monitoring tool, but the quality when available is exceptional and the licence is permissive for analytical use.

Google Maps and Bing Maps use different imagery sources, captured on different dates. Comparing them — and comparing both with Google Earth’s archive — frequently reveals additional resolution or different temporal coverage. Bing has periodically shown higher-resolution imagery than Google in specific areas, particularly in parts of Eastern Europe and Central Asia.

What3words is a geocoding system that converts coordinates into three-word identifiers. The analytical utility is the reverse: resolving imprecise location descriptions in social media (where a witness says “the explosion was near the bridge by the market”) into a checkable coordinate. Not a collection tool, but a useful translation layer between human and machine geographic vocabulary.

2.5 Image and Video Verification

Tool	Cost tier	OPSEC	Best for
InVID/WeVerify	Free (browser extension)	Server-side	Video keyframe extraction, multi-engine reverse search
Google Reverse Image	Free	Server-side; query logged	Reverse image — global content
Yandex Images	Free	Server-side; Russian jurisdiction	Reverse image — Russian, Eastern European, faces
Bing Visual Search	Free	Server-side	Reverse image — additional coverage
TinEye	Freemium	Server-side	Reverse image — exact-match focused
FotoForensics	Free	Server-side; upload	JPEG manipulation detection (ELA)
ExifTool	Free (CLI, local)	Local	Comprehensive metadata extraction
yt-dlp	Free (CLI, local)	Local on machine; server-side for download	Video preservation, 1,000+ site support

InVID/WeVerify is the verification workhorse — a browser extension developed by an EU research consortium specifically for journalistic and analytical verification. Keyframe extraction from video, simultaneous reverse-image search across Google/Yandex/Bing/TinEye/Baidu, EXIF and metadata viewing, image magnification, video forensic indicators. Free. There is no realistic substitute. Every analyst doing image or video verification should have InVID/WeVerify installed.

Reverse image search is a multi-engine practice, not a single-engine practice. The engines have meaningfully different indices:

Yandex significantly outperforms all others for Russian, Ukrainian, and broader Eastern European content, and for face matching in general. The strong face-matching capability is operationally valuable and ethically fraught — see Part 10 for ethical constraints.
Google is adequate for global English-language content and has the largest overall index for general images.
Bing Visual Search sometimes produces results the others miss, particularly for commercial imagery and certain regions.
TinEye is exact-match focused — useful when you need to know precisely where the same image (not similar images) has been published.

The operational pattern is to run all four (InVID/WeVerify automates this) and merge results.

FotoForensics (fotoforensics.com) provides Error Level Analysis (ELA) for JPEG manipulation detection. ELA is not a definitive forensic test — it identifies areas of an image with different compression histories, which often correlates with editing but can also occur for benign reasons (saved-twice, different software). Treat ELA results as a prompt for closer examination, not as proof of manipulation.

ExifTool is the comprehensive metadata extraction tool, written by Phil Harvey, free and open-source, command-line. It handles essentially any file format and extracts every metadata field present. The standard for metadata forensics. Available in most Linux package managers; the canonical reference distribution is the author’s website. Run on any image or document you receive before analysing it.

yt-dlp is the open-source successor to youtube-dl, supporting downloads from 1,000+ video sites. Free. Essential for preserving video evidence before deletion — and given the rate at which sensitive video is removed by platforms or by uploaders under pressure, preservation discipline is not optional. Pair with a hash-and-archive workflow (compute SHA-256 on download, store hash and timestamp in your case notes) for chain-of-custody-grade preservation.

2.6 Document Analysis and Language

Tool	Cost tier	OPSEC	Best for
DeepL	Freemium (free API: 500k chars/mo; paid from ~$10/mo)	Server-side	High-quality translation; EU + AR/ZH/JA/KO
Google Translate	Free	Server-side	Broadest-language gisting (incl. FA and long-tail)
pdfminer.six / pdfplumber	Free (Python, local)	Local	PDF text and table extraction (programmatic)
Tesseract OCR	Free (CLI, local)	Local	OCR, 100+ languages
GROBID	Free (self-hosted)	Local	Academic-paper structured extraction
Camelot / Tabula	Free (local)	Local	PDF table extraction to structured data

DeepL is the best neural machine translation available for European languages. German, French, Spanish, Italian, Portuguese, Russian, Polish — DeepL’s output is materially better than Google Translate’s for these languages, and the gap matters when analytical decisions depend on precise reading of source material. DeepL has also substantially expanded its language coverage: Arabic, Chinese (Simplified and Traditional), Japanese, and Korean are now fully supported, so the older “European-only” framing no longer holds — for these languages DeepL is now a competitive option rather than absent. The free API tier covers 500,000 characters per month (the free web translator is more limited, ~50,000 chars/month with per-translation caps); paid web/Pro plans start at roughly $10/month (Starter) and remove the limit. For an independent analyst doing serious multilingual collection, a paid DeepL plan is one of the highest-return subscriptions in the stack.

Google Translate retains the broadest raw language coverage — useful for Persian and the long tail of lower-resource languages where DeepL still has no support — and is a strong cross-check for Arabic, Chinese, Japanese, and Korean now that DeepL also covers them. Free. Use it for gisting (understanding the substance of source material) rather than for quoting (citing language verbatim). For quotation, find a human translator or, at minimum, get a second machine-translation pass (run DeepL and Google against each other) and a native-speaker spot-check.

The multilingual collection rule from Part 03 — that you read primary state-actor sources in their native language whenever possible — depends entirely on having competent translation tooling.

Programmatic document processing (pdfminer.six, pdfplumber, Tesseract, GROBID, Camelot, Tabula) is the toolkit for analysts who routinely process large document collections. The case for learning enough Python to run these tools — even at the level of adapting example scripts — is strong: institutional analysts have research-engineering teams that do this work; the independent analyst either does it themselves or does not do it at all.

pdfminer.six / pdfplumber for text extraction from PDFs.
Tesseract OCR (paired with Poppler / pdf2image for PDF page rendering) for scanned documents.
GROBID for structured extraction from academic papers — title, authors, abstract, sections, references — into clean structured records. Self-hosted; one-time setup; high return for any analyst processing literature.
Camelot and Tabula for PDF table extraction. Regulatory filings, financial statements, official reports, and most government documents contain tables that are useless until extracted into machine-readable form. These tools do that extraction.

2.7 Knowledge Management and Research Organisation

Tool	Cost tier	OPSEC	Best for
Obsidian	Free (personal) / paid Sync optional	Local-first	Analyst’s vault-as-intelligence-system
Zotero	Free	Local + optional sync	Reference management, PDF annotation
Joplin	Free	Local + E2E-encrypted sync	Simpler alternative to Obsidian

Obsidian is the reference standard for the independent analyst’s knowledge management foundation. Markdown-based (so the underlying files are plain text, future-proof, and grep-able); local-first (no cloud lock-in; your vault lives on your filesystem); graph visualisation; bidirectional backlinks; tag hierarchies; an extensive plugin ecosystem. Free for personal use. The vault-as-intelligence-system pattern — where the vault is not a notes-app but a working knowledge architecture organised by analytical function, with frontmatter-driven metadata, wikilinked entity references, and disciplined section structure — is the foundation on which the rest of the independent practice is built. See Obsidian for Intelligence Analysis for the operational architecture.

Zotero is the reference manager. PDF annotation, automatic bibliographic metadata capture (via the Zotero Connector browser extension), citation export in any major style, shared-library support, full-text search across the collection. Free and open-source. The complement to Obsidian: Zotero holds the documents and their formal bibliographic records; Obsidian holds your analytical work, with wikilinks into the Zotero corpus where citations matter.

Joplin is the open-source, end-to-end-encrypted note-taking alternative for analysts who want strong encryption-at-sync without Obsidian’s graph and plugin complexity. Free. Adequate for sensitive operational notes that should not live in a typical cloud notes service. The trade-off is feature minimalism — Joplin is not a vault-as-intelligence-system; it is a notes app with good encryption.

2.8 AI Augmentation for the Independent Analyst

The realistic assessment of LLM utility under independent-analyst constraints is documented in LLM-OSINT-SOP-A2IC. The summary below is for tool-selection purposes.

What LLMs genuinely improve for the solo independent analyst:

Translation quality and domain-specific terminology. LLMs handle technical military, legal, and financial vocabulary better than general MT engines, particularly for low-frequency terms-of-art that DeepL and Google Translate render literally.
Bulk processing of unstructured text. Telegram channel archive analysis, document-collection summarisation, multi-document entity extraction — tasks that scale linearly with text volume and that no single analyst can do by hand in reasonable time.
Adversarial review. The two-AI protocol from Part 06 — using one model to draft and a separate model to critique — substitutes (imperfectly) for the institutional red-team function.
Report drafting scaffolding. BLUF, key judgements structure, executive summaries, alternative-hypothesis framing — LLMs produce serviceable first drafts of structured analytical product that the analyst then revises and tightens.
Entity extraction from large document sets. Pulling person/organisation/place references out of a corpus into a structured list, suitable for downstream graph-construction or for sanctions cross-checking.

What LLMs do not replace:

Source verification. LLMs confabulate. Verification requires primary-source cross-check. Any citation generated by an LLM is presumed-fabricated until you have personally verified the source exists, says what the LLM claims, and is the authority the LLM implies.
Geolocation. LLMs cannot reliably interpret satellite imagery, perform shadow-angle analysis, or read terrain. Multimodal models perform poorly on this category and overstate their confidence.
Attribution. LLMs do not have access to current threat intelligence. Their pre-training data is temporally stale (months to years behind), and they have no live signal on actor infrastructure, current campaigns, or recent reattributions.
Confidence calibration. LLMs systematically overstate confidence. Output that reads as “highly probable” should be treated as a hypothesis, not a finding.

Tool selection for AI augmentation:

Option	OPSEC posture	Hardware	Cost
Claude (Anthropic API, ZDR endpoint)	Strong (zero data retention if enabled at org level)	None special	Pay-per-token, typically $20–100/mo
Gemini API / Vertex AI (current Gemini generation, ZDR)	Strong	None special	Subscription or pay-per-token
Local — Ollama with a current ~70B-class open-weights model (latest Llama or Qwen generation) / Mistral Large	Strongest	High-end GPU or large RAM	Hardware-only after acquisition
Local — smaller models (current ~7–8B-class open-weights, e.g. latest Llama/Qwen)	Strongest	Consumer GPU	Hardware-only

The trade-off is straightforward: local deployment offers the strongest OPSEC posture but requires hardware investment and accepts a meaningful capability gap relative to frontier hosted models. A ~70B-class open-weights model (the latest Llama or Qwen generation) and Mistral Large at full precision require 40+ GB of GPU memory, which is workstation-class hardware — and most current frontier open models are sparse Mixture-of-Experts, so the active-parameter count that determines inference speed is far smaller than the total weight count that determines memory footprint. Quantised versions run on more accessible hardware with quality degradation that varies by task. For analysts on consumer hardware, Claude API with zero-data-retention (configured at the organisation level) is the best balance of capability, OPSEC, and cost.

Budget realism: at typical independent-analyst usage volume — say, 2–3 investigations per month plus a weekly newsletter — Claude API costs run $30–50 per month at current pricing. A heavy bulk-processing analyst could reach $100–200 per month. This is an order of magnitude cheaper than any institutional analytics platform and within the operational budget of any seriously commercial independent practice.

2.9 OPSEC Toolchain

Tool	Cost tier	Purpose
Signal	Free	Encrypted messaging, source comms
Proton Suite (Mail / Drive / VPN)	Freemium ($9.99/mo paid)	E2E-encrypted mail/storage/VPN
KeePassXC	Free	Local-only password manager
YubiKey	Paid ($25–55 one-time)	Hardware 2FA
Tailscale	Free (Personal plan: 6 users / unlimited devices — check current limits)	WireGuard mesh for personal infra
Librewolf / Firefox + arkenfox	Free	Privacy-hardened collection browser
Qubes OS	Free (high learning curve)	VM-compartmentalised OS

Signal is the standard for encrypted messaging and source communications. Free, open-source, operated by the Signal Foundation (a non-profit). Use the desktop and mobile clients together (linked via QR code) so that you can handle long-form source material from the keyboard rather than from a phone. Signal’s metadata posture is the strongest in the consumer messaging ecosystem; treat it as the default for anything sensitive.

Proton Suite — ProtonMail, Proton Drive, ProtonVPN — provides end-to-end-encrypted email, cloud storage, and VPN in a single Swiss-jurisdictioned ecosystem. Mail free tier is adequate for basic use; paid plans ($9.99/month) add storage, custom-domain support, and more aliases. ProtonVPN’s free tier is genuinely usable — no bandwidth limits, limited server selection — which makes it one of the few free VPNs an analyst can actually rely on. The Proton ecosystem is not the only way to achieve these capabilities, but the integration across services is operationally convenient and the jurisdiction is favourable.

KeePassXC is the OPSEC-optimal password manager: free, open-source, local-only, no cloud sync, no vendor dependency. The encrypted database file is yours; sync between devices is achieved by syncing the file via Syncthing or Proton Drive if you want multi-device access, or by accepting single-device discipline. The trade-off relative to a cloud password manager (1Password, Bitwarden) is convenience: KeePassXC requires you to handle your own backup and sync, whereas cloud managers do it for you while taking on the OPSEC cost of holding your full credential set.

YubiKey is the hardware security key for 2FA. $25–55 depending on model. One-time cost. The operational benefit is total elimination of the SIM-swap and SMS-2FA attack vector for high-value accounts: once a YubiKey is enrolled, account compromise requires physical possession of the key, not just compromise of the phone number associated with the account. For any account whose compromise would be analytically or commercially significant, YubiKey enrolment is non-negotiable. Buy at least two keys (a primary and a backup) so that key loss does not lock you out.

Tailscale is zero-configuration WireGuard mesh networking for connecting personal devices and self-hosted infrastructure. The free Personal plan covers up to 6 users with unlimited user devices (as of the April 2026 pricing change; check current free-tier limits before relying on them). For analysts running self-hosted tools — Obsidian sync, local AI inference, private databases, personal n8n automation — Tailscale enables secure remote access between devices without exposing services to the public internet. The architectural value is that your private analytical infrastructure stays private: it is reachable from your laptop, your phone, and your home server, but not from the internet at large.

Librewolf (a privacy-hardened Firefox fork) or Firefox with the arkenfox user.js provides a collection-browsing environment separate from your public/personal browsing identity. Free, open-source. The operational separation matters: your collection browser should not share fingerprint, cookies, or session state with your personal browsing. Sandbox the collection identity.

Qubes OS is the compartmentalisation operating system: every application runs in an isolated Xen virtual machine, and you organise work into colour-coded VMs (a “work” VM, a “personal” VM, a “research” VM, a “vault” VM) with controlled inter-VM file transfer. The strongest desktop OPSEC architecture available. The learning curve is high; the hardware overhead is real (Qubes wants 16+ GB RAM and a fast SSD). Qubes is appropriate for analysts who face persistent adversarial interest — investigative journalists working on organised crime or state-actor subjects, for example. For most independent practitioners working on lower-sensitivity subjects, Qubes is overkill, and the maintenance burden makes it operationally counter-productive.

3. Tool Selection Decision Framework

When a tool is needed for a category, the selection decision is a function of seven variables. The matrix below makes the trade-offs explicit.

3.1 Decision criteria

Criterion	What it captures
Open-source vs. proprietary	Auditability, lock-in risk, fork-ability
Cost tier	Free / freemium / paid-required / enterprise-only
OPSEC risk	Server-side processing vs. local; vendor data handling; jurisdiction
Coverage / quality	Index size, source breadth, output fidelity
Maintenance burden	Update cadence, breaking changes, integration cost
Community support	Documentation, issue trackers, peer expertise
Alternative available	Substitutability — is there a fallback?

The principles from Section 1 set the weights: open-source is preferred on auditability and lock-in grounds; OPSEC risk is weighted up where collection touches sensitive subjects; maintenance burden is weighted up at solo scale; coverage is weighted up where institutional alternatives would obviously win on capability but are out of reach.

3.2 Reference selection matrix

Category	Recommended tool	OS / Prop	Cost	OPSEC	Best for	Alternative
Feed aggregation	Inoreader Pro	Proprietary	$9.99/mo	Moderate	Rule-based routing	Feedly
Telegram monitoring	Telegram Desktop + tgstat	Mixed	Free / freemium	Moderate	Channel coverage	Bot API self-host
Internet device scan	Shodan	Proprietary	$69/mo (Freelancer)	Moderate	CVE search, services	Censys
Passive DNS	CIRCL + VirusTotal + Validin	Mixed	Free	Moderate	Domain history	DNSDB (paid)
Certificate transparency	crt.sh	Open access	Free	Low	Subdomain enumeration	Censys certs
BGP / ASN	BGP.tools	Open access	Free	Low	Routing analysis	RIPE RIS, BGPView
Sanctions screening	OpenSanctions	Open	Free	Low	Daily-updated aggregation	Direct OFAC SDN
Corporate registry	OpenCorporates	Open data	Free	Low	Global officer search	Per-jurisdiction registries
Offshore	ICIJ Offshore Leaks	Open	Free	Low	Panama/Pandora pivot	OCCRP Aleph
Investigative docs	OCCRP Aleph	Open (registration)	Free	Low	Cross-source entity search	National FOIA
US public filings	SEC EDGAR (EFTS)	Open	Free	Low	Full-text 10-K/8-K	—
US court records	CourtListener + PACER	Mixed	Free / per-page	Low	Federal dockets	Trellis (state)
Free satellite	Sentinel Hub EO Browser	Open data	Free	Low	10m, 5-day revisit	Google Earth Pro
Commercial satellite	Planet Labs	Proprietary	Paid	Moderate	Daily 3m	Maxar Open Data (event)
Solar position	SunCalc.net	Open	Free	Low	Shadow-angle geo	—
Video verification	InVID/WeVerify	Open (EU consortium)	Free	Moderate	Keyframe + multi-engine RIS	—
Reverse image	Yandex + Google + Bing	Proprietary	Free	Moderate	Multi-engine merge	TinEye
Image forensics	FotoForensics	Closed (web)	Free	Server-upload	ELA	Self-hosted tools
Metadata	ExifTool	Open	Free	Local	All formats	—
Video archive	yt-dlp	Open	Free	Local + server	1,000+ sites	—
Translation (high-quality)	DeepL (paid)	Proprietary	~$10/mo+	Server-side	DE/FR/ES/RU + AR/ZH/JA/KO quality	Google Translate
Translation (broadest coverage)	Google Translate	Proprietary	Free	Server-side	FA + long-tail-language gist	Claude/Gemini
OCR	Tesseract	Open	Free	Local	100+ languages	Cloud Vision APIs
PDF tables	Camelot / Tabula	Open	Free	Local	Structured extraction	—
Reference mgmt	Zotero	Open	Free	Local + opt sync	PDF annotation, citations	—
Knowledge base	Obsidian	Closed source (free)	Free	Local-first	Vault architecture	Joplin
LLM (hosted)	Claude API (ZDR)	Proprietary	$30–100/mo	Strong (ZDR)	Drafting, review, extraction	Gemini API / Vertex AI
LLM (local)	Ollama + current ~70B-class open-weights (latest Llama/Qwen)	Open weights	Hardware	Strongest	Sensitive bulk processing	Mistral Large
Encrypted messaging	Signal	Open	Free	Strong	Source comms	—
Email + VPN	Proton Suite	Mixed	Freemium	Strong	E2E mail, VPN	Tutanota + Mullvad
Password manager	KeePassXC	Open	Free	Strongest	Local-only credentials	1Password (cloud)
Hardware 2FA	YubiKey	Proprietary	$25–55 once	Strongest	SIM-swap mitigation	—
Personal mesh VPN	Tailscale	Closed source (free tier)	Free	Strong	Self-host access	Headscale (FOSS)
Hardened browser	Librewolf	Open	Free	Strong	Collection identity	Firefox + arkenfox
Compartmentalised OS	Qubes OS	Open	Free	Strongest	VM isolation	—

The matrix is a starting point, not a prescription. Where your work has unusual requirements — extreme sensitivity, specific jurisdictional coverage, a particular collection subject — substitute accordingly. The point of the matrix is that each substitution should be a deliberate selection against the seven criteria, not a default.

4. What Not to Build / What Not to Buy

Independent analyst stacks fail in characteristic ways. The four most common failures are catalogued below.

4.1 Overbuilding infrastructure

The most common failure is the analyst who spends more time maintaining their stack than using it. Self-hosting is seductive: every commercial tool has an open-source equivalent that you can run yourself, and at first glance each self-host saves a subscription fee. In practice, the maintenance cost of self-hosted tools on single-person infrastructure is real, and it is paid in the currency that matters most — analytical hours.

The diagnostic question: how much time per week do you spend on infrastructure work (updates, debugging, integration, backups, monitoring)? If the answer is more than a few hours, your stack is over-built relative to your analytical throughput. The corrective is to retire self-hosted tools whose cloud equivalent is cheap, reliable, and OPSEC-acceptable for the work in question.

The exception is infrastructure whose existence is non-negotiable for OPSEC reasons — local AI inference for sensitive collection, a private knowledge base, an air-gapped credential store. For these, self-hosting is the architecture; the maintenance cost is a fixed cost of the threat model.

4.2 Enterprise tools without enterprise budgets

Palantir Gotham. Babel Street. Refinitiv/LSEG World-Check. LexisNexis Risk Solutions. Maltego enterprise. Recorded Future. ZeroFox. The list of tools that cost $30,000–$250,000 per seat per year is long, and the temptation to find a way to access them — through trial accounts, through institutional partnerships, through grey-market arrangements — is recurrent.

The temptation should be resisted on two grounds. First, the price point makes sustained access impossible for an independent practitioner; the work becomes dependent on a tool you cannot keep, which is a structural fragility. Second, and more importantly, these tools are not the right tools for independent practice. They are architected for institutional teams with workspace-sharing, data-governance, and downstream integration with classified or proprietary feeds. The independent practitioner who somehow accessed them would discover that their analytical workflow does not align with the tool’s assumptions and that the marginal capability over the free-tier and open-source stack is smaller than the marketing implies.

The successful independent practice is built on the free-tier and low-cost commercial stack documented above. It is not a constrained substitute for the institutional stack; it is a different architecture that fits a different operating model.

4.3 Scraper-heavy architectures without legal review

Building custom scrapers for platforms with Terms of Service restrictions creates legal exposure that the independent practitioner is poorly positioned to absorb. LinkedIn, Facebook, Instagram, X/Twitter, TikTok — all major platforms have explicit ToS restrictions on automated collection, and several have pursued litigation against scraping operations. The legal landscape is unsettled (see hiQ v. LinkedIn and its aftermath, where the legal status of scraping public profiles oscillated through multiple appellate decisions), but “the law is unsettled” is not a safe operational posture for a solo practitioner without a litigation budget.

The operational alternatives:

API-based access where the platform offers it, even on restrictive terms. The trade-off is rate limits and content gaps; the benefit is legal cover.
Research partnerships with institutional researchers who have data-access agreements (academic institutions, journalism collectives). The trade-off is collaboration overhead; the benefit is access through a legitimate channel.
Manual collection at reasonable cadence for high-value targets, with documented research purpose. Manual collection at human-scale rate is not what platform legal teams pursue.
Use of platform-archive services (Bellingcat’s tooling, Wayback Machine, archive.today) which have established themselves under fair-use and research-purpose frameworks.

The risk-acceptance posture is: do not build infrastructure whose ongoing operation creates legal exposure you cannot absorb if the platform escalates.

4.4 Trusting free tools with sensitive data

Free tiers of commercial tools are not gifts. The vendor’s economic model has to pay for the infrastructure, and free-tier user activity is one of the inputs to that model — through advertising, through aggregate data sales, through training-data collection, or through the funnel into paid conversion.

For low-sensitivity collection, this is acceptable. For sensitive collection, it is not. The discipline is to classify each tool by data handling:

Local processing: data does not leave your machine (ExifTool, Tesseract, Camelot, KeePassXC, Ollama, local Joplin/Obsidian).
Server-side with strong data handling: vendor has clear, contractually-binding data-handling commitments (Proton, Signal, Claude with ZDR enabled, paid Tailscale).
Server-side with permissive data handling: vendor’s free-tier terms allow them to use your activity broadly (Google Translate, free Inoreader/Feedly, free reverse-image services, many other freemium tools).

Match the tool’s data-handling tier to the sensitivity of the data you put through it. There is nothing wrong with running Google Translate on a public press release. There is everything wrong with running it on a confidential source’s identifying details.

5. Maintenance and Version Control for Your Stack

A stack is a working artefact, and like any artefact it accumulates entropy. Maintenance discipline keeps the stack useful instead of letting it decay into a collection of half-configured, half-broken, half-remembered tools.

5.1 Document your stack

Maintain a private reference note — in your private knowledge base, not in the public layer — that records for each tool: name, version (where applicable), installation method, configuration paths, account / API credentials (referenced by name to your password manager, never inline), licensing tier and renewal date, and the analytical purpose it serves. The discipline is not optional. The cost of forgetting which version of a tool you had configured how, six months ago when it last worked, is paid the next time it breaks.

Keep it in a private, never-published note in your operations folder.

5.2 Track breaking changes

For tools you depend on, subscribe to release notifications:

GitHub releases (use the “watch → custom → releases” feature for repositories of tools you rely on).
Vendor announcement channels — change blogs, status pages, email release notes.
Community channels for major tools — the Obsidian forum, OSINT-tool community Discord servers, relevant subreddits.

The point is not to read every release note but to be alerted that a release happened, so that the discovery of a breaking change is not when it breaks your workflow mid-investigation.

5.3 Backup configurations

Configurations are part of the stack. Store them in a private, encrypted backup:

Browser profiles (with credential stores extracted into the password manager, not in browser storage)
Obsidian vault configuration (the .obsidian/ directory: workspace, plugins, themes)
Zotero library and settings
Tool-specific configurations — ExifTool config files, custom Inoreader rules, automation-workflow definitions, Jupyter kernel configs
Shell configurations — .bashrc, .zshrc, custom scripts, aliases
Editor configurations — vim/neovim/VSCode settings

A modest discipline like a daily encrypted snapshot to a self-hosted backup target (Restic to a Tailscale-connected NAS, for example) protects against the bad outcomes — laptop loss, drive failure, configuration corruption — that otherwise wipe out months of stack tuning.

5.4 Evaluate the stack annually

Once a year, audit the stack:

Remove tools you no longer use. A tool unused for six months has no analytical claim on your maintenance time.
Identify capability gaps. What kind of work did you turn down or do badly in the last year because you lacked the right tool? Address it.
Identify maintenance over-spend. Which tools cost the most maintenance relative to analytical return? Replace or retire.
Re-check OPSEC posture. Vendors change data-handling policies. The tool you adopted on strong terms two years ago may be operating under weaker terms today.
Re-check cost. Subscription prices drift upward. Each paid tool should still pass the “would I pay this if it were new?” test.

The annual audit is the discipline that prevents a stack from becoming what every long-running analyst’s stack tends to become: a strata of tools accumulated over time, half of which are no longer optimal for current work and none of which are actively maintained.

6. Closing Reflection — The Stack as Analytical Asset

The tool stack is not the analyst. The judgement is the analyst.

This chapter has catalogued tools because the question “what should I use” is asked frequently and deserves a serious answer. But the answer should be held in proportion. An analyst with weak judgement and the best stack in the world produces poor analysis with elaborate infrastructure. An analyst with strong judgement and a modest stack produces strong analysis. The tools are amplifiers; what they amplify is whatever capability is already there.

The independent practitioner’s competitive position is built primarily on judgement, subject-matter depth, source-cultivation, and analytical discipline. The stack is in service of those. A practitioner who chooses tools with the principles in Section 1, organises them by function as in Section 2, selects deliberately against the criteria in Section 3, avoids the failures in Section 4, and maintains them as in Section 5 — that practitioner has done the necessary work and can return to the analytical work itself.

The stack is finite, knowable, and instrumental. The analysis is unbounded, demanding, and the point of the practice.

Key Connections

Series cross-links

Reference notes referenced from this chapter

Concepts

Intelligence notes

Explorer

Independent Intelligence Analysis — Part 11: Tools and Technology Stack