Friend or Foe? Language as an Ideological Switch in Open-Weight LLMs
Full title: Language as an ideological switch in open-weight LLMs under Russian disinformation stress
Authors: Anna Małgorzata Kamińska, Tetiana Klynina
Published: 2026-06-07
arXiv: 2606.08512
Source: arXiv (cs.CY, cs.CL)
Abstract
As Russia’s war against Ukraine extends into generative AI, large language models (LLMs) adapted for local post-Soviet languages are deployed in contested information environments. Policy and industry discourse assumes that culturally aligned adaptation encodes the political orientation of the target community: a Ukrainian-oriented model will resist Russian narratives, a Russian-oriented one will reinforce them. Does it? This article systematically disconfirms that assumption. We run a controlled audit of four openly available LLMs sharing a common base model but fine-tuned for different linguistic communities, querying them in Ukrainian, Russian and English across ten contested wartime narratives: Crimea, “denazification”, the “one people” thesis, and atrocity denial at Bucha and Mariupol. The result is a Fine-Tuning Paradox: the Ukrainian-oriented model shows the weakest resistance to Russian disinformation in Russian, while the Russian-oriented one exhibits the strongest rejection. Corpus composition, language coverage and prompt format prove more decisive than nominal cultural provenance. We situate these findings within debates on hybrid warfare, digital sovereignty and post-imperial information orders, arguing that the principal threat to regional information sovereignty is not adversarial fine-tuning but the untested assumption that cultural alignment guarantees resilience.
Why This Work Matters
This paper introduces the Fine-Tuning Paradox — a counterintuitive and empirically validated finding with direct implications for information warfare in post-Soviet digital spaces. The assumption that “culturally aligned = ideologically resilient” underlies national AI procurement and platform governance decisions in Ukraine and across Eastern Europe. The paradox invalidates that assumption and reframes the vulnerability.
The Russia-Ukraine theater makes this directly relevant to the ongoing conflict: the AI layer of the information environment is not behaving as policymakers and defense planners expected. The findings have immediate implications for Ukrainian digital sovereignty policy and for any state seeking to deploy local-language LLMs as information-resilience tools.
Core Concepts and Contributions
Fine-Tuning Paradox: In controlled testing across ten contested wartime narratives, the Ukrainian-oriented LLM showed the weakest resistance to Russian disinformation when queried in Russian, while the Russian-oriented model showed the strongest rejection. Cultural provenance inverts expected ideological alignment under cross-language stress.
Corpus composition as primary determinant: The decisive variables are not nominal cultural alignment but training corpus language coverage, data balance, and prompt format. Models with thin Russian-language training are brittle to Russian-language queries regardless of their ostensible political orientation.
Ten tested narratives: Crimea status, “denazification” framing, “one people” thesis, and atrocity denial at Bucha and Mariupol — the core contested claim-set of Russian information operations. This selection anchors the audit in operationally relevant content.
Post-imperial information orders: The paper situates its findings in a structural argument about digital sovereignty in the post-Soviet space, where linguistic infrastructure remains contested alongside territorial and political dimensions of the conflict.
Connections
- Information Warfare — the broader domain
- Disinformation Campaign — Russian IO methodology tested
- Russia-Ukraine War — the active conflict this paper directly investigates
- Russian Federation — principal actor driving the disinformation narratives tested
- Ukraine — primary target and field site
- Cognitive Warfare: Definition, Framework, and Case Study — theoretical framework that contextualizes these findings