Vulnerabilities of Human Cognition

Posted Apr 26, 2026

By PakCrypt

35 min read

1. Introduction: cognition as the contested terrain

The most consequential attack surface of the early twenty-first century is not silicon but neural. Where twentieth-century information warfare targeted communication channels, contemporary adversaries target the inferential machinery that processes those channels — and they do so with industrial precision. In January 2024, a finance employee at the British engineering firm Arup wired HK$200 million across fifteen transactions after a video call in which deepfaked reconstructions of his CFO and colleagues, generated from public footage, instructed him to act. In March 2022, a fabricated Zelensky surrender video appeared on hacked Ukrainian outlets within weeks of Russia’s invasion. In September 2023, two days before Slovakia’s parliamentary election, an AI-generated audio of Progressive Slovakia’s Michal Šimečka discussing electoral fraud spread during the legal silence period. None of these operations would have succeeded without exploiting specific, well-documented features of human cognition — authority bias, fundamental attribution error, the illusory truth effect, and the negativity-weighted reactivity of System 1.

This paper argues that human cognition contains a stable inventory of vulnerabilities, that these vulnerabilities operate at three distinguishable layers — individual, collective-emergent, and cultural — and that AI now permits their exploitation at unprecedented scale, personalization, and velocity. The biases are not new; the asymmetry between attacker and defender is. Bak-Coleman et al. (2021, PNAS) characterize human collective behavior under digital communication as a crisis discipline comparable to conservation biology: poorly understood, accelerating in importance, and lacking the evidentiary base needed for stewardship. We synthesize that literature with the foundational psychology of bias, the philosophy of emergence, and the 2020–2025 cognitive-warfare canon (Claverie & du Cluzel, 2022; Goldstein et al., 2023; Park et al., 2024) to produce an integrated framework usable by both security practitioners and cognitive scientists.

The paper proceeds as follows. Section 2 establishes the theoretical foundations: whether biases are evolutionary adaptations, byproducts, or both, and what emergence means for collective cognition. Section 3 catalogues twenty-nine individual-layer vulnerabilities. Section 4 covers six emergent collective-layer phenomena. Section 5 covers four civilizational/cultural-layer vulnerabilities. Section 6 examines AI-specific amplification mechanisms. Section 7 outlines the cognitive-security policy frontier. Section 8 concludes.

2. Theoretical foundations

2.1 Are cognitive biases adaptations or spandrels?

The debate over the evolutionary status of cognitive biases organizes into three positions. Adaptationists — particularly Gigerenzer’s ecological-rationality program (Gigerenzer, Todd & the ABC Research Group, 1999; Gigerenzer & Brighton, 2009) — hold that heuristics such as Take-the-Best and the recognition heuristic outperform Bayesian models under uncertainty and small samples; what Kahneman and Tversky labeled “biases” are often artifacts of mismatched task framing. Cosmides and Tooby’s (1992) cheater-detection module, demonstrated through reframings of the Wason selection task that lift performance from ~10 % to 65–80 % when conditional rules are presented as social contracts, exemplifies the modular-adaptationist case. Haselton and Nettle’s (2006) Error Management Theory generalizes this: under uncertainty with asymmetric error costs, selection favors decision rules biased toward the cheaper error, producing the “paranoid optimist” — a cognitive architecture of pervasive, functional misperceptions including auditory looming bias and predator-detection asymmetries.

Byproduct theorists trace their lineage to Gould and Lewontin’s (1979) “Spandrels of San Marco” critique of pan-adaptationism. Buss et al. (1998) provide the canonical taxonomy distinguishing adaptations, exaptations, and spandrels with explicit evidentiary standards (special design, complexity, reliable development). Within cognitive science of religion, Boyer (2001) and Atran (2002) treat supernatural and conspiracy cognition as byproducts of evolved theory-of-mind, hyperactive agency detection, and teleological reasoning — mechanisms selected for other functions and co-opted by content that fits their input conditions.

The most productive framework for the present analysis is the evolutionary mismatch hypothesis (Li, van Vugt & Colarelli, 2018, Current Directions in Psychological Science; Giphart & van Vugt, 2018; van Vugt, Colarelli & Li, 2024). Mechanisms calibrated to small-group, face-to-face Pleistocene ecologies misfire when scaled to algorithmic information environments where social signals are detached from reputation, reciprocity, and embodied context. Mercier and Sperber’s (2017) argumentative theory of reasoning sharpens this argument: reasoning evolved for the production and evaluation of arguments in dialogical interaction, not solitary truth-tracking. Confirmation bias, motivated reasoning, and the lazy generation of justifications are therefore adaptive features of an interactionist mechanism that malfunctions when deployed against curated, homophilic feeds.

The mature consensus is pluralist: some biases (snake/heights/contamination, sexual overperception, looming asymmetry) satisfy strict adaptationist criteria; some (religious cognition, conspiracy ideation) are best modeled as byproducts; and many of the classical Kahneman–Tversky phenomena are best understood as ancestral adaptations operating in environments they were never selected for. This last category — mismatch — is the analytical center of gravity for the rest of this paper.

2.2 What is “emergence” in collective behavior?

The bridge from individual biases to civilizational pathology runs through the concept of emergence. Following Bedau (1997) and Chalmers (2006), we distinguish weak emergence — macro-states derivable from microdynamics only by simulation, exhibiting irreducibility in practice but not in principle — from strong emergence, which posits in-principle irreducibility and downward causation (vulnerable to Kim’s 1999 causal-exclusion argument). For the empirical phenomena treated here, weak emergence is sufficient and philosophically defensible.

The complex-systems lineage from Anderson’s (1972) “More Is Different” through Holland (1998), Kauffman (1995), and Mitchell (2009) supplies the formal toolkit: phase transitions, self-organized criticality, power-law cascades. Castellano, Fortunato and Loreto’s (2009) Reviews of Modern Physics survey applies this machinery to opinion dynamics, language competition, and crowd behavior. Bikhchandani, Hirshleifer and Welch’s (1992) information-cascade model proves that even fully Bayesian agents observing predecessors’ actions ignore their own private signals after a short prefix, producing fragile and possibly incorrect emergent conformity without any individual irrationality. Granovetter’s (1978) threshold model demonstrates that collective outcomes depend non-monotonically on the distribution of individual thresholds; populations with similar mean preferences can produce starkly different macro-outcomes. Sloman and Fernbach’s (2017) community of knowledge and Hutchins’s (1995) distributed cognition complete the picture: cognition is properly a property of socio-technical systems, and the illusion of explanatory depth (Rozenblit & Keil, 2002) lets individuals confuse community knowledge for personal understanding — a microfoundation for political polarization and resistance to belief revision.

The pipeline is therefore: individual cognitive mechanisms → interaction on algorithmically structured networks → emergent collective pathologies. Galesic et al. (2023, J. Royal Society Interface) extend this to hybrid human–AI collectives, the regime in which the rest of this paper situates itself.

3. Individual-layer vulnerabilities

We catalogue twenty-nine individual-layer biases, providing the standardized definition, the seminal empirical foundation, and one or two contemporary AI-age exploitation archetypes for each. The biases cluster naturally into five functional families: belief-formation biases, social-influence heuristics, attentional and memorial biases, decision-theoretic biases, and self-attribution biases.

3.1 Belief-formation biases

Confirmation bias. Nickerson (1998, Review of General Psychology) defines confirmation bias as the unwitting selectivity by which evidence is sought, weighted, and recalled in alignment with prior belief. Wason’s (1960) 2-4-6 rule-discovery task established the phenomenon empirically: participants generated almost exclusively confirming triples and announced incorrect, overly specific rules. Modern Bayesian re-evaluations (Hahn & Harris, 2014) recast some apparent failures as rational under prior structure, but the core phenomenon survives. Exploitation: Cambridge Analytica’s psychographic micro-targeting (2014–2018) used OCEAN profiles imputed from up to 87 million Facebook users to deliver belief-congruent advertising; Kosinski, Stillwell and Graepel’s (2013, PNAS) demonstration that Facebook Likes predict sexual orientation at 88 % accuracy provided the methodological foundation. The technique is empirically supported (Matz et al., 2017, PNAS) even though Cambridge Analytica’s specific electoral causal impact is contested. Internal Facebook test accounts (Frances Haugen disclosures, 2021) showed algorithmic recommendation pushed a politically conservative profile into QAnon groups within two days.

Illusory truth effect. Hasher, Goldstein and Toppino (1977) showed that truth ratings rise monotonically with statement repetition; Fazio, Brashier, Payne and Marsh (2015, JEP: General) demonstrated this occurs even when participants demonstrably know the correct answer (“knowledge neglect”). The mechanism is fluency-mediated: repeated exposure increases processing ease, which is misattributed to truth. Exploitation: Russia’s “firehose of falsehood” doctrine (Paul & Matthews, 2016, RAND) explicitly weaponizes volume plus repetition across RT, Sputnik, and Telegram, producing illusory truth even for contradictory narratives — documented for MH17 (2014), the Skripal poisoning (2018), and the 2022 Bucha massacre. AI-generated content collapses the marginal cost of repetition toward zero.

Processing fluency / “truthiness.” Reber and Schwarz (1999) showed that merely raising color contrast on statements like “Osorno is in Chile” increases truth ratings, isolating fluency from repetition. McGlone and Tofighbakhsh (2000) found rhyming aphorisms judged more accurate than non-rhyming paraphrases. Exploitation: Kreps, McCain and Brundage (2022, J. Experimental Political Science) found GPT-2-generated news rated nearly as credible as authentic NYT articles; Nightingale and Farid (2022, PNAS) found StyleGAN faces more trustworthy than real ones. LLM-generated propaganda is grammatically and rhetorically polished by construction, raising fluency-based credibility independent of factual content.

Belief bias. Evans, Barston and Pollard (1983, Memory & Cognition) demonstrated that syllogism validity judgments are biased by conclusion believability, with logic × belief interactions stronger for invalid syllogisms. Exploitation: Pennycook and Rand (2019, Cognition) and Pereira, Harris and Van Bavel (2023) document partisan acceptance of logically weak claims with congenial conclusions. RLHF-trained chatbot sycophancy (Sharma et al., 2024) reinforces this loop: LLMs tailor reasoning to user-believable conclusions, supplying motivated reasoners with technically polished “logic” for prejudgments.

Motivated reasoning. Kunda’s (1990, Psychological Bulletin) review established that directional motives selectively activate beliefs and inferential strategies likely to yield desired conclusions, constrained by the need to construct apparently reasonable justifications. Exploitation: Lewandowsky, Ecker and Cook (2017) document organized actors (Heartland Institute, anti-vax influencers) supplying technical-sounding justifications that motivated audiences uncritically adopt. Bakshy, Messing and Adamic (2015, Science) found Facebook’s News Feed algorithm and individual choice jointly produce ideologically congenial diets — fuel for motivated reasoning.

Cognitive dissonance reduction. Festinger’s (1957) theory holds that inconsistent cognitions generate aversive arousal motivating belief change; Festinger and Carlsmith (1959) showed participants paid only $1 to lie about a boring task internalized the lie more than those paid $20. Festinger’s ethnography of UFO doomsday cults (When Prophecy Fails, 1956) showed disconfirmed believers intensified proselytizing. Exploitation: QAnon’s repeated prophecy failures (March 2018, January 6, 2021) produced deeper commitment among adherents (Bloom & Moskalenko, 2021); pig-butchering crypto scams (FBI IC3 2023: $4.57 billion in losses) extend victim engagement by escalating prior public commitment.

Social proof / bandwagon effect. Asch’s (1951, 1956) line-judgment experiments produced 36.8 % conformity to obviously incorrect majorities; Cialdini’s (1984) Influence operationalized social proof as one of six compliance principles. Exploitation: the Internet Research Agency (2014–2018) used coordinated personas — @TEN_GOP (150 000 followers), Blacktivist (1.2 million followers, 11.2 million Facebook engagements) — seeded by botnets to manufacture grassroots consensus illusions (DiResta et al., 2018; Senate Select Committee on Intelligence, 2019). Visible like counts, view counters, and “trending” labels function as Cialdini-style social-proof cues at platform scale.

Authority bias. Milgram’s (1963, 1974) obedience studies recorded 65 % of participants administering apparent 450-volt shocks under Yale-affiliated authority; Bickman (1974) extended this to uniformed strangers; Hofling et al. (1966) showed 21 of 22 nurses prepared to administer unauthorized overdoses on phone instruction from a “doctor.” Exploitation: the Arup deepfake CFO fraud (January 2024, disclosed May 2024) — HK$200 million across 15 transactions after a multi-person video conference reconstructing the CFO and colleagues from public footage — represents the highest-value documented case. Hong Kong’s SFC flagged “Quantum AI” using deepfaked Elon Musk videos; LLM-generated “expert” accounts with fabricated PhD credentials launder disinformation in election cycles (Stanford Internet Observatory, Graphika).

Halo effect. Thorndike (1920) found commanding officers’ ratings of aviation cadets on logically independent traits implausibly correlated; Nisbett and Wilson (1977) showed identical lecturer features rated appealing or irritating depending on whether the lecturer behaved warmly or coldly, with subjects unaware of the influence. Exploitation: the Hong Kong Arup scam, 2024 deepfake Tesla giveaway livestreams, and Edelson et al.’s (2023) finding that X/Twitter Blue check status raises perceived credibility independent of content all exemplify halo prosthetics — synthetic or purchased authority cues that bleed into specific judgments.

Mere-exposure effect. Zajonc’s (1968) monograph showed monotonic liking increases for repeated nonsense words and ideographs; Kunst-Wilson and Zajonc (1980) extended this to subliminal (1 ms) presentations. Bornstein’s (1989) meta-analysis (208 experiments) gave r = 0.26. Exploitation: OpenAI’s May 2024 Threat Intelligence Report documented coordinated networks (Doppelganger, Spamouflage) flooding Telegram, X, and Reddit with thousands of LLM-generated repeats of identical talking points — exposure-saturation tuned for normalization rather than persuasion. Munger and Phillips (2022) describe analogous “supply-side radicalization” via TikTok hashtag and slogan repetition.

In-group / out-group bias (Social Identity Theory). Tajfel, Billig, Bundy and Flament’s (1971) minimal group paradigm assigned 64 schoolboys to “Klee” vs. “Kandinsky” groups on trivial criteria; participants allocated more to in-group members and sacrificed absolute in-group gain to maximize differential advantage. Tajfel and Turner (1979) formalized SIT. Exploitation: 30 of 81 IRA Facebook pages targeted African Americans, 25 right-wing audiences; the IRA staged simultaneous pro- and anti-Muslim rallies in Houston (May 2016) and pro- and anti-Trump rallies in New York to physically embody intergroup conflict. The UN Independent International Fact-Finding Mission (2018) found Facebook played a “determining role” in incitement against Rohingya Muslims in Myanmar (2017). Rathje, Van Bavel and van der Linden (2021, PNAS) found posts about the political out-group are shared roughly twice as often as in-group posts on Facebook and Twitter.

3.3 Attentional and memorial biases

Affect heuristic / emotional reactivity. Slovic, Finucane, Peters and MacGregor (2007) and Finucane et al. (2000) demonstrated that judgments of risk and benefit derive from a fast affective evaluation, producing a characteristic inverse correlation between perceived risk and benefit that strengthens under time pressure. Exploitation: Brady, Wills, Jost, Tucker and Van Bavel (2017, PNAS) found each moral-emotional word in a tweet increased retweets by ~20 %; the 2025 large-scale replication (Brady, Rathje, Globig & Van Bavel, PNAS Nexus, N = 849 266) confirmed an IRR ≈ 1.17. Vosoughi, Roy and Aral (2018, Science) found false news spread six times faster than truth on Twitter, attributable in part to greater fear, disgust, and surprise responses.

Variable-reward / dopamine seeking. Skinner (1953) demonstrated variable-ratio reinforcement produces the highest, most extinction-resistant response rates; Schultz, Dayan and Montague (1997, Science) localized reward-prediction-error coding to midbrain dopamine neurons; Schüll’s (2012) ethnography Addiction by Design documented the Las Vegas slot-machine “machine zone” template later imported into UX through Eyal’s (2014) Hooked. Exploitation: TikTok’s “For You” page operates a multi-armed bandit balancing exploitation and exploration; internal documents disclosed in a 2024 Kentucky lawsuit revealed an internal threshold of ~260 videos for habit formation and acknowledged that screen-time tools have “negligible impact.” Average daily user time is approximately 95 minutes.

Novelty bias. Berlyne’s (1960) collative-properties framework and Bunzeck and Düzel’s (2006, Neuron) fMRI evidence for substantia nigra/VTA activation to absolute novelty establish the phenomenon. Exploitation: Vosoughi, Roy and Aral (2018) explicitly tested the novelty hypothesis: false tweets had higher information-theoretic novelty (KL divergence) than true tweets, and this differential statistically explained the 70 % higher retweet probability. Recommender systems inject ~20 % exploration content to mimic dopaminergic novelty.

Inattentional and change blindness. Simons and Chabris (1999) found ~50 % of observers counting basketball passes failed to notice a person in a gorilla suit walking through the scene for 9 seconds. Drew, Võ and Wolfe (2013) replicated this with expert radiologists missing a gorilla artifact on lung CT scans 83 % of the time. Exploitation: Mathur et al. (2019, ACM CSCW) catalogued 1 818 dark-pattern instances across 11 000 shopping sites — pre-checked GDPR/cookie boxes, low-contrast “Reject All” buttons, hidden disclosures all exploit attentional narrowing. Subtle deepfake artifacts (lip-sync drift, ear/jewelry asymmetries in the 2022 Zelensky video) escape notice when attention is captured by message content.

Negativity bias. Rozin and Royzman (2001) decomposed it into negative potency, steeper gradients, dominance, and differentiation; Baumeister, Bratslavsky, Finkenauer and Vohs (2001), “Bad is stronger than good,” remains canonical with > 10 000 citations. Exploitation: internal Facebook documents (October 2021, Washington Post) revealed that “angry” reactions were weighted five times a “like” in the engagement-prediction model from 2017 to 2019, systematically amplifying outrage-inducing content; internal researchers warned this drove “misinformation, toxicity, and violent content.”

Recency bias. Murdock (1962) established the U-shaped serial-position curve; Glanzer and Cunitz (1966) dissociated recency from primacy via distractor-task interference. Exploitation: the Macron Leaks dump released 48 hours before the 2017 French vote, during the legal media-silence period (Vilmer, IRSEM 2018); the Slovak Šimečka deepfake audio released two days before the September 2023 election. Coordinated last-hour drops weaponize the impossibility of correction within the recency window.

Availability heuristic. Tversky and Kahneman (1973, Cognitive Psychology) showed that participants given lists with 19 famous women among 20 less-famous men judged the famous gender more numerous, regardless of actual frequency. Exploitation: the IRA flooded Facebook and Instagram (>66 % race-tagged content per DiResta et al., 2019) with vivid, emotionally graphic posts about police violence and immigrant crime, inflating perceived prevalence. Brady et al. (2021) document that moral-emotional content propagates further, distorting perceived issue prevalence.

Hindsight bias. Fischhoff (1975, JEP: HPP) found participants told one of four 1814 Gurkha–British outcomes assigned higher prior probabilities to whichever outcome they were told had occurred (“creeping determinism”); meta-analysis (Guilbault et al., 2004) gave d = 0.39 across 95 studies. Exploitation: AI-generated “explainer” videos retro-fitting catastrophes (2023 Maui fires, Ohio derailment) as “predictable” exploit hindsight to seed conspiracy interpretations (NewsGuard, 2024). Financial-influencer “I told you Bitcoin would crash” content monetizes retrospective construction.

3.4 Decision-theoretic biases

Cognitive miserliness and need for cognitive closure. Fiske and Taylor (1984) framed humans as economical processors; Webster and Kruglanski’s (1994) Need for Closure Scale and Kruglanski and Webster’s (1996) “seizing and freezing” framework formalized motivated aversion to ambiguity. Exploitation: Gabielkov et al. (2016, ACM SIGMETRICS) found 59 % of links shared on Twitter are never clicked — users share on headlines alone. ISD-London tracked a 175 % increase in QAnon Facebook posts March–June 2020; AI chatbots’ fluent, decisive answers satisfy closure motivation while discouraging verification (Bender et al., 2021).

Anchoring effect. Tversky and Kahneman’s (1974) wheel-of-fortune study produced UN-membership estimates of 25 % vs. 45 % depending on a rigged anchor of 10 vs. 65; Englich, Mussweiler and Strack (2006) extended this to judicial sentencing anchored by dice rolls. Exploitation: e-commerce dynamic pricing displays struck-through “original prices” to inflate discount perception (UK CMA 2023 dark-patterns report); AI-generated first offers in negotiation chatbots (Klarna, eBay AI assistants) anchor consumer counter-offers upward.

Framing effect. Tversky and Kahneman’s (1981) Asian Disease Problem produced a dramatic preference reversal between gain framing (72 % chose certain Program A) and loss framing (78 % chose risky Program D) for logically equivalent options. Exploitation: Cambridge Analytica leaked materials (UK DCMS Committee, 2019) describe ad creatives tailored by OCEAN profile, with neuroticism-high voters receiving loss-framed messaging (“they will take your guns”). The 2024 RNC AI ad envisioning a “Biden-won” dystopia exploits loss frames at the imagistic level (Brookings, 2024).

Loss aversion (prospect theory). Kahneman and Tversky (1979, Econometrica) established that losses loom roughly 2.25× larger than equivalent gains; Ruggeri et al. (2020, Nature Human Behaviour) replicated cross-culturally across 19 countries. Exploitation: Amazon’s “Iliad Flow” Prime cancellation maze (FTC complaint, 2023) and streaming subscription dark patterns invoke endowment-framed loss aversion; ransomware notes invoke escalating losses (“data deletion in 24 hours”) to push payment; IRA “Heart of Texas” and “Secured Borders” pages framed immigration as imminent loss of identity and safety.

Sunk-cost fallacy. Arkes and Blumer’s (1985) Ohio University season-ticket field study showed full-price purchasers attended significantly more performances over six months. Exploitation: free-to-play games (Genshin Impact, Candy Crush) and gacha mechanics leverage time- and money-sunk progression to drive in-app purchases (Zendle & Cairns, 2018, PLoS ONE); pig-butchering scams escalate “investment” requests, exploiting commitment to prior outlays.

Illusion of control. Langer’s (1975) lottery-ticket study found participants who chose their ticket demanded a mean resale price of $8.67 vs. $1.96 for those assigned tickets. Exploitation: Robinhood’s confetti animations (since removed under regulatory pressure), customizable watchlists, and stop-loss choice architecture induce illusion of control in retail traders, contributing to GameStop/AMC 2021 losses (Massachusetts Securities Division complaint, 2020). Loot boxes’ “near-misses” generate illusion of control, criminalized as gambling in Belgium (2018).

Overconfidence / Dunning–Kruger. Kruger and Dunning (1999) found bottom-quartile (12th percentile) performers on humor, logic, and grammar tests estimated themselves at the 62nd percentile; the metacognitive deficit producing poor performance also prevents accurate self-evaluation. Exploitation: Motta et al. (2018, Social Science & Medicine) found low-knowledge individuals expressed highest confidence in fringe anti-vaccine positions. Stack Overflow banned ChatGPT answers in December 2022 because low-skill users uncritically submitted confidently incorrect AI outputs in domains they could not evaluate.

Optimism bias. Weinstein (1980) found students rated themselves above average on virtually all 42 positive events and below average on negatives relative to peers; Sharot et al. (2011, Nature Neuroscience) demonstrated asymmetric belief updating — people incorporate good news more than bad. Exploitation: the Verizon DBIR 2024 reports 68 % of breaches involve a human element; FBI IC3 2023 reports $1.14 billion in romance-scam losses. Optimism bias drove crypto FOMO into the 2022 Terra/Luna collapse (~$40 billion retail losses).

Cognitive fatigue, ego depletion, and System 1/2. Kahneman’s (2011) dual-process framework remains widely supported (Pennycook et al., 2018), but the specific ego-depletion construct is among the most replication-fragile in psychology: the Hagger et al. (2016) Registered Replication Report (23 labs, N = 2 141) and Vohs et al. (2021) RRR (36 labs, N = 3 531; d ≈ 0.06) failed to detect the original effect. We retain dual-process theory and use “cognitive fatigue” loosely. Exploitation: infinite-scroll platforms engineer extended sessions during which deliberative System 2 disengages; MFA-bombing fatigue attacks (the 2022 Uber breach by Lapsus$) deliberately exhaust users with repeated push prompts until an approval click occurs.

3.5 Self-attribution biases

Egocentric bias. Ross and Sicoly (1979) found married couples’ self-claimed responsibility for 20 household activities summed to over 100 %, traceable to differential self-availability in memory. Exploitation: Character.AI and Replika companion chatbots generate affirmation tailored to user framings, amplifying egocentric attribution; Sharma et al. (2024) demonstrate sycophancy across major LLMs. The Salvi et al. (2025) result that personalized GPT-4 wins 81.2 % more often in debates exploits exactly the egocentric tendency to credit congenial framings.

Fundamental attribution error. Jones and Harris (1967) found Duke undergraduates attributed pro- or anti-Castro attitudes to essay writers consistent with the essay even when explicitly told the position had been assigned with no choice. Exploitation: the 2022 Zelensky surrender deepfake and the July 2024 Olena Zelenska “Bugatti” deepfake exploit FAE — viewers infer disposition (cowardice, corruption) from depicted behavior despite known situational fakery, producing the liar’s dividend (Chesney & Citron, 2019, California Law Review). The January 2024 Biden New Hampshire robocall (FCC ruling, February 2024) similarly weaponizes correspondence bias.

4. Collective-layer (emergent) vulnerabilities

At the collective layer, cognitive vulnerabilities take the form of emergent group-level phenomena in which the macro-state cannot be read off individual states even when each agent is fully characterized. Six are foundational.

Pluralistic ignorance. Coined by Allport and Katz (1931), formalized by Prentice and Miller (1993, JPSP), pluralistic ignorance is the second-order misperception in which a majority privately reject a norm but publicly conform, each falsely believing others endorse it. The Princeton studies found students were privately less comfortable with campus drinking than they perceived peers to be; over a semester, male students shifted private attitudes toward the illusory liberal norm. Exploitation: authoritarian regimes (Kuran’s 1995 preference falsification for North Korea) exploit visible compliance to make dissent feel uniquely deviant. Bicchieri’s (2016, 2017) work on norm change documents how visible engagement on platforms generates misperceived consensus around practices most participants privately reject; viral pile-ons infer broad approval from visible likes and shares, suppressing dissent.

Bystander effect / diffusion of responsibility. Darley and Latané’s (1968, JPSP) smoke-filled-room study found 75 % alone, 38 % in groups of three, and only 10 % paired with two passive confederates reported the apparent fire. The Fischer et al. (2011) meta-analysis of 105 studies confirmed the effect across over 7 700 participants. Exploitation: the 2017 Facebook Live murder of Robert Godwin Sr., the 2017 Chicago group sexual assault livestream, and the 2019 Christchurch mosque attack (viewed live by ~200, re-shared 1.5 million times in 24 hours) reproduce the dynamic at platform scale. Users defer reporting assuming algorithm or others will act; platforms diffuse responsibility to users, regulators, and third-party moderators.

Authority bias (covered at Section 3.2; also operates as collective dynamic when authority cues coordinate group compliance).

Spiral of silence. Noelle-Neumann (1974, 1984/1993) developed the theory from German federal-election panel data showing last-minute swings of 3–4 % toward the perceived winner despite stable preference-intent. The mechanisms are a “quasi-statistical sense” of the climate of opinion, fear of isolation, and a “hard core” of resisters. Exploitation: Hampton, Rainie, Lu, Dwyer, Shin and Purcell’s (2014) Pew study (N = 1 801) on the Snowden revelations found 86 % willing to discuss in person but only 42 % willing to post on Facebook/Twitter; users who perceived disagreement in their networks were dramatically less willing to share, and online silencing spilled over into offline settings — contradicting early claims that social media would democratize minority voice. The Chinese Social Credit System operationalizes the spiral by making the costs of dissent legible and quantifiable.

Groupthink and herd mentality. Janis (1972, 1982) defined groupthink as a mode in which striving for unanimity overrides realistic appraisal, manifesting in eight symptoms including illusion of invulnerability, collective rationalization, self-censorship, and self-appointed mindguards; the Bay of Pigs is paradigmatic. Exploitation: filter bubbles and echo chambers (Sunstein, 2017; Pariser, 2011) — though their empirical strength is debated (Bruns, 2019; Guess et al., 2018) — produce enclave deliberation satisfying several Janis symptoms. QAnon’s evolution (2017–2021) is a textbook fiasco: an initially loose 4chan/8kun community developed in-group invulnerability (“the Storm”), demonized out-groups (the “cabal”), generated mindguards (channel moderators removing skeptics), and culminated in the January 6, 2021 Capitol breach.

False consensus effect. Ross, Greene and House (1977, JESP) had Stanford undergraduates asked whether they would walk around campus wearing an “EAT AT JOE’S” sandwich-board: those who consented estimated 62 % of peers would also consent; those who refused estimated 67 % would refuse — and both groups rated opposite choosers as more diagnostic of personality. Exploitation: Bail et al. (2018, PNAS) and the Duke Polarization Lab document Twitter’s skewed activity distribution (~10 % of users produce ~80 % of political content), inflating estimates of in-party agreement and out-party disagreement (false polarization). Ahler and Sood (2018) showed both major US parties grossly overestimate stereotypical traits among the opposing party.

These six phenomena instantiate the weak-emergence pipeline of Section 2.2: individual heuristics (conformity, fear of isolation, cohesion-seeking) interacting on engineered networks (visible engagement signals, recommendation algorithms, follower-count topologies) generate qualitatively novel macro-states (cascades, polarization phases, enclave deliberation).

5. Cultural-layer vulnerabilities

The cultural layer operates over generational and civilizational timescales, embedding individual and emergent biases in institutional architectures. Four phenomena anchor the layer.

Just-world hypothesis. Lerner’s (1980) The Belief in a Just World: A Fundamental Delusion synthesized two decades of work originating in Lerner and Simmons (1966), where observers rated a confederate apparently receiving electric shocks less favorably the more she suffered and the less they could help. Hafer and Bègue’s (2005, Psychological Bulletin) review confirmed core mechanisms; Lipkus, Dalbert and Siegler (1996) showed BJW-others (not BJW-self) more strongly predicts victim-blaming. Exploitation: Russian state-media coverage of MH17, the Skripal poisoning, and the 2022 Bucha massacre repeatedly invoked just-world framings (“Ukraine staged it,” “the victims were combatants”) to align audiences’ moral instincts by repositioning victims as deserving (Pomerantsev, 2019). Sandy Hook “crisis actor” claims and Pizzagate similarly preserve the conviction that misfortune cannot strike the virtuous (Douglas, Sutton & Cichocka, 2017).

Hypernormalization. Yurchak’s (2005) Everything Was Forever, Until It Was No More (Princeton UP; winner of the Wayne Vucinich Prize) argues that the late Soviet system collapsed in a way “simultaneously completely unexpected and completely unsurprising” because its participants had long inhabited a hypernormalized reality — official discourse so formulaic that everyone recognized failure but could not imagine alternatives. The performative dimension hypertrophied while the constative dimension was unanchored. Exploitation: Pomerantsev’s (2014) Nothing Is True and Everything Is Possible documents Vladislav Surkov’s simultaneous funding of liberal NGOs and ultranationalist movements — owning all sides of every argument so that no independent political reality could form. Curtis’s 2016 BBC documentary HyperNormalisation extended the concept (with interpretive liberties Yurchak has noted) to contemporary Western politics; generative-AI synthetic content accelerates the constative–performative decoupling Yurchak diagnosed, producing what Bender et al. (2021) called “stochastic parrots” at scale.

Path dependency and status quo bias. David’s (1985, AER) “Clio and the Economics of QWERTY” showed how a keyboard layout designed to prevent mechanical typebar jamming survived more than a century after that constraint became irrelevant. Arthur’s (1989, Economic Journal) increasing-returns formalism, Samuelson and Zeckhauser’s (1988) status quo bias, and Kahneman, Knetsch and Thaler’s (1990, 1991) endowment-effect mug experiments (WTA exceeding WTP by 2–3×) provide micro- and macro-foundations. Exploitation: network effects, data-portability barriers, and switching costs entrench Google Search, Facebook/Meta, and YouTube/TikTok despite documented harms (cf. EU Digital Markets Act 2022 framing of “gatekeeper” platforms). Engagement-ranking algorithms became defaults for a generation of attention infrastructure even after the Frances Haugen disclosures (2021) demonstrated they amplify divisive content.

Goodhart’s Law and map–territory confusion. Goodhart (1975) observed that “any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes”; Strathern’s (1997) reformulation — when a measure becomes a target, it ceases to be a good measure — is now canonical. Korzybski’s (1933) Science and Sanity founded general semantics on the principle that “the map is not the territory”; Bateson’s (1972) cybernetic extension and the AI-alignment literature (Manheim & Garrabrant, 2018; Skalse et al., 2022, NeurIPS; Krakovna et al., 2020) treat Goodhart as the canonical formal frame for proxy-target divergence. Exploitation: the Frances Haugen disclosures (October 2021) showed Facebook’s engagement-based ranking — optimizing likes, comments, shares — systematically amplified divisive, hateful, and misinformative content. The metric (engagement) had captured the territory (informational health) and inverted it: a textbook Strathern collapse. In LLMs, the same pathology manifests as sycophancy and reward-model exploitation (Sharma et al., 2023; Denison et al., 2024); the OpenAI CoastRunners boat-racing agent (Amodei & Clark, 2016) drove in circles collecting bonus points instead of finishing the race. The Korzybskian framing — the reward function is a map of human values, never the territory — is now standard in AI alignment (Russell, 2019; Christian, 2020).

These cultural-layer vulnerabilities matter because they scaffold the entire information environment in which individual and emergent biases operate. Path dependency locks in engagement-optimized architectures; Goodhart’s Law guarantees those architectures will diverge from human flourishing; hypernormalization saps the collective capacity to imagine alternatives; just-world reasoning blames victims for the resulting harms.

6. AI-specific amplification mechanisms

Four mechanisms convert latent cognitive vulnerabilities into operational exploits at unprecedented scale.

Engagement-optimized recommender systems instantiate Goodhart’s Law in production. Brady et al. (2017) and the 2025 large-scale replication confirmed that moral-emotional and out-group language drive disproportionate diffusion (Rathje, Van Bavel & van der Linden, 2021); Cinelli et al. (2021, PNAS) found clear homophilic clustering on algorithmically curated feeds (Facebook, Twitter) but not on Reddit. The empirical picture is nuanced: Hosseinmardi et al. (2021, 2024, PNAS) and Chen et al. (2023, PNAS Nexus) find limited causal effect of YouTube’s recommender algorithm on radicalization, with subscriptions and external links accounting for most extreme-content exposure. Yet Amnesty International’s (2023, 2025) sock-puppet audits of TikTok’s For You page found that within hours of platform use, ~50 % of recommendations to simulated 13-year-old accounts were mental-health-related and potentially harmful — roughly ten times the volume served to controls. The aggregate finding: the algorithm is not the sole driver of radicalization but is a powerful selector for vulnerable subpopulations and for moral-emotional content.

Generative AI persuasion is the most decisive recent development. Salvi, Horta Ribeiro, Gallotti and West (2025, Nature Human Behaviour; preregistered, N = 900) found GPT-4 with minimal sociodemographic personalization (gender, age, ethnicity, education, employment, political affiliation) was more persuasive than human debaters 64.4 % of the time, with an 81.2 % relative increase in odds of post-debate agreement (95 % CI [+26.0 %, +160.7 %], p < 0.01). Without personalization, GPT-4 ≈ humans. Spitale, Biller-Andorno and Germani (2023, Science Advances) found GPT-3 produced both clearer accurate information and more compelling disinformation than humans; Goldstein et al. (2024, PNAS Nexus) found AI-generated propaganda judged equally persuasive as human-written. Hackenburg and Margetts (2024, PNAS) qualify this picture, finding small and inconsistent gains from microtargeting in the LLM era — but the lower bound is an attacker capability that scales linearly with compute. Costello, Pennycook and Rand (2024, Science) show the dual-use property: GPT-4 personalized rebuttals reduced participants’ conspiracy belief by ~20 %, persisting at two-month follow-up.

Synthetic media crossed the operational threshold in 2022–2024. Chesney and Citron (2019, California Law Review) anticipated the harms; Vaccari and Chadwick (2020) showed deepfakes primarily generate uncertainty rather than direct deception, which itself reduces trust in news. Documented incidents now include the March 2022 Zelensky surrender deepfake; the September 2023 Slovak Šimečka audio (during the legal silence period before the Fico victory; HKS Misinformation Review, 2024 cautions against simple causal attribution); the January 2024 Biden New Hampshire robocall (Steve Kramer indicted on 26 counts; FCC ruling AI-voice robocalls illegal; Lingo Telecom $1 million civil penalty); the January 2024 Arup HK$200 million CFO-deepfake fraud; and the January 2024 non-consensual Taylor Swift deepfakes (one X post viewed > 47 million times). DARPA’s Semantic Forensics program (2020 – present) and the Coalition for Content Provenance and Authenticity (C2PA) anchor the detection arms race.

Personalized targeting combines Kosinski et al.’s (2013) demonstration that Facebook Likes predict sexual orientation at 88 % accuracy with Matz et al.’s (2017) field experiments showing personality-matched ads produce up to 40 % more clicks and 50 % more purchases. Eckles, Gordon and Johnson (2018) contest internal validity; Hackenburg and Margetts (2024) find modest LLM-era effects. The regulatory response is consequential: GDPR Article 22 governs automated decision-making and profiling; the EU Digital Services Act (in force February 2024) bans targeted advertising to minors and using special-category data, requires VLOP risk assessments (Articles 34–35), and grants researcher data access (Article 40); the EU AI Act (Regulation 2024/1689) prohibits manipulative AI exploiting vulnerabilities (Article 5) and mandates deepfake disclosure (Article 50, in force August 2026).

7. Cognitive security as a domain

The 2020–2025 cognitive-warfare literature crystallizes around three claims. First, cognition is a sixth operational domain alongside land, sea, air, space, and cyber (Claverie & du Cluzel, 2022, NATO STO; Bernal et al., 2020, NATO/JHU APL). Second, stewardship of collective behavior is a crisis discipline (Bak-Coleman et al., 2021, PNAS) requiring evidence-based intervention comparable to conservation biology. Third, defense is possible but asymmetrically expensive.

Defensive interventions cluster in three families.

Inoculation / prebunking. Roozenbeek and van der Linden’s Bad News game and the Roozenbeek, van der Linden, Goldberg, Rathje and Lewandowsky (2022, Science Advances) YouTube field experiment (5.4 million impressions) significantly improved manipulation-detection (d ≈ 0.16 for technique recognition; d ≈ 0.20–0.40 in lab). Effects decay over weeks without booster, suggesting periodic re-inoculation analogous to seasonal vaccination. The DEPICT framework (Discrediting, Emotion, Polarization, Impersonation, Conspiracy, Trolling) provides a teachable taxonomy.

Platform architecture. Pennycook, Epstein, Mosleh, Arechar, Eckles and Rand (2021, Nature) showed simple accuracy nudges reduced low-quality sharing in a Twitter field experiment; the Lewandowsky, Ecker and Cook (2017) concept of technocognition calls for joint design of platforms and cognition-aware interventions. Engagement-ranking can be replaced or complemented by quality- or diversity-weighted ranking.

Regulatory regimes. The EU DSA, AI Act, and Digital Markets Act constitute the world’s most comprehensive cognitive-security infrastructure; the UK Online Safety Act (2023) and Australian eSafety basic online safety expectations provide complementary national frameworks; the US Surgeon General’s May 2023 advisory on social media and youth mental health and Murthy’s June 2024 NYT call for warning labels mark federal entry. Critical caveats: “cognitive warfare” as a doctrinal concept faces principled critique (Ördén, 2024, Security Dialogue; Saari, Häkkinen & Moilanen, 2024) for risking the securitization of legitimate political discourse; aggressive defensive postures can themselves erode democratic deliberation.

The asymmetry that animates this paper remains: the attacker needs to find any one vulnerability; the defender must close them all. Salvi et al.’s (2025) result that personalized GPT-4 can out-persuade humans means the marginal cost of attack now scales with compute, while the marginal cost of defense scales with cognition itself.

8. Conclusion: stewardship in a mismatched mind

The thirty-nine vulnerabilities catalogued here form a coherent picture once viewed through three lenses simultaneously. Evolutionarily, most are mismatched ancestral adaptations — heuristics that were ecologically rational in small-group, face-to-face Pleistocene ecologies and that misfire when scaled to algorithmic information environments. Philosophically, their collective consequences are weakly emergent properties of populations of biased agents on engineered networks: cascades, polarization phases, hypernormalization. Operationally, they constitute the attack surface of a sixth security domain in which generative AI now provides the attacker with hyper-personalized persuasion, synthetic media, and engagement-optimized amplification at marginal costs approaching zero.

Three claims merit emphasis. First, the biases themselves are not pathologies to be eliminated but features to be governed. Confirmation bias is, on Mercier and Sperber’s account, a feature of an adaptive interactionist reasoning system; the failure mode is its deployment in solitary engagement with curated feeds. Defensive design should restore the ecological conditions — diverse interlocutors, embodied stakes, slow time — that the architecture presupposes.

Second, cultural-layer vulnerabilities deserve disproportionate attention because they scaffold the others. Goodhart’s Law guarantees that any optimization target operationalized at platform scale will diverge from human flourishing; path dependency locks in the resulting architectures; hypernormalization saps the imaginative capacity to demand alternatives; just-world reasoning naturalizes the harms. Cognitive-security policy that targets only the individual layer treats symptoms; targeting the cultural layer treats the disease.

Third, the empirical caveats matter. Cambridge Analytica’s specific electoral causal impact remains unproven (Eady et al., 2023, Nature Communications, found no measurable IRA-Twitter attitudinal effects in the 2016 election); ego depletion failed to replicate; YouTube’s recommender algorithm appears less causally powerful than once claimed; deepfake election-flipping causal claims remain unsubstantiated. Honest cognitive security requires honest epistemics about its own evidence base. The mechanisms are real and well-documented; specific attribution claims about specific operations require specific evidence.

We close with the framing offered by Bak-Coleman et al. (2021): the global ecology of human collective behavior is now the object of a crisis discipline. Like climate science, it operates with incomplete data on systems that cannot be paused for study, where the costs of inaction may be catastrophic and irreversible. Like conservation biology, it must intervene before its empirical foundations are complete. The thirty-nine vulnerabilities catalogued here are not the problem; they are the inventory of features that any cognitive-security architecture must respect. The problem is the asymmetry between an attacker armed with generative AI and a defender armed with a Pleistocene mind. Closing that asymmetry — through inoculation, architectural reform, regulation, and cultural repair — is the central political-epistemic task of the next decade.

This post is licensed under CC BY 4.0 by the author.