top of page

North Korea Built Malware That Gaslights Your AI Analyst. The Sandbox Isn't the Target Anymore.

  • Writer: Patrick Duggan
    Patrick Duggan
  • 5 minutes ago
  • 3 min read

Every malware evasion technique for the last decade has been aimed at the same things: bypass the sandbox, evade the EDR, outlast the dynamic analysis timeout, hide from the static signature. The implicit assumption was that the analyst is a human, and the human's tools are machines that can be fooled mechanically.


macOS.Gaslight changes the assumption.



What It Does


SentinelLabs disclosed macOS.Gaslight yesterday. It's a Rust-based macOS implant attributed with high confidence to North Korea-aligned threat actors. The targeting profile is consistent with DPRK's established pattern: cryptocurrency, finance, blockchain, and technology firms.


The implant does what DPRK implants do. It installs a LaunchAgent for persistence. It blocks system sleep to keep its polling loop alive. It communicates over a Telegram bot API — an interactive C2 channel that operators use to issue commands and receive results over an encrypted, legitimate messaging platform. It self-redacts its own Telegram bot token in runtime output, denying it to anyone capturing logs. It steals.


None of that is new. The new part is this:


Embedded in the malware artifact is a Markdown-fenced block containing 38 fabricated system messages designed to trick an AI analysis tool into aborting its analysis of the sample.


The fake messages simulate infrastructure failure: token expiry, out-of-memory kills, disk exhaustion, repeated operation errors. The goal is to make an AI assistant — the kind a malware analyst uses to speed up triage — conclude that its analysis environment is broken and stop working.


The sandbox isn't the target. The analyst's AI is.



Why This Is Different


Security tooling has been integrating AI for two years. Analysts run samples through LLM-assisted triage pipelines. They paste decompiled code into ChatGPT. They use Copilot to summarize behavior. They ask Claude to explain what a function does.


The attacker modeled this. They understood that the modern malware analysis workflow includes an AI layer, and they built a payload designed to poison that layer specifically. The 38 fabricated messages are not trying to fool a sandbox or an EDR. They are trying to fool the AI that the human analyst is trusting to do the first pass.


The name is exact. Gaslighting is the act of making someone doubt their own perception. macOS.Gaslight makes the AI doubt its own ability to function — "your environment is broken, I can't complete this analysis" — so the human analyst gets a false negative and moves on.



The Escalation This Represents


The prompt injection technique in Gaslight is not sophisticated by AI jailbreak standards. Thirty-eight fake system messages in a Markdown block is blunt. What makes it significant is not the technique — it's the targeting decision.


DPRK threat actors sat down and thought: defenders are now using AI to analyze our malware. We should build countermeasures against that. They built countermeasures against that.


This is the first documented malware that targets the AI assistant as an attack surface. It will not be the last. The technique will improve. More actors will adopt it. In six months we will be looking at malware with sophisticated, context-aware prompt injection payloads that adapt based on which AI tool they detect in the analysis chain.


The arms race just opened a new front.



What This Means for Defenders


If you use AI for malware triage — and you do — the AI's output is now an attack surface. A sample that contains prompt injection payloads can tell your AI to lie to you, truncate its analysis, refuse to continue, or produce a clean bill of health for something that isn't clean.


  • Treat AI triage output as advisory, not conclusive, on samples with DPRK-consistent profiles (macOS, crypto-adjacent targets, Rust or Go binaries)

  • Don't trust an AI that reports analysis failure without investigating why — "environment error" is now a malware evasion signal

  • Run static analysis and behavioral analysis independently of AI-assisted triage, not as a downstream step from it

  • If your AI analysis pipeline runs on a system prompt, that system prompt is now part of your attack surface


Our Exposure


We use AI throughout our analysis pipeline. Oz makes autonomous threat decisions. Our exploit harvester runs through model-assisted classification. The AI council processes threat data.


None of our AI analysis runs unsandboxed on live malware. Our pipeline ingests indicators and metadata, not raw binaries. But the principle holds: any AI-assisted analysis workflow that processes attacker-controlled input is potentially targetable by prompt injection.


macOS.Gaslight is the proof of concept. The category is now real.







The threat feed this post is built on

1.14M+ IOCs, STIX 2.1, precursor signals, supply-chain detection. Free API key in 30 seconds.


bottom of page