AI in Financial Crime Prevention: What's Actually Working, What's Not, and Why Governance Matters

Financial crime is not a fixed target. It moves, adapts, and scales faster than most institutions can respond to it. The numbers make this uncomfortable to ignore: in the United States alone, fraud losses exceeded $12.5 billion in 2024, up 53% from 2021. Identity theft hit 1.1 million reported cases. In Australia, nearly one in ten adults was a victim of credit card fraud in a single financial year.

Meanwhile, fraudsters are increasingly using AI themselves: testing stolen card credentials at scale, building fake platforms, and automating social engineering attacks that would have required entire teams a decade ago.

The arms race is real. And ignoring it is not a neutral choice.

The Problem Nobody Talks About Honestly

Most banks have spent the last decade getting compliant. Building KYC processes, hiring compliance teams, implementing transaction monitoring systems. That work was necessary. But it created something nobody advertised: enormous operational cost, mostly spent on reviewing alerts that go nowhere.

Here is a concrete illustration. Take a realistic mid-sized bank. Its rule-based transaction monitoring system flags thousands of alerts per month. Analysts review them. The vast majority turn out to be false positives, legitimate transactions caught by an overly broad rule. In realistic scenarios, the time spent on those false positives can exceed 6,000 analyst days per year. That is roughly 23 full-time employees spending their entire working year reviewing noise.

Not catching criminals. Reviewing noise.

This is the core problem AI is being asked to solve. Not detecting more fraud. Spending less time on cases that were never fraud to begin with.

Where Most Banks Actually Are Today

It helps to think about financial crime prevention as a maturity journey with three broad stages.

The first stage is where most banks were in the 2000s: not fully compliant, small teams, focus mainly on onboarding. The second stage, where most banks are now, is remediation. Increased compliance, but achieved through large manual efforts, siloed systems, and no integrated view of client risk.

The third stage, where the industry is trying to get to, is advanced methods. Better detection models. More focus on effectiveness rather than just coverage. Real integration between AML, fraud, sanctions, and KYC.

The shift matters because the question is no longer "are we compliant?" It is "are we actually preventing financial crime, and at what cost?"

What Is Actually Being Built

The most common entry point for AI in this space right now is not replacing the existing system. It is sitting on top of it.

The Noise Reduction Model is a good example. Instead of training AI on raw transactions, you train it on something you already have: historical alerts that analysts have reviewed and closed as false positives. The model learns what a dead-end alert looks like, and starts closing those automatically before they reach an analyst's desk.

A companion control model runs alongside it. Before the system auto-closes anything, the control model checks whether it has seen enough similar cases to act confidently. If not, the alert goes to a human.

The result is meaningful. Analysts spend time on cases that actually warrant attention. The feedback loop also improves over time as more labelled data accumulates.

The honest limitation: the underlying rule-based system stays live. You are adding intelligence on top of legacy architecture, not replacing it. That increases complexity. It is a step in the right direction, not the destination.

Statistical and unsupervised models are the next layer. Rather than fixed rules, these models learn what normal behaviour looks like for each client and flag meaningful deviations. A commercial customer in the food and beverage sector whose cash deposits suddenly spike two standard deviations above their 12-month average gets flagged. Not because a rule says "flag cash deposits above X" but because their own history says something unusual is happening.

This matters especially in situations where rule-based approaches struggle: correspondent banking, for instance, where a bank is processing transactions on behalf of customers it knows very little about. Unsupervised models can assign an outlier score to each transaction and, critically, explain which specific features drove that score. The analyst sees not just a flag but a reason.

The LLM Question: AI Reviewing People

One of the more interesting recent applications is using Large Language Models not to detect suspicious transactions, but to evaluate how well analysts are investigating them.

The setup is simple. The LLM receives the details of a flagged transaction and the analyst's written justification for closing it, then returns a score from 0 to 100 with an explanation.

A justification reading "customer does this all the time" scores around 10. The model correctly identifies the absence of any source-of-funds explanation, transaction purpose, or verification.

A more detailed justification, where a client explains a large cash deposit by referencing a used car purchase from a private seller, scores around 30. Plausible, but missing the seller's identity, a traceable payment trail, and supporting documentation.

Add the sales contract and vehicle registration to that same justification, and the score rises to 70. Still not perfect, because paying CHF 165,000 in cash for a Lamborghini raises its own questions. But the reasoning is documented and the risk is named.

This is not AI replacing the compliance officer. It is AI standardising the quality bar across an entire team. Catching the analyst who closes a complex case with two sentences. Making every decision more defensible when the regulator asks questions.

One open question is worth sitting with: what happens when two LLMs disagree on a borderline case? Is that a stronger control, or does it just diffuse accountability? The answer depends on how the governance around the system is designed. Which brings us to the part most vendors skip.

Where This Is Going

Five trends are shaping the near future of this space.

Risk-triggered monitoring over periodic review. The annual or biennial KYC review cycle is giving way to event-driven monitoring. A sudden change in transaction behaviour, a new adverse media hit, a sanctions list update: these trigger review. Not the calendar. This concentrates human attention where risk actually exists, rather than spreading it thin across the entire client base on a fixed schedule.

MLOps as serious infrastructure. Building a model is no longer the hard part. Keeping it working is. Fraud patterns shift. A model trained on last year's data may be significantly less effective against this year's tactics. Monitoring for model drift, retraining systematically, and maintaining clear audit trails of model decisions: these are operational capabilities that need to be built and owned.

GenAI for unstructured data. The highest-value application of generative AI in this context is not writing summaries. It is extracting structured, usable signals from unstructured sources: KYC documents, due diligence reports, adverse media, analyst notes. Turning qualitative information into something a detection model can actually use is one of the most promising applications currently being piloted.

Breaking down silos. AML, fraud detection, sanctions screening, and tax compliance have historically operated in separate systems with separate teams. A client who looks clean in each silo individually may look very different when the signals are combined. AI architectures trained on consolidated data are beginning to make that integrated view possible.

Federated learning for shared intelligence. The most powerful tool for detecting sophisticated laundering schemes is network analysis: mapping relationships between entities across transaction flows. The problem is that no single institution sees enough of the network to make it work. Federated learning allows institutions to train models collaboratively without sharing raw client data. It is technically complex and still maturing, but it is the realistic path toward cross-institution intelligence that respects privacy obligations.

The Governance Layer: The Part That Actually Determines Success

Every capability described above comes with governance requirements. In a regulated environment, these are not optional.

Explainability is a regulatory expectation. FINMA's Circular 08/2024 sets clear expectations around model risk management. A system that automatically closes AML alerts or escalates cases to suspicious activity reporting must explain its decisions. To analysts, to senior management, and to examiners. Building explainability in from the start is not a technical nicety. It is a compliance requirement.

Model drift needs an owner. When a model starts performing worse because fraud patterns have evolved, someone needs to notice, escalate, and act. That requires monitoring metrics, defined thresholds, and named accountability. "The model handles it" is not a governance structure.

Bias deserves explicit attention. Models trained on historical decisions inherit historical biases. In financial crime, where AI outputs can affect access to services, that is both an ethical issue and a regulatory exposure. It needs to be examined during model development and validation, not discovered during an examination.

Human oversight cannot be designed out. The right approach builds trust incrementally. Use model outputs to prioritise human review first. Then suppress only the most clearly benign cases. Then expand model authority as confidence is established. Full automation without meaningful human checkpoints is not defensible given where model capabilities actually are today.

And the accountability question that never goes away: when an AI system misses a genuine case of money laundering, who is responsible? The answer cannot be "the model." It has to be a named person, with a clear mandate, appropriate tools, and documented processes.

AI is genuinely changing financial crime prevention. The institutions getting it right are treating governance with the same seriousness as the technology. Not as a constraint on what they can build. As the foundation that makes it sustainable.