Representation learning with a transformer by contrastive learning for money laundering detection

This document introduces a novel two‑step methodology for money laundering detection that significantly improves upon existing rule‑based and traditional machine learning methods. The first step involves representation learning using a transformer neural network, which analyzes complex financial time series data without requiring labels through contrastive learning. This self‑supervised pre‑training helps the model understand the inherent patterns in transactions. The second step then leverages these learned representations within a two‑threshold classification procedure, calibrated by the Benjamini‑Hochberg (BH) procedure, to control the false positive rate while accurately identifying both fraudulent and non‑fraudulent accounts, addressing the significant class imbalance in money laundering datasets. Experimental results on real‑world, anonymized financial data demonstrate that this transformer‑based approach outperforms other models in detecting fraudulent activities.