Representation learning with a transformer by contrastive learning for money laundering detection
This document introduces a novel two-step methodology for money laundering detection that significantly improves upon existing rule-based and traditional machine learning methods. The first step involves representation learning using a transformer neural network, which analyzes complex financial time series data without requiring labels through contrastive learning. This self-supervised pre-training helps the model understand the inherent patterns in transactions. The second step then leverages these learned representations within a two-threshold classification procedure, calibrated by the Benjamini-Hochberg (BH) procedure, to control the false positive rate while accurately identifying both fraudulent and non-fraudulent accounts, addressing the significant class imbalance in money laundering datasets. Experimental results on real-world, anonymized financial data demonstrate that this transformer-based approach outperforms other models in detecting fraudulent activities.