Can Words Reveal Fraud? A Lexicon Approach to Detecting Fraudulent Financial Reporting

The study introduces a fraud lexicon and a Balanced Random Forest classifier for detecting fraudulent financial reporting. The classifier, utilizing the fraud lexicon as a feature set, demonstrates strong accuracy in predicting fraud across multiple samples from 2000 to 2017, outperforming random guessing by 40 to 48 percent. The fraud lexicon proves valuable for "bag-of-words" analysis, benefiting researchers, practitioners, auditors, regulators, and investors in enhancing fraud risk assessment procedures.

