Definition:Causal discovery

🧭 Causal discovery encompasses a family of algorithms and statistical methods designed to infer the structure of causal relationships among variables directly from observational data, rather than relying solely on pre-specified domain knowledge or experimental manipulation. In the insurance industry, where vast quantities of claims, policy, and behavioral data are collected but controlled experiments are rarely feasible, causal discovery offers a path toward understanding the true drivers of outcomes like claims frequency, severity, policyholder lapse, and fraud — moving beyond the correlational patterns that machine learning models typically exploit.

🔬 The main algorithmic families include constraint-based methods (such as the PC algorithm and FCI algorithm), which use conditional independence tests to eliminate edges in a causal graph; score-based methods (such as GES), which search over possible directed acyclic graph structures to maximize a fit criterion; and functional causal model approaches that exploit asymmetries in the data-generating process to determine causal direction. In an insurance setting, a data science team at a property insurer might apply causal discovery to a dataset containing building characteristics, maintenance records, weather patterns, and claims history to uncover which variables causally influence loss outcomes — and which merely co-occur due to shared upstream causes. The output is typically a partially directed graph that reveals both confirmed causal edges and ambiguous relationships requiring further investigation. This graph can then guide variable selection for pricing models, inform risk selection strategies, or identify previously unrecognized confounding factors that distort simpler analyses.

🌍 The insurance industry's growing interest in causal discovery is driven by both analytical and regulatory imperatives. Traditional actuarial modeling has relied heavily on expert judgment to specify which variables matter and how they interact, which works well in stable, well-understood lines but struggles with emerging risks such as cyber or pandemic-related exposures, where historical intuition may be limited. Causal discovery can surface unexpected relationships — for example, revealing that a particular policyholder behavior causally mediates the relationship between a demographic variable and loss, suggesting the demographic variable may be a proxy rather than a direct cause. This distinction has significant fairness and regulatory implications, as supervisory authorities across jurisdictions are increasingly scrutinizing whether rating factors used in underwriting models have genuine causal links to risk or merely encode protected characteristics through indirect pathways. By making the causal structure of insurance data explicit and testable, causal discovery serves as a bridge between data-driven modeling and the interpretability that insurers, regulators, and consumers require.

Related concepts: