Definition:Directed acyclic graph (DAG)

📋 Directed acyclic graph (DAG) is a visual and mathematical tool used to represent causal assumptions about how variables relate to one another, consisting of nodes (variables) connected by directed edges (arrows) that indicate the assumed direction of causation, with no path that loops back on itself. In insurance analytics, DAGs serve as blueprints for causal reasoning — making explicit the assumptions an actuary or data scientist holds about how rating factors, policyholder behaviors, market conditions, and claims outcomes are connected. By laying out these assumptions transparently, a DAG reveals which confounders must be adjusted for, which variables should not be conditioned on, and whether a particular causal question is even answerable with the available data.

⚙️ Constructing a DAG requires domain knowledge rather than statistical estimation: the analyst draws arrows based on expert understanding of the data-generating process. Once the graph is in place, formal rules — most notably do-calculus and the backdoor criterion — determine the minimal set of variables that must be controlled for to obtain an unbiased causal estimate. Consider an insurer examining whether a new telematics program causally reduces claims frequency. The DAG might include nodes for policyholder age, driving experience, program enrollment, driving behavior (a mediator), and claims outcome, with arrows reflecting the analyst's beliefs about which variables influence which. The graph then reveals, for instance, that conditioning on driving behavior would block the very causal pathway the insurer wants to measure, while failing to adjust for age and driving experience would leave confounding bias intact. This kind of clarity prevents common analytical mistakes that purely data-driven approaches can miss.

💡 As predictive modeling and machine learning become embedded in insurance operations — from underwriting to fraud detection — DAGs offer a much-needed bridge between statistical sophistication and interpretability. Regulators across jurisdictions, including Solvency II supervisors, the UK's Financial Conduct Authority, and U.S. state insurance departments, increasingly expect insurers to articulate not just what their models predict but why variables are included and how they relate to the risk being priced. A well-constructed DAG provides exactly this rationale in a form that technical and non-technical stakeholders alike can scrutinize. It also disciplines model-building by preventing analysts from inadvertently introducing collider bias or other structural errors that can distort conclusions about loss ratios, adverse selection, or intervention effectiveness.

Related concepts: