Peng Shi - University of Wisconsin-Madison
21 May 2021
A unique feature in nonlife insurance risk classification: rating variables are categorical and many have a large number of levels
The high cardinality in the categorical rating variables imposes challenges in the implementation of the traditional actuarial methods
A unique feature in nonlife insurance risk classification: rating variables are categorical and many have a large number of levels
The high cardinality in the categorical rating variables imposes challenges in the implementation of the traditional actuarial methods
In particular, the generalized linear models (GLMs) have some difficulties
Unrealistic amount of computational resource due to the high-dimensional design matrix
The higher likelihood of insufficient data in some categories of the rating variable
The relationship between different levels of the rating variable is usually ignored
We present several actuarial applications of categorical embedding in the context of nonlife insurance risk classification.
Single insurance risk
Dependent insurance risks
Pricing new risks with sparse data
Â
Based on paper:
P. Shi., K. Shi, 2021, Nonlife Insurance Risk Classification Using Categorical Embedding. Available at SSRN.
The idea is due to Guo and Berkhahn (2016). The method maps each categorical variable into a real-valued representation in the Euclidean space.
In the embedding space, the categories with similar effects are close to each other. Similar to word embedding in natural languge processing.
The idea is due to Guo and Berkhahn (2016). The method maps each categorical variable into a real-valued representation in the Euclidean space.
In the embedding space, the categories with similar effects are close to each other. Similar to word embedding in natural languge processing.
For the categorical variable \(x\) with \(K\) levels, the embedding function of \(d\)-dimensional embedding space is given by: \[\begin{align} e: x \mapsto \bf{\Gamma} \times \bf{\delta}, \end{align}\]
The \(k\)th category is represented by the \(k\)th column of \(\bf{\Gamma}\). To see this, for the \(i\)th data point with \(x_i=c_k\), we note: \[\begin{align} e(x_i) = \left( \begin{array}{ccc} \gamma_{11} & \cdots & \gamma_{1K} \\ \vdots & \ddots & \vdots \\ \gamma_{d1} & \cdots & \gamma_{dK} \\ \end{array} \right) \times \left( \begin{array}{c} \delta_{x_i,c_1} \\ \vdots \\ \delta_{x_i,c_K} \\ \end{array} \right) = \left( \begin{array}{c} \gamma_{1k} \\ \vdots \\ \gamma_{dk} \\ \end{array} \right). \end{align}\]
The embeddings can be automatically learned by a neural network in the supervised training process.
Add an embedding layer, an extra layer between the input layer and the hidden layer, in the neural network
Treat the embedding matrix as the weight parameters of the embedding neurons
We emphasize that categorical embedding is especially useful in two scenarios:
The insurance claims dataset is obtained from the local government property insurance fund of Wisconsin
We examine the building and contents insurance that covers damage to both physical structures and items inside
There are over one thousand entities observed during years 2006-2013, resulting in 8,880 policy-year observations.
Description of rating variables
We consider a binary outcome that measures the claim frequency by peril
Claim frequency outcomes are dependent:
In this case, we consider the context where there is a single insurance risk:
Treat the open-peril property insurance as an umbrella policy
Define the claim frequency as a risk measurement for the aggregate claims from all peirls
We fit neural networks:
One-hot encoding
Categorical embedding
Some results on prediction:
We could also use the embeddings to create risk classes:
In this case, we consider a model for multi-peril risks
Let \(Z_j\) be the outcome for peril \(j\). We formulate the problem as a multi-output network for the vector \(Y=(Z_1,Z_2,Z_3)\)
We use the dependence ratio to describe the raltionship among perils
\[\begin{align} \rho(z_1,z_2,z_3) = \frac{{\rm Pr}(Z_1=z_1,Z_2=z_2,Z_3=z_3)}{{\rm Pr}(Z_1=z_1){\rm Pr}(Z_2=z_2){\rm Pr}(Z_3=z_3)} \end{align}\]
In this case, we consider a model for multi-peril risks
Let \(Z_j\) be the outcome for peril \(j\). We formulate the problem as a multi-output network for the vector \(Y=(Z_1,Z_2,Z_3)\)
We use the dependence ratio to describe the raltionship among perils
\[\begin{align} \rho(z_1,z_2,z_3) = \frac{{\rm Pr}(Z_1=z_1,Z_2=z_2,Z_3=z_3)}{{\rm Pr}(Z_1=z_1){\rm Pr}(Z_2=z_2){\rm Pr}(Z_3=z_3)} \end{align}\]
We consider two types of insurance coverage, the stop-loss insurance and the excess-of-loss insurance. The insurer’s retained loss can be represented as: \[\begin{align*} {\rm Stop ~loss}:& ~R_1 = \min\{S,d_1\}\\ {\rm Excess~ of~ loss}:& ~R_2 = \max\{S-d_2,0\} \end{align*}\]
Suppose that the insurer has only provided coverage for water and other perils during years 2006-2011. Starting from year 2012, the insurer plans to offer fire coverage as well.
We demonstrate the idea of transfer learning using the categorical variable county.
Suppose that the insurer has only provided coverage for water and other perils during years 2006-2011. Starting from year 2012, the insurer plans to offer fire coverage as well.
We demonstrate the idea of transfer learning using the categorical variable county.
Comparison of similarity matrix