Categories
Basic Tutorials

POPULATION STABILITY INDEX AND CHARACTERISTIC ANALYSIS

Use of Population Stability Index (PSI)

There are multiple uses of Population Stability Index (PSI). They are listed below –

Model might be influenced by economic changes. Suppose you built a risk model during economic recession (year 2008) and you are using the same model to score datasets in year 2016. There is a high chance that various attributes of the model are changed drastically over last 8 years. It means it does not make sense to use this model anymore if features of the model are changed significantly.

Change in product offerings due to internal policy changes. For example, one of your product are relaunched recently so attributes may behave differently as compared to attributes of your model.

PSI can detect if any data integration or programming issues to run the scoring code.

How PSI is calculated?

PSI = (% of records based on scoring variable in Scoring Sample (A) – % of records based on scoring variable in Training Sample (B)) * In(A/ B)

Steps

  1. Sort scoring variable on descending order in scoring sample
  2. Split the data into 10 or 20 groups (deciling)
  3. Calculate % of records in each group based on scoring sample
  4. Calculate % of records in each group based on training sample
  5. Calculate difference between Step 3 and Step 4
  6. Take Natural Log of (Step3 / Step4)
  7. Multiply Step5 and Step6

Rules

  1. PSI < 0.1 – No change. You can continue using existing model.
  2. PSI >=0.1 but less than 0.2 – Slight change is required.
  3. PSI >=0.2 – Significant change is required. Ideally, you should not use this model any more.

To understand the cause of a change, we need to generate the characteristic analysis report.

Characteristic Analysis

It answers which variable is causing a shift in population distribution. It compares the distribution of an independent variable in the scoring data set to a development data set. It detects shifts in the distributions of input variables that are submitted for scoring over time.

It helps to determine which changing variable is most influential in causing the model score shift.

Most Important –
Check the direction of impact due to model variable shifts.

Check the signs of the shifted attributes and the average values of those attributes compared to those from the previously scored population or development sample. This will indicate whether the model attribute shifts are increasing or decreasing the model scores.

Leave a Reply

Your email address will not be published. Required fields are marked *