The Predictive Power Score was developed by Florian Wetschoreck and the team at 8080 Labs in order to improve upon correlation metrics.
Correlation is limited because it will miss non-linear relationships (for example, a quadratic relationship charting daily temperature and theme park ticket sales, a step function that represents the ticket price of an amusement against the number of people waiting in line, or the gaussian function used at the “Guess Your Weight” carnival game). Any relationships related to categorical variables will also be missed by a correlation matrix.
Moreover, correlation lacks the capability to provide information about asymmetry of a relationship. For example, knowing a customer’s favorite part of the park might not predict their favorite ride, but knowing their favorite ride would have much stronger predictive power for evaluating their favorite part of the park.
By contrast, the Predictive Power Score can detect non-linear effects, automatically encodes categorical variables, and quantifies asymmetry. It computes predictive relationships between pairs of columns and provides a score ranging from 0 to 1.
To use, simply import ppscore as pps
and call pps.matrix(df)
.