Modeling extreme cases with peaks over threshold - 1
Overview of extreme statistics
What we want to model
Lets say you want to model a rainfall that comes every 250 years. Enough rainfall would lead to floods, and would then lead to damages to the shop owners or vehicle owners.
There are other extreme events besides rainfall. Those could be a stock market crash or any other events that does not happen a lot, but when it does, would have a huge impact.
How we can model extreme cases
Overall, when assessing the value of something, we use a Value At Risk (VAR) methods. There are parametric and non-parametric methods under VAR, and this article focuses on parametric methods.
"Parametric" means it uses parameters. In other words, we want to come out with a parameter so that the model based on those parameters would appropriately represent the data.
For modeling extreme cases, we could choose between two methods.
- Block maxima
- Peaks over threshold
Both methods aim to fit a distribution to extreme cases, which looks like this:
A normal distribution won't be a good fit since it doesn't take much consideration on the far right or left data points. Thus we cut the tail of the distribution.
Block maxima
The block maxima method chooses the maximum value for each time span e.g. year. The data points extracted like this tends to follow a GEV (Generalized Extreme Value) distribution. However, the extimated Value At Risk calculated based on this method tends to be unstable according to the specified time span.
We will focus on a more stable method; peaks over threshold.
Peaks over threshold
As the name mentiones, this methods sets a threshold and tries to fit a distribution for data points that goes over that threshold.
![]()
Then it would mean that choosing the right threshold is critical in the process of this method.
Here are the methods introduced for choosing the appropriate threshold.
| Paper | Proposed way of setting the threshold |
|---|---|
| Davison and Smith (1990) / Nefci (2000) | Set it as (standard dev * 1.65) |
| Danielsson and de Vires (1997) | Set threshold based on MSE |
| Cole S. (2001) | Use mean excess graph and Hill graph |
| Massahiro F. and Yasufumi S. (2002) | Value where it divides 4~6% of the total data |
| Christoffersen (2003) | Value where it divides 5% of the total data |
| Embrechts (2003) | Where the excess mean plot is linear |
The hard part of modeling extreme values is that there are only a few data points. This makes it hard to determine which part of the tail we should 'cut'. Naturally, it becomes important to find parameters for a distribution that can have stable results with minimal loss.
In the next post, we'll look into the mathematical details of peaks overthreshold.