BeeFlix is using percentiles: why and how

post-thumb

What are percentiles?

A percentile is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. For example, the 20th percentile is the value (or score) below which 20% of the observations may be found ( Wikipedia source ).

Why percentiles?

Mean and median tend to hide outliers. In contrast, the max is easily distorted by a single outlier. You can find explanation and examples here and here .


BeeFlix percentiles with the analyze module

PostgreSQL Performance Analyze module
Each resource is displayed with percentiles. In the above example, we can see that at 18h the Network throughput is pretty stable (no big jump between the max, 99th and 50th percentiles).

On the other hand, you can see in the following:

PostgreSQL Performance IOs percentiles

that the IOS throughput is more than 5.1 GB per second about 1% of the time at 22h and is less that 2.5 GB per second 90% of the time.

In the analyze module, the percentiles help to spot the outliers and to get a clear picture of the resources consumption of your databases.


BeeFlix percentiles with the simulation module


PostgreSQL Performance heatmap simulation

In the above example, we can see that an Exadata X7-2 1/8 (with 96 CPUs) is not able to handle the max CPU load observed (the Thursday at 22h (need 99 CPUs)).

Should I try another Exadata configuration? What risk do I take if I order this configuration with 96 CPUs?

To reply to the last question, let’s have a look to the 99th and 90th percentiles.


99th percentile


PostgreSQL Performance 99th percentiles

90th percentile


PostgreSQL Performance 90th percentiles

It means that, on a Exadata X7-2 1/8 with 96 CPUs:

  • 90% of the time the databases would use less that 39 CPUs.
  • 99% of the time the databases would use less that 84 CPUs (on Thursday 22h and even less the other hours).
  • The risk of having your databases requesting more than 96 CPUs occurs only on Thursday at 22h and less than 1% of the time.

So what?


  • Thanks to the percentiles you have a clear view of your databases workload in the analyze module.
  • The percentiles being used in the simulation module help to see the percentage of time the workload would fit on the destination machine. They also help to see the risk you take in case the workload does not fit 100% of the time.
  • The percentiles are also used in the optimization module

Beeflix team.