Skip to content

Calculation of outlier score in series_outlier method #146

@Lopa2016

Description

@Lopa2016

I want to implement the series_outlier method in Python & used the following code

import pandas as pd
import numpy as np
from scipy.stats import norm

Load the data into a DataFrame

data = {
'series': [67.95675, 58.63898, 33.59188, 4906.018, 5.372538, 702.1194, 0.037261, 11161.05, 1.403496, 100.116]
}
df = pd.DataFrame(data)

Function to calculate the outlier score based on custom percentiles

def custom_percentile_outliers(series, p_low=10, p_high=90):

Calculate custom percentiles

percentile_low = np.percentile(series, p_low)
percentile_high = np.percentile(series, p_high)

# Calculate Z-scores for the percentiles assuming normal distribution
z_low = norm.ppf(p_low / 100)

z_high = norm.ppf(p_high / 100)

Calculate normalization factor

normalization_factor = (2 * z_high - z_low) / (2 * z_high - 2.704)

Calculate outliers score

return series.apply(lambda x: (x - percentile_high) / (percentile_high - percentile_low) * normalization_factor
if x > percentile_high else ((x - percentile_low) / (percentile_high - percentile_low) * normalization_factor
if x < percentile_low else 0))

Apply the custom percentile outlier scoring function

df['outliers'] = custom_percentile_outliers(df['series'], p_low=10, p_high=90)

Display the DataFrame with outliers

print(df)
And getting the following results for the series

 series   outliers

0 67.956750 0.000000 1 58.638980 0.000000 2 33.591880 0.000000 3 4906.018000 0.000000 4 5.372538 0.000000 5 702.119400 0.000000 6 0.037261 0.006067 7 11161.050000 -27.776847 8 1.403496 0.000000 9 100.116000 0.000000

While with the series_outlier function I get the below results enter image description here

I referred the github article #136 & also tried implementing & manually calculating with the help of the solution given on stackoverflow - How does Kusto series_outliers() calculate anomaly scores?

I am probably going wrong with the normalization score calculation. Would be great if someone can help

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions