Skip to content

Multivariate detector#52

Open
abaranov25 wants to merge 18 commits intosintel-dev:masterfrom
abaranov25:Multivariate-Detector
Open

Multivariate detector#52
abaranov25 wants to merge 18 commits intosintel-dev:masterfrom
abaranov25:Multivariate-Detector

Conversation

@abaranov25
Copy link
Collaborator

@abaranov25 abaranov25 commented Dec 10, 2025

Resolve #57
Added a multivariate detector pipeline with various formatting methods.

raw=False,
samples=1,
padding=0,
multivariate_allowed_symbols = [],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add multivariate_allowed_symbols to the docstrings above

@@ -0,0 +1,72 @@
from .multivariate_formatting import MultivariateFormattingMethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we rely on absolute imports rather than relative in our packaging:

Suggested change
from .multivariate_formatting import MultivariateFormattingMethod
from sigllm.primitives.formatting.multivariate_formatting import MultivariateFormattingMethod

Comment on lines +1 to +2
from .multivariate_formatting import MultivariateFormattingMethod
import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically we follow the following structure for imports:

# python inherent libraries (e.g. import os)

# 3rd party libraries (e.g. import numpy)

# this library (e.g. import sigllm)

this is google python style coding, so in your case it will be:

import numpy as np

from sigllm.primitives.formatting.multivariate_formatting import MultivariateFormattingMethod

Comment on lines +66 to +72
if __name__ == "__main__":
method = DigitInterleave(digits_per_timestamp=3)
method.test_multivariate_formatting_validity(verbose=False)
errs, y_hat, y = method.run_pipeline(return_y_hat=True)
print(errs)
print(y_hat)
print(y) No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after you finish testing, this can be removed.

})


def run_pipeline(self, data=create_test_data(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of this method? It can be removed or moved to utils since it doesn't belong in formatting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this file from the PR? I don't think it's related.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename it to multivariate-detector-pipeline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make it an end-to-end tutorial of using the pipeline? In addition to the new formatting, you can have a full detection process and show the anomalies.

@abaranov25 abaranov25 requested a review from sarahmish February 17, 2026 23:05
@abaranov25 abaranov25 self-assigned this Feb 19, 2026
Copy link
Contributor

@sarahmish sarahmish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the PR is great in terms of functionality but still needs to be cleaned up, I have a few comments about different aspects.

1. Unittests

Unittest are a great way to ensure the validity of your code and making sure that overtime the function is behaving as expected even if a underlying dependency changes its behavior, we can immediately catch it when we have solid unittests.

There are existing tests provided under tests/primitives that you can mimic to create your tests, typically I like there to be 3 blocks in a function:

def test_example():
    # setup, here you create your variables and instances.

    # run, here you run the function you want to test.

    # assert, here you check that the expected value matches the output.

This will make readability of the test function easier.

2. Docstrings

A couple of things should be considered regarding the docstrings for this and other PRs as well:

  • The - can be removed when listing the Args, so line starts directly with the argument name.
  • A Returns block should be added listing the return type and a description in the next line.
  • A blank line should always exist between the first line and the rest, and also before Args and Returns

Here's the recommended way of having docstrings.

"""Short description in a single line ending with a dot.

Longer description that can span across multiple lines. Longer
description that can span across multiple lines. Longer description
that can span across multiple lines.

Args:
    arg_name (arg_type):
        argument description.
    arg_name (arg_type):
        Argument description that spans across multiple lines. Argument
        description that spans across multiple lines. Argument description
        that spans across multiple lines.

Returns:
    return_type:
        description of the returned objects

3. Unnecessary files

This applies to this PR only. tutorials/pipelines/detector-pipeline.ipynb should not be changed in this PR.

padding (int):
Additional padding token to forecast to reduce short horizon predictions.
Default to `0`.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multivariate_allowed_symbols should be added to the Args docstrings here

results_by_step = {step: [] for step in steps_ahead}

for window in X:
step_samples = {step: [] for step in steps_ahead}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to setup a dictionary with an empty list, you can do so using:

from collections import defaultdict

step_samples = defaultdict(list)

Then any key in the dictionary will have an empty list by default.

})


def test_multivariate_formatting_validity(method, verbose=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function, along with create_test_data, can be moved to unit tests since that's what it's doing.
Create a file in test/primitives/formatting/test_{file_name}.py e.g. test/primitives/formatting/test_json_format.py.

There you can test if format_as_string and format_as_integer are returning the expecting output. I would strongly recommend doing this for every formatting method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multivariate Detector Pipeline

2 participants