Conversation
| raw=False, | ||
| samples=1, | ||
| padding=0, | ||
| multivariate_allowed_symbols = [], |
There was a problem hiding this comment.
add multivariate_allowed_symbols to the docstrings above
| @@ -0,0 +1,72 @@ | |||
| from .multivariate_formatting import MultivariateFormattingMethod | |||
There was a problem hiding this comment.
we rely on absolute imports rather than relative in our packaging:
| from .multivariate_formatting import MultivariateFormattingMethod | |
| from sigllm.primitives.formatting.multivariate_formatting import MultivariateFormattingMethod |
| from .multivariate_formatting import MultivariateFormattingMethod | ||
| import numpy as np |
There was a problem hiding this comment.
typically we follow the following structure for imports:
# python inherent libraries (e.g. import os)
# 3rd party libraries (e.g. import numpy)
# this library (e.g. import sigllm)this is google python style coding, so in your case it will be:
import numpy as np
from sigllm.primitives.formatting.multivariate_formatting import MultivariateFormattingMethod| if __name__ == "__main__": | ||
| method = DigitInterleave(digits_per_timestamp=3) | ||
| method.test_multivariate_formatting_validity(verbose=False) | ||
| errs, y_hat, y = method.run_pipeline(return_y_hat=True) | ||
| print(errs) | ||
| print(y_hat) | ||
| print(y) No newline at end of file |
There was a problem hiding this comment.
after you finish testing, this can be removed.
| }) | ||
|
|
||
|
|
||
| def run_pipeline(self, data=create_test_data(), |
There was a problem hiding this comment.
what's the purpose of this method? It can be removed or moved to utils since it doesn't belong in formatting
There was a problem hiding this comment.
can you remove this file from the PR? I don't think it's related.
There was a problem hiding this comment.
rename it to multivariate-detector-pipeline
There was a problem hiding this comment.
Can you make it an end-to-end tutorial of using the pipeline? In addition to the new formatting, you can have a full detection process and show the anomalies.
sarahmish
left a comment
There was a problem hiding this comment.
Overall the PR is great in terms of functionality but still needs to be cleaned up, I have a few comments about different aspects.
1. Unittests
Unittest are a great way to ensure the validity of your code and making sure that overtime the function is behaving as expected even if a underlying dependency changes its behavior, we can immediately catch it when we have solid unittests.
There are existing tests provided under tests/primitives that you can mimic to create your tests, typically I like there to be 3 blocks in a function:
def test_example():
# setup, here you create your variables and instances.
# run, here you run the function you want to test.
# assert, here you check that the expected value matches the output.This will make readability of the test function easier.
2. Docstrings
A couple of things should be considered regarding the docstrings for this and other PRs as well:
- The
-can be removed when listing theArgs, so line starts directly with the argument name. - A
Returnsblock should be added listing the return type and a description in the next line. - A blank line should always exist between the first line and the rest, and also before Args and Returns
Here's the recommended way of having docstrings.
"""Short description in a single line ending with a dot.
Longer description that can span across multiple lines. Longer
description that can span across multiple lines. Longer description
that can span across multiple lines.
Args:
arg_name (arg_type):
argument description.
arg_name (arg_type):
Argument description that spans across multiple lines. Argument
description that spans across multiple lines. Argument description
that spans across multiple lines.
Returns:
return_type:
description of the returned objects
3. Unnecessary files
This applies to this PR only. tutorials/pipelines/detector-pipeline.ipynb should not be changed in this PR.
| padding (int): | ||
| Additional padding token to forecast to reduce short horizon predictions. | ||
| Default to `0`. | ||
| """ |
There was a problem hiding this comment.
multivariate_allowed_symbols should be added to the Args docstrings here
| results_by_step = {step: [] for step in steps_ahead} | ||
|
|
||
| for window in X: | ||
| step_samples = {step: [] for step in steps_ahead} |
There was a problem hiding this comment.
If you want to setup a dictionary with an empty list, you can do so using:
from collections import defaultdict
step_samples = defaultdict(list)Then any key in the dictionary will have an empty list by default.
| }) | ||
|
|
||
|
|
||
| def test_multivariate_formatting_validity(method, verbose=False): |
There was a problem hiding this comment.
This function, along with create_test_data, can be moved to unit tests since that's what it's doing.
Create a file in test/primitives/formatting/test_{file_name}.py e.g. test/primitives/formatting/test_json_format.py.
There you can test if format_as_string and format_as_integer are returning the expecting output. I would strongly recommend doing this for every formatting method.
Resolve #57
Added a multivariate detector pipeline with various formatting methods.