Refactor averaging.py and manipulations.py to use quantities#207
Refactor averaging.py and manipulations.py to use quantities#207DrPaulSharp wants to merge 6 commits into
Conversation
b149a6b to
d0e7867
Compare
d0e7867 to
8d3eb43
Compare
krzywon
left a comment
There was a problem hiding this comment.
Most of my comments are related to three concepts.
- The differences between
three_dimvstwo_dimdata and how the former would be handled in the current implementation. Ideally, the Qz values would be used if they exist. - Variances and taking the square root of them versus using the bare values.
- x * x vs x ** 2
Overall, this is a good port, but bypasses the mesh paradigm added in the sasdata.slicing package. Where do you see that coming into play?
| dqx_data = np.sqrt(data2d._data_contents["Qx"].variance.value) | ||
| dqy_data = np.sqrt(data2d._data_contents["Qy"].variance.value) |
There was a problem hiding this comment.
Why the square root of the variance? Aren't the variances already the dq values?
There was a problem hiding this comment.
From line 1579 of quantity.py we have: self._variance = standard_error**2, where standard_error is the input of a Quantity object. I'll look through and see if there is the opportunity to refactor and use variances here if possible, but my understanding is that this is the correct transcription of the existing code.
There was a problem hiding this comment.
Okay, then you are doing the correct operation here, but from a fundamental standpoint, by taking the square of the standard error, aren't we changing the value of the data that should be immutable? And be squaring the data, how do we know which root is correct, the positive or the negative. Also, if future developers aren't aware of this, taking the raw variance will create an issue in our error propagation which will be difficult to debug.
Fundamentally, I'm not sure I agree with the underlying structure for many reasons, but this is something we can discuss.
There was a problem hiding this comment.
Yes, I'm having the same thoughts. I don't understand why the Quantity object takes in an error and records a variance. I'll ask around and see what I can find, and we can discuss.
There was a problem hiding this comment.
Quantity does have a standard_deviation method, which is:
def standard_deviation(self) -> "Quantity":
return self.variance**0.5
which just seems to add to the mystery of why it's structured like this.
There was a problem hiding this comment.
In error propagation, the variances are used most often, so I think I understand the reasoning now, but we should be storing the standard_error as-presented, for reproducibility reasons.
| mask_all = data2D.mask[finite_mask] | ||
| data_all = data2D.ordinate.value[finite_mask] | ||
| qx_all = data2D._data_contents["Qx"].value[finite_mask] | ||
| qy_all = data2D._data_contents["Qy"].value[finite_mask] |
There was a problem hiding this comment.
You shouldn't be peeking at the internals of SasData from outside the class. You could make data_contents a public attribute. Since users of the class need to compose data_content for the constructor this is appropriate. Or use data2D.abscissae to get an array of (qx, qy) points.
Note: data2D.abscissae makes a copy but data2D.ordinate does not. This will trip somebody up some day. Also, properties should not be doing a lot of work. Maybe turn abscissae into function call, or maybe cache it.
Your type system allows calling with 1D data. Are you sure you don't want a subclass of SasData that you know to be 2D? Then you can have qx,qy accessors and the (qx,qy) -> (q, phi) transforms built into the class.
For q_all below, you already have masked qx_all and qy_all. If you are not using abscissae with the 2-norm or a SasData2D subclass with a q method you can use qx_all and qy_all to calculate q.
Moving the masking operation into SasData seems appropriate. It is an operation that many users of the data will want. You want to be clear on the copy semantics: is it masking in place, or returning a masked copy, even if the copy contains everything.
Comments on the preexisting code:
It is suspicious that this slicer has an isfinite mask but others do not.
Why the _all tag on the variable names?
There was a problem hiding this comment.
You shouldn't be peeking at the internals of SasData from outside the class. You could make
data_contentsa public attribute. Since users of the class need to composedata_contentfor the constructor this is appropriate. Or usedata2D.abscissaeto get an array of (qx, qy) points.Note:
data2D.abscissaemakes a copy butdata2D.ordinatedoes not. This will trip somebody up some day. Also, properties should not be doing a lot of work. Maybe turnabscissaeinto function call, or maybe cache it.
This PR has shown that we need to refactor the SasData object. I am planning on doing that next in a separate PR.
Your type system allows calling with 1D data. Are you sure you don't want a subclass of SasData that you know to be 2D? Then you can have qx,qy accessors and the (qx,qy) -> (q, phi) transforms built into the class.
Given the SasData Design Decisions, we do want the Data to be duck-typed. At present SasData has a dataset_type quantity that we can test to ascertain whether or not data is 2D. I'll keep this in mind going forward.
For
q_allbelow, you already have maskedqx_allandqy_all. If you are not using abscissae with the 2-norm or a SasData2D subclass with a q method you can use qx_all and qy_all to calculate q.
Good plan, makes sense.
Comments on the preexisting code:
It is suspicious that this slicer has an isfinite mask but others do not.
Why the
_alltag on the variable names?
The other slicers get their isfinite masks through a call to the validate_and_assign_data routine in roi.py. This slicer is done in this way instead to preserve the ismask flag, which was the call made in #47. No idea on the _all tag. I'll remove it now, I wanted to keep things like this in place at first to make reviewing the ported code easier.
There was a problem hiding this comment.
No quality gates enabled for this code.
See analysis details in CodeScene
Quality Gate Profile: Custom Configuration
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.
|
Thanks @krzywon and @pkienzle for your reviews. I propose merging this PR soon, and working on a new PR to refactor the Good point about the |
This PR updates the modules refactored in #47 to use
Quantityrather thanData1D/Data2Dobjects. It should serve as an example of how we use these objects in the refactoring project.