-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10889: [Rust] [Proposal] Add guidelines about usage of unsafe
#8901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
unsafeunsafe
|
This is funny timing. I just filed ARROW-10889 for this and then saw this PR! |
Codecov Report
@@ Coverage Diff @@
## master #8901 +/- ##
==========================================
+ Coverage 75.48% 83.26% +7.78%
==========================================
Files 181 195 +14
Lines 41649 48066 +6417
==========================================
+ Hits 31439 40023 +8584
+ Misses 10210 8043 -2167
Continue to review full report at Codecov.
|
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks @jorgecarleitao !
|
It would be really cool to have a GitHub Action that detected |
|
Very cool proposal. Will take some time soon to read it in detail. I think currently there is a lot of code violating what is proposed, so that's clearly needed 👍 I think it also makes sense to keep track of the amount of Also I think the unsafe code can /should be more tested, e.g. by having more |
rust/arrow/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe 10% is a bit arbitrary and depends on context? Maybe it should.bd demonstrated / be likely that it has an impact on real world usage of arrow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this wording would be stronger without a specific bound. I would rather say something like "usage of unsafe for performance reasons is justified only when all other alternatives have been exhausted and the performance benefits are sufficiently large. Performance benefits should be quantified with a bench".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the sentiment.
My concern is that I am not sure I would like to maintain a complex unsafe code for a 6% improvement at this particular phase of the project.
The rational for a concrete number is to impose a bound that we consider to not have the capacity to handle the maintenance cost for "small" improvements, and thus people should not try to focus on those types of improvements (again, at this particular phase of the project).
My concern with "performance benefits are sufficiently large" is that anything bigger than zero is always "sufficiently large" when compared to zero.
What if we write something like
usage of unsafe for performance reasons is justified only when all other alternatives have been exhausted and the performance benefits are sufficiently large (>~10%)
so that we allow other values, but we offer a number for reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the improvement be expressed by the "hotness" of the code? I can imagine that a 6% improvement in the Buffers is much more valuable than a 11% percentage improvement in a specific kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like
usage of unsafe for performance reasons is justified only when all other alternatives have been exhausted and the performance benefits are sufficiently large (e.g. >~10%)
@ritchie46 I agree with the sentiments that some performance improvements are more important than others, which is why I was suggesting leaving room for interpretation in the writeup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it would be preferable to have some room to trade safety for performance and vice versa. A change that makes everything a few percent faster might be acceptable, while a 50% improvement on a micro benchmark might not in some cases. And in some cases you maybe want to reduce the amount of safety, but might want to accept a certain negative performance impact.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great idea -- thank you for writing it up @jorgecarleitao !
rust/arrow/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://doc.rust-lang.org/reference/behavior-considered-undefined.html might be a reasonable citation, though it doesn't quantify the sources of unsafetly.
rust/arrow/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest being more explicit here -- maybe "Omitting bounds checking for performance" or something
rust/arrow/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stylistically, I think we could remove the 'Usage of unsafe for ...' sentences. The rationale for use of unsafe is already explained above so this repetition seems redundant to me.
I also think such wording may give the impression that unsafe is always allowed for these operations, when I think the intent is "unsafe can be used as a last resort"
rust/arrow/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this wording would be stronger without a specific bound. I would rather say something like "usage of unsafe for performance reasons is justified only when all other alternatives have been exhausted and the performance benefits are sufficiently large. Performance benefits should be quantified with a bench".
|
@jorgecarleitao I am not sure what has happened to this PR but it now seemingly includes many changes that are unrelated to the readme improvements: |
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
|
Some mistake on my part. Fixed |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. @sunchao @paddyhoran or @nevi-me would you like to express an opinion on this proposal?
rust/arrow/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like
usage of unsafe for performance reasons is justified only when all other alternatives have been exhausted and the performance benefits are sufficiently large (e.g. >~10%)
@ritchie46 I agree with the sentiments that some performance improvements are more important than others, which is why I was suggesting leaving room for interpretation in the writeup
Dandandan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me the changes are clear enough and provide a good guideline for using unsafe.
|
I saw a tweet yesterday from @timClicks showing an example of how the imageproc project documents their unsafe code and I really like it. I think we should do something similiar. Maybe we could add this to the guidelines @jorgecarleitao ? // JUSTIFICATION
// Benefit
// Using checked indexing here makes bench_integral_image_rgb take 1.05x as long
// (The results are noisy, but this seems to be reproducible. I've not checked the generated assembly.)
// Correctness
// x and y are within bounds by definition of in_width and in_height
let input = unsafe { image.unsafe_get_pixel(x, y) }; |
Saw that too :) I think it's an interesting approach. In this PR #8900 I added a comment with "SAFETY", but that misses the benefit part. |
paddyhoran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks all for the suggestions. Really good stuff. I have incorporated all the changes:
I have used the term |
nevi-me
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any additional comments
|
I think this PR has enough approvals / 👍 so I merged it in. We can continue to iterate on it in the future but this is a great base to work from. Thank you @jorgecarleitao and everyone else who contributed. A true team effort 👍 |
|
Thanks all for working hard to keep Arrow as robust as possible |

I would like to propose that we outline and enforce guidelines on the arrow crate implementation with respect to the usage of
unsafe.The background of this proposal are PRs #8645 and #8829. In both cases, while addressing an unrelated issue, they hit undefined behavior (UB) due to an incorrect usage of
unsafein the code base. This UB was very time-consuming to identify and debug: combined, they accounted for more than 12hs of my time.Safety against undefined behavior is the core premise of the Rust language. In many cases, the maintenance burden (time to find and fix bugs) does not justify the performance improvements and the decrease in motivation in handling them (they are just painful due to how difficult they are to debug). In particular, IMO those 12 hours would have been better spent in other parts of the code if
unsafewould have not been used in the first place, which would have been likely translated in faster code or more features.There are situations where
unsafeis necessary, and the guidelines outline these cases. However, I also see many uses ofunsafethat are not necessary nor properly documented.The goal of these guidelines is to motivate contributors of the Rust implementation to be conscious about the maintenance cost of
unsafe, and outline specific necessary conditions for any newunsafeto be introduced in the code base.