Skip to content

Conversation

@RalfJung
Copy link
Member

@RalfJung RalfJung commented Jul 31, 2023

As discussed by @rust-lang/opsem in rust-lang/opsem-team#9: instead of saying that *ptr is insta-UB for "bad" pointers, only have UB when the place is actually accessed, and when place projections go out-of-bounds.

This makes the following code (which currently has UB) legal:

let ptr = 23 as *const i32;
// `ptr` is not aligned and doesn't point to an actual allocation.
// Still, this is allowed:
let ptr = addr_of!(*ptr);
// Even with a null pointer, this is allowed:
let ptr = addr_of!(*(0 as *const i32));

However, the following code is still UB:

// Accessing (loading from or storing to) a place that is dangling or unaligned
let mem = 0u32;
let odd_ptr = addr_of!(mem).byte_add(1).cast::<u16>();
let _val = *odd_ptr; // pointer is 1-aligned but must be 2-aligned
// Performing a place projection that violates the requirements of in-bounds pointer arithmetic
let ptr = 23 as *const (i32, i32);
let ptr = addr_of!((*ptr).1); // pointer arithmetic not inside the bounds of an allocation

@ehuss ehuss added the T-opsem Team: opsem label Jul 31, 2023
@RalfJung RalfJung added the T-lang Team: Lang label Jul 31, 2023
@RalfJung
Copy link
Member Author

RalfJung commented Jul 31, 2023

I am not sure if this counts as a "small" decision that t-opsem can just make without t-lang involvement, so I'll add both teams. This is an irreversible decision after all and a change to the core "list of UB", that seems quite fundamental.

The justification for why we think this first example should not be UB is mostly that many people are confused about the entire place expression business, so they don't realize addr_of!(*ptr) is "dereferencing" a pointer. There's also not a clear benefit from making it UB -- our current codegen is already completely compatible with this change, the UB we are removing is only recognized by Miri but never exploited by the compiler.

The justification for keeping the later 2 examples UB is that our current codegen actually does exploit both of them, so allowing them comes at a cost. The first example should be rather uncontroversial; the fact that every access has to be in-bounds and aligned has never been up for debate. The second example is probably less clear, but we do want to keep using getelementptr inbounds for place projections, so we have to make this UB. (Maybe we will come back to this later and weaken place projections from "inbounds" to "nowrap". But for now we don't see a good motivation for doing this, and without a good motivation it is unlikely someone is going to put in the work of implementing those semantics in the codegen backend. LLVM currently only supports "inbounds" and "wrapping", and we definitely don't want "wrapping".)

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Jul 31, 2023

Team member @RalfJung has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns.
See this document for info about what commands tagged team members can give me.

@JakobDegen
Copy link

Fwiw I don't expect this example to have needed T-lang signoff, but there's no harm.

@rfcbot reviewed

@saethlin
Copy link
Member

saethlin commented Aug 1, 2023

@rfcbot reviewed

@RalfJung
Copy link
Member Author

RalfJung commented Aug 1, 2023

rust-lang/rust#114330 implements this change in Miri. I found some other interesting cases that are allowed now.

A mere "place mention" of a dangling place is no longer UB -- that is in line with addr_of! no longer being UB here:

fn deref_invalid_underscore() {
    unsafe {
        let _ = *ptr::invalid::<i32>(0);
        let _ = *ptr::invalid::<i32>(1);
    }
}

If xptr only partially points to memory, and (*xptr).1 accesses the inbounds part, then this is now allowed -- both the place projection (pointer arithmetic) and the load from that place are fully inbounds, so there's nothing to cause UB here:

fn deref_partially_dangling() {
    let x = (1, 13);
    let xptr = &x as *const _ as *const (i32, i32, i32);
    let val = unsafe { (*xptr).1 };
    assert_eq!(val, 13);
}

@RalfJung
Copy link
Member Author

RalfJung commented Aug 1, 2023

Ah, and here is another fun one that is allowed now:

fn deref_too_big_slice() {
    unsafe {
        let slice: *const [u8] = mem::transmute((1usize, usize::MAX));
        let _val = addr_of!(*slice);
    }
}

addr_of!(*slice) doesn't check the wide pointer metadata for consistency any more.

@lukas-code
Copy link
Member

fn deref_partially_dangling() {
    let x = (1, 13);
    let xptr = &x as *const _ as *const (i32, i32, i32);
    let val = unsafe { (*xptr).1 };
    assert_eq!(val, 13);
}

Have we decided that the order of the fields in the tuple (i32, i32, i32) is guaranteed? In the UCG it specifically says that tuple fields can be reordered, such that .1 could actually be the third field in memory and out-of-bounds.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 1, 2023

No we have not decided that, the proper version of that test would use a repr(C) struct. I was just lazy...

@scottmcm
Copy link
Member

scottmcm commented Aug 6, 2023

This makes good sense to me. Having addr_of!(*p) be a no-op seems like what most people would expect.

@digama0
Copy link

digama0 commented Aug 7, 2023

@rfcbot reviewed

1 similar comment
@CAD97
Copy link

CAD97 commented Aug 7, 2023

@rfcbot reviewed

@tmandry
Copy link
Member

tmandry commented Aug 17, 2023

This makes good sense to me. Having addr_of!(*p) be a no-op seems like what most people would expect.

I agree with this and don't have any blocking concerns over this change specifically. But I am concerned that this will lead users to a mental model where place projections inside addr_of! (the last example in the description) are also a no-op. I found it surprising that the latter example is UB if the first example is not.

At a high level I would like us to strongly prefer simple rules for what is and isn't UB, wherever possible, and justify complexity as either inherently necessary or temporary. "*foo is insta-UB for bad pointers" is quite simple, despite being overly restrictive. "Only if actually accessed" is quite simple. "Only if actually accessed or a place projection goes out of bounds" is less so.

From the description it sounded like this condition might be temporary (and is due to a current limitation in LLVM), but if we could wave a magic wand and fix the LLVM limitation, are we sure we would?

@saethlin
Copy link
Member

saethlin commented Aug 18, 2023

I am concerned that this will lead users to a mental model...

I agree, generally, but I think this is a problem of documentation. Currently we have essentially no documentation for what valid uses of addr_of! are. Our documentation for this currently just punts to "the usual rules" while never defining that as a means to avoid specifying what uses are and aren't valid, because we haven't made decisions about the rules such as the one that this PR is about.

@scottmcm
Copy link
Member

"Only if actually accessed or a place projection goes out of bounds" is less so.

With any luck offset_of! will help here, since we'll be able to talk about how
addr_of!((*p).foo) is p.byte_add(offset_of!(Type, foo)).cast(), and how p.wrapping_byte_add(offset_of!(Type, foo)).cast() is available for cases where you don't want the UB from GEPi.

@RalfJung
Copy link
Member Author

At a high level I would like us to strongly prefer simple rules for what is and isn't UB, wherever possible, and justify complexity as either inherently necessary or temporary. "*foo is insta-UB for bad pointers" is quite simple, despite being overly restrictive.

Those are the rules we currently have. I also think they are simpler. But many, many people have been caught off-guard by the fact that *foo itself can be UB, even in a context like addr_of!(*foo). Yes, our docs are bad, but at this point I think we also have enough evidence that people strongly expect addr_of!(*foo) to never be UB, and it might be better to align the model with people's expectations -- even if that nominally makes the model more complicated, and given in particular that the extra UB does not actually seem that useful.

"Only if actually accessed" is quite simple. "Only if actually accessed or a place projection goes out of bounds" is less so.

You just have to separate alignment and inbounds:

@RalfJung
Copy link
Member Author

RalfJung commented Aug 18, 2023

From the description it sounded like this condition might be temporary (and is due to a current limitation in LLVM), but if we could wave a magic wand and fix the LLVM limitation, are we sure we would?

If this is talking about the requirement that field projections follow the rules of the offset method, then this is intended to be permanent. I can't imagine a good reason for why we'd want field projections and that method to behave differently.

We considered weakening the requirements of offset from inbounds to nowrap. However, we don't have any motivating usecase for this -- as in, we don't know any real code that people might write that would benefit from weakening the requirement. (The one exception is manual offset_of implementations but we consider that solved by the language-native macro.) Without a usecase, it's not worth trying to convince LLVM to add support for this attribute. Also, this weakening causes some trouble for offset as a const fn -- basically, in a const fn we cannot know if the pointer overflows or not since we don't know where in the address space it lies, so we have to still require inbounds there.

@rfcbot
Copy link

rfcbot commented Aug 18, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

psst @RalfJung, I wasn't able to add the final-comment-period label, please do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants