Skip to content

Implement nom::Offset on BitPtr #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
myrrlyn opened this issue Jul 28, 2019 · 7 comments
Closed

Implement nom::Offset on BitPtr #21

myrrlyn opened this issue Jul 28, 2019 · 7 comments
Assignees

Comments

@myrrlyn
Copy link
Collaborator

myrrlyn commented Jul 28, 2019

Feature request by @fasterthanlime

Implementation notes:

  • pointer-to-pointer arithmetic is, according to C++ and thus likely LLVM,
    defined only between two pointers known to be in the same allocation region.
    (article 1) I do not know at this time what Miri thinks of pointer-to-pointer
    arithmetic, but I strongly suspect, given Miri’s implementation of pointers in
    its evaluator model (article 2) that this is suspect at best and a candidate
    for rejection at worst.

    Given that the rest of the crate is also UB in Miri that the compiler merely
    happens to not yet reject, I am not too concerned about adding yet more
    pointer operations that cause errors in Miri.

    See @RalfJung’s blog article (article 3) for more about how Miri (and by
    extension, eventually, rustc) treats pointer manipulation.

  • BitPtr-to-BitPtr arithmetic is required by the constraints above to only
    be meaningful within the same allocation region, and thus, with the same type.
    This means that the function signature is not required to generalize as

    fn BitPtr<T: BitStore>::ptr_diff<U: BitStore>(self, other: BitPtr<U)>) -> _;

    but may be kept specific as just

    fn BitPtr<T: BitStore>::ptr_diff(self, other: Self) -> _;

    We cannot enforce this in the type system, but this is an unsafe fn with the
    precondition that other and self be derived from the same overarching
    BitSlice<_, T> region. It is firmly undefined behavior, even by the
    UB-adjacent standards of this crate, to call ptr_diff with pointers from two
    different regions.

  • Prior art in fn store::BitStore::offset indicates that the return type
    should be the anonymous record { elts: isize, bits: i8 } (canonicalized as
    the tuple (isize, i8)). That function produces (isize, BitIdx) because it
    computes a jump value for ptr::offset and an absolute bit index in the
    element to which the jump value refers. This function produces a jump value,
    and a bit distance between the start pointer self and the end pointer
    other.

  • BitPtr<T: BitStore> is notably missing a C: Cursor type parameter. It is
    defined behavior for two BitSlice<C: Cursor, T: BitStore> handles drawn from
    the same region to have different C type parameters. How should this be
    handled?

    • BitPtr::ptr_diff can only compute BitIdx differences. This will
      necessarily produce a bit distance that describes the index difference
      rather than the electrical difference, but, this is unavoidable with the
      information available.

    • fn BitSlice<C: Cursor, T>::offset<D: Cursor>(&self, other: &BitSlice<D, T>) -> _;

      could convert both self and other’s BitIdx values to BitPos using
      their provided Cursor implementations, then construct BitPtr<T> pointers
      from the BitPos values and call ptr_diff on them. This would compute the
      electrical bit distance between the two BitSlice handles.

Nom integration notes

  • bitvec currently relies on behavior that is compiled as expected in current
    rustc but is marked as UB in Miri, and will likely eventually be rejected by
    the compiler. For this reason, I am hesitant to encourage the nom crate to
    begin depending on bitvec for its bit-stream parsing.

    However, it is advantageous to have BitSlice integrate with nom’s traits,
    so that clients who wish to use bitvec for bitstream nom parsing are able
    to drop BitSlice into any trait-driven nom functions.

  • Once bitvec depends on nom, it becomes illegal for nom to also depend on
    bitvec. I don’t know the dependency rules offhand, but I am hoping that if
    bitvec depends on nom optionally, the default feature set of bitvec is
    dependable by nom without creating a dependency cycle.

    If nom elects to depend on bitvec, bitvec’s nom integration will be
    removed and placed in nom instead. This may constitute a major-breaking
    change if I release a 1.0.

@RalfJung
Copy link

RalfJung commented Jul 29, 2019

The [n] references seem to be broken, so now I am wondering which blog post this refers to. ;)

bitvec currently relies on behavior that is compiled as expected in current
rustc but is marked as UB in Miri, and will likely eventually be rejected by
the compiler.

Is there some documentation on what you are doing here and why you are deliberately causing UB? Basically I am curious if there is something you want to do that is just not possible UB-free with the current language; that would be interesting input for the future development of our UB spec.

@myrrlyn
Copy link
Collaborator Author

myrrlyn commented Jul 29, 2019

Did not realize GitHub would retarget those, cool.

I mangle raw pointers using bit masks when I know that there are low bits in the pointer available for use, and crash the program when there aren't. The last time I ran Miri, it complained about my use of integer bit operations on pointers to (a) look at the low bits, and (b) shove other bits in there.

The document Bit Patterns describes the encoding scheme; encoding is implemented in pointer::BitPtr::<T>::new_unchecked, and decoding in pointer::BitPtr::<T>::pointer.

I've written most of a draft for defining pointer encodings that I intend to submit to the Unsafe Code Working Group after I review it again.


I'm honestly uncertain if my creation of &BitSlice reference structures is UB or not; I talked with eddyb about it and they were of the opinion that using a ZST as my slice referent type would convince the compiler to allow any address/length, so this is, fingers crossed, valid.

@RalfJung
Copy link

RalfJung commented Jul 29, 2019

The last time I ran Miri, it complained about my use of integer bit operations on pointers to (a) look at the low bits, and (b) shove other bits in there.

Ah, I see. That would not have been a UB issue then, just with Miri not supporting int-ptr-casts yet. They have been implemented in the mean time, so I encourage you to try again. :) (But also keep in mind that Miri does not detect all UB, just a lot of it.)

I've written most of a draft for defining pointer encodings that I intend to submit to the Unsafe Code Working Group after I review it again.

All right. :)

I'm honestly uncertain if my creation of &BitSlice reference structures is UB or not; I talked with eddyb about it and they were of the opinion that using a ZST as my slice referent type would convince the compiler to allow any address/length, so this is, fingers crossed, valid.

I see the (), but the real question is what is BitSlice defined as.

EDIT: Ah, it is also a ZST. That should help, yes. :) Not sure if it is enough, there seems to be quite a few coding tricks going on here.

@myrrlyn
Copy link
Collaborator Author

myrrlyn commented Jul 29, 2019

Slice of ()

@myrrlyn
Copy link
Collaborator Author

myrrlyn commented Aug 24, 2019

I have filed #916 on Miri, which will (when landed) allow me to run .trailing_zeros() on pointers to determine alignment. This is a blocker for using Miri to find other UB, though from conversation with @oli-obk I strongly suspect that my pointer manipulation will have many fewer instances of UB on modern Miri than I was observing earlier this year.

@RalfJung
Copy link

many fewer instances of UB on modern Miri than I was observing earlier this year.

Those errors were likely not indicating UB but just operations Miri does not support. The error messages are not great at telling these two cases apart.

@myrrlyn
Copy link
Collaborator Author

myrrlyn commented Oct 7, 2020

Done

@myrrlyn myrrlyn closed this as completed Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants