-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace PID with Task Pointer in Rusty #713
Conversation
da81085
to
1137f00
Compare
Wouldn't this design throw away the origin safety provided by |
Btw I want to ask what's the actual cause of |
That was not apparent to me but now that you mention it, this sounds like it might be what I've been fighting the verifier over wrt/ all the different attempts to make this work (i.e. for every attempt, once the code makes sense, instruction count skyrockets and the verifier refuses to pass this code).
I'm going to give that a try using
tl;dr -- pid's can change (but task struct ptrs cannot): #610 |
5800828
to
2d78731
Compare
I think this is the best we (or well, at least I) can do atm. So like, I tried a few ways of tracking the association between the initial pid of a task struct and the pointer of the task struct. My most recent attempts errored at null-checking the task context read using that association:
Unless I'm reading something wrong or misunderstand something, I think this is the best we can do ATM (i.e. until it's possible to null check the retrieved variable). I think this might be alright though because like, tests do pass and logs on those tests do not have stalls (which, I think, is the main reason to be concerned we are tracking everything). |
It seems to me that what the commit does is only refactor the origin code , rather than actually changing any logic or mechanism when dealing with |
This is correct. What I am wondering is if the current behavior (without that exception being thrown) is problematic. I would think I would see a stall if it were if I understand how things work correctly now, but I'm not super certain of that. |
I think the idea is that if correct scheduling decisions can't be made by the the scheduler it's better to exit (the scheduler) rather than making bad scheduling decisions. There's still some bugs in rusty, so in some ways it maybe is better to fail fast in more places. In this case of looking up the task context I think it's probably ok to bail the scheduler, because if the task context is lost/incorrect then making correct scheduling decisions is probably not handled correctly elsewhere. |
That makes sense.
This is the part where like, IDK. So like, if we lost the task, it wouldn't be scheduled, and we would see a stall? If that is correct, then maybe (like, super maybe, idk), we're getting the task again after the pid change happens, and because we're keying off the PID, we're treating task as an entirely new task (hence no stall)? |
I think the problem with |
gonna run it on laptop for a day and do the terraria/compile test/see if I can find bad behavior, good idea, thx! |
6298ad6
to
146e7e8
Compare
Alright so I think I finally got this (and learned a ton along the way). This PR now has Rusty use Task pointer casted to u64 to track tasks instead of pid. A couple things I learned: In bpf, you can't cast "trusted pointers". I don't fully get it, but t_to_tptr makes the cast work w/ the verifier. Making functions static makes some problems go away, but can also super quickly drive things over the instruction limit. More interestingly, I tried only grabbing pid via probe_read (rather than a pointer to task_struct), and to then grab task_struct via What I found is that is super racy (i.e. adding a counter and a while loop made stress test last longer while before failing with a missing pid error), so I kind of think we need to not use |
b1d2c70
to
ab3995c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will work!
Replace PID with Task Pointer in Rusty Fixes: sched-ext#610
3bc75a5
to
efabcfc
Compare
I wasnt't able to reopen #672 because I force pushed (just learned about that limit).This is a continuation of that PR.I closed that PR and opened #710 because, while the first PR's approach passed stress tests and seemed to work, it kind of didn't make sense that I could deref the casted pointer to task_struct.I updated that initial PR to explicltly deref that casted pointer to confirm this works/the verifier allows it and it seems to work so, this works?see latest comment