Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Added timestamping of log entries written to the log server #40

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

perlun
Copy link
Contributor

@perlun perlun commented May 27, 2015

(Title of this PR changed halfway. The original title was "//ramdisk/config/servers/boot/startup not found")


This is an attempt to work in a systematic way with trying to nail down this issue...

When booting up chaos now, I get this error:

[boot] beginning of boot
[boot] Mounted the first available block service as //ramdisk.
[boot] Reading startup script...
[virtual_file_system] Mounting 23 at //ramdisk.
[boot] //ramdisk/config/servers/boot/startup not found.

My conclusions thus far:

  • The boot server is responsible both for mounting the file system (from the initial ramdisk) and reading its file. The former seems to work, the latter seems to fail.
  • The FAT server (which should be serving this filesystem) never seems to get any messages, which seem to indicate that the connection between VFS and individual file systems are broken somehow. To me right now, it feels unclear how this work: what is it that determines what file system server should be used for a given mount point?
  • When trying to debug the vfs server, I've placed breakpoints in vfs_get_file_info to try and step through the code. However, it's been an extremely unpleasant situation trying to get my hands dirty on this one. Because of chaos multiprocess/multi-threaded nature, stepping in the debugger will make things "jump around" very much, making it really really hard to understand what's going on.

So I think:

  1. Either we have to fix gdb to properly understand our process/thread model. Probably doable, amount of work unknown OR
  2. Resort to printf style debugging, i.e. log_print_formatted. This is the approach I'm leaning towards and which I've tried to a certain points.

Comments/suggestions/ideas from all of you are highly appreciated. This bug is really bothering me, and it's frustrating to not understand the code that we wrote 15 years ago... 😄

perlun added 12 commits May 27, 2015 08:25
This gives us a better picture of the time elapsed between log entries. It also enables us to understand in what order things really occur (since they might get logged in the wrong order).
It's actually already mounted when the message is logged.
The reason is to get more predicable order of messages being printed in the log. It feels silly that the boot server goes on with its work before the vfs server has ACKnowledged that the mount has been done.
@perlun
Copy link
Contributor Author

perlun commented May 27, 2015

Alright. After having spent a bunch of unfruitful attempts at getting this working, I'm leaning towards the notion that this stuff has never worked. Which is strange, since I seem to remember it working. But it could very well be that I am mistaken, and the average quality (or lack thereof) of the code seems to indicate that...

In chaos4ever/chaos-old@1aadeee, I noted that we seem to support starting cluido from grub instead. That's of course much easier. I don't like that approach, but maybe we'll live with that for now.

@perlun
Copy link
Contributor Author

perlun commented May 27, 2015

The only real value that his PR adds is the timestamping of log entries. However, for whatever reason it doesn't work as intended; it always logs the same time... I don't fully know what the problem is here, but I feel like I want to resolve that before we merge this.

@perlun perlun changed the title //ramdisk/config/servers/boot/startup not found Added timestamping of log entries written to the log server May 27, 2015
@perlun
Copy link
Contributor Author

perlun commented May 28, 2015

Debugged this a bit further. The value that the log server receives always seem to have the same value in the timestamp field, so I'm suspecting the system call doesn't work as expected (or the output constraints messes things up).

@perlun
Copy link
Contributor Author

perlun commented May 28, 2015

Weird. The timers_update is indeed updating the timeslice in the kernel, but all system_call_timer_read calls seemed to return the same value...

@perlun
Copy link
Contributor Author

perlun commented Feb 14, 2016

This PR is stale and should be rebased and fixed.

@perlun perlun self-assigned this Feb 20, 2016
@perlun perlun changed the title Added timestamping of log entries written to the log server WIP: Added timestamping of log entries written to the log server Mar 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant