Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor overhaul of history writing #65

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from
Draft

Minor overhaul of history writing #65

wants to merge 13 commits into from

Conversation

jpco
Copy link
Collaborator

@jpco jpco commented Nov 30, 2023

With this PR, the way history writing works is like this:

  1. During command input, the input line is buffered in memory
  2. When $&parse returns a full parsed command, it also returns the input that produced the command
  3. If %is-interactive, %parse passes that input to the function %write-history
  4. If readline support is compiled in, %write-history calls $&writehistory which puts the input in the readline history list and asks the history library to write it to the $history file (or, technically, whatever was passed to $&sethistory most recently).
  5. If readline support is NOT compiled in, then there is no $&sethistory or $&writehistory, and %write-history is defined as
fn %write-history cmd {
  if {!~ $history ()} {
    if {access -w $history} {
      echo $cmd >> $history
    } {
      history = ()
    }
  }
}

Buffering the full command input until returning produces much nicer behavior for multi-line commands than the line-by-line logging that es did previously. With readline on, it makes scrolling up into history for multi-line commands pleasant; with readline off, it still helps prevent "interleaved" commands in the off chance that the user is writing two multi-line commands in separate terminals at once.

Readline is fully used for history when support is compiled in. This is less "dodgy" than the alternative ( :) ), and means things like history_write_timestamps can be used, which then makes multi-line commands work nicely even across es invocations.

Having a hook function makes it pretty easy to do things like imitate HISTCONTROL or even get fancier (unset $history and use a networked history server? Something like https://github.com/atuinsh/atuin?) I don't really need to waste bytes trying to convince es people about how nice hook functions are, though.

A small additional benefit of this change is that it allows users to use a variable other than history for their history file if they want, by just changing which settor function calls $&sethistory (or modifying the %write-history function, in the no-readline case.) This can be useful for users who are, say, jumping between rc and es and want an easy way to keep the two shells' history files separate.

Some caveats:

  • Bikeshedding %write-history and everything else is very welcome. I don't know if that's the best name, but I'm too deep in the code at this point, so fresh eyes would be very helpful here.

  • Despite making it possible to enable history_write_timestamps with this change, I didn't do so, since it would make history files both different and a bit uglier. We could turn this on now and make things extra shiny right away.

  • %write-history is currently run before the command is executed, and there is no protection in %interactive-loop from it throwing an exception. Therefore if you do something foolish like set fn-%write-history = throw error %write-history, you can just break your shell. I didn't add any protection for this for the purposes of not growing %interactive-loop too much, but I do want to call out the omission. We could also put the %write-history call after the command is executed. I think it should go before, but it would be a simpler fix than some fancy unwind-protect thing.

  • The major limitation on getting wackier and using something like a remote history server or SQL database or whatever instead of a file is that loading history into the readline history list requires giving $&sethistory a simple file to read. Perhaps that's an extension for the future, if people want it.

  • This PR just deletes the disablehistory variable that seemed to be for the purpose of avoiding logging heredoc contents into the history file. That variable was already broken as far as I could tell, but it may be nice to support that behavior in the future for people who want that. Another extension for the future.

Hopefully the individual commit messages make it obvious what's going on in each file.

This produces nicer behavior with multi-line commands, especially when
using readline.  Where before you always have to cobble together multi-
line commands from history with multiple searches, now you can hit up
and get the whole multi-line command at once.

Also moves history logic into a separate history.c file.  This is just
because input.c is already huge and sprawling.
This allows, for example, the use of `history_write_timestamps`, which
inserts a `#` and a timestamp before each command written to history.
This then allows multi-line commands to be properly read from the
history file.

That feature is left disabled, though, because it writes incompatible
history files.
This change makes %parse output the typed line(s) in addition to the
parsed command.  This is then passed to the new %write-history hook
function, which is set to either $&writehistory if readline support is
included, or an es-native function otherwise.
@jpco
Copy link
Collaborator Author

jpco commented Dec 1, 2023

Another potential design for readline-history primitive(s) would be to have just one $&history primitive with several "subcommands". For example:

  • $&history log $cmd
  • $&history setfile $file
  • $&history setpos $n
  • $&history list

and so on. This could be used if someone wanted to write a history command like exists in other shells, without filling the primitive-space too much. But it would make the $&history primitive a real complex beast. Just an idea.

Typically when I write a syntax error my instinct is to hit up and then try to correct it; that requires writing syntax errors to history in order to happen.
@memreflect
Copy link
Contributor

I like the buffered command input and the write to the history file even when an error occurs.

I also am grateful that you did not enable timestamps.
There are other methods of preserving newlines without breaking existing line-based history entries, such as writing \\n and \\\\ escape sequences or splitting the command on \n and using the equivalent of a %W conversion to write control characters in place of the newlines.

I'm not so sure about trying to remove $history and $&sethistory.
$history makes history file code conveniently easy to write, and most users would simply do something like the following in their .esrc if hooks are the way to go:

history-file = /path/to/history/file
fn-%write-history = $&writehistory $history-file
fn %read-history {
  $&readhistory $history-file
}

But i recognize that someone might want to do something else with history, which hook functions would facilitate.
As for $&sethistory, it is necessary unless you create a $&readhistory primitive for readline's sake.

There are definitely more history operations that can be moved into es code if hooks are desirable.
Readline's read_history() is just a loop over lines of a file, and append_history() can pretty much be accomplished using >> filename.
That also would help transition to a format that supports multi-line commands.
Primitives might be handy for encoding and decoding multi-line commands for performance and memory reasons, and another primitive is required to invoke add_history(cmd) for readline's sake.
Lastly, you would need some way to get the lines (or perhaps even store them in an es list and keep things synchronized with the line editor using a settor) to be able to write them to a file in the first place.

And because es can do all of those things itself with the help of some new primitives, i'm curious what you want out of a $&history primitive; also, how would hooks work with the subcommands?
$&history setfile $filename for example would be more simply written as history = $filename.
The alternative to both of those would be something like $&history output $cmd >> $filename or $&history append $filename $cmd, but that would require a user to create a %history-append hook or something for you to check and invoke on each command.
I do like the idea, but i wonder if you had something specific in mind when suggesting it.
I already mentioned four history-related primitives in the previous paragraph, so i can understand why you might want a single "one command to do it all" primitive that sort of groups the related operations together and delegates those operations to subcommands.

@jpco
Copy link
Collaborator Author

jpco commented Dec 7, 2023

As for $&sethistory, it is necessary unless you create a $&readhistory primitive for readline's sake.

Oops, I missed that. Good catch.

There are definitely more history operations that can be moved into es code if hooks are desirable.

I already mentioned four history-related primitives in the previous paragraph, so i can understand why you might want a single "one command to do it all" primitive that sort of groups the related operations together and delegates those operations to subcommands.

This is exactly what I'm picturing. If you wanted to, say, duplicate the whole behavior of a robust history command, with a good amount of https://tiswww.case.edu/php/chet/readline/history.html#History-Functions exposed as individual primitives, you'd be adding quite a lot to the output of echo <=$&primitives. A single $&history primitive with sub-commands would help keep that cleaner. Then you could wrap those calls in hook functions in a similar pattern as $&openfile has in the shell now.

However, I'm not really convinced I even want all that flexibility.

$&history setfile $filename for example would be more simply written as history = $filename.

In this scenario, I would want initial.es to contain set-history = $&history setfile so that when a user runs history = $filename it would alert the readline library by default.

But all of this is a bit hypothetical, and I only wanted this PR to be a minor overhaul of history writing :)

Do you have any comments on the PR as written? I'm not 100% sure which of my harebrained ideas

I'm not so sure about trying to remove $history and $&amp;sethistory.

is referring to -- is that about what I've already written, or one of my proposed alternatives?

@memreflect
Copy link
Contributor

Do you have any comments on the PR as written?

I initially lamented the change from an always-open file descriptor to repeated opening and closing of a file by name.
The file descriptor was a better option if you ignore the fact that something like rm -f $history means subsequent writes to a history file in that shell would write to the open file description of a file that no longer exists, but the new design ensures that the file is always written, even after a $history file is removed, which is a more predictable behavior.

That minor detail aside, the changes to %parse, the buffering of command input, and the writing of all commands, including those that cause syntax errors, are all welcome choices, which is pretty much this entire PR, so +1 from me.

I'm not 100% sure which of my harebrained ideas

I'm not so sure about trying to remove $history and $&sethistory.

is referring to -- is that about what I've already written, or one of my proposed alternatives?

One of your alternatives suggested that $&writehistory [FILE] [CMD] could be used to get $&sethistory out of the shell.
For some reason, my brain mistakenly processed $&writehistory [FILE] combined with the removal of $&sethistory as a complete replacement for $history as well.
That is definitely my blunder.

@wryun
Copy link
Owner

wryun commented Dec 23, 2023

This is the sort of quality of life improvement I'm pretty keen on (like the tests thing).

However, given that it affects run-time functionality, I don't want to merge it until I've had a chance to go through it fairly carefully. Hopefully in the next month :) (pre-commitment!)

@jpco jpco closed this Jan 20, 2024
@jpco jpco deleted the readline branch January 20, 2024 20:07
@jpco jpco restored the readline branch January 20, 2024 22:35
@jpco jpco reopened this Jan 20, 2024
@jpco
Copy link
Collaborator Author

jpco commented Jan 20, 2024

(Sorry, trying to clean up some old branches, I took this one out as well. Restored.)

@wryun
Copy link
Owner

wryun commented Sep 15, 2024

Sadly, I don't think you should wait on me for this. I'm happy for you to merge this if you're confident it won't break anything.

@jpco
Copy link
Collaborator Author

jpco commented Sep 16, 2024

Looking back at this with fresh eyes, there are a few things I want to change:

  • (back-compat) no-readline %write-history should test access -w $history and handle the false case by unsetting history.
  • (back-compat) a line like # hello, which produces no actual code, should still get logged to history.
  • (bug fix) handle non-syntax error errors from $&parse correctly (e.g., incomplete here document errors).
  • (style) encapsulate history writing logic (including exception-handling) better, possibly within %parse. (Doing it in %parse is tricky because %parse is run by %batch-loop as well.)

In addition, it's possible that in the future with something like runflags from #79, the -v flag could be implemented in all-es script with the new return value from $&parse added here (after all, what is -v but logging history to stderr?) So I'd like it if this design were speculatively "forward-compatible" with that. In practice, I think that just means that $&parse needs to make sure to return everything it reads in in the new return value. I'll need to see how far from that we are now.

I'm also hoping that this design can be compatible with some future input system that actually allows es code to be used while reading command input, but that would require a lot of novel design so it's hard to prepare for now.

@jpco
Copy link
Collaborator Author

jpco commented Sep 16, 2024

Also, I'm realizing now that the return values of $&parse are... terrible in general with the PR as it is. Sometimes $&parse turns a line (like # foo) into a null command. Sometimes $&parse returns a command but no input due to not running interactively. Sometimes it returns both command and input, and sometimes neither. Because of that, you can't actually tell what any element of its return value actually is. It works as used in initial.es today, but that's a pretty terrible experience. Instead $&parse should always return its input, even if it's just ''.

The $&parse primitive now always returns the input it collected as the
first element of the return value, and includes the input it collected
as part of any error exceptions it throws.

This extra element is "consumed" and handled by %parse, which may call
%write-history with it.
@jpco
Copy link
Collaborator Author

jpco commented Sep 17, 2024

(Score one for running CircleCI tests with -DGCDEBUG=1 -- it caught a memory error that wasn't getting triggered on my machine!)

I have refactored $&parse so that now its return value(s) always start with a single element representing the input it ingested to do the parse. $&parse also includes that input when it throws an error exception -- error $&parse INPUT MESSAGE is the format in that case. %parse "consumes" that input when handling returns and exceptions, similar to how %backquote "consumes" the status element that $&backquote returns.

This is nice because it's a simple and consistent behavior from $&parse, which I prefer from primitives. This is a bummer because it means that in non-interactive behavior, $&parse still buffers and returns its input only for that input to almost always be immediately thrown away. There are ways to remove that wasted work, but they all sacrifice simplicity by making $&parse more complicated or adding other primitives.

A couple questions:

  • I'm not sure how I feel about the current way that %parse is implemented -- in particular that it tests %is-interactive every time it runs, despite the fact that %is-interactive changes only rarely and in controlled ways -- this way seems like more wasted work, especially during %batch-loop. Does it make more sense to lexically bind a simpler %parse to %batch-loop in initial.es?

  • I don't feel entirely confident whether it should be %parse's job or %write-history's job to avoid writing empty lines to history. I decided to put it in %write-history to give the user more flexibility, but ... will anybody actually make use of that flexibility?

@jpco jpco marked this pull request as draft November 2, 2024 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants