-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL-encoded session names #55
URL-encoded session names #55
Conversation
Thanks! This looks really good. As far as I understand enabling url-encoding shouldn't break anything for anyone, am I right? Or rather: it could break user sessions if they have names for which url-encoding is not a no-op, but that should be rare as I guess that most people should be using names without any special characters. If so, then maybe we can go for a breaking change and set the default value to |
I rebased your changes on top of current master and fixed some stylua issues. On top of that I implemented some things and pushed to your branch, I hope that it's not a problem. The changes include:
Are you okay with these changes? If so than I would wait some time using this to make sure it doesn't break stuff and then I would squash the commits and merge it as a breaking change (even though it should not break anything for 90% case). Thanks again for the idea and PR, this allowed to add some functionality that has already been requested multiple times. Related: #12, #45 |
I'm glad you found it useful, and of course, I agree with your changes. Thank you for maintaining such a great plugin. 👍 |
There is a lot of good work here, but some issues remain. Firstly, I think calling it url encoding is a bit misleading since that is not precisely what's happening. Correct url encoding has a different set of characters that must be encoded and different set of reserved characters, and this is also not a full or correct implementation. Perhaps something like Secondly, part of the algorithm uses simple substitution, which is not sufficient. I've outlined some issues with substitution in a different session plugin here: rmagatti/auto-session#273 Here are the results of some test cases with the "url encoding" algorithm as it currently stands: So while it might work for a good number of cases, it's not entirely robust. Alternatives: I don't know if you want to take a dependency, but this might be a quick solution. Another possibility is that so long as any invalid filename chars are stripped or encoded, we can still use this idea provided it is a one way function. That is, so long as we are always "encoding", it can work. For example, currently the loading of the last session attempts to decode the filename. It could instead just read the CWD inside the JSON (which is not encoded). I have yet to consider whether this is possible with the migrate function and whatever Thanks @awerebea for your work on this! For reference, here is the code I was testing out at https://codapi.org/lua/ function url_encode(str)
if type(str) == 'string' then
str = string.gsub(str, '\n', '\r\n')
str = string.gsub(str, '([^%w %%%+%-%_%.])', function(c)
return string.format('%%%02X', string.byte(c))
end)
str = string.gsub(str, ' ', '+')
end
return str
end
function url_decode(str)
if type(str) == 'string' then
str = string.gsub(str, '+', ' ')
str = string.gsub(str, '%%(%x%x)', function(h)
return string.char(tonumber(h, 16))
end)
str = string.gsub(str, '\r\n', '\n')
end
return str
end
function enc_dec(string)
local encoded = url_encode(string)
local decoded = url_decode(encoded)
print(string .. " -> " .. encoded .. " -> " .. decoded)
end
enc_dec([[foo\bar]])
enc_dec([[unc:\\bar]])
enc_dec([[c:\foo\bar]])
enc_dec([[foo+bar]])
enc_dec([[foo%bar]])
enc_dec([[foo bar]])
enc_dec([[foo bar]])
enc_dec([[foo++bar]])
enc_dec([[foo+%bar]])
enc_dec([[foo%%bar]])
enc_dec([=[foo[[]]bar]=]) |
After a quick play, it looks like the following works for my above samples.
Encode What's the point in substituting function url_encode(str)
if type(str) == 'string' then
--str = string.gsub(str, '\n', '\r\n')
str = string.gsub(str, '([^%w %+%-%_%.])', function(c) -- removed %% to let % encode
return string.format('%%%02X', string.byte(c))
end)
--str = string.gsub(str, ' ', '+')
end
return str
end
function url_decode(str)
if type(str) == 'string' then
--str = string.gsub(str, '+', ' ')
str = string.gsub(str, '%%(%x%x)', function(h)
return string.char(tonumber(h, 16))
end)
--str = string.gsub(str, '\r\n', '\n')
end
return str
end |
@josh-nz, thank you for your feedback and comments. |
My personal preference would be to only percent encode characters that are not legal filename characters (this be get tricky to define exhaustively and is platform specific, which I wouldn't worry about here, just the general cases). So I would not encode Ultimately, the current implementation still works, and I doubt I'll really be looking at the encoded filenames on disk and dealing with them directly in any capacity, but it does have an impact on code understanding. |
A few other things to consider. Does anyone know how this plays with non-English characters? I'm guessing it depends on how Lua treats the Also, this encoding transforms a single character into 3 characters now. There might be limits on file system name lengths. Windows in particular can be pretty fussy about this. These limits are probably a minor edge case, but worth keeping in mind. |
The current implementation allows you to create a session with a name of your choice. It can be not only CWD, but also, for example, "My session :*)". I think it is necessary to escape all potentially unsafe characters. If this approach does not seem reasonable to you, I prefer to encode all characters except alphanumeric, instead of allowing spaces, asterisks, commas, question marks and other legal filename characters. |
Thanks for your research. I read through it and it looks like this is much more complicated than I thought :P I took a look at how Neovim/Vim do this kind of encoding for undo files (what we get from undofile()). It looks like the only thing that is done is to replace path separators with percent character (undo.c). So all other characters should probably be valid file name characters on all platforms? The main problem I see is that on Windows this is not a one-to-one mapping. We could try some heuristic on how to decide where to put What do you think about it? If I find some time I'll try to implement this, but if you have time and will to do so, then feel free. Or are there any issues that you see with this idea? |
Looks like I was naive, there seem to be a lot of forbidden characters on Windows: https://stackoverflow.com/questions/1976007/what-characters-are-forbidden-in-windows-and-linux-directory-names. We could:
In any case it's probably good to stop relying on file names at all when reading sessions and just parse JSON. This would just need some "health checks" in case for some reason two files with different filenames have the same session name in JSON. This is up for discussion, I don't know which solution is best, I just think that banning is the worst one. |
I think given the many different filesystems, what @awerebea said about encoding
Of these two, I agree on the second. Just let Neovim and filesystem deal with the error handling. The current encoding and decoding I think is working. I wasn't aware it was encoding manually entered session names also. One option is to only encode the automatic CWD names, and leave manually entered names to be dealt with by Neovim natively (the second point above). I would probably be in favour of this, but also a bit ambivalent about the decision. Maybe it is preferable to be consistent and just continue to encode everything. Edit: yeah, I think? I'm leaning towards consistency. Let users have their "My session :*)" if they want it. Looking forward, I can see my suggestion here being useful: #59 (comment) |
@awerebea @josh-nz
Please review these changes to see if it's ok. I will then squash this as there are a lot of commits. When this is merged we can think of other suggestions from #59, e.g. "remove last symlink and use mtime" or "load CWD session based on session.cwd from most recent session". |
Hi @jedrzejboczar,
It looks good to me. Thank you for your attention to this issue. |
I'll have a look at the changes later today. I've also got some local commits for "remove last symlink and use mtime" or "load CWD session based on session.cwd from most recent session" (this last one is still a work in progress, only started on it last night and will need some adjustments based on your changes). You know the code a lot better than I do, so I'd consider these as proof-of-concepts, there might be a better way to implement them. Happy to share them when I've fixed them up for the latest changes in this branch. |
Looks ok to me. I'm wondering if there is a need to support the
The only question is do we want to decode session names for display, eg PossessionList. If we do, there is a very slight chance we mis-display a filename if the user had manually entered something like |
Thanks for quick test :D As you suggested @josh-nz I removed the config option and we will just always encode - this should be less confusing for users. |
I think this is good to merge in. Any other issues that arise can be separate issues? I don't think there are any breaking changes that this PR will introduce. (I've been working on updating my implementations for "remove last symlink and use mtime" or "load CWD session based on session.cwd from most recent session" based on the recent changes here) |
…sions Encode/decode session names using percent-encoding syntax commonly used in web development to encode URLs. This way we can encode paths as session file names to save CWD sessions. Now we always decode session name from JSON data, name->filename is just for generating unique files.
Thanks @jedrzejboczar for your help and creating this plugin, and @awerebea for getting this particular feature started. |
Add a configuration option to encode/decode session names using percent-encoding syntax, very similar to what is commonly used in web development to encode URLs, to avoid using illegal characters in file names, but still allow them to be used in session names.
When this option is set to true, the session name is encoded for use as part of the session filename before saving and decoded back before displaying in the list of sessions and notification messages.
This change allows us to use CWD paths as the session name to manage sessions based on path, for example:
Here's what it looks like in practice: