-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Listen3 API #214
Comments
Something like that for Listen 3.x, should be fine right? listener = Listen.to('dir/to/listen', 'dir/to/listen2') do |changed|
puts "changed absolute path: #{changed}"
end
listener.start # not blocking
sleep |
TL;DR; - scroll to the summary to see the Guardfile example using a new Listen API Proposals and usage examples for new APIImportant introThere are ONLY 2 things people really want to track:
This means distinguishing between moving/deleting/renaming/adding makes no sense for E.g. if anyone wants to be notified when files are added or removed or renamed for use cases such as:
... then what they REALLY need to do instead is track the parent/containing directory for modifications. So if someone wants to know if dir1 contains new files, they need to watch dir1 for modifications and compare with ... well .... some previous state. Since If anyone really needs to track additions/removals/renames ...... then here's a workaround to prove a point:
With this workaround, there's NO reason for Since we can now safely cut ourselves from even considering bring adding/removing/deleting into any listen-related discussion, let's move on to ... Current APIHere's what a callback currently gets: {
modified: ['dir1/foo'], # from `echo 'foo' > dir1/foo`
added: ['dir1/dir2/bar'], # from `touch dir1/dir2/bar`
removed: ['dir1/dir2/dir3/baz'] # from `rm dir1/dir2/dir3/baz`
} This is way too problematic to describe and properly support. Reducing the event types to "modified"
New API proposalWhile the events are too complex, the watcher configuration isn't flexible enough. INotify support multiple watchers per listener and there's currently no API to allow configuring different callbacks and configuration options per directory. A new API could look like this: listen = Listen.new(ignore: '.swp', recursive: true) #global options if any
watcher1 = listen.to('dir1') { |dir, object| ...}
watcher2 = listen.to('dir1/dir2', recursive: false, ignore!: 'dir3', wait_for_changes: 0.5) { |dir, object| ...}
watcher3 = listen.to('dir1/dir2/dir3/baz', file: true, ignore: '*.swp', wait_for_changes: 0.1) { |dir, object| ...} In general this means: A watcher is a adapter "instance" or "configuration" - depending on whatever makes sense for an adapter. E.g. there will obviously only be one allowed TCP instance listening on a host/port, but it can "simulate" instances by keeping a hash of "instance configs" and their callbacks. Notes/differences:
The first argument of the callback should the directory the change is related to and how that's matched depends on the adapter, e.g. TCP could match user@host:post prefix (mentioned below), while Linux adapter would need to match a INotify watcher's config with the associated callback). The third option (with Overall, per-watcher-configuration allows different listening options for different plugins/directories, e.g. longer polling times + ignore masks for "editing" directories vs maximum responsivity for virtual machine changed files and TCP listening, etc. This also allows fine tuning Guard/listen based on the amount of resources, the performance improved (concurrency), and polling parameters based on how frequently files are modified in a given directory. What the callback would look like (new API)So, assuming the above 3 watches with callbacks, we'd get: # callback 1
[
[ 'dir1', '' ], # from `echo 'foo' > dir1/foo`
[ 'dir1', 'foo'], # from `echo 'foo' > dir1/foo`
[ 'dir1', 'dir2'], # because of `touch dir1/dir2/bar`
[ 'dir1', 'dir2/dir3'], # because of `rm dir1/dir2/dir3/baz`
[ 'dir1', 'dir2/dir3/bar'], # because of `rm dir1/dir2/dir3/baz`
]
# callback 2
[
[ 'dir1/dir2', ''], # from `touch dir1/dir2/bar`
[ 'dir1/dir2', 'bar'], # from `touch dir1/dir2/bar`
]
# callback 3
[
# from `rm dir1/dir2/dir3/baz`
['dir1/dir2/dir3/baz', nil] # nil here means the key is a file path,
# not that the file was deleted
] (With TCP, the adapter would prefix the keys with user@host:port, so multiple TCP broadcasters could send to a single receiver, and callbacks could be correctly handled per node). Or, with comments: [
# results of watching dir1 RECURSIVELY
[ 'dir1', '' ], # something about dir1 changed (attributes)
# '' - because File.join(dir1,'') => 'dir1/', while nil would give an error
[ 'dir1', 'foo' ], # Something about foo in dir1 changed
# (attributes as a result of content change)
[ 'dir1' => 'dir2' ], # Something about dir2 changed (attributes)
[ 'dir1' => 'dir2/dir3' ], # Something about dir3 changed (attributes because bar was removed)
[ 'dir1' => 'dir2/dir3/bar' ], # Something about dir3 changed, specifically - attributes for bar
]
[
# results of watching dir1/dir2 NON-RECURSIVELY
[ 'dir1/dir2', '' ], # something about dir1/dir2 changed (attributes)
[ 'dir1/dir2', 'bar' ], # Something about dir1/dir2/bar changed
# (attributes as a result of adding the file)
# Notice how value is relative to path in key
# Note: changes to (...)/dir3 and (...)/dir3/bar are ignored
]
[
# results of watching a specific file's content
[ 'dir1/dir2/dir3/bar', nil ], # Assuming dir1 is watched recursively
# note: IMHO it's ok to cause a File.join(key, value) error here
] Silencing vs ignores vs pattern matching (guard)Since there's no conceptual difference between silencing events (listen level) and regexp matching (guard level), and since the above new API would allows options per watcher configuration... ... we could throw away pattern handling from Guard and put it in the listener, where it would make more sense, since less notifications would be needed and more fine-tuning would be possible. This would also make things easier to debug, because removing the pattern would show all the file notifications a given More complex example of usage from GuardWith the above callback invocation, the TCP listener could also track different changes with different (local) options per directory, because it could group notifications by watched directory - effectively the TCP listener could distinguish on which node a file changed and trigger a different callback. Here's a non-existing new API proposal for directory('db', recursive: false) do
# for long running rake db:migrate
watch('schema') do |dir, object|
notify "migration finished!"
end
end
directory('.', ignore: %w(git db tmp log), recursive: true) do
# quick workspace backup
watch do |dir, object|
sh "rsync -arR #{File.join(dir, object)} ../copy-of-project.$(date +%s)"
end
end
directory('[email protected]:db', wait_for_delay: 3600, on: localhost:4321) do
# Get latest list of 'must support' cases (entered by client on website)
watch('*.db-journal') do
sh 'wget -O spec/fixtures/valid_cfg.csv http://vm1/currently_used.csv'
end
end
directory('ci@server1', wait_for_delay: 30, on: localhost:4321) do
# if a team member changes the schema, we want to know ASAP
watch('db/schema') do |dir, object|
sh "scp #{File.join(dir, object)} new_schema.rb"
unless sh 'cmp new_schema.rb db/schema.rb'
notify "schema changed on master!"
end
end
# notify when ci server finishes reporting coverage stats
watch('coverage/index.html') do |dir, object|
notify "build completed and coverage data available on server1"
end
end and on the nodes, respectively: # in Guardfile on example.org
directory('db', latency: 10, recursive: falseignore: 'migrate', forward_to: 'localhost:4000') # in Guardfile on server1, where ci home is /var/lib/ci
directory("/var/lib/ci/checkouts/current", latency: 10, forward_to: 'localhost:4000') do
watch('db/schema')
watch('coverage/index.html')
end Notes about the examples and broadcasting
SummaryThe above examples show how guard could pass options to Here's another example. Here too, the actual guard DSL is irrelevant - the point is separating individual adapter configurations and scoping. directory('app', ignore: 'schema.rb') do
directory('views') { watch(%w(*.erb *.haml *.slim), :rspec, :cucumber, :livereload) }
directory(%w(models controllers helpers config lib) { watch('*.rb', :rspec, :rubocop, :livereload) }
watch('config/locales/*.yml', :cucumber, :livereload)
watch('db/**/*.rb', :migrate)
watch('spec/**/*.rb', :rspec)
directory('features') do
watch('*.feature', :cucumber)
directory(%w(support step_definitions)) { watch('*.rb', :cucumber, :rubocop) }
end
directory(%w(assets public)) do
watch('*.html', :livereload)
watch(%w(*.css *.js), :livereload, :jammit)
end
watch('Gemfile', :bunder)
end
directory('vendor/assets') { watch(%w(*.css *.js), :livereload) } This would in total setup only 6 INotify watchers, 11 callbacks (11 filters) and without monitoring and notifying changes in directories like lib, tmp, db, log, etc. - or specifically ignoring anything in them. Further more, since watch now specifies a pattern and event type isn't necessary, it's free for allowing parameters for e.g. local dependencies and options for plugins, etc. So that's my proposal on how to change the API in a way helps extend and adapt to new ideas in the future without major modifications in the new API. |
Sorry - I started editing earlier and took a longer break, so I didn't refresh and see your comment. listener = Listen.to('dir/to/listen', 'dir/to/listen2') do |changed|
puts "changed absolute path: #{changed}"
end
listener.start # not blocking
sleep The key differences with what I wrote above are:
listener = Listen.to('dir/to/listen', 'dir/to/listen2') do |dir, object| (so we're not locked to only using one listener + one adapter + one configuration + one callback per client or having to manage listeners on the client side)
|
My writing probably sucks, so feel free to skim through the examples and raise issues with them. |
I should probably write less and put in more examples. |
PS2: your writing is pretty good for me, continue like that! 👌 |
Update + status: I'm trying to safely "evolve" guard without breaking everything (which is tricky). I've updated the above examples (my hash examples were dumb). I decided on arrays because ... because ... because Arrays! (from proposal above): # callback 1
[
# [ #<Pathname:dir1>, '.' ], # from `echo 'foo' > dir1/foo`
# [ #<Pathname:dir1>, 'foo'], # from `echo 'foo' > dir1/foo`
# [ #<Pathname:dir1>, 'dir2'], # because of `touch dir1/dir2/bar`
# [ #<Pathname:dir1>, 'dir2/dir3'], # because of `rm dir1/dir2/dir3/baz`
# [ #<Pathname:dir1>, 'dir2/dir3/bar'], # because of `rm dir1/dir2/dir3/baz`
] This is so people can simply do: listen = Listen(options) # object to manage instances through Celluloid registry
adapter_instance = listen.to('app/views', ignore: '*.swp') do |changes|
changes.each do |watched_dir, rel_path|
puts watched_dir + rel_path # or watched_dir.join(rel_path), instead of ::File.join(watched_dir, rel_path)
end
end This should gives the conveniences of Pathname methods for testing if the directory exists, if the path exists, if there's a symlink, use relative paths for nicer outputs, etc. Ideally, I'd want to make Listen abstract enough to allow any changes in the form of: [ [ signal1, object1 ], [signal2, object2], ... ] where signal could be a:
And object would be:
Using above APII've run into a few race conditions during the tests. To avoid them with the new API (and to make good use of the new functionality), here's how the above example could be used by Guard: # listen starts in paused mode - meaning it's listening for adapter events, but
# there are no initialized adapters yet
# (Re)Initialize watchers + start collecting notifications, setup internal watchers, etc.
# Filters are run on collected/queued events (e.g. ignore, broadcast, smoosh) and
# the callback is called with a batch of optimized/reduced files
cucumber_adapter_instance.start
# The given Guard callback is called here, which Guard knows is associated with
# cucumber plugin
# Before running non-background commands, Guard switches back to silent event
# "collecting/queuing" mode, e.g. while cucumber is running
cucumber_adapter_instance.pause
# Pause notifications, so they are collected and shown in bulk only once the
# whole triggered chain completes or fails
notifications_adapter_instance.pause
# Guard runs cucumber, cucumber notifications are collected by notifications
# adapter, user edits files
# Manually called by a `guard-auto-dependency` plugin (works like `make deps`),
# which detected a layout file changed using a different non-paused
# adapter_instance
cucumber_adapter_instance.trigger('reports/browsing.feature')
# Guard cucumber finishes here
# Show optimized/reduced/summary notifications first, so user isn't spammed
notifications_adapter_instance.resume
# Resume here triggers collected results, including 'reports/browsing.features'
cucumber_adapter_instance.resume
# If no callbacks were called for a certain period of time, the computer could
# do some useful stuff, e.g. rubocop, coverage, static code analysis on files
# changed since last run, etc.
idle_adapter_instance.resume
# pauses all adapters (probably doesn't make sense, since non-interactive
# "background" tasks could do useful stuff in idle mode)
listen.pause Implementation roadmapBecause of how everything is coupled together in Listen, my "plan of step-by-step changes" is as follows:
Any feedback, questions, criticism, suggestions on where I could be going - all much appreciated. [Maybe I should put this on a separate wiki page?] |
Looks really good to me. |
I meant gems like rspec does it: rspec, rspec-core, rspec-mockes, etc. Here's a list of ideas for gems/extensions:
The idea is that listen itself would have to be really, really minimalistic, |
Ok now I get the "like rspec", yeah I like that approach. Seems like a really great list! What do you have in mind for listen-filters-livereload? |
listen-filters-livereload would just allow listen to effectively work like guard-livereload ... except without the guard part. It would be a "quick and dirty" way to temporarily watch/debug parts of a project without Gemfile+Guard+plugins+guard init + tweaking. Like guard-livereload without Guard and EventMachine. It could be useful for adding livereload to non-ruby/non-rails projects without changing anything in the project (e.g. static sites). Though personally I'd prefer to use guard set up in every project I use. I'm thinking that - if actually created - it would demo how to create more "server-like" adapters/filters without having to worry out your own event handling, etc. (listen-filter-tcp would be so similar). I wouldn't be surprised if some non-interactive guard plugins (like guard-livereload) were ultimately moved to listen, where they could be more easily configured from guard. That means moving some "managing" functionality from guard ... and making room in guard for more high-level workflow related features. But currently I have to get polling working perfectly with the new API... |
Ok sounds great! |
TL;DR - I'll rename classes completely (!) and prepare a PR for master - without breaking anything. Crap ... I kept hitting dead ends trying to reliably refactor the existing code - without ending up with huge rewrites. And the way Celluloid responds to errors doesn't help rework things. The challenge: changing things step by step to make room for the new API, while keeping backward compatibility and keeping the version working, and while making the final changes easier to understand for potential contributors. In short: I'd prefer to be quickly making "master branch worthy" changes and not breaking things for as long as possible. There are lots of "hidden" aspects, like there's no need for a
The second thing is with the current classes: While they separate the ideas pretty well, the confusion is in:
So I think there's a way to "reorganize" things without actually breaking anything or changing the overall API:
With the above, backward compatibility wouldn't be an issue, while the "new" API would simply be creating a And I think this is all doable without too many changes in the codebase or even tests. Then, with every adapter instance tied to a directory, an ugly but simple hack is all that's necessary to support the new API - possibly also closing symlink and recursion issues. (The hack/workaround can't be done now, because events aren't separated by directory - they're mashed up together and OS-dependent). What do you think? Would this make things less confusing for contributors or more? I know I'm taking my 'nth stab at this, but I think I'm on the right track now. |
I think that looks good. Don't hesitate to change everything for Listen 3.0 if that makes sense to you. You're doing massive change so that's normal to have massive renaming :) I can't wait to see the new structure taking live! Thanks for your work! |
Perhaps a silly request but seems that the issue matches the topic: is it possible for the new API to pass mtime and other stats? I believe |
@antitoxic - if you're not sending events over a network, you can pretty much get any file info in your callback - ether by calling Ruby API directly, or use the Listen::Record for getting information from Listen's internal structures. Over a network, the TCP::Broadcaster is really only a convenience - you should setup your own server, decide on a protocol you want, and write your own client or Listen adapter (there's no example for this, though). Also, getting stuff from the filesystem is usually cheap (except for virtual machines and network file systems) - and in those cases, RSync is usually probably faster/cheaper anyway. |
I'm looking exactly at the virtual machine scenario. And rsync + ssh is terribly slow method to sync or at least how vagrant do it is terribly slow: https://github.com/mitchellh/vagrant/blob/master/plugins/synced_folders/rsync/helper.rb#L106 So I'm thinking Listen + notifying the vm that changes are made + clients on the vm that listen to events. I'll have a look at Listen::Record. Yesterday I was looking at Listen::File and I saw something like This question wasn't much about the TCP but just an observation on how Listen works and how other watchers work - they provide changed path and it's stats. |
@antitoxic - yeah, rsync+ssh on Windows is probably extremely slow - you may want to try mounting a guest folder as a network folder on the host instead (and run the listen server on the VM). Note that the Listen::Record is mostly for being able to detect which files changes actually occurred (mostly for polling and on OSX) - it's not an optimization or for consistency. In fact, it's best to check the mtime explicitly, because it might have changed by the time the client gets the message. So having just the filename in the API is pretty much just a "hint", nothing more - mtime is even less reliable (especially on FAT and HFS). |
Having spent some time with the
listen
internals, issues and thinking about solutions, rethinking the API constantly comes up.Since I don't know much about listen use cases outside the typical
guard-cucumber
andguard-rspec
usage throughguard
, I'd appreciate and feedback, caveats, issues and requests that could be affected by such API changes.I've had some epiphanies about how to think about tracking files and directories to avoid pitfalls. It's long, it's boring, but it's here: https://gist.github.com/e2/10948802
The text was updated successfully, but these errors were encountered: