Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ".active" suffix to actively recording files #1672

Open
wants to merge 5 commits into
base: rolling
Choose a base branch
from

Conversation

facontidavide
Copy link

@facontidavide facontidavide commented May 23, 2024

@facontidavide facontidavide requested a review from a team as a code owner May 23, 2024 16:24
@facontidavide facontidavide requested review from emersonknapp and hidmic and removed request for a team May 23, 2024 16:24
Signed-off-by: Davide Faconti <[email protected]>
Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the PR, a few minor comments.

rosbag2_cpp/src/rosbag2_cpp/writers/sequential_writer.cpp Outdated Show resolved Hide resolved
rosbag2_cpp/src/rosbag2_cpp/writers/sequential_writer.cpp Outdated Show resolved Hide resolved
rosbag2_cpp/src/rosbag2_cpp/writers/sequential_writer.cpp Outdated Show resolved Hide resolved
@facontidavide
Copy link
Author

@fujitatomoya thanks for the feedback!

@facontidavide
Copy link
Author

any additional concern blocking this PR?

@facontidavide facontidavide changed the title fix issue #1597 (replaces #1671) fix issue #1597: add suffix "active" to files being written Jun 3, 2024
@defunctzombie
Copy link

@tonynajjar If you have the ability to incorporate this change into your workspace you can give this a try as a fix for #1597

@tonynajjar
Copy link

tonynajjar commented Jun 4, 2024

Thanks for the ping @defunctzombie and thanks for the fix @facontidavide. I'm on humble so after fixing some conflicts in the backport I was able to build.

I also tested it out and looks like it works well (at least my humble backport does):

$ ls
rosbag2_2024_06_04-17_41_56_0.active.mcap

$ ls
metadata.yaml  rosbag2_2024_06_04-17_41_56_0.mcap  rosbag2_2024_06_04-17_41_56_1.active.mcap

@tonynajjar
Copy link

One comment though, the suffix is now .active.mcap. It should be .mcap.active for two reasons:

1- It was like that in ROS1
2- when it's active, it's technically not a valid .mcap file yet

@facontidavide
Copy link
Author

@tonynajjar , that was my original intentions, but since the suffix is added by the storage plugins, that would require more changes in multiple files

@MichaelOrlov
Copy link
Contributor

that was my original intentions, but since the suffix is added by the storage plugins, that would require more changes in multiple files

That is why the solution with creating a temporary empty file with the same name as the original currently recording file but with the .active extention instead of renaming the active file is better.

@MichaelOrlov
Copy link
Contributor

@facontidavide As regards

any additional concern blocking this PR?

Yes. it doesn't handle properly cases when enabled file compression.
See a similar issue in the #1643.

@defunctzombie
Copy link

That is why the solution with creating a temporary empty file with the same name as the original currently recording file but with the .active extention instead of renaming the active file is better.

I disagree that it is "better". It may be technically easier to implement because of other architectural choices in the code but I would not define it as "better" from an end-user standpoint. (My opinion of course).

@facontidavide
Copy link
Author

Ok, based on the input (thanks!) I decided to address this in the individual storage plugins and it seems to work correctly, also when using file compression. All the changes to sequential writer have been reverted.

@MichaelOrlov MichaelOrlov linked an issue Jun 8, 2024 that may be closed by this pull request
@MichaelOrlov MichaelOrlov changed the title fix issue #1597: add suffix "active" to files being written Add ".active" suffix to actively recording files Jun 8, 2024
Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@facontidavide Thanks for your contribution.
The approach with changes in the storage plugins looks more clear.
Although, it causes multiple failures in unit tests from various rosbag2 packages.
I've tried to fix linter issues and expectations in failing tests. However, It would be good to have a dedicated unit test in the rosbag2_storage package to make sure that all plugins adhere the same behavior and also make sure that .active will be removed after closing read_write storage.
Also, need to consider if we want to add the same .active suffix when we are openning storage with append mode. Not sure what would be an implication from that and if we do really need it.
cc: @defunctzombie Opinion about append mode and .active suffix?

@defunctzombie
Copy link

Also, need to consider if we want to add the same .active suffix when we are openning storage with append mode. Not sure what would be an implication from that and if we do really need it.

When does this happen? Is this when someone uses the rosbag2 interfaces as a library to read/update files in some "offline" workflows? Or does this also happen if you start/stop the recording process in some particular way?

My perspective is that the .active behavior is a feature of the recording process, for files it initializes, to indicate a file is actively being written to, not closed, and not finalized. For formats like MCAP that have a defined EOF this is more important than what I know of the sql recording plugin. If I was using a library to open/update my file I don't think I would expect the library to rename my file.

If there is a way to start the recording process and tell it to append to an existing file if one exists (rather than start a new one) I am not sure what to think about that because the existing file may have already been copied elsewhere by the user so modifying it is kind of asking for strangeness with what it might mean for managing recordings on and off robot. If I had to be more principled about what should happen I would say that if the writer has updated the file on-disk to something that has now made it an invalid file (i.e in MCAP removed the footer, etc) then it would be appropriate to mark as .active since the file is in a partially valid state.

@MichaelOrlov
Copy link
Contributor

@defunctzombie We actually don't have a real use case in rosbag2 for using storage in append mode.
It seems it was added just in case for future opportunities, or maybe someone is using it from the library in their own recorder implementation or converter.
I was curious if you have a real use case in mind when it is used.

Ok, let's keep append mode unmodified for a while.

By the way, it seems we don't really support append mode in the mcap storage plugin. As you mentioned, it is supposed to remove the footer from an already recorded mcap file, add more data, and write a new footer. But we are not removing the footer from the already recorded file. I am not sure, but it could be a valid case by not removing a footer for the MCAP standard since we rely on offsets and the reader can skip the footer, which is in the middle of the file. Anyway, it seems no one tried to test this mode.

@oysstu
Copy link
Contributor

oysstu commented Jun 10, 2024

that was my original intentions, but since the suffix is added by the storage plugins, that would require more changes in multiple files

That is why the solution with creating a temporary empty file with the same name as the original currently recording file but with the .active extention instead of renaming the active file is better.

Another possibility is to add a get_storage_extension method to storage_interfaces::ReadWriteInterface, and have the storage plugin expect the full path including the extension.

Like you're suggesting, I have used a separate file on the side with a different extension (.lock). In practice, there's some difference, because if the file is closed improperly all scripts that currently look for files ending with .mcap or .db3 now also have to also look for .active.

@facontidavide
Copy link
Author

@MichaelOrlov thanks a lot for turning CI green!

Since we said that changes to support "append" are not required yet, is there anything blocking this?

@MichaelOrlov
Copy link
Contributor

MichaelOrlov commented Jun 11, 2024

@facontidavide as regards

Since we said that changes to support "append" are not required yet, is there anything blocking this?

After some more investigation and weighting all the pros and cons, I am actually seriously considering closing this PR as wouldn't be merged in favor of the alternative approach of creating a temporary empty file with the same name as the original currently recording file but with the .active extension.

I am still not convinced that the current approach is better than alternative with creating a new empty file with .active extension. I don't envision any pros for the current approach rather than it will be the same approach as in ROS 1.
I don't quite understand why @facontidavide and @defunctzombie so promoting and advocating for the current approach with files renamig. I don't see any benefits in it in comparison to the alternative approach.
Please correct me if I am wrong.

Pros and cons of the current approach from this PR with direct file renaming.

Pros:

  1. It will be the same approach and behavior as in the ROS 1.
  2. For the MCAP format, which has a defined EOF, it will be more obvious for the third-party tools to know if the file was not closed properly.
  • That statement is actually defeated by the MCAP file format itself and MCAP reader implementation. The MCAP file format is designed to be resilient for crashes and abnormal write termination. Also, mcap reader has an option to fall back to the full scan of the mcap file to recreate (indexes) summary if the footer or summary section is not found. And we are using this option by default see
    void MCAPStorage::ensure_summary_read()
    {
    if (!has_read_summary_) {
    const auto status = mcap_reader_->readSummary(mcap::ReadSummaryMethod::AllowFallbackScan);

Cons:

  1. The user will have a hard time figuring out what to do with the .active file in case of the abnormal recorder termination in case of any segfault or power outage. Or even the user will think that the recording was not saved since rosbag2 player or any other tools don't recognize .active file as a valid bag file.
  2. As it was mentioned earlier by the @oysstu in the Add ".active" suffix to actively recording files #1672 (comment) if the file is closed improperly all scripts that currently look for files ending with .mcap or .db3 now also have to also look for .active files.
  3. Since file renaming is done on the storage plugin level and we can't enforce restrictions for third-party developed storage plugins to do the same renaming it will likely be inconsistency in behaviour and wrong expectations for some third-party storage plugins or in-house written recorders.
  4. In the case of using per-file compression, it will be an odd behavior with the race conditions for the scripts which can't be resolved by design. Consider a workflow
    1. some bag_file_name1.mcap.active created.
    2. The bag file is finished and bag_file_name1.mcap.active renamed to the bag_file_name1.mcap.
    3. Another bag_file_name2.mcap.active created.
    4. The file compressor started compressing the bag_file_name1.mcap file and created the bag_file_name1.mcap.gzip file.
    5. The file compressor finished compression and deleted bag_file_name1.mcap.

The scripts that are looking for the finished files will take the bag_file_name1.mcap file on the step 2 and will try to process it, likely try to upload it to the cloud. Although, the file bag_file_name1.mcap will be deleted by the file compressor somewhere in the middle of the uploading to the cloud in the step 5.

Pros and cons of the approach of creating a temporary empty file with the same name as the original currently recording but with the .active extension.

Pros:

  1. It will be a smooth experience for the users in case of the abnormal recorder termination, and it will not be any differences from the current behavior. This is because the MCAP reader already gracefully handles the situation in case of the absent footer or summary section in the MCAP file and tries to recreate (index) "summary" in memory on the fly via parsing the whole file. The parsing will take a bit longer than if it would normally save a summary at the end of the file. However, the entire playback will be the same and timings between messages will be preserved, and the user will not even notice that the map file wasn't properly finished during recording.
  2. The third-party tools will only handle files that don't have counterparts with the same name and .active extension at the end. i.e. easy to write scripts and no disambiguate when and what file is ready to process.
  3. The per-file compression could be properly handled. on step 2 we will not delete the bag_file_name1.mcap.active but rather create another one bag_file_name1.mcap.gzip.active on step 5 and will delete both bag_file_name1.mcap.active and bag_file_name1.mcap.gzip.active on step 6 and script can easily figure out that available only bag_file_name1.mcap.gzip and the right moment when it is ready to handle.
  4. The third-party storage plugins will automatically be compliant with the new feature. i.e., no changes are required on their side.

Cons:

  1. It will be NOT the same approach and behavior as in the ROS 1.
  2. The implementation will need to be done above Bugfix for bag_split event callbacks called to early with file compression #1643
  3. The playback of the abnormally finished mcap file e.g. without a footer or summary section will be a bit delayed due to the on the fly reindexing. However, this drawback could be overcome via using ros2 bag rewrite or mcap-cli tools by recreating indexes and writing a new mcap file from the existent.
  4. Please add if you envision anything else.

@Timple
Copy link

Timple commented Jun 11, 2024

I kind of proposed a similar thing (an active file next to the file being recorded). @tonynajjar had a valid comment on that: #1597 (comment)

However, your proposal by keeping the same name with a suffix actually mitigates that. So seems like a feasible plan.

One note: how will this file be cleaned up in case of crashes? It might stay there forever and syncing scripts might always assume it's being written. Although this can of course be scripted around pragmatically.

@MichaelOrlov
Copy link
Contributor

@Timple As regards your comment

how will this file be cleaned up in case of crashes? It might stay there forever and syncing scripts might always assume it's being written. Although this can of course be scripted around pragmatically.

Well, this is a corner case, which is a challenge for both approaches.
I think that the monitoring script can periodically run ps -s to see if the rosbag2 recorder is still alive. If not, the script can detect the abnormal recorder termination situation and treat all bag files as finished, regardless of whether they have .active counterparts or not.

@facontidavide
Copy link
Author

Sooo... Shal we consider again the option of Linux file locks, then?

Note that they are not mutually exclusive.

It should technically work, even if people don't "see" that as they would with the "active" extension.

@MichaelOrlov
Copy link
Contributor

@facontidavide As regards

Sooo... Shal we consider again the option of Linux file locks, then?

The rosbag2 is one of the core packages, and we need to support Windows as well. I think that Linux file locks would be out of scope for consideration.

@facontidavide
Copy link
Author

I believe Windows have that too.
I can investigate, if you believe that it's worth it.

@oysstu
Copy link
Contributor

oysstu commented Jun 11, 2024

I kind of proposed a similar thing (an active file next to the file being recorded). @tonynajjar had a valid comment on that: #1597 (comment)

Won't the file next to the bag file being recorded have the same bag split suffix? E.g. HHMMSS_mmss_0.active, HHMMSS_mmss_1.active, and so on?

@oysstu
Copy link
Contributor

oysstu commented Jun 11, 2024

I believe Windows have that too. I can investigate, if you believe that it's worth it.

I have no need for this currently, but should it also support network shares (e.g., NFS, SMB)? Because I think that may complicate things even if there is OS support on both Linux and Windows for local files.

@MichaelOrlov
Copy link
Contributor

As regards:

Won't the file next to the bag file being recorded have the same bag split suffix? E.g. HHMMSS_mmss_0.active, HHMMSS_mmss_1.active, and so on?

Yes, it is going to be HHMMSS_mmss_0.mcap.active, HHMMSS_mmss_1.mcap.active, and so on.

@fujitatomoya
Copy link
Contributor

thanks to all comments and information.

In the case of using per-file compression, it will be an odd behavior with the race conditions for the scripts which can't be resolved by design. Consider a workflow

this looks really problematic for user... i think we need to avoid this for sure. but this also looks like it comes from current implementation details. if we can keep the active postfix until compression is completed, it is okay for user?

and i think requirement here is to keep the notice for the user the file id not ready yet in a common way of rosbag2 standard, as long as those are supported i think that is okay for users.

That is why the solution with creating a temporary empty file with the same name as the original currently recording file but with the .active extention instead of renaming the active file is better.

just checking, during recording we can see 2 files, the one is bag_file_name1.mcap (empty) and bag_file_name1.mcap.active? if that is true, there would be more complication for user to determine bag_file_name1.mcap is ready or not? because use extract the postfix from active file then find the empty file is not just ready yet?

in the file system (linux, no idea about windows), when we rename it, we can reuse the same inode without having any temporary one. but creating temporary file generates extra inode to be overwritten. i do not think this is the problem, but wanted to mentioned this because we would use inotify to monitor the directory and files and send it to the other place without scripting...

@jhurliman
Copy link

I kind of proposed a similar thing (an active file next to the file being recorded). @tonynajjar had a valid comment on that: #1597 (comment)

However, your proposal by keeping the same name with a suffix actually mitigates that. So seems like a feasible plan.

One note: how will this file be cleaned up in case of crashes? It might stay there forever and syncing scripts might always assume it's being written. Although this can of course be scripted around pragmatically.

Can you provide a reference for this pattern being used in some other software? I've never heard of it. On the other hand, the .active suffix on in-progress files and renaming after completion follows the same pattern used by all major browsers (with a .part or .crdownload suffix), so I would be inclined to follow that established design pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mark bags that are still being recorded
8 participants