Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] export_frames_media option when exporting datasets #4717

Open
2 of 6 tasks
evatt-harvey-salinger opened this issue Aug 23, 2024 · 2 comments
Open
2 of 6 tasks

[FR] export_frames_media option when exporting datasets #4717

evatt-harvey-salinger opened this issue Aug 23, 2024 · 2 comments
Labels
feature Work on a feature request

Comments

@evatt-harvey-salinger
Copy link
Contributor

evatt-harvey-salinger commented Aug 23, 2024

Proposal Summary

When exporting any dataset with dataset.export(...), the export_media=True option copies the source imagery (or videos) into the export_dir directory. This is especially useful when the rel_dir option is the same export_dir, since that directory can be moved anywhere and reloaded instantly. However, this option only applies to media in sample.filepath, currently.

Frames media in frames.filepath is not exportable in that way, OR importable in that way. It makes sense that not exporting the frames media would be the default; it's much tidier to only depend on the video samples and sample the frames at will. But, there are plenty of use cases where it would be helpful to export and reimport the frames media directly. For example:

  • maybe I hadn’t used dataset.to_frames(sample_frames=True) to sample the frames, but I used some other fancy ffmpeg command. I don’t want to have to rerun it.
  • maybe the video samples were formed by stitching together raw images, and encoding then re-sampling degrades the image quality too much (my actual use case)

So, I'm proposing an option like export_frames_media...

dataset.export(
    ...
    rel_dir=
    export_media=True,         # currently exists, exports samples only
    export_frames_media=True,  # doesn't exist
)

...which gives the user the option to export to frames media into the export_dir along with the samples.

Bonus Issue: rel_dir does not extend to frames.filepath

One related change would be for rel_dir to operate on frame.filepath in the same way it applies to sample.filepath. Lets say I export a video dataset with rel_dir, move the dataset+media to a new directory, then import my dataset with rel_dir. Currently, my sample.filepaths will point to the new directory but my frame.filepaths will still point to the old directory! Even if I moved the sampled frames into the export_dir, the frames would still have to be resampled. This is a clear inconsistency, but I'll link this slack thread for details.

These seems like straightforward and helpful changes! Wish I had more time right now to implement them...

P.S.: passing ffmpeg options to dataset.to_frames could be easier. I followed dataset.to_frames down the rabbit hole to eta.video.FFmpeg and tried to expose the in_opts and out_opts back up the chain to to_frames, but the sparse sampling techniques threw me.

Motivation

Captured above.

What areas of FiftyOne does this feature affect?

  • App: FiftyOne application
  • Core: Core fiftyone Python library
  • Server: FiftyOne server

Details

Captured above.

Willingness to contribute

  • Yes. I can contribute this feature independently
  • Yes. I would be willing to contribute this feature with guidance from the FiftyOne community
  • No. I cannot contribute this feature at this time 😢
@evatt-harvey-salinger evatt-harvey-salinger added the feature Work on a feature request label Aug 23, 2024
@brimoor
Copy link
Contributor

brimoor commented Aug 23, 2024

Hi @evatt-harvey-salinger 👋

To clarify: are you exporting in FiftyOneDataset format?

I think it would be reasonable to just update the exporter to check if a frames.filepath field exists and, if so, assume that users wants that media to be included in the export as well when export_media=True. After all, if the dataset has multiple media fields declared, then it already includes those media in the export.

In fact, a natural thing to do could be to support this:

# Declare that dataset contains frame images that should be treated as media for export (and other purposes)
dataset.app_config.media_fields = ["filepath", "frames.filepath"]

dataset.export(
    export_dir="/path/to/export",
    dataset_type=fo.types.FiftyOneDataset,
    export_media=True,
)

@evatt-harvey-salinger
Copy link
Contributor Author

evatt-harvey-salinger commented Aug 23, 2024

Ah, yes I am specifically exporting to fo.types.FiftyOneDataset.

I like the idea of an "opt-in" option rather than exporting the frames media, since the sampling of frames can be easily recreated at-will in most contexts. That's why I'm suggesting the export_frame_media kwarg.

Interesting, I've never touched the app_config, or declared what was "media" field and what wasn't... maybe I need to learn more in that area, but it strikes me as more intuitive to provide a means of doing it without touching anything "app" related. (In fact, I almost never launch the app for my video datasets, because it takes so long for them to load across my network 🤣 Since I have a ton of high-res videos, I almost always pick a couple videos and/or subset of frames and launch the framesView. Usually, I've only labelled a fraction of the frames anyways...)

Just my two cents as consistent user of video datasets, without deep knowledge of the inner workings. Whatever you think is best!

Edit: Reading up a bit about the concept of multiple media fields, your proposal to use dataset.app_config.media_fields does make more sense. (This is pretty neat, I might try to use it when doing data augmentation stuff..)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Work on a feature request
Projects
None yet
Development

No branches or pull requests

2 participants