Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Mediainfo Dependency #5762

Closed
1 task
joncameron opened this issue Apr 2, 2024 · 8 comments
Closed
1 task

Remove Mediainfo Dependency #5762

joncameron opened this issue Apr 2, 2024 · 8 comments
Assignees

Comments

@joncameron
Copy link
Contributor

Description

Work related to incorrect duration values being written into Avalon metadata for files (see #5754) has brought to attention Avalon's use of an older version of Mediainfo. Mediainfo is only being used for basic information gathering and could be removed as a dependency altogether in favor of the FFmpeg toolchain. This would free Avalon from using this older version of mediainfo, and reduce breaking changes related to mediainfo.

This is separate from the Active Encode gem which still uses mediainfo. Active Encode uses the current version of mediainfo with the updated XML schema and does not have this same problem.

Done Looks Like

  • Mediainfo calls replaced with ffprobe
@joncameron
Copy link
Contributor Author

joncameron commented Apr 10, 2024

A separate issue can be created in the Active Encode repo to remove mediainfo as a dependency there as well.

Edit: samvera-labs/active_encode#136

@joncameron
Copy link
Contributor Author

joncameron commented Sep 4, 2024

While testing this I passed in a video file (https://drive.google.com/file/d/110WAo6DaHKXAre5ksW7cLYsEOrBoDzyB/view?usp=drive_link) that went through as audio:

The raw object info in the dashboard reports the video stream as expected but the encode profile is for audio, and after file upload the flash message appears indicating that Avalon found it to be an audio item.

I tried an upload with lunchroom manners and it seemed fine and reported as a video: https://avalon-dev.dlib.indiana.edu/media_objects/8910jt59d.

I tried another time with the Libya to Somalia file and could reproduce that it identifies as audio. This is all via web upload.

@masaball do you think this is related or should I open a new issue?

@masaball
Copy link
Contributor

masaball commented Sep 4, 2024

Yes, that is a bug with this work. One of the checks we use for identifying video is to confirm that there is a display aspect ratio value. This check was because text files are treated as image/video files by ffprobe but don't generate that field, so we could use it to reject them.

Running the example file through ffprobe, there is no display aspect ratio so it fails the video check. So looks like we will need to go back and reevaluate those checks to avoid false positives/negatives.

@joncameron
Copy link
Contributor Author

My proposal for this would be to use the existence of duration rather than aspect ratio and packet values.

Using ffprobe version 7.0.1 on my local machine, it looks like a plain text file returned an empty hash when I used the parameters in

raw_output = `#{ffprobe} -i "#{media_path}" -v quiet -show_format -show_streams -count_packets -of json`
. A PNG file returned "codec_type": "video" but it doesn't look like duration is present for images. I thought maybe we could just let these pass through, but it looks like they'll just error out during the encode job anyway without getting wrapped up into a 1-frame video file so it doesn't really matter there. If a video stream is present and has a duration value, I assume that should go through the encode process without a problem unless there was some other unrelated issue with the file.

For this check:

return true if @video_stream && @video_stream[:display_aspect_ratio] && @video_stream[:nb_read_packets].to_i > 1

I was thinking something like return true if @video_stream && @video_stream[:duration]

What do you think?

@masaball
Copy link
Contributor

masaball commented Sep 5, 2024

That does look like it should work. Some stuff may slip through but will probably just error in active encode. I am a little baffled because I have a test jpeg that returns a duration when I run the command in the CLI but does not return a duration when processed during the upload. No clue why there is a discrepancy with that, but keying on @video_stream[:duration] does seem to be consistently catching what it should.

Will make the change.

@masaball
Copy link
Contributor

masaball commented Sep 5, 2024

I'm still not sure why there is this discrepancy in how ffprobe is reporting duration (maybe something changes when the file is loaded into minio during upload that changes how ffprobe reads it?) but an image file is failing to be processed properly in the automated tests. I'm going to remove the display aspect ratio test as discussed, but am going to switch back to using frame count for image filtering because that seems to be more consistent than duration.

@cjcolvar
Copy link
Member

cjcolvar commented Sep 6, 2024

ActiveStorage uses marcel to determine mime type which seems like something we could use before ffprobe to deal with this problem.
I think we'd do something like (although I'm not totally sure about how this works with s3 vs. on-disk):

Marcel::MimeType.for IO.binread(media_path, 4.kilobytes)

See https://github.com/rails/rails/blob/main/activestorage/app/models/active_storage/blob/identifiable.rb#L23

@joncameron
Copy link
Contributor Author

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants