Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skipping silence sometimes adds delay #884

Open
lesderid opened this issue Aug 4, 2024 · 6 comments
Open

Skipping silence sometimes adds delay #884

lesderid opened this issue Aug 4, 2024 · 6 comments

Comments

@lesderid
Copy link

lesderid commented Aug 4, 2024

Actual behaviour

With some songs, skipping silence introduces delay between output and microphone input.

Expected behaviour

Skipping silence doesn't introduce delay for any song.

Steps to reproduce

  1. Play an affected song
  2. Skip silence (with S)
  3. There is now a 100ms+ delay between output and microphone input

Details

  • USDX version: v2024.5.1
  • Operating System + version: Windows Server 2019
@barbeque-squared
Copy link
Member

some songs

Can you check if those songs happen to be mp3 with variable bitrate? I don't know where you can check this on Windows, but Windows has different mp3 handling compared to mac and linux, and variable bitrate has been known to cause issues.

If they are constant bitrate, or not an mp3 in the first place, you're going to somehow need to share an affected song. If it's downloaded through the USDB Syncer, just put the USDB link here (plus what audio format you're using), otherwise the best way is probably PM'ing it through Discord (see https://usdx.eu/contact/ for the Discord link, then find my username in the #ultrastar-deluxe channel) or emailing on my git commit email.

(don't put audio files / real txt's directly on github)

@lesderid
Copy link
Author

lesderid commented Aug 4, 2024

Can you check if those songs happen to be mp3 with variable bitrate?

I checked some MP3s with a variable bitrate and some with a constant bitrate, and it does indeed seem to only be an issue with variable bitrate MP3s.

(Edit: For reference, I used mediainfo in MSYS2 to check.)

@lesderid
Copy link
Author

lesderid commented Aug 4, 2024

If it's downloaded through the USDB Syncer

I use my own script to download from USDB. I don't know if USDB Syncer existed when I made it, but I at least didn't know about it....

or not an mp3 in the first place, you're going to somehow need to share an affected song.

My script essentially does ffmpeg -i source.* -q:a 0 output.mp3, which, depending on the input, can output VBR MP3s.

What audio codecs does USDX support in the #MP3 field? Can I just store the source audio in FLAC, or encode to Opus?

@s09bQ5
Copy link
Collaborator

s09bQ5 commented Aug 4, 2024

USDX supports in the #MP3 field everything supported by ffmpeg.

VBR MP3 files are not designed for accurate seeking. When we tell libavformat that we want to seek to position x, the library has to guess where x is inside the MP3 file based on the frame sizes it has already seen. And once it has chosen a position inside the file, it has no way to verify which point in time that corresponds to since MP3 frames don't contain time stamps. This works ok if all frames have the same number of bytes, but with VBR this is not the case. The only way to avoid problems like these is to decode the whole audio file from the beginning to the point where you want to skip to.

It's not the fault of mp3. The problem is the lack of a container format like matroska or mp4 around the audio data. You can add one e.g. with ffmpeg -i x.mp3 -f matroska -c:a copy x.mka. But when you are not limited to mp3, using a different codec where the data traditionally lives inside a better container, would fix this as well.

@lesderid
Copy link
Author

lesderid commented Aug 4, 2024

USDX supports in the #MP3 field everything supported by ffmpeg.

Oh, that's good to know.

VBR MP3 files are not designed for accurate seeking. When we tell libavformat that we want to seek to position x, the library has to guess where x is inside the MP3 file based on the frame sizes it has already seen. And once it has chosen a position inside the file, it has no way to verify which point in time that corresponds to since MP3 frames don't contain time stamps. This works ok if all frames have the same number of bytes, but with VBR this is not the case. The only way to avoid problems like these is to decode the whole audio file from the beginning to the point where you want to skip to.

Ah, that clears things up, thanks for the explanation!

It's not the fault of mp3. The problem is the lack of a container format like matroska or mp4 around the audio data. You can add one e.g. with ffmpeg -i x.mp3 -f matroska -c:a copy x.mka. But when you are not limited to mp3, using a different codec where the data traditionally lives inside a better container, would fix this as well.

I'm not sure I really understand this. A Matroska file contains timestamps that point to a media track number and a frame number? If so, if I make a Matroska file with my VBR MP3 data (like the ffmpeg example you gave), wouldn't it have to decode the whole audio stream to generate those timestamps? If so, why does it take significantly less time than decoding to a PCM stream?

lesderid@hachi10 /c/G/u/vbrtest> time ffmpeg -i in.mp3 -f matroska -c:a copy - > /dev/null 2>&1

________________________________________________________
Executed in   73.61 millis    fish           external
   usr time    0.00 millis    0.00 micros    0.00 millis
   sys time   31.00 millis    0.00 micros   31.00 millis

vs

lesderid@hachi10 /c/G/u/vbrtest> time ffmpeg -i in.mp3 -f s16le - > /dev/null 2>&1

________________________________________________________
Executed in  252.20 millis    fish           external
   usr time   15.00 millis    0.00 micros   15.00 millis
   sys time   15.00 millis    0.00 micros   15.00 millis

(both fastest of 10 runs)


In any case, I feel like decoding the 'whole' audio file in the case of VBR MP3s (in the native MPEG-1/MPEG-2 container) would be reasonable for USDX to do? I can't think of many cases where the extra decoding time would be slower than the time saved by skipping.

@s09bQ5
Copy link
Collaborator

s09bQ5 commented Aug 4, 2024

A Matroska file contains timestamps that point to a media track number and a frame number? If so, if I make a Matroska file with my VBR MP3 data (like the ffmpeg example you gave), wouldn't it have to decode the whole audio stream to generate those timestamps? If so, why does it take significantly less time than decoding to a PCM stream?

Matroska files usually contain a KaxCues block that tells FFmpeg where to find some points in time inside the file. In the test file I created from an MP3 this is in 5 second steps. In addition to that Matroska places a time stamp right before every MP3 frame.

FFmpeg treats MP3 both as a container format and as an audio codec. The muxing into Matroska only needs the container format code from libavformat. That code will look at each frame to determine how long it is in bytes and (micro)seconds. The code in libavcodec to convert an MP3 frame into PCM samples is not used. It is not full decoding, but every frame has to be looked at in the correct order to calculate the time stamps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants