-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: converting frame count is not supported. #81
Comments
On second thought, is this because batched inference might not be supported? I am using the PyTorch model btw. |
Batched inference is supported. The error you're getting, however, suggests that you're taking a query point from one video and using transforms.convert_grid_coordinates to convert it to a different framerate for use with the same video. This seldom makes sense, and it's easy to get it wrong, which is why we prevent the library function from doing it and force you to do it manually (are you converting framerate or cropping the video? The required operation will be different.) In other words, I think I need more details about how why you're converting across framerates in order to be able to help you. |
Really sorry for the late response. Currently, here is what I am trying to do:
I have a long video of length, say, (100, 224, 224, 3). I reshape it to (100, 256, 256, 3) and randomly sample a bunch of 8-frame clips from it of shape (8, 256, 256, 3) — suppose I sample, say, 16 clips per video. As for my query points, I pick N points that are spaced around a grid and use the same set of N query points for each video. Therefore, my batched input to the model is a video of dimension (16, 8, 256, 256, 3) and queries of dimension (16, N, 3). The video pixels are in the range (0, 255) and normalized to (-1, 1) and the query points are in the range (0, 255) (since the height and width of each image is 256).
That is all that I input to the model. I notice that there are other inputs I can feed to the forward function but I am not using any of them currently since I have been more or less trying to follow along the tutorial on the PyTorch Colab!
… On Feb 21, 2024, at 6:02 AM, cdoersch ***@***.***> wrote:
Batched inference is supported.
The error you're getting, however, suggests that you're taking a query point from one video and using transforms.convert_grid_coordinates to convert it to a different framerate for use with the same video. This seldom makes sense, and it's easy to get it wrong, which is why we prevent the library function from doing it and force you to do it manually (are you converting framerate or cropping the video? The required operation will be different.)
In other words, I think I need more details about how why you're converting across framerates in order to be able to help you.
—
Reply to this email directly, view it on GitHub <#81 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGBWZWVETWX4EMTEJDJKSGDYUX473AVCNFSM6AAAAABDRLALIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJWG4ZDANBSGQ>.
You are receiving this because you authored the thread.
|
Could you post a full stack trace? I don't see anything about this setup that shouldn't work. |
Sure! Here is what the stacktrace looks like (I did wrap the model with DataParallel to allow for faster multi-batch inference)
Here is what the code for calling TAPIR looks like
|
@cdoersch sorry for the ping! Do you have any insight into what might be going wrong here? Let me know if there is any other information I can provide. |
I think the issue is that TAPIR thinks that the time frame index in the query |
But @cdoersch I did a simpler experiment where I am just trying to pass 4 videos from Kubric (tensor dimensions are [4, 256, 256, 3]) to the model as a batch with the corresponding query points (tensor dim: [4, 256, 3]) and am still running into the same error Here is the full stacktrace:
Specifically,
is where I am making a forward pass with a batch. I think the issue is here: Line 338 in b0c6aa6
You flatten the video batch along the first two dimensions, but never unflatten it. Hence, if the batch size is 1, it will be fine, but if you have a batch size > 1, then the model thinks that you have a single video with batch size * T frames (where T is the number of frames per video) |
@cdoersch just bumping this up again, could you please confirm that this is an actual bug in the PyTorch version of the code which prevents batched forward passes? Thanks! |
@cdoersch bumping the bump above! |
@cdoersch here is a corrected version of the function. Could you please take a look and confirm if this is correct?
|
Hi all, I tried to reproduce your error according to but I could not. Here is what I adjust in https://colab.sandbox.google.com/github/deepmind/tapnet/blob/master/colabs/torch_tapir_demo.ipynb
The output looks reasonable to me.
|
I want to track points across many short horizon videos of, say, dimension (8, 256, 256, 3). Suppose I have B such videos and I want to track the same N points in each video. Then, the input that I pass into TAPIR is of dimensions (B, 8, 256, 256, 3) for the frames and (B, N, 3) for the queries. Suppose B = 4 and N = 32 for the sake of an example.
However, this always seems to give me a ValueError:
converting frame count is not supported.
Any ideas of what might be happening here?The text was updated successfully, but these errors were encountered: