Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad projection ? #30

Closed
LoickCh opened this issue Jun 21, 2022 · 9 comments
Closed

Bad projection ? #30

LoickCh opened this issue Jun 21, 2022 · 9 comments

Comments

@LoickCh
Copy link

LoickCh commented Jun 21, 2022

Hello,

I have notice something strange in the definition of "generate_planes" and "project_onto_planes". If I have well understood, you define three transfer matrices in "generate_planes" that are used to project coordinates in "project_onto_planes" before to keep only the first two coordinates of your projection.


Problem:

B=2
N_rays=11
coordinates= torch.randn(B,N_rays,3)
planes = generate_planes()
out = project_onto_planes(planes, coordinates)

If we set P=coordinates[0][0], since out[:3,0,:] is the projection, it is supposed to return:
[P[0], P[1]]
[P[1], P[2]]
[P[2], P[0]]

However, I got:
[P[0], P[1]]
[P[0], P[2]]
[P[2], P[0]]

If you prefer, I have: [(X,Y), (X,Z), (Z,X)]


Reason:
If I am right, I have found the reason. You defined planes by the following matrices:
[[1, 0, 0],[0, 1, 0],[0, 0, 1]]
[[1, 0, 0],[0, 0, 1],[0, 1, 0]]
[[0, 0, 1],[1, 0, 0],[0, 1, 0]]

Let us call the matrices M1, M2 and M3. Their inverts are:

          [[1, 0, 0],
M1^{-1} = [0, 1, 0],
          [0, 0, 1]]

          [[1, 0, 0], 
M2^{-1} = [0, 0, 1],
          [0, 1, 0]]

          [[0, 1, 0],
M3^{-1} = [0, 0, 1],
          [1, 0, 0]]

If I have a point P=(X,Y,Z), I got:
P @ M1^{-1} = (X,Y,Z)
P @ M2^{-1} = (X,Z,Y)
P @ M3^{-1} = (Z,X,Y)

Then, if I keep only the two coordinates, I have: [(X,Y), (X,Z), (Z,X)]


Possible solution:
Update "generate_planes" to:

torch.tensor([[[1, 0, 0],[0, 1, 0],[0, 0, 1]],
           [[0, 1, 0],[0, 0, 1],[1, 0, 0]],
           [[0, 0, 1],[1, 0, 0],[0, 1, 0]]
           ], dtype=torch.float32)

Do not hesitate to tell me if I am misunderstanding something.

@e4s2022
Copy link

e4s2022 commented Jun 23, 2022

I have a similar question. The inv_planes in the code is actually

tensor([[[1., 0., 0.],
         [0., 1., 0.],
         [0., 0., 1.]],

        [[1., 0., 0.],
         [0., 0., 1.],
         [0., 1., 0.]],

        [[0., 1., 0.],
         [0., 0., 1.],
         [1., 0., 0.]]
)

According to the PyTorch bmm Doc, the bmm will multiply the inv_planes on the right. If the input coordinates are [[x,y,z]], then the bmm result will be [xy, zx, zx]. However, if we multiply the inv_planes on the left (of course, you have to transpose the coordinates beforehand), the result will be [xy, xz, yz].

Not hundred percent sure.

@41xu
Copy link

41xu commented Jul 13, 2022

@ericryanchan also found the question about projection

@WeichuangLi
Copy link

Also the same problem, and I changed the projection to [xy, xz, yz] with the following code:

projections = torch.bmm(inv_planes, torch.transpose(coordinates, 1, 2)) return torch.transpose(projections, 1, 2)[..., :2]

After that, the visual quality of the image appears to have deteriorated instead.

image

image

Both images are generated at 201kimg (Top: Official version, Bottom: Changed version)

@e4s2022
Copy link

e4s2022 commented Jul 14, 2022

@WeichuangLi

Hey, I met similar deteriorated results as your top image shown. Did you strictly preprocess the FFHQ according to the given script? From my experience, if you use the original well-cropped FFHQ, the training results will be like the top image. I can confirm it is caused by the mismatch between the camera pose and face images since I've got the expected training results after re-cropping.

I think you can try to re-crop the images from FFHQ in-the-wild images first, then train the model with the changed projection code to see if it works. BTW, if you cannot process all the images (70k in total), you can choose to make up a subset, say 5k images. Please let me know If you have some updates, thank you.

@WeichuangLi
Copy link

@WeichuangLi

Hey, I met similar deteriorated results as your top image shown. Did you strictly preprocess the FFHQ according to the given script? From my experience, if you use the original well-cropped FFHQ, the training results will be like the top image. I can confirm it is caused by the mismatch between the camera pose and face images since I've got the expected training results after re-cropping.

I think you can try to re-crop the images from FFHQ in-the-wild images first, then train the model with the changed projection code to see if it works. BTW, if you cannot process all the images (70k in total), you can choose to make up a subset, say 5k images. Please let me know If you have some updates, thank you.

Hi @bd20222 ,

Thanks for your kind advice, I used the same dataset as the official version, which is got by sending an email to Eric.

After training for a longer time, the model seems to generate much better results. I do not have many ideas about the scenario. Personally, I think it might be induced by different initializations. I also attached the generated images below for your kind perusal.

image

As for the projection, I think both strategies should work, as they both include different coordinates, even though the original result is [xy, xz, zx], it did include the z-coordinate. But I think the revised version might align with the strategy mentioned in the paper and my intuition.

Best regards,
Weichuang

@e4s2022
Copy link

e4s2022 commented Jul 14, 2022

Yuh, I can also get similar training results by following how Eric processed the dataset, and below are my generated faces ([xy, xz, xz] version):
image

I agree as both coordinates are contained. So the above faces you attached are generated by the revised version, i.e., [xy, xz, yz]?

@WeichuangLi
Copy link

Yuh, I can also get similar training results by following how Eric processed the dataset, and below are my generated faces ([xy, xz, xz] version): image

I agree as both coordinates are contained. So the above faces you attached are generated by the revised version, i.e., [xy, xz, yz]?

Sorry for missing out on that information. Yes, the images attached above are generated with the revised version at 2217kimg. If training for a longer time, the result might be better.

@e4s2022
Copy link

e4s2022 commented Jul 14, 2022

Cool, mine is at 2400kimg, but I used a subset of FFHQ to train. (~5K training images).

Have a nice day. : )

@LoickCh LoickCh closed this as completed Aug 20, 2022
@luminohope
Copy link
Collaborator

Please see the relevant post here: #67

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants