Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doing source finding before cropping the training area #9

Open
eahussein opened this issue Jun 28, 2022 · 6 comments
Open

Doing source finding before cropping the training area #9

eahussein opened this issue Jun 28, 2022 · 6 comments
Assignees
Labels
question Further information is requested

Comments

@eahussein
Copy link
Owner

Why do we have to crop the image first and then perform source finding separately on the training and the whole image?

Can't we just perform source finding on the whole image and then split the data into training and testing??

@eahussein eahussein changed the title Doing source finding before coping the training area Doing source finding before cropping the training area Jun 29, 2022
@eahussein eahussein added the question Further information is requested label Jun 29, 2022
@eahussein eahussein self-assigned this Jun 29, 2022
@eahussein
Copy link
Owner Author

eahussein commented Jul 13, 2022

there is a lot to this,

  • it seems that they are selecting every 2 raws in the training DF, but why if we cropped an area to training why skip, or why to divide the data in 1/2
  • it seems that they are using scorer.run() for comparing against the GT data, but no idea what the parameters mean, and what the training has to do with anything here.
  • the training parameter
    - if it is true, means only get training area,
    - exclude training area

Screenshot 2022-07-13 at 13 48 46

@eahussein
Copy link
Owner Author

The above method uses coordinates the following coordinates : 1400: {"ra_min": -0.2688, "ra_max": 0.0, "dec_min": -29.9400, "dec_max": -29.7265},

which is the same as the one used when cropping the data

so it seems that there is no data leakage

@eahussein
Copy link
Owner Author

eahussein commented Jul 13, 2022

New Questions.,

  • why then there is still a duplication of sources
  • why do we need to separate the training image from the full image, if we can just shuffle the whole df later from the full image
  • why are we skipping every two rows ON THE TRAINING DATA

In the meantime, I will follow my own path unless someone convinces me otherwise

@eahussein
Copy link
Owner Author

That explains why when we sued ML on the testing set, there was no sign of duplications
Screenshot 2022-07-14 at 13 48 39
Screenshot 2022-07-14 at 13 49 06

@eahussein
Copy link
Owner Author

New Questions.,

  • why then there is still a duplication of sources using the np.close
    • (maybe just the sources are close to each other (bad explanation, but the code above looks good) )
  • why do we need to separate the training image from the full image, if we can just shuffle the whole df later from the full image
  • why are we skipping every two rows ON THE TRAINING DATA
    -why are we using the smaller image in the training, and the bigger image in the testing, I don't understand.

In the meantime, I will follow their solution no problem

@eahussein
Copy link
Owner Author

training area before the slice:
Screenshot 2022-07-14 at 14 31 52

training after the slice:
Screenshot 2022-07-14 at 14 34 21

eahussein added a commit that referenced this issue Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant