-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very large data dimension size when initialising training #62
Comments
So, PINK expects a unique file format that contains the input images to train against. Among other things, you have to specify the conditionality of the images, their actual dimensions, their data-type, and the number of images. Having a quick look at your issue, I am guessing the problem is that PINK is reading a set of bytes at the start of the file, and is inferring that your data has many (many!) dimensions. What it is actually reading though are random bytes belonging to the numpy array that has been written to disk. There is an example of the expected header format in script form here: https://github.com/HITS-AIN/PINK/blob/master/scripts/create_test_image.py A couple years ago when I was working with PINK I did write a very basic Hopefully this helps :) |
Ooooh, okay that makes sense! Thanks for the tips and links, I'll have a read through these scripts and see if I can get these images into the right format finally; your |
It was a few year years ago when I wrote that toy. Please feel free to
reach out if there is anything I can help with.
…On Tue, 13 Sep 2022, 5:29 pm SpaceMeerkat, ***@***.***> wrote:
Ooooh, okay that makes sense! Thanks for the tips and links, I'll have a
read through these scripts and see if I can get these images into the right
format finally; your ImageWriter looks really really helpful thanks!
—
Reply to this email directly, view it on GitHub
<#62 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACQOAJWIQIGVVFJPU7ZAIOTV6BCQDANCNFSM6AAAAAAQLF73BQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Please find a description of the binary file format at https://github.com/HITS-AIN/PINK/blob/master/FILE_FORMATS.md. |
Thanks, I really appreciate that! |
Hi Both, Thanks for your replies in this thread. Happy to close it if it's annoying having me leave it open, but I thought I'd ask a quick question first! I've been able to train the SOM, and map images to the trained map using the But is there a way of providing the training images to the trained map, and replacing the images which are output by Sorry if that's a really naive question, I'm struggling with the binary file paradigm which means that even though I know how to do it in theory, the direct reading (and then comparison) of the mapped_data.bin file and the saved SOM.bin files is a little confusing for me still! @tjgalvin , if there's a specific function in your Many thanks for any help on this! |
Sorry for my delayed reply @SpaceMeerkat. The use of binary files can be a little tricky to get around. I am a little unclear on what you are asking, but I think it is how to list all images that best match to a particule BMU of interest. So, to do this you will need to load the map file. This should have a shape that is something like You would need to have a particular neuron of interest, and you will need to find all images where their minimum euclidean distance is at the neuron coordinate. Does that make sense? Once you have these indices, you can use them to pull out the images of interest from your image binary file. If you are using my code from Sorry that there are no doc pages for this old code of mine. I never got around to doing it. |
Hey @tjgalvin , don't be sorry I'm just happy to have any help on this at all! I didn't do a great job of explaining my question but you were pretty close with what you guessed I was getting at! I was actually aiming to do the reverse, so for a particular map neuron, find the image from the training images that best matches to it... but from the It's a great package of helper functions, I can imagine it'll be a pain to go back and write docs for it now but when used hand-in-hand with the PINK package this makes life so much easier! Thanks for all this help :D |
Hi,
I'm trying to train an SOM (just to get used to the way PINK works, hence the tiny file numbers and parameters below) but when I set the training run going, my output log file is growing in size at a rate of ~1Gb every 30 seconds, being filled with imformation like that shown in the quote below.
I set the training run going using the following:
$Pink --train _scripts/test.bin _pink_out/som.bin --som-width 10 --som-height 10 --num-iter 1 --numrot 4
So as you can see I'm keeping this little training run simple with a small gridsize as well as only 4 rotations per image.
This is for a training run where I'm just using 3 images of size 128x128, in float 32 format. So I'm confused as to why the number of data entries is so high in the first line of the quote above?
I'd really appreciate any help you can give on this!
Extra info
My images are stored in
test.bin
using the following numpy raw binary example:... where
stacked_images
has shape (128, 128, 3) or (3, 128, 128). (I've tried both ways to see if that was the problem but it still causes the issue above)The text was updated successfully, but these errors were encountered: