-
Notifications
You must be signed in to change notification settings - Fork 415
Data Input
This page will introduce data input method in cxxnet. cxxnet use data iterator to provide data to the neural network. Iterators do some preprocessing and generate batch for the neural network.
- We provide basic iterators for MNIST, CIFAR-10, Image, Binary Image.
- To boost performance, we provide thread buffer for loading.
- Putting threadbuffer iterator after input iterator will open an independent thread to fetch from the input, this allows parallel of learning process and data fetching.
- We recommend you use thread buffer in all cases to avoid IO bottle neck.
Declarer the iterator in the form
iter = iterator_type
options 1 =
options 2 =
...
iter = end
- The basic iterator type is mnist , cifar , image , imgbin
- To use thread buffer, declare in this form
iter = iterator_type
options 1 =
options 2 =
...
iter = threadbuffer
iter = end
= Iterators
=
shuffle = 1
- shuffle set 1 to shuffle the training data.
=
- Required fields
path_img = path to gz file of image
path_label = path to gz file of label
input_flat = 1
- input_flat means loading the data in shape 1,1,784 or 1,28,28
=
- Required fields
path = path to CIFAR file folder
input_flat = 0
test = 1
batch1 = 1
batch2 = 0
batch3 = 1
batch4 = 0
batch5 = 0
- test , batch1 , batch2 , batch3 , batch4 , batch5 , is binary variable to choose which batch file to be used.
=
There are two ways to load images, image iterator that takes list of images in the disk, and image binary iterator that reads images from a packed binary file. Usually, I/O is a bottle neck, and image binary iterator makes training faster. However, we also provide image iterator for convenience
- Preprocessing Option for Image/Image Binary
rand_crop = 1
rand_mirror = 1
divideby = 256
image_mean = "img_mean.bin"
- rand_crop set 1 for cropping image to a larger space
- rand_mirror set 1 for random mirroring the training data
- divideby normalize the data by dividing a value
- image_mean normalize the data by minus the mean of all image. The value is the path of the mean image file. If the file doesn't exist, cxxnet will generate one.
- Required fields
image_list = path to the image list file
image_root = path to the image folder
The image_list is a formatted file. The format is
image_index \t label \t file_name
A valid image list file is like the following (NO header):
1 0 cat.5396.jpg
2 0 cat.11780.jpg
3 1 dog.11254.jpg
4 0 cat.6791.jpg
5 0 cat.7937.jpg
6 1 dog.9329.jpg
- image_root is the path to the folder contains files in the image list file.
Image binary iterator aims to reduce to IO cost in random seek. It is especially useful when deal with large amount for data like in ImageNet.
- Required field
image_list = path to the image list file
image_bin = path to the image binary file