Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cost and Weights are NAN #6

Open
tRosenflanz opened this issue Jul 17, 2017 · 2 comments
Open

Cost and Weights are NAN #6

tRosenflanz opened this issue Jul 17, 2017 · 2 comments

Comments

@tRosenflanz
Copy link

Hello,

During training cost goes to NAN probably because one of the weights becomes too large and data goes out of bounds of float32. This causes all other weights to become NAN as well. I think classic way to deal with is to add Batch Normalization layers which clips large updates to weights however my limited understanding of Theano and your script prevents me from testing it out... Also cost seems quite high- have you seen similar values with your training? Let me know your thoughts on this:
image

@mp2893
Copy link
Owner

mp2893 commented Jul 19, 2017

Hi,

I've seen similar cost values to yours in my experiments, although I don't recall running into NaNs.
Since I don't know your experiment detail (e.g. the dataset, the application, etc) I can't really say for sure why you run into NaNs.
But I think simple gradient clipping would suffice in this situation.
I can't guarantee when, but I will add an option to turn on gradient clipping in the future.

@tRosenflanz
Copy link
Author

tRosenflanz commented Jul 21, 2017

That sounds fair to me- I appreciate the help and understand that you have other stuff going on!

For all purposes I think it is fair to say that my dataset is similar to the one from the paper - it is a list of icd,cpt,ndc codes from person's visit to the doctor with the transformations from the provided ReadMe (lists of indexed ints with different patients separated by [-1] )

I have tried implementing gradient clipping by adding grad_clip on total_cost in build_model method but even with thresholds of -.5 and .5 I am still eventually getting NAN (probably because that is not the right way to do it)
Here is the lengthy output of NanGuardMode if it helps:

Med2Vec$ CUDA_VISIBLE_DEVICES=0 THEANO_FLAGS=mode=NanGuardMode python med2vec.py /da
ta/trosenfl/visit.pkl 32228 output.pkl --batch_size 10 --cr_size 500 --vr_size 1000 --window_size 3 --verbose --n_epoch 20
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switc
h to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: Tesla P100-SXM2-16GB (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5110)
initializing parameters
building models
loading data
object
training start
epoch:0, iteration:0/1475771, cost:374.362030
epoch:0, iteration:10/1475771, cost:292.984375
epoch:0, iteration:20/1475771, cost:333.820068
epoch:0, iteration:30/1475771, cost:496.689789
Traceback (most recent call last):
  File "med2vec.py", line 321, in <module>
    train_med2vec(seqFile=args.seq_file, demoFile=args.demo_file, labelFile=args.label_file, outFile=args.out_file, numXcode
s=args.n_input_codes, numYcodes=args.n_output_codes, embDimSize=args.cr_size, hiddenDimSize=args.vr_size, batchSize=args.bat
ch_size, maxEpochs=args.n_epoch, L2_reg=args.L2_reg, demoSize=args.demo_size, windowSize=args.window_size, logEps=args.log_e
ps, verbose=args.verbose)
  File "med2vec.py", line 289, in train_med2vec
    cost = f_grad_shared(x, mask, iVector, jVector)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 884, in __call__
    self.fn() if output_subset is None else\
  File "/usr/local/lib/python2.7/dist-packages/theano/gof/vm.py", line 513, in __call__
    storage_map=storage_map)
  File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/local/lib/python2.7/dist-packages/theano/gof/vm.py", line 482, in __call__
    _, dt = self.run_thunk_of_node(current_apply)
  File "/usr/local/lib/python2.7/dist-packages/theano/gof/vm.py", line 402, in run_thunk_of_node
    compute_map=self.compute_map,
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/nanguardmode.py", line 344, in nan_check
    do_check_on(storage_map[var][0], node)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/nanguardmode.py", line 332, in do_check_on
    raise AssertionError(msg)
AssertionError: NaN detected
NanGuardMode found an error in the output of a node in this variable:
GpuElemwise{Composite{(((((((-i0) + (-i1)) / i2) + (((-i3) + (-i4)) / i5)) + (((-i6) + (-i7)) / i8)) + ((-i9) / i10)) + (i11
 * i12))}}[(0, 0)] [id A] ''
 |GpuCAReduce{add}{1,1} [id B] ''
 | |GpuElemwise{Composite{((i0 * log(i1)) + (i2 * log1p((-i3))))},no_inplace} [id C] ''
 |   |GpuSubtensor{int64::} [id D] ''
 |   | |GpuFromHost [id E] ''
 |   | | |x [id F]
 |   | |Constant{1} [id G]
 |   |GpuElemwise{add,no_inplace} [id H] ''
 |   | |CudaNdarrayConstant{[[  9.99999994e-09]]} [id I]
 |   | |GpuElemwise{mul,no_inplace} [id J] ''
 |   |   |GpuSubtensor{:int64:} [id K] ''
 |   |   | |GpuSoftmaxWithBias [id L] ''
 |   |   | | |GpuDot22 [id M] ''
 |   |   | | | |GpuElemwise{maximum,no_inplace} [id N] ''
 |   |   | | | | |GpuElemwise{Add}[(0, 0)] [id O] ''
 |   |   | | | | | |GpuDot22 [id P] ''
 |   |   | | | | | | |GpuElemwise{maximum,no_inplace} [id Q] ''
 |   |   | | | | | | | |GpuElemwise{Add}[(0, 0)] [id R] ''
 |   |   | | | | | | | | |GpuDot22 [id S] ''
 |   |   | | | | | | | | | |GpuFromHost [id E] ''
 |   |   | | | | | | | | | |W_emb [id T]
 |   |   | | | | | | | | |GpuDimShuffle{x,0} [id U] ''
 |   |   | | | | | | | |   |b_emb [id V]
 |   |   | | | | | | | |CudaNdarrayConstant{[[ 0.]]} [id W]
 |   |   | | | | | | |W_hidden [id X]
 |   |   | | | | | |GpuDimShuffle{x,0} [id Y] ''
 |   |   | | | | |   |b_hidden [id Z]
 |   |   | | | | |CudaNdarrayConstant{[[ 0.]]} [id W]
 |   |   | | | |W_output [id BA]
 |   |   | | |b_output [id BB]
 |   |   | |Constant{-1} [id BC]
 |   |   |GpuElemwise{mul,no_inplace} [id BD] ''
 |   |     |GpuDimShuffle{0,x} [id BE] ''
 |   |     | |GpuSubtensor{:int64:} [id BF] ''
 |   |     |   |GpuFromHost [id BG] ''
 |   |     |   | |mask [id BH]
 |   |     |   |Constant{-1} [id BC]
 |   |     |GpuDimShuffle{0,x} [id BI] ''
 |   |       |GpuSubtensor{int64::} [id BJ] ''
 |   |         |GpuFromHost [id BG] ''
 |   |         |Constant{1} [id G]
 |   |GpuElemwise{sub,no_inplace} [id BK] ''
 |   | |CudaNdarrayConstant{[[ 1.]]} [id BL]
 |   | |GpuSubtensor{int64::} [id D] ''
 |   |GpuElemwise{mul,no_inplace} [id J] ''
 |GpuCAReduce{add}{1,1} [id BM] ''
 | |GpuElemwise{Composite{((i0 * log(i1)) + (i2 * log1p((-i3))))},no_inplace} [id BN] ''
 |   |GpuSubtensor{:int64:} [id BO] ''
 |   | |GpuFromHost [id E] ''
 |   | |Constant{-1} [id BC]
 |   |GpuElemwise{add,no_inplace} [id BP] ''
 |   | |CudaNdarrayConstant{[[  9.99999994e-09]]} [id I]
 |   | |GpuElemwise{mul,no_inplace} [id BQ] ''
 |   |   |GpuSubtensor{int64::} [id BR] ''
 |   |   | |GpuSoftmaxWithBias [id L] ''
 |   |   | |Constant{1} [id G]
 |   |   |GpuElemwise{mul,no_inplace} [id BD] ''
 |   |GpuElemwise{sub,no_inplace} [id BS] ''
 |   | |CudaNdarrayConstant{[[ 1.]]} [id BL]
 |   | |GpuSubtensor{:int64:} [id BO] ''
 |   |GpuElemwise{mul,no_inplace} [id BQ] ''
 |GpuElemwise{Add}[(0, 1)] [id BT] ''
 | |CudaNdarrayConstant{9.99999993923e-09} [id BU]
 | |GpuCAReduce{add}{1,1} [id BV] ''
 |   |GpuElemwise{mul,no_inplace} [id BD] ''
 |GpuCAReduce{add}{1,1} [id BW] ''
 | |GpuElemwise{Composite{((i0 * log(i1)) + (i2 * log1p((-i3))))},no_inplace} [id BX] ''
 |   |GpuSubtensor{int64::} [id BY] ''
 |   | |GpuFromHost [id E] ''
 |   | |Constant{2} [id BZ]
 |   |GpuElemwise{add,no_inplace} [id CA] ''
 |   | |CudaNdarrayConstant{[[  9.99999994e-09]]} [id I]
 |   | |GpuElemwise{mul,no_inplace} [id CB] ''
 |   |   |GpuSubtensor{:int64:} [id CC] ''
 |   |   | |GpuSoftmaxWithBias [id L] ''
 |   |   | |Constant{-2} [id CD]
 |   |   |GpuElemwise{Composite{((i0 * i1) * i2)},no_inplace} [id CE] ''
 |   |     |GpuDimShuffle{0,x} [id CF] ''
 |   |     | |GpuSubtensor{:int64:} [id CG] ''
 |   |     |   |GpuFromHost [id BG] ''
 |   |     |   |Constant{-2} [id CD]
 |   |     |GpuDimShuffle{0,x} [id CH] ''
 |   |     | |GpuSubtensor{int64:int64:} [id CI] ''
 |   |     |   |GpuFromHost [id BG] ''
 |   |     |   |Constant{1} [id G]
 |   |     |   |Constant{-1} [id BC]
 |   |     |GpuDimShuffle{0,x} [id CJ] ''
 |   |       |GpuSubtensor{int64::} [id CK] ''
 |   |         |GpuFromHost [id BG] ''
 |   |         |Constant{2} [id BZ]
 |   |GpuElemwise{sub,no_inplace} [id CL] ''
 |   | |GpuSubtensor{int64::} [id BY] ''
 |   |GpuElemwise{mul,no_inplace} [id CB] ''
 |GpuCAReduce{add}{1,1} [id CM] ''
 | |GpuElemwise{Composite{((i0 * log(i1)) + (i2 * log1p((-i3))))},no_inplace} [id CN] ''
 |   |GpuSubtensor{:int64:} [id CO] ''
 |   | |GpuFromHost [id E] ''
 |   | |Constant{-2} [id CD]
 |   |GpuElemwise{add,no_inplace} [id CP] ''
 |   | |CudaNdarrayConstant{[[  9.99999994e-09]]} [id I]
 |   | |GpuElemwise{mul,no_inplace} [id CQ] ''
 |   |   |GpuSubtensor{int64::} [id CR] ''
 |   |   | |GpuSoftmaxWithBias [id L] ''
 |   |   | |Constant{2} [id BZ]
 |   |   |GpuElemwise{Composite{((i0 * i1) * i2)},no_inplace} [id CE] ''
 |   |GpuElemwise{sub,no_inplace} [id CS] ''
 |   | |CudaNdarrayConstant{[[ 1.]]} [id BL]
 |   | |GpuSubtensor{:int64:} [id CO] ''
 |   |GpuElemwise{mul,no_inplace} [id CQ] ''
 |GpuElemwise{Add}[(0, 1)] [id CT] ''
 | |CudaNdarrayConstant{9.99999993923e-09} [id BU]
 | |GpuCAReduce{add}{1,1} [id CU] ''
 |   |GpuElemwise{Composite{((i0 * i1) * i2)},no_inplace} [id CE] ''
 |GpuCAReduce{add}{1,1} [id CV] ''
 | |GpuElemwise{Composite{((i0 * log(i1)) + (i2 * log1p((-i3))))},no_inplace} [id CW] ''
 |   |GpuSubtensor{int64::} [id CX] ''
 |   | |GpuFromHost [id E] ''
 |   | |Constant{3} [id CY]
 |   |GpuElemwise{add,no_inplace} [id CZ] ''
 |   | |CudaNdarrayConstant{[[  9.99999994e-09]]} [id I]
 |   | |GpuElemwise{mul,no_inplace} [id DA] ''
 |   |   |GpuSubtensor{:int64:} [id DB] ''
 |   |   | |GpuSoftmaxWithBias [id L] ''
 |   |   | |Constant{-3} [id DC]
 |   |   |GpuElemwise{Composite{(((i0 * i1) * i2) * i3)},no_inplace} [id DD] ''
 |   |     |GpuDimShuffle{0,x} [id DE] ''
 |   |     | |GpuSubtensor{:int64:} [id DF] ''
 |   |     |   |GpuFromHost [id BG] ''
 |   |     |   |Constant{-3} [id DC]
 |   |     |GpuDimShuffle{0,x} [id DG] ''
 |   |     | |GpuSubtensor{int64:int64:} [id DH] ''
 |   |     |   |GpuFromHost [id BG] ''
 |   |     |   |Constant{1} [id G]
 |   |     |   |Constant{-2} [id CD]
 |   |     |GpuDimShuffle{0,x} [id DI] ''
 |   |     | |GpuSubtensor{int64:int64:} [id DJ] ''
 |   |     |   |GpuFromHost [id BG] ''
 |   |     |   |Constant{2} [id BZ]
 |   |     |   |Constant{-1} [id BC]
 |   |     |GpuDimShuffle{0,x} [id DK] ''
 |   |       |GpuSubtensor{int64::} [id DL] ''
 |   |         |GpuFromHost [id BG] ''
 |   |         |Constant{3} [id CY]
 |   |GpuElemwise{sub,no_inplace} [id DM] ''
 |   | |CudaNdarrayConstant{[[ 1.]]} [id BL]
 |   | |GpuSubtensor{int64::} [id CX] ''
 |   |GpuElemwise{mul,no_inplace} [id DA] ''
 |GpuCAReduce{add}{1,1} [id DN] ''
 | |GpuElemwise{Composite{((i0 * log(i1)) + (i2 * log1p((-i3))))},no_inplace} [id DO] ''
 |   |GpuSubtensor{:int64:} [id DP] ''
 |   | |GpuFromHost [id E] ''
 |   | |Constant{-3} [id DC]
 |   |GpuElemwise{add,no_inplace} [id DQ] ''
 |   | |CudaNdarrayConstant{[[  9.99999994e-09]]} [id I]
 |   | |GpuElemwise{mul,no_inplace} [id DR] ''
 |   |   |GpuSubtensor{int64::} [id DS] ''
 |   |   | |GpuSoftmaxWithBias [id L] ''
 |   |   | |Constant{3} [id CY]
 |   |   |GpuElemwise{Composite{(((i0 * i1) * i2) * i3)},no_inplace} [id DD] ''
 |   |GpuElemwise{sub,no_inplace} [id DT] ''
 |   | |CudaNdarrayConstant{[[ 1.]]} [id BL]
 |   | |GpuSubtensor{:int64:} [id DP] ''
 |   |GpuElemwise{mul,no_inplace} [id DR] ''
 |GpuElemwise{Add}[(0, 1)] [id DU] ''
 | |CudaNdarrayConstant{9.99999993923e-09} [id BU]
 | |GpuCAReduce{add}{1,1} [id DV] ''
 |   |GpuElemwise{Composite{(((i0 * i1) * i2) * i3)},no_inplace} [id DD] ''
 |GpuCAReduce{add}{1} [id DW] ''
 | |GpuElemwise{log,no_inplace} [id DX] ''
 |   |GpuElemwise{Composite{(i0 + (i1 / i2))},no_inplace} [id DY] ''
 |     |CudaNdarrayConstant{[  9.99999994e-09]} [id DZ]
 |     |GpuElemwise{Exp}[(0, 0)] [id EA] ''
 |     | |GpuCAReduce{add}{0,1} [id EB] ''
 |     |   |GpuElemwise{mul,no_inplace} [id EC] ''
 |     |     |GpuAdvancedSubtensor1 [id ED] ''
 |     |     | |GpuElemwise{maximum,no_inplace} [id EE] ''
 |     |     | | |W_emb [id T]
 |     |     | | |CudaNdarrayConstant{[[ 0.]]} [id W]
 |     |     | |Elemwise{Cast{int64}} [id EF] ''
 |     |     |   |iVector [id EG]
 |     |     |GpuAdvancedSubtensor1 [id EH] ''
 |     |       |GpuElemwise{maximum,no_inplace} [id EE] ''
 |     |       |Elemwise{Cast{int64}} [id EI] ''
 |     |         |jVector [id EJ]
 |     |GpuAdvancedSubtensor1 [id EK] ''
 |       |GpuCAReduce{add}{0,1} [id EL] ''
 |       | |GpuElemwise{Exp}[(0, 0)] [id EM] ''
 |       |   |GpuDot22 [id EN] ''
 |       |     |GpuElemwise{maximum,no_inplace} [id EE] ''
 |       |     |GpuDimShuffle{1,0} [id EO] ''
 |       |       |GpuElemwise{maximum,no_inplace} [id EE] ''
 |       |Elemwise{Cast{int64}} [id EF] ''
 |GpuFromHost [id EP] ''
 | |Elemwise{Cast{float32}} [id EQ] ''
 |   |Shape_i{0} [id ER] ''
 |     |iVector [id EG]
 |CudaNdarrayConstant{0.0010000000475} [id ES]
 |GpuCAReduce{pre=sqr,red=add}{1,1} [id ET] ''
   |W_emb [id T]



Apply node that caused the error: GpuElemwise{Composite{(((((((-i0) + (-i1)) / i2) + (((-i3) + (-i4)) / i5)) + (((-i6) + (-i
7)) / i8)) + ((-i9) / i10)) + (i11 * i12))}}[(0, 0)](GpuCAReduce{add}{1,1}.0, GpuCAReduce{add}{1,1}.0, GpuElemwise{Add}[(0,
1)].0, GpuCAReduce{add}{1,1}.0, GpuCAReduce{add}{1,1}.0, GpuElemwise{Add}[(0, 1)].0, GpuCAReduce{add}{1,1}.0, GpuCAReduce{ad
d}{1,1}.0, GpuElemwise{Add}[(0, 1)].0, GpuCAReduce{add}{1}.0, GpuFromHost.0, CudaNdarrayConstant{0.0010000000475}, GpuCARedu
ce{pre=sqr,red=add}{1,1}.0)
Toposort index: 138
Inputs types: [CudaNdarrayType(float32, scalar), CudaNdarrayType(float32, scalar), CudaNdarrayType(float32, scalar), CudaNda
rrayType(float32, scalar), CudaNdarrayType(float32, scalar), CudaNdarrayType(float32, scalar), CudaNdarrayType(float32, scal
ar), CudaNdarrayType(float32, scalar), CudaNdarrayType(float32, scalar), CudaNdarrayType(float32, scalar), CudaNdarrayType(f
loat32, scalar), CudaNdarrayType(float32, scalar), CudaNdarrayType(float32, scalar)]
Inputs shapes: [(), (), (), (), (), (), (), (), (), (), (), (), ()]
Inputs strides: [(), (), (), (), (), (), (), (), (), (), (), (), ()]
Inputs values: [CudaNdarray(nan), CudaNdarray(-75.6651153564), CudaNdarray(9.0), CudaNdarray(-68.2716598511), CudaNdarray(-6
8.2715835571), CudaNdarray(8.0), CudaNdarray(-60.6214866638), CudaNdarray(-60.6214637756), CudaNdarray(7.0), CudaNdarray(0.0
), CudaNdarray(0.0), CudaNdarray(0.0010000000475), CudaNdarray(462.694641113)]
Outputs clients: [[HostFromGpu(GpuElemwise{Composite{(((((((-i0) + (-i1)) / i2) + (((-i3) + (-i4)) / i5)) + (((-i6) + (-i7))
 / i8)) + ((-i9) / i10)) + (i11 * i12))}}[(0, 0)].0)]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants