The performance during training is always the same #95

KassemKallas · 2018-05-11T07:09:10Z

Dear all,
I am trying to train the model on windows 10 (CPU). The problem that I am finding is that the performance doesn't change at all even if the cost change a little. If I rerun the training the performance values change but remain again constant. Here is a snippet:

B7860 64.00% 64.00% loss: 10794.2373046875 (digits: 1051.9578857421875, presence: 9742.279296875) | X X XX X X XXX X X XX X X X XX |
time for 60 batches 324.8394412994385
PV73LEX 0.0 <-> QM69OTK 0.0
KZ48OUS 1.0 <-> QM69OTK 0.0
XF10UGX 0.0 <-> QM69OTK 0.0
HP51SYY 0.0 <-> QM69OTK 0.0
MQ82HOD 0.0 <-> QM69OTK 0.0
YF62RYQ 0.0 <-> QM69OTK 0.0
LE19HIO 0.0 <-> QM69OTK 0.0
XG44DHU 1.0 <-> QM69OTK 0.0
WM08RYQ 0.0 <-> QM69OTK 0.0
TZ23KIA 0.0 <-> QM69OTK 0.0
FB39LOJ 1.0 <-> QM69OTW 0.0
CP55DID 1.0 <-> QM69OTK 0.0
PN26VBI 0.0 <-> QM69OTK 0.0
FO65FUI 0.0 <-> QM69OTK 0.0
OP09YVZ 1.0 <-> QM69OTK 0.0
SK87TTT 0.0 <-> QM69OTK 0.0
EE78HSB 0.0 <-> QM69OTK 0.0
NM15DHP 1.0 <-> QM69OTK 0.0
WY52RKZ 0.0 <-> QM69OTK 0.0
AE21YYQ 0.0 <-> QM39OTK 0.0
AT37NOB 0.0 <-> QM69OTK 0.0
DD97XRW 0.0 <-> QM69OTK 0.0
DV44XSO 0.0 <-> QM69OTK 0.0
EX56ARF 1.0 <-> QM69OTK 0.0
RN63AOR 1.0 <-> QM69OTK 0.0
SQ19HKQ 1.0 <-> QM69OTK 0.0
QL68VPS 0.0 <-> QM69OTK 0.0
UJ87YEA 0.0 <-> QM69OTK 0.0
VN48ULX 1.0 <-> QM69OTK 0.0
DG23BSJ 0.0 <-> QM69OTK 0.0
GD77UFQ 0.0 <-> QM69OTK 0.0
RN27AOA 0.0 <-> QM69OTK 0.0
QX18QPV 0.0 <-> QM69OTK 0.0
KQ35RDE 1.0 <-> QM69OTK 0.0
IF80QMX 0.0 <-> QM69OTK 0.0
CE21AVV 1.0 <-> QM69OTK 0.0
UB26TQZ 1.0 <-> QM69OTK 0.0
EI30JGL 0.0 <-> QM69OTK 0.0
OU28NEY 1.0 <-> QM69OTK 0.0
MN01XZT 0.0 <-> QM69OTK 0.0
WK15APF 0.0 <-> QM69OTK 0.0
SS66HYB 1.0 <-> QM69OTK 0.0
NW44SQL 0.0 <-> QM69OTK 0.0
XI75LCF 0.0 <-> QM69OTK 0.0
IQ93XRG 0.0 <-> QM69OTK 0.0
NJ17XKK 1.0 <-> QM69OTK 0.0
MV55MGF 0.0 <-> QM69OTK 0.0
DK30EQB 1.0 <-> QM69OTK 0.0
WO74RMB 1.0 <-> QM69OTK 0.0
HV08HRX 0.0 <-> QM69OTK 0.0
B7880 64.00% 64.00% loss: 10789.783203125 (digits: 1051.4071044921875, presence: 9738.3759765625) | X X XX X X XXX X X XX X X X XX |
time for 60 batches 319.3657536506653

Please anyone had a similar issue?

Thank you in advance,
Best

muneeb991 · 2018-05-21T07:36:19Z

how long did u train to get 64% accuracy?

KassemKallas · 2018-05-29T07:41:44Z

Sir,

I have been training the model for more than 24 hours and the performance did not change. It started at 64% and remained there.

muneeb991 · 2018-05-29T10:24:05Z

after 6 to 7 hours of training mine correction rate was 0% .it started at 0 and remained 0 after that much training. i'm training on gpu gtx1060 .any suggestion?

Cazador6 · 2018-07-01T21:09:55Z

I have the same issue. Could you fix it?

WaGjUb · 2019-01-11T11:32:08Z

decrease your learn rate

Abduoit · 2019-01-28T21:41:02Z

how did you guys change the batch_size. it takes only the first 50 images. ??!!

WaGjUb · 2019-02-01T12:12:01Z

@Abduoit around line 265 of train.py is an parameter to train method called batch_size.
But it's not taking only the first 50 images. this is the batch size and will take for each epoch an different batch to training. What is the same is the batch for test, around the line 232 is taking 50 images for dataset to test.

Abduoit · 2019-02-01T16:32:47Z

Thanks @WaGjUb
Do you mean the first 50 images that we see in the terminal are for testing not for training, does this effect the training process?

I found this line in the train.py

test_xs, test_ys = unzip(list(read_data("test/*.png"))[:50])

I changed it to this

test_xs, test_ys = unzip(list(read_data("test/*.png"))[:batch_size]) and I left the line at the end as same like this
batch_size=50,

But I don't think this is correct, any suggestion please, should I leave it as it is ??

WaGjUb · 2019-02-02T00:01:06Z

@Abduoit

Do you mean the first 50 images that we see in the terminal are for testing not for training, does this effect the training process?

Yes! I think so because the training try to minimize the loss as you can see around line 175 "train_step = tf.train.AdamOptimizer(learn_rate).minimize(loss)"
It try to minimize the loss and the loss is calculated by the given result of predictions of test

But I don't think this is correct, any suggestion please, should I leave it as it is ??

I don't think you must do it, but will work as well. You just changed your test size as the same of train batch size.

mazcallu · 2019-02-02T14:30:31Z

I think there is an error with the get_loss function:

def get_loss(y, y_):
# Calculate the loss from digits being incorrect. Don't count loss from
# digits that are in non-present plates.
digits_loss = tf.nn.softmax_cross_entropy_with_logits(tf.reshape(y[:, 1:], [-1, len(common.CHARS)]),tf.reshape(y_[:, 1:],[-1, len(common.CHARS)]))

If I understand right "y" are predictions and "y_" labels, so when calling tf.nn.softmax_cross_entropy_with_logits the parameters order should be:

tf.nn.softmax_cross_entropy_with_logits(
_sentinel=None,
labels=None,
logits=None,
dim=-1,
name=None
)
So y_should go in first position and then y, and it cannot be reversed since by definition there is a log that affects one of the terms and thus the function is not commutative:

https://stackoverflow.com/questions/36078411/tensorflow-are-my-logits-in-the-right-format-for-cross-entropy-function

So, get_loss has a bug and the order should be reversed if I am not wrong.

Let me know there is a mistake in my reasoning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The performance during training is always the same #95

The performance during training is always the same #95

KassemKallas commented May 11, 2018

muneeb991 commented May 21, 2018

KassemKallas commented May 29, 2018

muneeb991 commented May 29, 2018

Cazador6 commented Jul 1, 2018

WaGjUb commented Jan 11, 2019

Abduoit commented Jan 28, 2019

WaGjUb commented Feb 1, 2019 •

edited

Loading

Abduoit commented Feb 1, 2019

WaGjUb commented Feb 2, 2019

mazcallu commented Feb 2, 2019

The performance during training is always the same #95

The performance during training is always the same #95

Comments

KassemKallas commented May 11, 2018

muneeb991 commented May 21, 2018

KassemKallas commented May 29, 2018

muneeb991 commented May 29, 2018

Cazador6 commented Jul 1, 2018

WaGjUb commented Jan 11, 2019

Abduoit commented Jan 28, 2019

WaGjUb commented Feb 1, 2019 • edited Loading

Abduoit commented Feb 1, 2019

WaGjUb commented Feb 2, 2019

mazcallu commented Feb 2, 2019

WaGjUb commented Feb 1, 2019 •

edited

Loading