Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The performance during training is always the same #95

Open
KassemKallas opened this issue May 11, 2018 · 10 comments
Open

The performance during training is always the same #95

KassemKallas opened this issue May 11, 2018 · 10 comments

Comments

@KassemKallas
Copy link

Dear all,
I am trying to train the model on windows 10 (CPU). The problem that I am finding is that the performance doesn't change at all even if the cost change a little. If I rerun the training the performance values change but remain again constant. Here is a snippet:

B7860 64.00% 64.00% loss: 10794.2373046875 (digits: 1051.9578857421875, presence: 9742.279296875) | X X XX X X XXX X X XX X X X XX |
time for 60 batches 324.8394412994385
PV73LEX 0.0 <-> QM69OTK 0.0
KZ48OUS 1.0 <-> QM69OTK 0.0
XF10UGX 0.0 <-> QM69OTK 0.0
HP51SYY 0.0 <-> QM69OTK 0.0
MQ82HOD 0.0 <-> QM69OTK 0.0
YF62RYQ 0.0 <-> QM69OTK 0.0
LE19HIO 0.0 <-> QM69OTK 0.0
XG44DHU 1.0 <-> QM69OTK 0.0
WM08RYQ 0.0 <-> QM69OTK 0.0
TZ23KIA 0.0 <-> QM69OTK 0.0
FB39LOJ 1.0 <-> QM69OTW 0.0
CP55DID 1.0 <-> QM69OTK 0.0
PN26VBI 0.0 <-> QM69OTK 0.0
FO65FUI 0.0 <-> QM69OTK 0.0
OP09YVZ 1.0 <-> QM69OTK 0.0
SK87TTT 0.0 <-> QM69OTK 0.0
EE78HSB 0.0 <-> QM69OTK 0.0
NM15DHP 1.0 <-> QM69OTK 0.0
WY52RKZ 0.0 <-> QM69OTK 0.0
AE21YYQ 0.0 <-> QM39OTK 0.0
AT37NOB 0.0 <-> QM69OTK 0.0
DD97XRW 0.0 <-> QM69OTK 0.0
DV44XSO 0.0 <-> QM69OTK 0.0
EX56ARF 1.0 <-> QM69OTK 0.0
RN63AOR 1.0 <-> QM69OTK 0.0
SQ19HKQ 1.0 <-> QM69OTK 0.0
QL68VPS 0.0 <-> QM69OTK 0.0
UJ87YEA 0.0 <-> QM69OTK 0.0
VN48ULX 1.0 <-> QM69OTK 0.0
DG23BSJ 0.0 <-> QM69OTK 0.0
GD77UFQ 0.0 <-> QM69OTK 0.0
RN27AOA 0.0 <-> QM69OTK 0.0
QX18QPV 0.0 <-> QM69OTK 0.0
KQ35RDE 1.0 <-> QM69OTK 0.0
IF80QMX 0.0 <-> QM69OTK 0.0
CE21AVV 1.0 <-> QM69OTK 0.0
UB26TQZ 1.0 <-> QM69OTK 0.0
EI30JGL 0.0 <-> QM69OTK 0.0
OU28NEY 1.0 <-> QM69OTK 0.0
MN01XZT 0.0 <-> QM69OTK 0.0
WK15APF 0.0 <-> QM69OTK 0.0
SS66HYB 1.0 <-> QM69OTK 0.0
NW44SQL 0.0 <-> QM69OTK 0.0
XI75LCF 0.0 <-> QM69OTK 0.0
IQ93XRG 0.0 <-> QM69OTK 0.0
NJ17XKK 1.0 <-> QM69OTK 0.0
MV55MGF 0.0 <-> QM69OTK 0.0
DK30EQB 1.0 <-> QM69OTK 0.0
WO74RMB 1.0 <-> QM69OTK 0.0
HV08HRX 0.0 <-> QM69OTK 0.0
B7880 64.00% 64.00% loss: 10789.783203125 (digits: 1051.4071044921875, presence: 9738.3759765625) | X X XX X X XXX X X XX X X X XX |
time for 60 batches 319.3657536506653

Please anyone had a similar issue?

Thank you in advance,
Best

@muneeb991
Copy link

how long did u train to get 64% accuracy?

@KassemKallas
Copy link
Author

Sir,

I have been training the model for more than 24 hours and the performance did not change. It started at 64% and remained there.

@muneeb991
Copy link

after 6 to 7 hours of training mine correction rate was 0% .it started at 0 and remained 0 after that much training. i'm training on gpu gtx1060 .any suggestion?

@Cazador6
Copy link

Cazador6 commented Jul 1, 2018

I have the same issue. Could you fix it?

@WaGjUb
Copy link

WaGjUb commented Jan 11, 2019

decrease your learn rate

@Abduoit
Copy link

Abduoit commented Jan 28, 2019

how did you guys change the batch_size. it takes only the first 50 images. ??!!

@WaGjUb
Copy link

WaGjUb commented Feb 1, 2019

@Abduoit around line 265 of train.py is an parameter to train method called batch_size.
But it's not taking only the first 50 images. this is the batch size and will take for each epoch an different batch to training. What is the same is the batch for test, around the line 232 is taking 50 images for dataset to test.

@Abduoit
Copy link

Abduoit commented Feb 1, 2019

Thanks @WaGjUb
Do you mean the first 50 images that we see in the terminal are for testing not for training, does this effect the training process?

I found this line in the train.py

test_xs, test_ys = unzip(list(read_data("test/*.png"))[:50])

I changed it to this

test_xs, test_ys = unzip(list(read_data("test/*.png"))[:batch_size]) and I left the line at the end as same like this
batch_size=50,

But I don't think this is correct, any suggestion please, should I leave it as it is ??

@WaGjUb
Copy link

WaGjUb commented Feb 2, 2019

@Abduoit

Do you mean the first 50 images that we see in the terminal are for testing not for training, does this effect the training process?

Yes! I think so because the training try to minimize the loss as you can see around line 175 "train_step = tf.train.AdamOptimizer(learn_rate).minimize(loss)"
It try to minimize the loss and the loss is calculated by the given result of predictions of test

But I don't think this is correct, any suggestion please, should I leave it as it is ??

I don't think you must do it, but will work as well. You just changed your test size as the same of train batch size.

@mazcallu
Copy link

mazcallu commented Feb 2, 2019

I think there is an error with the get_loss function:

def get_loss(y, y_):
# Calculate the loss from digits being incorrect. Don't count loss from
# digits that are in non-present plates.
digits_loss = tf.nn.softmax_cross_entropy_with_logits(tf.reshape(y[:, 1:], [-1, len(common.CHARS)]),tf.reshape(y_[:, 1:],[-1, len(common.CHARS)]))

If I understand right "y" are predictions and "y_" labels, so when calling tf.nn.softmax_cross_entropy_with_logits the parameters order should be:

tf.nn.softmax_cross_entropy_with_logits(
_sentinel=None,
labels=None,
logits=None,
dim=-1,
name=None
)
So y_should go in first position and then y, and it cannot be reversed since by definition there is a log that affects one of the terms and thus the function is not commutative:

https://stackoverflow.com/questions/36078411/tensorflow-are-my-logits-in-the-right-format-for-cross-entropy-function

So, get_loss has a bug and the order should be reversed if I am not wrong.

Let me know there is a mistake in my reasoning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants