Skip to content

ML results

Venkatesh Iyer edited this page Apr 6, 2023 · 10 revisions

Handwritten Alphanumeric model

Iteration 1:

Model details Dataset Accuracy achieved
Model trained and inference on Existing dataset 99.80%
Model inference on NIST dataset 62.70%
Average across all datasets - 81.25%

Reasoning - As model is trained on the existing dataset, doesn't perform good on NIST dataset

Iteration 2:

Model details Dataset Accuracy achieved
Model trained on Existing dataset + NIST misclassifications 83.30%

Reasoning - After training the model on NIST misclassifications, improvement in accuracy

Iteration 3:

Model details Dataset Accuracy achieved
Model trained on Existing dataset + NIST misclassifications 93.90%

Reasoning - After training the model on NIST misclassifications as per previous checkpoint, improvement in accuracy

Iteration 4:

Model details Dataset Accuracy achieved
Model trained on Existing dataset + manually collected dataset from sheets (~1K samples) 93.90%

Reasoning - Training the model upon previous checkpoint and adding manually collected data, improvement in accuracy

Handwritten Digits model

Iteration 1:

Model details Dataset Accuracy achieved
Model trained and inference on Existing dataset 99.90%
Model inference on NIST dataset 60.00%
Model inference on Obtained production data (~50 samples) 97.70%
Average across all datasets - 85.80%

Reasoning - As model is trained on the existing dataset, doesn't perform good on NIST dataset

Iteration 2:

Model details Dataset Accuracy achieved
Model trained on Existing dataset + NIST misclasifications 96.40%

Reasoning - After training the model on NIST misclassifications, improvement in accuracy

Iteration 3:

Model details Dataset Accuracy achieved
Model trained on Existing dataset + NIST misclasifications + production dataset (~50 samples) 99.70%

Reasoning: After training the model on NIST misclassifications and production dataset, improvement in accuracy

Iteration 4:

Model details Dataset Accuracy achieved
Model trained on Existing dataset + NIST misclasifications + manually collected dataset from sheets (~8.6k samples) 98.30%

Reasoning: As averaging upon a large production dataset, the accuracy slightly dips as compared to iteration 3

Sample dataset images

Existing dataset

Handwritten alphanumeric

0a56f9eb1545428f8d9aea0665b7c460 0b0de241b7e64e918fada2f896efdf24 00b6ef24818b4323b4ada81ad433af67 0ac132e3693c47ccb2fe40fb465ba060 0aeaa24048ff4c5d8312d37671c6e08e 0b7f1f188faf453b83704a069791f28f 0bd15fe615db458781bce7dc32a926b5 0adb585eae814b62a56695650cac0666 0b0ffa9c3dcf497c8da259d391f5f159 0b2c4c540a7247629e1dd7a9abd59962

Handwritten digits

0b1a2b29b0904989a3f6df3a13b93d89 0abb9b222a1142bdaabd30b6742ca8b2 0afd4ac792f14b8185d54cb9d0937b3e 0ac2ad40-7af8-44b4-9829-284a2b33416c_printed 00af4fd32ddc42efbfb1222725599ecf 0aa35cc5-4472-455d-a575-7167c15df849_generated 0a8986b29e804002a3136ddf110f3638 0a1d909f2f2847b795ac40cefa452c3e 0__0040c46a-eae7-4219-bffb-7dd418ad9ffb_up_govt 00c9de07f7b640d083680a5c21661a8b

NIST dataset

Handwritten alphanumeric

hsf_0_00017 hsf_0_00043 hsf_0_00017 hsf_0_00020 hsf_0_00010 hsf_0_00017 hsf_0_00020 hsf_0_00008 hsf_0_00025 hsf_0_00010

Handwritten digits

hsf_0_00015 hsf_0_00010 hsf_0_00017 hsf_0_00009 hsf_0_00117 hsf_0_00009 hsf_0_00012 hsf_0_00019 hsf_0_00009 hsf_0_00013

Manually collected

Handwritten alphanumeric

img014-032 img015-044 img016-045 img023-039 img026-001 img029-045 img013-040 img032-053 img034-055 img011-033

Handwritten digits

33665 33679 33758 33745 33650 33753 31372 33362 31075 31065

Some unhandled misclassifications

18103 28926 607 30658 402 32358 24141 30838 24158 32225

Reasoning: Generally occurs if the digits are written in corners of the cell

Clone this wiki locally