Dataset issue #2

Kannadasa · 2020-05-12T12:25:49Z

Hi,

I am testing your model, but i am not getting the desired output. I think i am not distributing the data properly in train and valid folders.

Please let me know how you are creating the folder structure and loading the images for train and valid datasets. This is for binary classification

LelisThanos · 2020-05-14T10:34:52Z

Hello,
same issues here, having trouble reproducing your code with loading and distributing images for train and validation datasets.

muhammedtalo · 2020-05-14T11:20:54Z

You may use https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html
for spiting the datasets.
I have also provided our results for three classes. Please see COVID-19 main repository.

Kannadasa · 2020-05-15T17:19:20Z

Do we know the actual results of the X-Ray images ? or can i assume that all 125 x-ray images inside Covid-19 folder are covid-19 positive ?

Thanks
Kannadasan

muhammedtalo · 2020-05-16T12:18:00Z

Do we know the actual results of the X-Ray images ? or can i assume that all 125 x-ray images inside Covid-19 folder are covid-19 positive ?

Thanks
Kannadasan
Yes, the X-ray images inside the Covid-19 folder are covid-19 positive. The folder names are given in terms of diagnosis results.

Kannadasa · 2020-05-17T21:38:05Z

Hi Muhammed,

Have you got the code to implement K-Fold on the datasets ?

Thanks
Kannadasan

Kannadasa · 2020-05-19T15:11:49Z

Hi Muhammed,
By having 125 images in Covid-19 and 500 images in No_findings folder, are we not dealing with imbalanced dataset ?

The reason why i am asking is i trained your model using KFold datasets but i am getting only 58% accuracy. I am printing below one of the iteration output and I think somewhere something is wrong in my code.

epoch | train_loss | valid_loss | accuracy | time

0 | 0.003996 | 0.006837 | 1.000000 | 02:26
1 | 0.002480 | 0.004546 | 1.000000 | 02:10
2 | 0.001856 | 0.002552 | 1.000000 | 02:06
3 | 0.001408 | 0.001160 | 1.000000 | 02:04
4 | 0.001097 | 0.000621 | 1.000000 | 02:06
5 | 0.000911 | 0.000315 | 1.000000 | 02:11
6 | 0.000743 | 0.000152 | 1.000000 | 02:10
7 | 0.000620 | 0.000084 | 1.000000 | 02:12
8 | 0.000522 | 0.000066 | 1.000000 | 02:10
9 | 0.000442 | 0.000044 | 1.000000 | 02:09
10 | 0.000372 | 0.000033 | 1.000000 | 02:10
11 | 0.000316 | 0.000022 | 1.000000 | 02:09
12 | 0.000272 | 0.000018 | 1.000000 | 02:10
13 | 0.000233 | 0.000018 | 1.000000 | 02:10
14 | 0.000201 | 0.000017 | 1.000000 | 02:08
15 | 0.000173 | 0.000017 | 1.000000 | 02:10
16 | 0.000149 | 0.000015 | 1.000000 | 02:08
17 | 0.000129 | 0.000014 | 1.000000 | 02:10
18 | 0.000112 | 0.000014 | 1.000000 | 02:06
19 | 0.000097 | 0.000014 | 1.000000 | 02:07
20 | 0.000084 | 0.000015 | 1.000000 | 02:05
21 | 0.000074 | 0.000014 | 1.000000 | 02:07
22 | 0.000064 | 0.000014 | 1.000000 | 02:07
23 | 0.000056 | 0.000011 | 1.000000 | 02:07
24 | 0.000049 | 0.000010 | 1.000000 | 02:07
25 | 0.000043 | 0.000009 | 1.000000 | 02:07
26 | 0.000038 | 0.000009 | 1.000000 | 02:09
27 | 0.000034 | 0.000008 | 1.000000 | 02:10
28 | 0.000030 | 0.000007 | 1.000000 | 02:10
29 | 0.000026 | 0.000007 | 1.000000 | 02:10
30 | 0.000023 | 0.000007 | 1.000000 | 02:10
31 | 0.000020 | 0.000007 | 1.000000 | 02:11
32 | 0.000018 | 0.000007 | 1.000000 | 02:06
33 | 0.000016 | 0.000006 | 1.000000 | 02:07
34 | 0.000014 | 0.000006 | 1.000000 | 02:06
35 | 0.000012 | 0.000006 | 1.000000 | 02:08
36 | 0.000011 | 0.000005 | 1.000000 | 02:07
37 | 0.000010 | 0.000005 | 1.000000 | 02:07
38 | 0.000009 | 0.000005 | 1.000000 | 02:07
39 | 0.000008 | 0.000005 | 1.000000 | 02:10
40 | 0.000007 | 0.000005 | 1.000000 | 02:09
41 | 0.000006 | 0.000005 | 1.000000 | 02:10
42 | 0.000006 | 0.000005 | 1.000000 | 02:08
43 | 0.000005 | 0.000005 | 1.000000 | 02:10
44 | 0.000005 | 0.000004 | 1.000000 | 02:11
45 | 0.000004 | 0.000004 | 1.000000 | 02:12
46 | 0.000004 | 0.000004 | 1.000000 | 02:10
47 | 0.000003 | 0.000004 | 1.000000 | 02:13
48 | 0.000003 | 0.000004 | 1.000000 | 02:06
49 | 0.000003 | 0.000004 | 1.000000 | 02:07
50 | 0.000003 | 0.000004 | 1.000000 | 02:12
51 | 0.000002 | 0.000004 | 1.000000 | 02:10
52 | 0.000002 | 0.000004 | 1.000000 | 02:12
53 | 0.000002 | 0.000004 | 1.000000 | 02:10
54 | 0.000002 | 0.000004 | 1.000000 | 02:10
55 | 0.000002 | 0.000004 | 1.000000 | 02:07
56 | 0.000002 | 0.000003 | 1.000000 | 02:09
57 | 0.000001 | 0.000003 | 1.000000 | 02:10
58 | 0.000001 | 0.000004 | 1.000000 | 02:08
59 | 0.000001 | 0.000004 | 1.000000 | 02:09
60 | 0.000001 | 0.000004 | 1.000000 | 02:10
61 | 0.000001 | 0.000004 | 1.000000 | 02:12
62 | 0.000001 | 0.000004 | 1.000000 | 02:09
63 | 0.000001 | 0.000004 | 1.000000 | 02:09
64 | 0.000001 | 0.000003 | 1.000000 | 02:08
65 | 0.000001 | 0.000003 | 1.000000 | 02:09
66 | 0.000001 | 0.000003 | 1.000000 | 02:10
67 | 0.000001 | 0.000004 | 1.000000 | 02:09
68 | 0.000001 | 0.000004 | 1.000000 | 02:11
69 | 0.000001 | 0.000003 | 1.000000 | 02:08
70 | 0.000001 | 0.000003 | 1.000000 | 02:09
71 | 0.000001 | 0.000003 | 1.000000 | 02:08
72 | 0.000001 | 0.000003 | 1.000000 | 02:08
73 | 0.000001 | 0.000003 | 1.000000 | 02:07
74 | 0.000001 | 0.000003 | 1.000000 | 02:06
75 | 0.000001 | 0.000003 | 1.000000 | 02:08
76 | 0.000001 | 0.000003 | 1.000000 | 02:07
77 | 0.000001 | 0.000003 | 1.000000 | 02:07
78 | 0.000001 | 0.000003 | 1.000000 | 02:07
79 | 0.000001 | 0.000003 | 1.000000 | 02:07
80 | 0.000001 | 0.000003 | 1.000000 | 02:06
81 | 0.000000 | 0.000004 | 1.000000 | 02:08
82 | 0.000000 | 0.000003 | 1.000000 | 02:11
83 | 0.000000 | 0.000003 | 1.000000 | 02:08
84 | 0.000000 | 0.000003 | 1.000000 | 02:10
85 | 0.000000 | 0.000003 | 1.000000 | 02:09
86 | 0.000000 | 0.000003 | 1.000000 | 02:08
87 | 0.000000 | 0.000003 | 1.000000 | 02:08
88 | 0.000000 | 0.000003 | 1.000000 | 02:09
89 | 0.000000 | 0.000003 | 1.000000 | 02:09
90 | 0.000000 | 0.000003 | 1.000000 | 02:08
91 | 0.000000 | 0.000003 | 1.000000 | 02:08
92 | 0.000000 | 0.000003 | 1.000000 | 02:09
93 | 0.000000 | 0.000003 | 1.000000 | 02:09
94 | 0.000000 | 0.000003 | 1.000000 | 02:09
95 | 0.000000 | 0.000003 | 1.000000 | 02:09
96 | 0.000000 | 0.000003 | 1.000000 | 02:11
97 | 0.000000 | 0.000003 | 1.000000 | 02:11
98 | 0.000000 | 0.000003 | 1.000000 | 02:09
99 | 0.000000 | 0.000003 | 1.000000 | 02:08

muhammedtalo · 2020-05-20T11:57:31Z

Hi Muhammed,
By having 125 images in Covid-19 and 500 images in No_findings folder, are we not dealing with imbalanced dataset ?

The reason why i am asking is i trained your model using KFold datasets but i am getting only 58% accuracy. I am printing below one of the iteration output and I think somewhere something is wrong in my code.

epoch | train_loss | valid_loss | accuracy | time

0 | 0.003996 | 0.006837 | 1.000000 | 02:26
1 | 0.002480 | 0.004546 | 1.000000 | 02:10
2 | 0.001856 | 0.002552 | 1.000000 | 02:06
3 | 0.001408 | 0.001160 | 1.000000 | 02:04
4 | 0.001097 | 0.000621 | 1.000000 | 02:06
5 | 0.000911 | 0.000315 | 1.000000 | 02:11
6 | 0.000743 | 0.000152 | 1.000000 | 02:10
7 | 0.000620 | 0.000084 | 1.000000 | 02:12
8 | 0.000522 | 0.000066 | 1.000000 | 02:10
9 | 0.000442 | 0.000044 | 1.000000 | 02:09
10 | 0.000372 | 0.000033 | 1.000000 | 02:10
11 | 0.000316 | 0.000022 | 1.000000 | 02:09
12 | 0.000272 | 0.000018 | 1.000000 | 02:10
13 | 0.000233 | 0.000018 | 1.000000 | 02:10
14 | 0.000201 | 0.000017 | 1.000000 | 02:08
15 | 0.000173 | 0.000017 | 1.000000 | 02:10
16 | 0.000149 | 0.000015 | 1.000000 | 02:08
17 | 0.000129 | 0.000014 | 1.000000 | 02:10
18 | 0.000112 | 0.000014 | 1.000000 | 02:06
19 | 0.000097 | 0.000014 | 1.000000 | 02:07
20 | 0.000084 | 0.000015 | 1.000000 | 02:05
21 | 0.000074 | 0.000014 | 1.000000 | 02:07
22 | 0.000064 | 0.000014 | 1.000000 | 02:07
23 | 0.000056 | 0.000011 | 1.000000 | 02:07
24 | 0.000049 | 0.000010 | 1.000000 | 02:07
25 | 0.000043 | 0.000009 | 1.000000 | 02:07
26 | 0.000038 | 0.000009 | 1.000000 | 02:09
27 | 0.000034 | 0.000008 | 1.000000 | 02:10
28 | 0.000030 | 0.000007 | 1.000000 | 02:10
29 | 0.000026 | 0.000007 | 1.000000 | 02:10
30 | 0.000023 | 0.000007 | 1.000000 | 02:10
31 | 0.000020 | 0.000007 | 1.000000 | 02:11
32 | 0.000018 | 0.000007 | 1.000000 | 02:06
33 | 0.000016 | 0.000006 | 1.000000 | 02:07
34 | 0.000014 | 0.000006 | 1.000000 | 02:06
35 | 0.000012 | 0.000006 | 1.000000 | 02:08
36 | 0.000011 | 0.000005 | 1.000000 | 02:07
37 | 0.000010 | 0.000005 | 1.000000 | 02:07
38 | 0.000009 | 0.000005 | 1.000000 | 02:07
39 | 0.000008 | 0.000005 | 1.000000 | 02:10
40 | 0.000007 | 0.000005 | 1.000000 | 02:09
41 | 0.000006 | 0.000005 | 1.000000 | 02:10
42 | 0.000006 | 0.000005 | 1.000000 | 02:08
43 | 0.000005 | 0.000005 | 1.000000 | 02:10
44 | 0.000005 | 0.000004 | 1.000000 | 02:11
45 | 0.000004 | 0.000004 | 1.000000 | 02:12
46 | 0.000004 | 0.000004 | 1.000000 | 02:10
47 | 0.000003 | 0.000004 | 1.000000 | 02:13
48 | 0.000003 | 0.000004 | 1.000000 | 02:06
49 | 0.000003 | 0.000004 | 1.000000 | 02:07
50 | 0.000003 | 0.000004 | 1.000000 | 02:12
51 | 0.000002 | 0.000004 | 1.000000 | 02:10
52 | 0.000002 | 0.000004 | 1.000000 | 02:12
53 | 0.000002 | 0.000004 | 1.000000 | 02:10
54 | 0.000002 | 0.000004 | 1.000000 | 02:10
55 | 0.000002 | 0.000004 | 1.000000 | 02:07
56 | 0.000002 | 0.000003 | 1.000000 | 02:09
57 | 0.000001 | 0.000003 | 1.000000 | 02:10
58 | 0.000001 | 0.000004 | 1.000000 | 02:08
59 | 0.000001 | 0.000004 | 1.000000 | 02:09
60 | 0.000001 | 0.000004 | 1.000000 | 02:10
61 | 0.000001 | 0.000004 | 1.000000 | 02:12
62 | 0.000001 | 0.000004 | 1.000000 | 02:09
63 | 0.000001 | 0.000004 | 1.000000 | 02:09
64 | 0.000001 | 0.000003 | 1.000000 | 02:08
65 | 0.000001 | 0.000003 | 1.000000 | 02:09
66 | 0.000001 | 0.000003 | 1.000000 | 02:10
67 | 0.000001 | 0.000004 | 1.000000 | 02:09
68 | 0.000001 | 0.000004 | 1.000000 | 02:11
69 | 0.000001 | 0.000003 | 1.000000 | 02:08
70 | 0.000001 | 0.000003 | 1.000000 | 02:09
71 | 0.000001 | 0.000003 | 1.000000 | 02:08
72 | 0.000001 | 0.000003 | 1.000000 | 02:08
73 | 0.000001 | 0.000003 | 1.000000 | 02:07
74 | 0.000001 | 0.000003 | 1.000000 | 02:06
75 | 0.000001 | 0.000003 | 1.000000 | 02:08
76 | 0.000001 | 0.000003 | 1.000000 | 02:07
77 | 0.000001 | 0.000003 | 1.000000 | 02:07
78 | 0.000001 | 0.000003 | 1.000000 | 02:07
79 | 0.000001 | 0.000003 | 1.000000 | 02:07
80 | 0.000001 | 0.000003 | 1.000000 | 02:06
81 | 0.000000 | 0.000004 | 1.000000 | 02:08
82 | 0.000000 | 0.000003 | 1.000000 | 02:11
83 | 0.000000 | 0.000003 | 1.000000 | 02:08
84 | 0.000000 | 0.000003 | 1.000000 | 02:10
85 | 0.000000 | 0.000003 | 1.000000 | 02:09
86 | 0.000000 | 0.000003 | 1.000000 | 02:08
87 | 0.000000 | 0.000003 | 1.000000 | 02:08
88 | 0.000000 | 0.000003 | 1.000000 | 02:09
89 | 0.000000 | 0.000003 | 1.000000 | 02:09
90 | 0.000000 | 0.000003 | 1.000000 | 02:08
91 | 0.000000 | 0.000003 | 1.000000 | 02:08
92 | 0.000000 | 0.000003 | 1.000000 | 02:09
93 | 0.000000 | 0.000003 | 1.000000 | 02:09
94 | 0.000000 | 0.000003 | 1.000000 | 02:09
95 | 0.000000 | 0.000003 | 1.000000 | 02:09
96 | 0.000000 | 0.000003 | 1.000000 | 02:11
97 | 0.000000 | 0.000003 | 1.000000 | 02:11
98 | 0.000000 | 0.000003 | 1.000000 | 02:09
99 | 0.000000 | 0.000003 | 1.000000 | 02:08

It seems you are using the test set during the training.

Kannadasa · 2020-05-20T12:11:11Z

I have not created any testset during the training.

All i did was split the data using StratifiedKFold and split the data using 20%. That means KFOLD n_splits=5.
Then i ran 5 iteration during training with 100 epochs.

In each iteration 20% of my entire dataset will act as testset.

I used Stratified KFold to split the data, this is to make sure some portion of testdata will be available during training.

for example :
This is how data is split during training.

[ 25 26 27 28 ... 621 622 623 624] [ 0 1 2 3 ... 221 222 223 224]
[ 0 1 2 3 ... 621 622 623 624] [ 25 26 27 28 ... 321 322 323 324]
[ 0 1 2 3 ... 621 622 623 624] [ 50 51 52 53 ... 421 422 423 424]
[ 0 1 2 3 ... 621 622 623 624] [ 75 76 77 78 ... 521 522 523 524]
[ 0 1 2 3 ... 521 522 523 524] [100 101 102 103 ... 621 622 623 624]

Kannadasa · 2020-05-20T14:33:39Z

Also whatever the dataset i am using is training set and validation set. My testset is completely unseen x-ray images and the accuracy i am getting is 67%.

AmiZya · 2020-05-23T06:41:53Z

@Kannadasa can you please provide the code you used for KFolds?

Kannadasa · 2020-05-23T07:15:46Z

Please find below my code for KFolds.

from sklearn.model_selection import KFold
from sklearn.model_selection import StratifiedKFold
kf = KFold(n_splits=5)
skf=StratifiedKFold(n_splits=5)

data= (ImageList.from_folder(path)
.split_none()
.label_from_folder()
.transform(size=(256,256))
.databunch()).normalize(imagenet_stats)

df=data.to_df()

for train_index, test_index in skf.split(df.index, df['y']):
print(len(train_index), len(test_index))

print((train_index), (test_index))

d = (ImageList.from_folder (path)
        .split_by_idxs(train_index, test_index)
        .label_from_folder()
        .transform(size = (256,256))
        .databunch(num_workers =0)).normalize(imagenet_stats)

AmiZya · 2020-05-23T07:58:57Z

Thanks, much appreciated.

On a side note did you manage to get higher accuracy? I'm running the model now and it sits around 78% for the the three classes model.

Kannadasa · 2020-05-23T08:12:56Z

Hi,

I did not test for 3 classes. I did test only 2 classes. My KFold code is also for 2 classes.

I am not getting good accuracy on unseen data. With the training set and validation set the model is working fine. I am not getting good accuracy on the new data which the model has not seen before.

Thanks
Kannadasan

Kannadasa · 2020-05-23T08:13:51Z

Are you using KFold to split the data for 3 classes prediction?

Is my KFold split code working for you in 3 classes?

Thanks
Kannadasan

Shambhujii · 2020-06-12T08:05:48Z

Hi,

I am testing your model, but i am not getting the desired output. I think i am not distributing the data properly in train and valid folders.

Please let me know how you are creating the folder structure and loading the images for train and valid datasets. This is for binary classification

i am also facing the same issue,,,i hope you have fixed this problem now,,,Please let me know how you are creating the folder structure and loading the images for train and valid datasets.

Kannadasa · 2020-06-12T08:12:00Z

Hi,

First of all you need to have a directory called train and valid, because Fastai will look for these names while running the code. I am using KFold cross validation to split the data into training and validation sets.

Please find below my code for KFolds.

from sklearn.model_selection import KFold
from sklearn.model_selection import StratifiedKFold
kf = KFold(n_splits=5)
skf=StratifiedKFold(n_splits=5)

data= (ImageList.from_folder(path)
.split_none()
.label_from_folder()
.transform(size=(256,256))
.databunch()).normalize(imagenet_stats)

df=data.to_df()

for train_index, test_index in skf.split(df.index, df['y']):
print(len(train_index), len(test_index))

print((train_index), (test_index))

d = (ImageList.from_folder (path)
.split_by_idxs(train_index, test_index)
.label_from_folder()
.transform(size = (256,256))
.databunch(num_workers =0)).normalize(imagenet_stats)

Shambhujii · 2020-06-12T08:35:05Z

Hi,

First of all you need to have a directory called train and valid, because Fastai will look for these names while running the code. I am using KFold cross validation to split the data into training and validation sets.

Please find below my code for KFolds.

from sklearn.model_selection import KFold
from sklearn.model_selection import StratifiedKFold
kf = KFold(n_splits=5)
skf=StratifiedKFold(n_splits=5)

data= (ImageList.from_folder(path)
.split_none()
.label_from_folder()
.transform(size=(256,256))
.databunch()).normalize(imagenet_stats)

df=data.to_df()

for train_index, test_index in skf.split(df.index, df['y']):
print(len(train_index), len(test_index))

print((train_index), (test_index))

d = (ImageList.from_folder (path)
.split_by_idxs(train_index, test_index)
.label_from_folder()
.transform(size = (256,256))
.databunch(num_workers =0)).normalize(imagenet_stats)

Thank you so much my friend for this valuable comment,,,I will try to split train and validation sets as per your guidance,,Thank you again,,,Lets collaborate together to fight against this pandemic.

Kannadasa · 2020-06-12T09:46:36Z

It works fine for me in the training set and validation set. If i show some unseen x-ray images to my model, the model does not predict well. I dont know how to fix this problem.
If you get any solution please let me know.

Thanks

rahuls321 · 2020-07-01T11:44:44Z

Hi,

First of all you need to have a directory called train and valid, because Fastai will look for these names while running the code. I am using KFold cross validation to split the data into training and validation sets.

Please find below my code for KFolds.

from sklearn.model_selection import KFold
from sklearn.model_selection import StratifiedKFold
kf = KFold(n_splits=5)
skf=StratifiedKFold(n_splits=5)

data= (ImageList.from_folder(path)
.split_none()
.label_from_folder()
.transform(size=(256,256))
.databunch()).normalize(imagenet_stats)

df=data.to_df()

for train_index, test_index in skf.split(df.index, df['y']):
print(len(train_index), len(test_index))

print((train_index), (test_index))

d = (ImageList.from_folder (path)
.split_by_idxs(train_index, test_index)
.label_from_folder()
.transform(size = (256,256))
.databunch(num_workers =0)).normalize(imagenet_stats)

Hey @Kannadasa , I successfully run the code for normal splitting like 80% for training and 10% for validation and 10% for testing. But I'm still facing issues with KFold cross-validation. After creating a train and valid dir. this code didn't produce anything. Could you please give a brief about this code?

Thanks in advance

aliranic · 2021-01-14T16:17:09Z

Hello! Why you used validation dataset as test dataset? Or you did different thing that I don't understand?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset issue #2

Dataset issue #2

Kannadasa commented May 12, 2020

LelisThanos commented May 14, 2020

muhammedtalo commented May 14, 2020

Kannadasa commented May 15, 2020

muhammedtalo commented May 16, 2020

Kannadasa commented May 17, 2020

Kannadasa commented May 19, 2020

muhammedtalo commented May 20, 2020

Kannadasa commented May 20, 2020

Kannadasa commented May 20, 2020

AmiZya commented May 23, 2020

Kannadasa commented May 23, 2020

AmiZya commented May 23, 2020

Kannadasa commented May 23, 2020

Kannadasa commented May 23, 2020

Shambhujii commented Jun 12, 2020

Kannadasa commented Jun 12, 2020

Shambhujii commented Jun 12, 2020

Kannadasa commented Jun 12, 2020

rahuls321 commented Jul 1, 2020

aliranic commented Jan 14, 2021

Dataset issue #2

Dataset issue #2

Comments

Kannadasa commented May 12, 2020

LelisThanos commented May 14, 2020

muhammedtalo commented May 14, 2020

Kannadasa commented May 15, 2020

muhammedtalo commented May 16, 2020

Kannadasa commented May 17, 2020

Kannadasa commented May 19, 2020

muhammedtalo commented May 20, 2020

Kannadasa commented May 20, 2020

Kannadasa commented May 20, 2020

AmiZya commented May 23, 2020

Kannadasa commented May 23, 2020

AmiZya commented May 23, 2020

Kannadasa commented May 23, 2020

Kannadasa commented May 23, 2020

Shambhujii commented Jun 12, 2020

Kannadasa commented Jun 12, 2020

Shambhujii commented Jun 12, 2020

Kannadasa commented Jun 12, 2020

rahuls321 commented Jul 1, 2020

aliranic commented Jan 14, 2021