Add Training Process for Nodule Detection and Classification - added customized datasets #300

swarm-ai · 2018-01-25T23:45:59Z

Description

Using the documented process in the Training/Readme, a developer can prepare custom datasets from radiologists who have annotated series of CT scans. The data should have lesion box annotations in a .csv file using the format specified. An exmaple using a CT scan data set from a Taiwan-based clinic is included. The data should also have labels for cancer/non-cancer as well.

Reference to official issue

Issue #130
Issue #131

Motivation and Context

The motivation is to increase the available training examples so that the concept-to-clinic classifier can handle complex lung cancer cases besides those in the Luna and LIDC data sets. We have seen improved model accuracy with a preliminary run using additional data sets. A new model is currently being trained and is on epoch 80 now

How Has This Been Tested?

We have run the training process using Luna, LIDC, and NSCLC-Radiomics Data sets. The NSCLC-Radiomics data set contains 422 cases of lung cancer type non-small cell lung cancer. We label these data sets with lesion location information and cancer/non-cancer labels using the software Horos. We then import this data for training in concurrence with the Luna16 and LIDC data sets. Here is a reference link to download the data sets: http://www.cibl-harvard.org/data

CLA

I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well

… the grt algorithm with improvements for handling custom data. Using the documented process in the Readme, a developer can prepare custom datasets from radiologists who have annotated series of CT scans. The data should have lesion box annotations in a .csv file using the format specified. An exmaple using a CT scan data set from a Taiwan-based clinic is included. The data should also have labels for cancer/non-cancer as well.

…ded custom annotation file example

lamby · 2018-01-26T03:43:00Z

prediction/src/algorithms/preprocessing/step1.py

+def load_scan(dirpath):
+    print('loading scan %s' % dirpath)
+
+    if dirpath.startswith('s3://'):


Have you see urlparse ooi?

WGierke · 2018-01-26T07:07:12Z

prediction/src/algorithms/preprocessing/AddSegmentation.asv

@@ -0,0 +1,41 @@
+function AddSegmentation(SegmentDataFolder, FolderDelimiter, BatchSize, ParFor_flag, IgnoreExisting_flag)


What's the reason for using another language here than Python?

WGierke · 2018-01-26T07:08:28Z

prediction/src/algorithms/preprocessing/step1.py

+    return bw
+
+
+def all_slice_analysis(bw, spacing, cut_num=0, vol_limit=[0.68, 8.2], area_th=6e3, dist_th=62):


Could you add a few docstrings so it's easier to grasp what the functions are expecting and doing? :)

WGierke · 2018-01-26T08:51:45Z

prediction/src/algorithms/training/classifier/trainval_detector.py

+    end_time = time.time()
+
+    print('elapsed time is %3.2f seconds' % (end_time - start_time))
+    print


I think you can achieve the same by appending two '\n' to the previous print statement :)

also, that last print statement will error in py3 since print is a function.

WGierke · 2018-01-26T08:55:26Z

Would you mind converting your code to comply with PEP8? There are a few things that need to be fixed according to flake8 and pycodestyle :)

lamby · 2018-01-26T21:30:05Z

Thanks for the review @WGierke :)

WGierke · 2018-01-27T09:39:12Z

All data are resized to 1x1x1 mm, the luminance is clipped between -1200 and 600, scaled to 0-255 and converted to uint8. A mask that include the lungs is calculated, luminance of every pixel outside the mask is set to 170. The results will be stored in 'preprocess_result_path' defined in config_training.py along with their corresponding detection labels.

I think we already have that preprocessing steps. Converting the data to voxels, clipping the Hounsfield units that are soft tissue and rescaling the image is a very common practice among the top solutions. Could you have a look at lung_segmentation.py and improved_lung_segmentation.py? There already is lots of logic that might be useful for the steps you defined I think :)

isms · 2018-01-28T01:37:52Z

@swarm-ai We'll need quite a bit more context for the PR. This is a big PR with very little reference to any of the pieces of the existing project.

The data should have lesion box annotations

What are these? Where do they come from? Could they be expected to come from new CT imagery without hand labeling?

An exmaple using a CT scan data set from a Taiwan-based clinic is included.

This is extremely interesting, but is hard to envision how to integrate this when it comes right before the end of the last phase.

isms · 2018-01-29T18:42:54Z

We've discussed internally, and have concluded that both of the following points are true:

There is some really interesting and potentially helpful stuff in this PR.
We can't accept the PR as-is and there is no apparent roadmap to acceptance.

We're going to close the PR but we encourage community members to use this as a resource to help inform model training and potentially other pieces of the application. The submission will be recognized for this aspect of contribution under the "Community" heading.

swarm-ai · 2018-01-29T21:33:56Z

Hi @isms Can you give me 1-2 days to work on resolving these issues and only just saw these comments?

isms · 2018-01-29T22:25:19Z

@swarm-ai You are more than welcome to keep working on the PR if you'd like but at this point it won't result in additional points. Feel free to email us directly if you have questions or concerns.

swarm-ai added 11 commits January 25, 2018 16:15

added some configurations for the custom data set

c6d4f6c

added preprocessing folder, added readme, added requirements file, ad…

7abc8c7

…ded custom annotation file example

fix readme formatting on github

26d34c0

removed some lingering debug messages

d69b65d

udpated readme

207595e

fix github readme formatting

f38db29

fix github readme formatting

deaa50f

added images for readme

711523e

fixed typo in requirements doc

c9ace29

Fix pycodestyle errors

aa24e80

lamby reviewed Jan 26, 2018

View reviewed changes

WGierke reviewed Jan 26, 2018

View reviewed changes

isms closed this Jan 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Training Process for Nodule Detection and Classification - added customized datasets #300

Add Training Process for Nodule Detection and Classification - added customized datasets #300

swarm-ai commented Jan 25, 2018

lamby Jan 26, 2018

WGierke Jan 26, 2018

WGierke Jan 26, 2018

WGierke Jan 26, 2018

reubano Jan 30, 2018

WGierke commented Jan 26, 2018

lamby commented Jan 26, 2018

WGierke commented Jan 27, 2018

isms commented Jan 28, 2018

isms commented Jan 29, 2018 •

edited

Loading

swarm-ai commented Jan 29, 2018

isms commented Jan 29, 2018

		@@ -0,0 +1,41 @@
		function AddSegmentation(SegmentDataFolder, FolderDelimiter, BatchSize, ParFor_flag, IgnoreExisting_flag)

		return bw


		def all_slice_analysis(bw, spacing, cut_num=0, vol_limit=[0.68, 8.2], area_th=6e3, dist_th=62):

Add Training Process for Nodule Detection and Classification - added customized datasets #300

Add Training Process for Nodule Detection and Classification - added customized datasets #300

Conversation

swarm-ai commented Jan 25, 2018

Description

Reference to official issue

Motivation and Context

How Has This Been Tested?

CLA

lamby Jan 26, 2018

Choose a reason for hiding this comment

WGierke Jan 26, 2018

Choose a reason for hiding this comment

WGierke Jan 26, 2018

Choose a reason for hiding this comment

WGierke Jan 26, 2018

Choose a reason for hiding this comment

reubano Jan 30, 2018

Choose a reason for hiding this comment

WGierke commented Jan 26, 2018

lamby commented Jan 26, 2018

WGierke commented Jan 27, 2018

isms commented Jan 28, 2018

isms commented Jan 29, 2018 • edited Loading

swarm-ai commented Jan 29, 2018

isms commented Jan 29, 2018

isms commented Jan 29, 2018 •

edited

Loading