Update getting started page #494

bw4sz · 2023-10-02T21:58:27Z

I added several new doc pages to reduce the giant scrolling through the getting_started.md. It was too long and not helpful. I am also working on making the docs feel like the bird detector is not just an add on, but that we are building towards more models. Added links to recent papers and more images.

bw4sz · 2023-10-03T01:36:08Z

The commit to solve the readthedocs error is already in main. I think this is ready to merge.

henrykironde · 2023-10-03T16:07:47Z

The error here is

(deepforest) ➜  DeepForest git:(update_getting_started_page) ✗ yapf -d --recursive deepforest/ 
--- deepforest/preprocess.py	(original)
+++ deepforest/preprocess.py	(reformatted)
@@ -78,15 +78,13 @@
     offset = 40
     selected_annotations = annotations[(annotations.xmin > (window_xmin - offset)) &
                                        (annotations.xmin < (window_xmax)) &
-                                       (annotations.xmax >
-                                        (window_xmin)) & (annotations.ymin >
-                                                          (window_ymin - offset)) &
+                                       (annotations.xmax > (window_xmin)) &
+                                       (annotations.ymin > (window_ymin - offset)) &
                                        (annotations.xmax < (window_xmax + offset)) &
-                                       (annotations.ymin <
-                                        (window_ymax)) & (annotations.ymax >
-                                                          (window_ymin)) &
-                                       (annotations.ymax <
-                                        (window_ymax + offset))].copy(deep=True)
+                                       (annotations.ymin < (window_ymax)) &
+                                       (annotations.ymax > (window_ymin)) &
+                                       (annotations.ymax < (window_ymax + offset))].copy(
+                                           deep=True)
     # change the image name
     image_basename = os.path.splitext("{}".format(annotations.image_path.unique()[0]))[0]
     selected_annotations.image_path = "{}_{}.png".format(image_basename, index)

This means the code is not well formatted. I suggest running yapf in place using -i to format the code. `yapf -i --recursive deepforest/

ethanwhite · 2023-10-03T16:12:45Z

@henrykironde - this is fixed in #493 so I recommend reviewing and merging it and then either ignoring it here or having @bw4sz rebase

bw4sz · 2023-10-03T21:39:05Z

Okay, I can update the evaluation.md in a new PR.

…

On Tue, Oct 3, 2023 at 2:28 PM henry senyondo ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In docs/Evaluation.md <#494 (comment)>: > @@ -1,5 +1,43 @@ # Evaluation +Independent analysis of whether a model can generalize from training data to new areas is critical for creating a robust workflow. +We stress that evaluation data must be different from training data, as neural networks have millions of parameters and can easily memorize thousands of samples. Therefore, while it would be rather easy to tune the model to get extremely high scores on the training data, it would fail when exposed to new images. + +To get an evaluation score, specify an annotations file in the same format as the training example above. The model will +``` +csv_file = get_data("OSBS_029.csv") +root_dir = os.path.dirname(csv_file) +results = model.evaluate(csv_file, root_dir, iou_threshold = 0.4) +``` + +The results object is a dictionary with keys, 'results',"recall","precision". Results is the intersection-over-union scores for each ground truth object in the csv_file. Can we rephrase this. I would think of this: " The returned object is a dictionary containing the three keys: results, recall, and precision." Not sure if the second part translates to this The result in the csv-file represents the intersection-over-union score for each ground truth object — Reply to this email directly, view it on GitHub <#494 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJHBLBGRIRM72KEWONUHFDX5R7PVAVCNFSM6AAAAAA5QBRHQOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTMNJWGE3DEMJQG4> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

-- Ben Weinstein, Ph.D. Research Scientist University of Florida http://benweinstein.weebly.com/

henrykironde

These are some of the changes, I will put them in a new PR

henrykironde · 2023-10-03T21:29:05Z

docs/Evaluation.md

+results = model.evaluate(csv_file, root_dir, iou_threshold = 0.4)
+```
+
+The results object is a dictionary with keys, 'results',"recall","precision". Results is the intersection-over-union scores for each ground truth object in the csv_file.


Can we rephrase this. I would think of this: " The returned object is a dictionary containing the three keys: results, recall, and precision."

Not sure if the second part translates to this

The result in the csv-file represents the intersection-over-union score for each ground truth object

henrykironde · 2023-10-03T21:34:24Z

docs/Evaluation.md

+This dataframe contains a numeric id for each predicted crown in each image, the matched ground truth crown in each image. The intersection-over-union score between predicted and ground truth (IoU), and whether that score is greater than the IoU threshold ('match').
+
+The recall is the proportion of ground truth which have a true positive match with a prediction based on the intersection-over-union threshold, this threshold is default 0.4 and can be changed in model.evaluate(iou_threshold=<>)
+


This dataframe contains a numeric id for each predicted crown in each image and the matched ground truth crown in each image. The intersection-over-union score between predicted and ground truth (IoU), and whether that score is greater than the IoU threshold ('match').

The recall is the proportion of ground truth that has a true positive match with a prediction based on the intersection-over-union threshold. The default threshold is 0.4 and can be changed in the model.evaluate(iou_threshold=<>)

henrykironde · 2023-10-03T21:40:44Z

docs/Evaluation.md

+results["box_precision"]
+0.781
+```
+
 To convert overlap among predicted and ground truth bounding boxes into measures of accuracy and precision, the most common approach is to compare the overlap using the intersection-over-union metric (IoU).
 IoU is the ratio between the area of the overlap between the predicted polygon box and the ground truth polygon box divided by and the area of the combined bounding box region.


Should we remove the and?

polygon box divided by and the area of the combined..>> polygon box divided by the area of the combined

henrykironde · 2023-10-03T21:48:18Z

docs/annotation.md

+![QGISannotation](../www/QGIS_annotation.png)
+
+## Do I need annotate all objects in my image?
+Yes! Object detection models use the non-annotated areas of an image as negative data. We know that it can be exceptionally hard to annotate all trees in an image, or determine the classes of all birds in an image. However, if you have objects in the image that are not annotated, the model is learning *to ignore* those portion of the image. This can severly effect model performance.


Change This can severly effect model performance.
to
This can severely affect model performance.

henrykironde · 2023-10-03T21:51:17Z

docs/annotation.md

+Yes! Object detection models use the non-annotated areas of an image as negative data. We know that it can be exceptionally hard to annotate all trees in an image, or determine the classes of all birds in an image. However, if you have objects in the image that are not annotated, the model is learning *to ignore* those portion of the image. This can severly effect model performance.
+
+## Can I annotate points instead of bounding boxes?
+Yes. This make more sense for the bird detection task, as trees tend to vary widely in size. Often birds will be a standard size compared to the image resolution.


makes

Yes. This makes more sense for the bird detection task, as trees tend to vary widely in size. Often, birds will be a standard size compared to the image resolution.

henrykironde · 2023-10-03T22:21:46Z

docs/prebuilt.md

+
+## Tree Crown Detection
+
+The model was initially described in [Remote Sensing](https://www.mdpi.com/2072-4292/11/11/1309) on a single site. The prebuilt model uses a semi-supervised approach in which millions of moderate quality annotations are generated using a LiDAR unsupervised tree detection algorithm, followed by hand-annotations of RGB imagery from select sites. Comparisons among geographic sites was added in [Ecological Informatics](https://www.sciencedirect.com/science/article/pii/S157495412030011X). The model was further improved and the python package was released in [Methods in Ecology and Evolution](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13472)


Comparisons among geographic sites were added to Ecological Informatics. The model was further improved, and the Python package was released in Methods in Ecology and Evolution.

henrykironde · 2023-10-03T22:29:06Z

docs/training.md

+# Training
+
+The prebuilt models will always be improved by adding data from the target area. In our work, we have found that even one hour's worth of carefully chosen hand-annotation can yield enormous improvements in accuracy and precision.
+We envision that for the majority of scientific applications at least some fine-tuning of the prebuilt model will be worthwhile. When starting from the prebuilt model for training, we have found that 5-10 epochs is sufficient. 


We expect that the prebuilt model will benefit from at least some fine-tuning for the vast majority of scientific applications. We have discovered that 5–10 epochs of training with the prebuilt model are adequate.
The improvement of a retraining task after 10–30 epochs has never been observed, but it is theoretically feasible if there are very big datasets with extremely varied classes.

henrykironde · 2023-10-03T22:40:37Z

docs/training.md

+OSBS_029.jpg,161,155,199,191,Tree
+```
+
+We tell the config that we want to train on this csv file, and that the images are in the same directory. If images are in a separate folder, change the root_dir.


The config file specifies the path to the CSV file that we want to use when training. The images are located in the working directory by default, and a user can provide a path to a different image directory.

henrykironde · 2023-10-03T22:44:58Z

docs/training.md

+myimage.png, 0,0,0,0,"Tree"
+```
+
+Excessive use of negative samples may have negative impact on model performance, but used sparingly it can increase precision. These samples are removed from evaluation and do not count in precision/recall. 


Excessive use of negative samples may have a negative impact on model performance, but when used sparingly, they can increase precision. These samples are removed from evaluation and do not contribute to the precision or recall evaluation.

henrykironde · 2023-10-03T22:50:22Z

docs/training.md

+---
+
+Note that when reloading models, you should carefully inspect the model parameters, such as the score_thresh and nms_thresh. These parameters are updated during model creation and the config file is not read when loading from checkpoint!
+It is best to be direct to specify after loading checkpoint. If you want to save hyperparameters, edit the deepforest_config.yml directly. This will allow the hyperparameters to be reloaded on deepforest.save_model().


Can we rephrase this sentence.

It is best to be direct to specify after loading checkpoint. If you want to save hyperparameters, edit the deepforest_config.yml directly. This will allow the hyperparameters to be reloaded on deepforest.save_model().

ref weecology#494

ref #494

* update docs to split our getting started to smaller markdown files * ignore the mac import * style changes

ref weecology#494

bw4sz added 2 commits October 2, 2023 14:55

update docs to split our getting started to smaller markdown files

d3e2135

ignore the mac import

f38dd6b

bw4sz requested a review from henrykironde October 2, 2023 21:58

bw4sz self-assigned this Oct 2, 2023

Merge branch 'main' into update_getting_started_page

f523bf9

style changes

eb5c348

bw4sz merged commit 283cb75 into main Oct 3, 2023
3 checks passed

henrykironde reviewed Oct 3, 2023

View reviewed changes

henrykironde deleted the update_getting_started_page branch October 3, 2023 22:59

henrykironde added a commit to henrykironde/DeepForest that referenced this pull request Oct 3, 2023

Rephase documentation

01550e7

ref weecology#494

henrykironde mentioned this pull request Oct 3, 2023

Rephrase documentation #501

Merged

henrykironde added a commit to henrykironde/DeepForest that referenced this pull request Oct 4, 2023

Rephrase documentation

db4d91c

ref weecology#494

bw4sz pushed a commit that referenced this pull request Oct 4, 2023

Rephrase documentation (#501)

822d72d

ref #494

janjatovic pushed a commit to Treeconomy/DeepForest_new that referenced this pull request Mar 26, 2024

Update getting started page (weecology#494)

033b950

* update docs to split our getting started to smaller markdown files * ignore the mac import * style changes

janjatovic pushed a commit to Treeconomy/DeepForest_new that referenced this pull request Mar 26, 2024

Rephrase documentation (weecology#501)

28ce221

ref weecology#494

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update getting started page #494

Update getting started page #494

bw4sz commented Oct 2, 2023

bw4sz commented Oct 3, 2023

henrykironde commented Oct 3, 2023

ethanwhite commented Oct 3, 2023

bw4sz commented Oct 3, 2023 via email

henrykironde left a comment

henrykironde Oct 3, 2023

henrykironde Oct 3, 2023

henrykironde Oct 3, 2023

henrykironde Oct 3, 2023

henrykironde Oct 3, 2023

henrykironde Oct 3, 2023

henrykironde Oct 3, 2023

henrykironde Oct 3, 2023

henrykironde Oct 3, 2023

henrykironde Oct 3, 2023

		This dataframe contains a numeric id for each predicted crown in each image, the matched ground truth crown in each image. The intersection-over-union score between predicted and ground truth (IoU), and whether that score is greater than the IoU threshold ('match').

		The recall is the proportion of ground truth which have a true positive match with a prediction based on the intersection-over-union threshold, this threshold is default 0.4 and can be changed in model.evaluate(iou_threshold=<>)


		## Tree Crown Detection

		The model was initially described in [Remote Sensing](https://www.mdpi.com/2072-4292/11/11/1309) on a single site. The prebuilt model uses a semi-supervised approach in which millions of moderate quality annotations are generated using a LiDAR unsupervised tree detection algorithm, followed by hand-annotations of RGB imagery from select sites. Comparisons among geographic sites was added in [Ecological Informatics](https://www.sciencedirect.com/science/article/pii/S157495412030011X). The model was further improved and the python package was released in [Methods in Ecology and Evolution](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13472)

Update getting started page #494

Update getting started page #494

Conversation

bw4sz commented Oct 2, 2023

bw4sz commented Oct 3, 2023

henrykironde commented Oct 3, 2023

ethanwhite commented Oct 3, 2023

bw4sz commented Oct 3, 2023 via email

henrykironde left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment