Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update getting started page #494

Merged
merged 4 commits into from
Oct 3, 2023
Merged

Update getting started page #494

merged 4 commits into from
Oct 3, 2023

Conversation

bw4sz
Copy link
Collaborator

@bw4sz bw4sz commented Oct 2, 2023

I added several new doc pages to reduce the giant scrolling through the getting_started.md. It was too long and not helpful. I am also working on making the docs feel like the bird detector is not just an add on, but that we are building towards more models. Added links to recent papers and more images.

@bw4sz bw4sz self-assigned this Oct 2, 2023
@bw4sz
Copy link
Collaborator Author

bw4sz commented Oct 3, 2023

The commit to solve the readthedocs error is already in main. I think this is ready to merge.

@henrykironde
Copy link
Contributor

The error here is

(deepforest) ➜  DeepForest git:(update_getting_started_page) ✗ yapf -d --recursive deepforest/ 
--- deepforest/preprocess.py	(original)
+++ deepforest/preprocess.py	(reformatted)
@@ -78,15 +78,13 @@
     offset = 40
     selected_annotations = annotations[(annotations.xmin > (window_xmin - offset)) &
                                        (annotations.xmin < (window_xmax)) &
-                                       (annotations.xmax >
-                                        (window_xmin)) & (annotations.ymin >
-                                                          (window_ymin - offset)) &
+                                       (annotations.xmax > (window_xmin)) &
+                                       (annotations.ymin > (window_ymin - offset)) &
                                        (annotations.xmax < (window_xmax + offset)) &
-                                       (annotations.ymin <
-                                        (window_ymax)) & (annotations.ymax >
-                                                          (window_ymin)) &
-                                       (annotations.ymax <
-                                        (window_ymax + offset))].copy(deep=True)
+                                       (annotations.ymin < (window_ymax)) &
+                                       (annotations.ymax > (window_ymin)) &
+                                       (annotations.ymax < (window_ymax + offset))].copy(
+                                           deep=True)
     # change the image name
     image_basename = os.path.splitext("{}".format(annotations.image_path.unique()[0]))[0]
     selected_annotations.image_path = "{}_{}.png".format(image_basename, index)

This means the code is not well formatted. I suggest running yapf in place using -i to format the code. `yapf -i --recursive deepforest/

@ethanwhite
Copy link
Member

@henrykironde - this is fixed in #493 so I recommend reviewing and merging it and then either ignoring it here or having @bw4sz rebase

@bw4sz bw4sz merged commit 283cb75 into main Oct 3, 2023
3 checks passed
@bw4sz
Copy link
Collaborator Author

bw4sz commented Oct 3, 2023 via email

Copy link
Contributor

@henrykironde henrykironde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are some of the changes, I will put them in a new PR

results = model.evaluate(csv_file, root_dir, iou_threshold = 0.4)
```

The results object is a dictionary with keys, 'results',"recall","precision". Results is the intersection-over-union scores for each ground truth object in the csv_file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rephrase this. I would think of this: " The returned object is a dictionary containing the three keys: results, recall, and precision."

Not sure if the second part translates to this

The result in the csv-file represents the intersection-over-union score for each ground truth object

This dataframe contains a numeric id for each predicted crown in each image, the matched ground truth crown in each image. The intersection-over-union score between predicted and ground truth (IoU), and whether that score is greater than the IoU threshold ('match').

The recall is the proportion of ground truth which have a true positive match with a prediction based on the intersection-over-union threshold, this threshold is default 0.4 and can be changed in model.evaluate(iou_threshold=<>)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dataframe contains a numeric id for each predicted crown in each image and the matched ground truth crown in each image. The intersection-over-union score between predicted and ground truth (IoU), and whether that score is greater than the IoU threshold ('match').

The recall is the proportion of ground truth that has a true positive match with a prediction based on the intersection-over-union threshold. The default threshold is 0.4 and can be changed in the model.evaluate(iou_threshold=<>)

results["box_precision"]
0.781
```

To convert overlap among predicted and ground truth bounding boxes into measures of accuracy and precision, the most common approach is to compare the overlap using the intersection-over-union metric (IoU).
IoU is the ratio between the area of the overlap between the predicted polygon box and the ground truth polygon box divided by and the area of the combined bounding box region.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the and?

polygon box divided by and the area of the combined..>> polygon box divided by the area of the combined

![QGISannotation](../www/QGIS_annotation.png)

## Do I need annotate all objects in my image?
Yes! Object detection models use the non-annotated areas of an image as negative data. We know that it can be exceptionally hard to annotate all trees in an image, or determine the classes of all birds in an image. However, if you have objects in the image that are not annotated, the model is learning *to ignore* those portion of the image. This can severly effect model performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change This can severly effect model performance.
to
This can severely affect model performance.

Yes! Object detection models use the non-annotated areas of an image as negative data. We know that it can be exceptionally hard to annotate all trees in an image, or determine the classes of all birds in an image. However, if you have objects in the image that are not annotated, the model is learning *to ignore* those portion of the image. This can severly effect model performance.

## Can I annotate points instead of bounding boxes?
Yes. This make more sense for the bird detection task, as trees tend to vary widely in size. Often birds will be a standard size compared to the image resolution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 makes

Yes. This makes more sense for the bird detection task, as trees tend to vary widely in size. Often, birds will be a standard size compared to the image resolution.


## Tree Crown Detection

The model was initially described in [Remote Sensing](https://www.mdpi.com/2072-4292/11/11/1309) on a single site. The prebuilt model uses a semi-supervised approach in which millions of moderate quality annotations are generated using a LiDAR unsupervised tree detection algorithm, followed by hand-annotations of RGB imagery from select sites. Comparisons among geographic sites was added in [Ecological Informatics](https://www.sciencedirect.com/science/article/pii/S157495412030011X). The model was further improved and the python package was released in [Methods in Ecology and Evolution](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13472)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparisons among geographic sites were added to Ecological Informatics. The model was further improved, and the Python package was released in Methods in Ecology and Evolution.

# Training

The prebuilt models will always be improved by adding data from the target area. In our work, we have found that even one hour's worth of carefully chosen hand-annotation can yield enormous improvements in accuracy and precision.
We envision that for the majority of scientific applications at least some fine-tuning of the prebuilt model will be worthwhile. When starting from the prebuilt model for training, we have found that 5-10 epochs is sufficient.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expect that the prebuilt model will benefit from at least some fine-tuning for the vast majority of scientific applications. We have discovered that 5–10 epochs of training with the prebuilt model are adequate.
The improvement of a retraining task after 10–30 epochs has never been observed, but it is theoretically feasible if there are very big datasets with extremely varied classes.

OSBS_029.jpg,161,155,199,191,Tree
```

We tell the config that we want to train on this csv file, and that the images are in the same directory. If images are in a separate folder, change the root_dir.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config file specifies the path to the CSV file that we want to use when training. The images are located in the working directory by default, and a user can provide a path to a different image directory.

myimage.png, 0,0,0,0,"Tree"
```

Excessive use of negative samples may have negative impact on model performance, but used sparingly it can increase precision. These samples are removed from evaluation and do not count in precision/recall.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excessive use of negative samples may have a negative impact on model performance, but when used sparingly, they can increase precision. These samples are removed from evaluation and do not contribute to the precision or recall evaluation.

---

Note that when reloading models, you should carefully inspect the model parameters, such as the score_thresh and nms_thresh. These parameters are updated during model creation and the config file is not read when loading from checkpoint!
It is best to be direct to specify after loading checkpoint. If you want to save hyperparameters, edit the deepforest_config.yml directly. This will allow the hyperparameters to be reloaded on deepforest.save_model().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rephrase this sentence.

It is best to be direct to specify after loading checkpoint. If you want to save hyperparameters, edit the deepforest_config.yml directly. This will allow the hyperparameters to be reloaded on deepforest.save_model().

@henrykironde henrykironde deleted the update_getting_started_page branch October 3, 2023 22:59
henrykironde added a commit to henrykironde/DeepForest that referenced this pull request Oct 3, 2023
henrykironde added a commit to henrykironde/DeepForest that referenced this pull request Oct 4, 2023
bw4sz pushed a commit that referenced this pull request Oct 4, 2023
janjatovic pushed a commit to Treeconomy/DeepForest_new that referenced this pull request Mar 26, 2024
* update docs to split our getting started to smaller markdown files

* ignore the mac import

* style changes
janjatovic pushed a commit to Treeconomy/DeepForest_new that referenced this pull request Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants