Rephrase documentation (weecology#501)

ref weecology#494
Treeconomy · Oct 4, 2023 · 28ce221 · 28ce221
1 parent d035acd
commit 28ce221
Show file tree

Hide file tree

Showing 7 changed files with 26 additions and 19 deletions.
diff --git a/README.md b/README.md
@@ -46,7 +46,7 @@ Free software: [MIT license](https://github.com/weecology/DeepForest/blob/master
 
 ## Why DeepForest?
 
-Remote sensing can transform the speed, scale, and cost of biodiversity and forestry surveys. Data acquisition currently outpaces the ability to identify individual organisms in high resolution imagery. Individual crown delineation has been a long-standing challenge in remote sensing and available algorithms produce mixed results. DeepForest is the first open source implementation of a deep learning model for crown detection. Deep learning has made enormous strides in a range of computer vision tasks but requires significant amounts of training data. By including a trained model, we hope to simplify the process of retraining deep learning models for a range of forests, sensors, and spatial resolutions.
+Remote sensing can transform the speed, scale, and cost of biodiversity and forestry surveys. Data acquisition currently outpaces the ability to identify individual organisms in high-resolution imagery. Individual crown delineation has been a long-standing challenge in remote sensing, and available algorithms produce mixed results. DeepForest is the first open-source implementation of a deep learning model for crown detection. Deep learning has made enormous strides in a range of computer vision tasks but requires significant amounts of training data. By including a trained model, we hope to simplify the process of retraining deep learning models for a range of forests, sensors, and spatial resolutions.
 
 ## Citation
 

diff --git a/docs/Evaluation.md b/docs/Evaluation.md
@@ -10,7 +10,8 @@ root_dir = os.path.dirname(csv_file)
 results = model.evaluate(csv_file, root_dir, iou_threshold = 0.4)
 ```
 
-The results object is a dictionary with keys, 'results',"recall","precision". Results is the intersection-over-union scores for each ground truth object in the csv_file.
+The returned object is a dictionary containing the three keys: results, recall, and precision. The result key in the csv-file represents the intersection-over-union score for each ground truth object.
+
 
 ```
 results["results"].head()
@@ -22,9 +23,10 @@ results["results"].head()
 28             28         4  0.37461  OSBS_029.tif  False
 ```
 
-This dataframe contains a numeric id for each predicted crown in each image, the matched ground truth crown in each image. The intersection-over-union score between predicted and ground truth (IoU), and whether that score is greater than the IoU threshold ('match').
+This dataframe contains a numeric id for each predicted crown in each image and the matched ground truth crown in each image. The intersection-over-union score between predicted and ground truth (IoU), and whether that score is greater than the IoU threshold ('match').
+
+The recall is the proportion of ground truth that has a true positive match with a prediction based on the intersection-over-union threshold. The default threshold is 0.4 and can be changed in the model.evaluate(iou_threshold=<>)
 
-The recall is the proportion of ground truth which have a true positive match with a prediction based on the intersection-over-union threshold, this threshold is default 0.4 and can be changed in model.evaluate(iou_threshold=<>)
 
 ```
 results["box_recall"]
@@ -39,7 +41,7 @@ results["box_precision"]
 ```
 
 To convert overlap among predicted and ground truth bounding boxes into measures of accuracy and precision, the most common approach is to compare the overlap using the intersection-over-union metric (IoU).
-IoU is the ratio between the area of the overlap between the predicted polygon box and the ground truth polygon box divided by and the area of the combined bounding box region.
+IoU is the ratio between the area of the overlap between the predicted polygon box and the ground truth polygon box divided by the area of the combined bounding box region.
 
 Let's start by getting some sample data and predictions
 

diff --git a/docs/annotation.md b/docs/annotation.md
@@ -7,16 +7,16 @@ For quick annotations of a few images, we recommend using QGIS or ArcGIS. Either
 ![QGISannotation](../www/QGIS_annotation.png)
 
 ## Do I need annotate all objects in my image?
-Yes! Object detection models use the non-annotated areas of an image as negative data. We know that it can be exceptionally hard to annotate all trees in an image, or determine the classes of all birds in an image. However, if you have objects in the image that are not annotated, the model is learning *to ignore* those portion of the image. This can severly effect model performance.
+Yes! Object detection models use the non-annotated areas of an image as negative data. We know that it can be exceptionally hard to annotate all trees in an image, or determine the classes of all birds in an image. However, if you have objects in the image that are not annotated, the model is learning *to ignore* those portion of the image. This can severely affect model performance.
 
 ## Can I annotate points instead of bounding boxes?
-Yes. This make more sense for the bird detection task, as trees tend to vary widely in size. Often birds will be a standard size compared to the image resolution.
+Yes. This makes more sense for the bird detection task, as trees tend to vary widely in size. Often, birds will be a standard size compared to the image resolution.
 
 If you would like to train a model, here is a quick video on a simple way to annotate images.
 
 <div style="position: relative; padding-bottom: 62.5%; height: 0;"><iframe src="https://www.loom.com/embed/e1639d36b6ef4118a31b7b892344ba83" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
 
-Using a shapefile we could turn it into a dataframe of bounding box annotations by converting the points into boxes. If you already have boxes, you can exclude convert_to_boxes and buffer_size.
+Using a shapefile, we could turn it into a dataframe of bounding box annotations by converting the points into boxes. If you already have boxes, you can exclude convert_to_boxes and buffer_size.
 
 ```
 df = shapefile_to_annotations(
@@ -25,7 +25,7 @@ df = shapefile_to_annotations(
 )
 ```
 
-Optionally we can split these annotations into crops if the image is large and will not fit into memory. This is often the case.
+Optionally, we can split these annotations into crops if the image is large and will not fit into memory. This is often the case.
 
 ```
 df.to_csv("full_annotations.csv",index=False)
@@ -40,7 +40,8 @@ annotations = preprocess.split_raster(
 ```
 
 ## How can I view current predictions as shapefiles?
-It often useful to train new training annotations starting from current predictions. This allows users to more quickly find and correct errors. The following example shows how to create a list of files, predict detections in each and save as shapefiles. A user can then edit this shapefiles in a program like QGIS.
+
+It is often useful to train new training annotations starting from current predictions. This allows users to more quickly find and correct errors. The following example shows how to create a list of files, predict detections in each, and save as shapefiles. A user can then edit these shapefiles in a program like QGIS.
 
 ```
 from deepforest import main

diff --git a/docs/getting_started.md b/docs/getting_started.md
@@ -27,15 +27,15 @@ For single images, ```predict_image``` can read an image from memory or file and
 
 ### Sample data
 
-DeepForest comes with a small set of sample data to help run the docs examples. Since users may install in a variety of manners, and it is impossible to know the relative location of the files, the helper function ```get_data``` is used. This function looks to where DeepForest is installed, and finds the deepforest/data/ directory.
+DeepForest comes with a small set of sample data that can be used to test out the provided examples. The data resides in the DeepForest data directory. Use the `get_data` helper function to locate the path to this directory, if needed.
 
 ```python
 sample_image = get_data("OSBS_029.png")
 sample_image
 '/Users/benweinstein/Documents/DeepForest/deepforest/data/OSBS_029.png'
 ```
 
-For non-tutorial images, you do not need the get_data function, just provide the full path to the data anywhere on your computer.
+To use images other than those in the sample data directory, provide the full path for the images.
 
 ```python
 image_path = get_data("OSBS_029.png")

diff --git a/docs/multi_species.md b/docs/multi_species.md
@@ -7,7 +7,8 @@ When creating a deepforest model object, pass the designed number of classes and
 m = main.deepforest(num_classes=2,label_dict={"Alive":0,"Dead":1})
 ```
 
-It is often, but not always, useful to start from a prebuilt model when trying to identify multiple species. This helps the model focus on learning the multiple classes and not wasting data and time re-learning  bounding boxes.
+It is often, but not always, useful to start with a prebuilt model when trying to identify multiple species. This helps the model focus on learning the multiple classes and not waste data and time re-learning bounding boxes.
+
 To load the backboard and box prediction portions of the release model, but create a classification model for more than one species.
 
 Here is an example using the alive/dead tree data stored in the package, but the same logic applies to the bird detector.

diff --git a/docs/prebuilt.md b/docs/prebuilt.md
@@ -1,10 +1,10 @@
 # Prebuilt models
 
-DeepForest current has two prebuilt models.
+At the moment, DeepForest has two prebuilt models: Bird Detection and Tree Crown Detection.
 
 ## Tree Crown Detection
 
-The model was initially described in [Remote Sensing](https://www.mdpi.com/2072-4292/11/11/1309) on a single site. The prebuilt model uses a semi-supervised approach in which millions of moderate quality annotations are generated using a LiDAR unsupervised tree detection algorithm, followed by hand-annotations of RGB imagery from select sites. Comparisons among geographic sites was added in [Ecological Informatics](https://www.sciencedirect.com/science/article/pii/S157495412030011X). The model was further improved and the python package was released in [Methods in Ecology and Evolution](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13472)
+The model was initially described in [Remote Sensing](https://www.mdpi.com/2072-4292/11/11/1309) on a single site. The prebuilt model uses a semi-supervised approach in which millions of moderate quality annotations are generated using a LiDAR unsupervised tree detection algorithm, followed by hand-annotations of RGB imagery from select sites. Comparisons among geographic sites were added to [Ecological Informatics](https://www.sciencedirect.com/science/article/pii/S157495412030011X). The model was further improved, and the Python package was released in [Methods in Ecology and Evolution](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13472).
 
 ![image](../www/MEE_Figure4.png)
 

diff --git a/docs/training.md b/docs/training.md
@@ -1,8 +1,9 @@
 # Training
 
 The prebuilt models will always be improved by adding data from the target area. In our work, we have found that even one hour's worth of carefully chosen hand-annotation can yield enormous improvements in accuracy and precision.
-We envision that for the majority of scientific applications at least some fine-tuning of the prebuilt model will be worthwhile. When starting from the prebuilt model for training, we have found that 5-10 epochs is sufficient. 
-We have never seen a retraining task that improved after 10-30 epochs, but it is possible if there are very large datasets with very diverse classes.
+
+We expect that the prebuilt model will benefit from at least some fine-tuning for the vast majority of scientific applications. We have discovered that 5-10 epochs of training with the prebuilt model are adequate.
+The improvement of a retraining task after 10-30 epochs has never been observed, but it is theoretically feasible if there are very big datasets with extremely varied classes.
 
 Consider an annotations.csv file in the following format
 
@@ -22,7 +23,7 @@ OSBS_029.jpg,115,109,150,152,Tree
 OSBS_029.jpg,161,155,199,191,Tree
 ```
 
-We tell the config that we want to train on this csv file, and that the images are in the same directory. If images are in a separate folder, change the root_dir.
+The config file specifies the path to the CSV file that we want to use when training. The images are located in the working directory by default, and a user can provide a path to a different image directory.
 
 ```python
 # Example run with short training
@@ -107,7 +108,8 @@ image_path, xmin, ymin, xmax, ymax, label
 myimage.png, 0,0,0,0,"Tree"
 ```
 
-Excessive use of negative samples may have negative impact on model performance, but used sparingly it can increase precision. These samples are removed from evaluation and do not count in precision/recall. 
+Excessive use of negative samples may have a negative impact on model performance, but when used sparingly, they can increase precision. These samples are removed from evaluation and do not contribute to the precision or recall evaluation.
+
 
 ### Model checkpoints
 
@@ -153,6 +155,7 @@ pd.testing.assert_frame_equal(pred_after_train,pred_after_reload)
 ---
 
 Note that when reloading models, you should carefully inspect the model parameters, such as the score_thresh and nms_thresh. These parameters are updated during model creation and the config file is not read when loading from checkpoint!
+
 It is best to be direct to specify after loading checkpoint. If you want to save hyperparameters, edit the deepforest_config.yml directly. This will allow the hyperparameters to be reloaded on deepforest.save_model().
 
 ---