Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update getting started page #494

Merged
merged 4 commits into from
Oct 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 6 additions & 8 deletions deepforest/preprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,15 +78,13 @@ def select_annotations(annotations, windows, index, allow_empty=False):
offset = 40
selected_annotations = annotations[(annotations.xmin > (window_xmin - offset)) &
(annotations.xmin < (window_xmax)) &
(annotations.xmax >
(window_xmin)) & (annotations.ymin >
(window_ymin - offset)) &
(annotations.xmax > (window_xmin)) &
(annotations.ymin > (window_ymin - offset)) &
(annotations.xmax < (window_xmax + offset)) &
(annotations.ymin <
(window_ymax)) & (annotations.ymax >
(window_ymin)) &
(annotations.ymax <
(window_ymax + offset))].copy(deep=True)
(annotations.ymin < (window_ymax)) &
(annotations.ymax > (window_ymin)) &
(annotations.ymax < (window_ymax + offset))].copy(
deep=True)
# change the image name
image_basename = os.path.splitext("{}".format(annotations.image_path.unique()[0]))[0]
selected_annotations.image_path = "{}_{}.png".format(image_basename, index)
Expand Down
40 changes: 38 additions & 2 deletions docs/Evaluation.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,43 @@
# Evaluation

Independent analysis of whether a model can generalize from training data to new areas is critical for creating a robust workflow.
We stress that evaluation data must be different from training data, as neural networks have millions of parameters and can easily memorize thousands of samples. Therefore, while it would be rather easy to tune the model to get extremely high scores on the training data, it would fail when exposed to new images.

To get an evaluation score, specify an annotations file in the same format as the training example above. The model will
```
csv_file = get_data("OSBS_029.csv")
root_dir = os.path.dirname(csv_file)
results = model.evaluate(csv_file, root_dir, iou_threshold = 0.4)
```

The results object is a dictionary with keys, 'results',"recall","precision". Results is the intersection-over-union scores for each ground truth object in the csv_file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rephrase this. I would think of this: " The returned object is a dictionary containing the three keys: results, recall, and precision."

Not sure if the second part translates to this

The result in the csv-file represents the intersection-over-union score for each ground truth object


```
results["results"].head()
prediction_id truth_id IoU image_path match
39 39 0 0.00000 OSBS_029.tif False
19 19 1 0.50524 OSBS_029.tif True
44 44 2 0.42246 OSBS_029.tif True
67 67 3 0.41404 OSBS_029.tif True
28 28 4 0.37461 OSBS_029.tif False
```

This dataframe contains a numeric id for each predicted crown in each image, the matched ground truth crown in each image. The intersection-over-union score between predicted and ground truth (IoU), and whether that score is greater than the IoU threshold ('match').

The recall is the proportion of ground truth which have a true positive match with a prediction based on the intersection-over-union threshold, this threshold is default 0.4 and can be changed in model.evaluate(iou_threshold=<>)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dataframe contains a numeric id for each predicted crown in each image and the matched ground truth crown in each image. The intersection-over-union score between predicted and ground truth (IoU), and whether that score is greater than the IoU threshold ('match').

The recall is the proportion of ground truth that has a true positive match with a prediction based on the intersection-over-union threshold. The default threshold is 0.4 and can be changed in the model.evaluate(iou_threshold=<>)

```
results["box_recall"]
0.705
```

The regression box precision is the proportion of predicted boxes which overlap a ground truth box.

```
results["box_precision"]
0.781
```

To convert overlap among predicted and ground truth bounding boxes into measures of accuracy and precision, the most common approach is to compare the overlap using the intersection-over-union metric (IoU).
IoU is the ratio between the area of the overlap between the predicted polygon box and the ground truth polygon box divided by and the area of the combined bounding box region.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the and?

polygon box divided by and the area of the combined..>> polygon box divided by the area of the combined


Expand Down Expand Up @@ -135,7 +173,5 @@ results = model.evaluate(
csv_file="new_annotations.csv",
root_dir=<base_dir from above>
)


```

85 changes: 85 additions & 0 deletions docs/annotation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Annotation
Annotation is likely the most important part of machine learning projects. Fancy models are nice, but data is always paramount. If you aren't happy with model performance, annotating new samples is always the best first idea.

## How should I annotate images?
For quick annotations of a few images, we recommend using QGIS or ArcGIS. Either as project or unprojected data. Create a shapefile for each image.

![QGISannotation](../www/QGIS_annotation.png)

## Do I need annotate all objects in my image?
Yes! Object detection models use the non-annotated areas of an image as negative data. We know that it can be exceptionally hard to annotate all trees in an image, or determine the classes of all birds in an image. However, if you have objects in the image that are not annotated, the model is learning *to ignore* those portion of the image. This can severly effect model performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change This can severly effect model performance.
to
This can severely affect model performance.


## Can I annotate points instead of bounding boxes?
Yes. This make more sense for the bird detection task, as trees tend to vary widely in size. Often birds will be a standard size compared to the image resolution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 makes

Yes. This makes more sense for the bird detection task, as trees tend to vary widely in size. Often, birds will be a standard size compared to the image resolution.


If you would like to train a model, here is a quick video on a simple way to annotate images.

<div style="position: relative; padding-bottom: 62.5%; height: 0;"><iframe src="https://www.loom.com/embed/e1639d36b6ef4118a31b7b892344ba83" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>

Using a shapefile we could turn it into a dataframe of bounding box annotations by converting the points into boxes. If you already have boxes, you can exclude convert_to_boxes and buffer_size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a shapefile, we


```
df = shapefile_to_annotations(
shapefile="annotations.shp",
rgb="image_path", convert_to_boxes=True, buffer_size=0.15
)
```

Optionally we can split these annotations into crops if the image is large and will not fit into memory. This is often the case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optionally, we


```
df.to_csv("full_annotations.csv",index=False)
annotations = preprocess.split_raster(
path_to_raster=image_path,
annotations_file="full_annotations.csv",
patch_size=450,
patch_overlap=0,
base_dir=directory_to_save_crops,
allow_empty=False
)
```

## How can I view current predictions as shapefiles?
It often useful to train new training annotations starting from current predictions. This allows users to more quickly find and correct errors. The following example shows how to create a list of files, predict detections in each and save as shapefiles. A user can then edit this shapefiles in a program like QGIS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change It often useful. edit these shapefiles

It is often useful to train new training annotations starting from current predictions. This allows users to more quickly find and correct errors. The following example shows how to create a list of files, predict detections in each, and save as shapefiles. A user can then edit these shapefiles in a program like QGIS.


```
from deepforest import main
from deepforest.visualize import plot_predictions
from deepforest.utilities import boxes_to_shapefile

import rasterio as rio
import geopandas as gpd
from glob import glob
import os
import matplotlib.pyplot as plt
import numpy as np
from shapely import geometry

PATH_TO_DIR = "/Users/benweinstein/Dropbox/Weecology/everglades_species/easyidp/HiddenLittle_03_24_2022"
files = glob("{}/*.JPG".format(PATH_TO_DIR))
m = main.deepforest(label_dict={"Bird":0})
m.use_bird_release()
for path in files:
#use predict_tile if each object is a orthomosaic
boxes = m.predict_image(path=path)
#Open each file and get the geospatial information to convert output into a shapefile
rio_src = rio.open(path)
image = rio_src.read()

#Skip empty images
if boxes is None:
continue

#View result
image = np.rollaxis(image, 0, 3)
fig = plot_predictions(df=boxes, image=image)
plt.imshow(fig)

#Create a shapefile, in this case img data was unprojected
shp = boxes_to_shapefile(boxes, root_dir=PATH_TO_DIR, projected=False)

#Get name of image and save a .shp in the same folder
basename = os.path.splitext(os.path.basename(path))[0]
shp.to_file("{}/{}.shp".format(PATH_TO_DIR,basename))
```

122 changes: 0 additions & 122 deletions docs/bird_detector.md

This file was deleted.

Loading
Loading