Skip to content

Commit

Permalink
Fix DataSet management.
Browse files Browse the repository at this point in the history
  • Loading branch information
druzhynin-oleksii committed Dec 22, 2021
1 parent cd5e18f commit 5df64a8
Show file tree
Hide file tree
Showing 65 changed files with 1,929 additions and 2,055 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
.vscode

# python
.env
.venv
__pycache__/
*.py[cod]
Expand Down
208 changes: 208 additions & 0 deletions SETUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
# Setup

## Model Garden Minimal Setup

### Part I. Setup Cloud Account

#### Register Cloud Account

Create AWS account following instructions in
[aws.amazon.com/free](http://aws.amazon.com/free).

More than 5GB free tier S3 bucket storage should be enough for testing purposes.

**NOTE**: If you created already a free tier account couple years ago and it is
suspended (deleted) usually such account can't be restored due to Amazon
service implementation peculiarities. In this case the new account can be
registered for the same email but with appended a plus ("+") sign and any
combination of words or numbers before at ("@") sign (
see [2 hidden ways to get more from your Gmail
address](http://gmail.googleblog.com/2008/03/2-hidden-ways-to-get-more-from-your.html)
for more details).


#### Activate MFA in Cloud Account

It is recommended to enable
[Multi-factor authentication](http://www.wikipedia.org/wiki/Multi-factor_authentication)
in order to avoid any account data blocking issues:

* Install on the smart-phone
[Google Authenticator](http://play.google.com/store/apps/details?id=com.google.android.apps.authenticator2&hl=en),
[Microsoft Authenticator](http://play.google.com/store/apps/details?id=com.azure.authenticator)
or any other MFA mobile application to in order to perform
[Two-factor authentication](http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_mfa.html)
at the AWS account login.

* Open '**[My Security Credentials](http://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html)**'
from the account menu (having the same name as the created account) in the top
right corner of [console.aws.amazon.com](http://console.aws.amazon.com).

* In the cloud console expand '**[Multi-factor authentication (MFA)](http://aws.amazon.com/iam/features/mfa)**'
section with displayed [QR-code](www.wikipedia.org/wiki/QR_code) and
[scan](http://www.wikipedia.org/wiki/Barcode_Scanner_(application)) it in the
installed MFA mobile application using the smart-phone camera.


#### Create Account Access Key

Access key is needed to
[Add to Backend .env File](backend/README.md#add-backend-env-file) (see
[backend/README.md](backend/README.md)). This key can be taken from the account
[AWS security credentials](http://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html):

* Open '**My Security Credentials**' from the account menu (having the same name
as the account) in the top right corner of
[console.aws.amazon.com](http://console.aws.amazon.com).

* Expand '**[Access keys](http://docs.aws.amazon.com/general/latest/gr/aws-access-keys-best-practices.html)**'
section and press '**Create New Access Key**'.

* Dump '**Access Key ID**' and the key itself to a safe place.

<table>
<tr>
<th style="text-align:center">AWS Parameter</th>
<th style="text-align:center">Backend Env Variabel</th>
<th style="text-align:center">Value Example</th>
</tr>
<tr>
<td>AWSAccessKeyId</td>
<td>AWS_ACCESS_KEY_ID</td>
<td>ABCDEFGHIJKLMNOPQRST</td>
</tr>
<tr>
<td>AWSSecretKey</td>
<td>AWS_SECRET_KEY</td>
<td>abcdefghijklmnopqrstuvwxyz0123456789-+/</td>
</tr>
</table>


#### Add Cloud Bucket

[Create an S3 Bucket](http://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html)
in [s3.console.aws.amazon.com](http://s3.console.aws.amazon.com) with the
following settings and properties:

* Frankfurt **Region** (eu-central-1)

* Blocked *all public access*

* Disabled *Bucket Versioning* and *Encryption*

The recommended bucket name is *model-garden-*<your_initials>.

See more details in the [Setup S3 Bucket](deploy/README.md#setup-s3-bucket)
[Deployment](deploy/README.md) instructions.


#### Setup Content Delivery Network for Bucket Content

[Content delivery network](www.wikipedia.org/wiki/Content_delivery_network)
to reduce a latency of the media content delivery in the frontend pages.
Actually the content can be delivered from S3 bucket directly, but the latency
in this case is at minimum a half of second. At the same time the average CDN
latency fluctuates between 30 and 45 ms (see
[Benchmarking CDNs: CloudFront, Cloudflare, Fastly, and Google Cloud](www.pingdom.com/blog/benchmarking-cdns-cloudfront-cloudflare-fastly-and-google-cloud/)).

Steps to setup AWS CloudFront CDN:

1. Press '**[Create Distribution](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-creating-console.html)**'
in the AWS **[CloudFront Distributions](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-working-with.html)**
in [console.aws.amazon.com/cloudfront/home](http://console.aws.amazon.com/cloudfront/home).

2. Select **Web** as the delivery method.

3. Specify *model-garden-*<your_initials>.s3.*eu-central-1*.amazonaws.com as a
**[Origin Domain Name](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesDomainName)**
(*model-garden-*<your_initials> is the bucket name and *eu-central-1* is the
bucket **Region** specified at the bucket creation) and leave automatically generated
**[Origin ID](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesId)**
(e.g. S3-*model-garden-*<your_initials>).

**ATTENTION**: Usually '**Create Distribution**' wizard proposes
*model-garden-*<your_initials>.s3.amazonaws.com as a **Origin Domain Name**.
This option leads to the content access errors. Insert *eu-central-1* **Region**
between s3 and [amazonaws.com](amazon.com) to prevent CDN service from a
redirection to S3 bucket restricted links (see publications about
[AWS CloudFront redirecting to S3 bucket](www.stackoverflow.com/questions/38735306/aws-cloudfront-redirecting-to-s3-bucket)).

4. Leave as default the rest of parameters and press '**Create Distribution**'
at the bottom.

5. Open the newly-created distribution and in
[Origins and Origin Groups](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesTargetOriginId)
tab selecting *model-garden-*<your_initials>.s3.*eu-central-1*.amazonaws.com to
edit this.

6. At *model-garden-*<your_initials>.s3.*eu-central-1*.amazonaws.com editing in
the **Edit Origin** page after clicking at '*Yes*' in **Restrict Bucket Access**
radio buttons the following options should appear and be selected:
* Choose '*Create a New Identity*' in
**[Origin Access Identity](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesOAI)**.
* Choose '*Yes, Update Bucket Policy*' option in order to
[grant read permissions on bucket within CloudFront](https://acloud.guru/forums/aws-certified-solutions-architect-associate/discussion/-KaF8wx8WH-hwgkWkoP_/grant-read-permissions-on-bucket-within-cloudfront).

**NOTE**: By default the cloud bucket doesn't provide content read access to
other cloud services including CDN. '*Yes, Update Bucket Policy*' option is
obligatory to allow such access.

Finally bucket CDN **Domain Name** required by Model Garden admin pages should
appear in **[CloudFront Distributions](http://cconsole.aws.amazon.com/cloudfront/home?distributions)**
table. For instance:
* [https://abc7ah1iunm93.cloudfront.net/](https://abc7ah1iunm93.cloudfront.net/)
* [https://d1l3nuymkh9b8l.cloudfront.net/](https://abc7ah1iunm93.cloudfront.net/)

See more details in
[Create CloudFront Distribution](deploy/README.md#create-cloudfront-distribution)
section of [<model_garden_root>/deploy/README.md](deploy/README.md).


#### Test Access to Bucket Content through CDN

Put a test image to the root of the bucket in order to test CDN setup:

1. Download one of images from [Google Image Search](www.google.com/search?q=cats).

2. Rename the image as *test.jpg* to be short.

3. Open the created bucket root in
[s3.console.aws.amazon.com/s3/buckets/*model-garden-*<your_initials>](http://s3.console.aws.amazon.com/s3/buckets/)
and upload *test.jpg* there (e.g. via drug-n-drop).

4. Open the image using root CDN link (e.g.
[https://dcn7ah1iunm93.cloudfront.net/test.jpg](https://dcn7ah1iunm93.cloudfront.net/test.jpg)).

The *test.jpg* image should appear in the browser.

Try to repeat steps from
[Setup Content Delivery Network](#setup-content-delivery-network-for-bucket-content)
instruction, if you see the
[AccessDenied](http://aws.amazon.com/premiumsupport/knowledge-center/s3-website-cloudfront-error-403/)
error message with the following title:

- *[This XML file does not appear to have any style information associated with
it. The document tree is shown below.](http://www.stackoverflow.com/questions/44741287/cloudfront-error-this-xml-file-does-not-appear-to-have-any-style-information-as)*


#### Results

The cloud account setup should have the following results which are necessary
for [backend](backend/README.md) and [frontend](frontend/README.md) operations:

* Created Account Access Key (and its id).

<table>
<tr>
<td style="text-align:center">AWSAccessKeyId</td>
<td style="text-align:center">AWSSecretKey</td>
</tr>
<tr>
<td style="text-align:center">AWS_ACCESS_KEY_ID</td>
<td style="text-align:center">AWS_SECRET_KEY</td>
</tr>
</table>

* Distribution mechanism to display the bucket media files (e.g. images) in
the browser through CDN.
2 changes: 1 addition & 1 deletion backend/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ DJANGO_DB_HOST='localhost'
DJANGO_DB_PORT=5444
```

<sup>* - environment specific values</sup>
<sup>* - environment specific values. Ask the team to provide the real values.</sup>

### Make sure you have Python 3.8 installed
```
Expand Down
73 changes: 7 additions & 66 deletions backend/model_garden/admin/dataset.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,9 @@
import logging
from collections import defaultdict
from typing import List, Set, Tuple

from django.contrib import admin
from django.db.models import QuerySet

from model_garden.models import Dataset, MediaAsset
from model_garden.services import S3Client
from model_garden.services.s3 import DeleteError
from model_garden.models import Dataset
from model_garden.services import DatasetService

from .common import FilterCreatedFixture, format_date

Expand Down Expand Up @@ -36,66 +32,11 @@ def get_search_results(self, request, queryset, search_term):

def delete_model(self, request, obj):
queryset = type(obj).objects.filter(pk=obj.pk)

self.delete_queryset(request, queryset)

def delete_queryset(self, request, queryset):
media_assets = list(get_media_assets(queryset))

bucket_map = defaultdict(list)
for asset in media_assets:
bucket_map[asset.dataset.bucket.name].append(asset)

error_keys: Set[Tuple(str, str)] = set()
for bucket, assets in bucket_map.items():
file_path_to_remove = ([asset.full_path for asset in assets]
+ [asset.full_label_path for asset in assets])
delete_errors = delete_files_in_s3(
bucket, file_path_to_remove,
)
error_keys |= set((bucket, error.key) for error in delete_errors)

(
MediaAsset.objects
.filter(
pk__in=[
asset.pk for asset in media_assets
if (asset.dataset.bucket.name, asset.full_path) not in error_keys
],
).delete()
)
(
queryset
.exclude(
pk__in=set(
asset.dataset.pk for asset in media_assets
if (asset.dataset.bucket.name, asset.full_path) in error_keys
),
)
.delete()
)


def get_media_assets(dataset: QuerySet) -> QuerySet:
return (
MediaAsset.objects
.filter(dataset__in=dataset)
.select_related('dataset')
.select_related('dataset__bucket')
)


def delete_files_in_s3(bucket: str, keys: List[str]) -> List[DeleteError]:
if not keys:
return []

client = S3Client(bucket_name=bucket)

errors = client.delete_files_concurrent(*keys)

if errors:
logger.error(
'Unable to delete media_assets in bucket %s: %s', bucket, errors,
)

return errors
dataset_service = DatasetService()
media_assets = dataset_service.get_media_assets(queryset)
dataset = media_assets.first().dataset
dataset_service.delete_media_assets_by_dataset(media_assets)
dataset.delete()
25 changes: 16 additions & 9 deletions backend/model_garden/management/commands/process_task_statuses.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,12 @@ def _get_annotations_format(dataset: Dataset):
else:
return AnnotationsFormat.PASCAL_VOB_ZIP_1_1

def generate_empty_file(self):
bytesIO = BytesIO()
bytesIO.write(b'')
bytesIO.seek(0)
return bytesIO

def _upload_labeling_task_annotations(self, labeling_task: LabelingTask):
dataset = labeling_task.media_assets.first().dataset
annotation_frmt = self._get_annotations_format(dataset)
Expand Down Expand Up @@ -163,15 +169,16 @@ def _upload_labeling_task_annotations(self, labeling_task: LabelingTask):
labeling_file_name = f"{asset_filename}" + self._get_label_file_extension(annotation_frmt)
if labeling_file_name in annotation_filenames:
file_object = annotation_filenames[f"{asset_filename}" + self._get_label_file_extension(annotation_frmt)]

# TODO:remove deprecated remote_label_path property
media_asset.labeling_asset_filepath = media_asset.remote_label_path
s3_client.upload_file_obj(
file_obj=file_object,
bucket=bucket_name,
key=media_asset.full_label_path,
)
media_asset.save()
else:
file_object = self.generate_empty_file()
# TODO:remove deprecated remote_label_path property
media_asset.labeling_asset_filepath = media_asset.remote_label_path
s3_client.upload_file_obj(
file_obj=file_object,
bucket=bucket_name,
key=media_asset.full_label_path,
)
media_asset.save()
logger.info(f"Uploaded annotation '{media_asset.full_label_path}'")
except Exception as e:
raise Exception(f"Failed to upload task annotations: {e}")
46 changes: 23 additions & 23 deletions backend/model_garden/migrations/0025_auto_20200914_1241.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
# Generated by Django 3.0.6 on 2020-09-14 17:41

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('model_garden', '0024_auto_20200810_0335'),
]

operations = [
migrations.AddField(
model_name='mediaasset',
name='labeling_asset_filepath',
field=models.CharField(default='', max_length=512),
),
migrations.AlterField(
model_name='dataset',
name='dataset_format',
field=models.CharField(choices=[('PASCAL_VOC', 'Pascal VOC'), ('YOLO', 'YOLO')], default='PASCAL_VOC', max_length=16),
),
]
# Generated by Django 3.0.6 on 2020-09-14 17:41

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('model_garden', '0024_auto_20200810_0335'),
]

operations = [
migrations.AddField(
model_name='mediaasset',
name='labeling_asset_filepath',
field=models.CharField(default='', max_length=512),
),
migrations.AlterField(
model_name='dataset',
name='dataset_format',
field=models.CharField(choices=[('PASCAL_VOC', 'Pascal VOC'), ('YOLO', 'YOLO')], default='PASCAL_VOC', max_length=16),
),
]
Loading

0 comments on commit 5df64a8

Please sign in to comment.