Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation API core #8348

Merged
merged 265 commits into from
Oct 3, 2024
Merged

Validation API core #8348

merged 265 commits into from
Oct 3, 2024

Conversation

zhiltsov-max
Copy link
Contributor

@zhiltsov-max zhiltsov-max commented Aug 26, 2024

Motivation and context

Depends on #8272
Depends on #8321

  • Added server API for creation of a GT job on task creation
  • Added server support for task creation with GT pool (aka Honeypot)
  • Added new GT job frame selection method random_per_job, which guarantees each annotation job gets the specified GT overlap, making each annotation job validatable
  • Added new GT job frame count selection options based on task size % and segment size %
  • Changed GT job creation parameter "frames" to accept relative frame ids instead of absolute (source data) ones
  • Allowed frame deletion in GT jobs. Deleted GT frames are considered excluded from validation, so should not appear in quality reports. Frame removal from a simple GT job (in tasks without honeypots) doesn't remove task frames, only the GT job frames.

Server API changes:

  • GET /api/tasks/{id}/ got a new validation_mode field, reflecting the current validation configuration (immutable)
  • POST /api/tasks/{id}/data got a new validation_params field, which allow to enable GT / GT_POOL validation for a task on its creation

Tasks with Honeypots

This validation mode affects task creation, so can only be used in task creation. It cannot be disabled or changed after the task is created. When honeypots are configured, each job in the task gets several extra validation frames.
The pool of available frames and the number of validation frames per job are specified by the user at task creation.

Limitations:

  • This validation mode can only be used with random frame ordering.
  • Inherently, this assumes that job_frame_mapping and overlap cannot be used in such tasks.
  • Track annotations are prohibited in tasks with honeypots enabled.

Honeypot frames and GT annotations are accessible via the GT job, as in the case with regular GT jobs. However, unlike regular tasks with GT jobs, task annotation import affects the GT job as well in tasks with honeypots. Task annotation export contains only GT annotations on validation frames (so, only the GT copy of validation frames is included).

How has this been tested?

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)
  • I have increased versions of npm packages if it is necessary
    (cvat-canvas,
    cvat-core,
    cvat-data and
    cvat-ui)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

  • New Features

    • Introduced a server setting to disable media chunks on the local filesystem, enhancing configurability.
    • Added tracking for the last assignee update date in quality reports, improving task management.
    • Enhanced job chunk identifiers for better clarity and uniqueness.
  • Bug Fixes

    • Resolved memory management issues and refined job assignment logic in video processing.
  • Documentation

    • Updated API schema with new enhancements related to job management and validation processes.
  • Chores

    • Updated package dependencies and added new configuration settings for Redis in the Helm chart.

)
path = models.CharField(max_length=1024, default='')

class ValidationLayout(models.Model):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to bind ValidationParams and ValidationLayout to Data instead of Task model?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May you explain the name ValidationLayout? I feel is something like ValidationPool

Copy link
Contributor Author

@zhiltsov-max zhiltsov-max Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Layout:

  1. Merriam-Webster

the plan or design or arrangement of something laid out

  1. Wiki

In general terms, a layout is a structured arrangement of items within certain limits, or a plan for such arrangement. Specifically, layout may refer to: Page layout, the arrangement of visual elements on a page.

It's used to describe validation frames in tasks, both for simple GT and for Honeypots. That's why it doesn't have pool in the name.

What is the reason to bind ValidationParams and ValidationLayout to Data instead of Task model?

This is made to be the same as storing deleted_frames in the Data model. Basically, it describes task data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Layout sounds like some set of elements and their relation to each other.
But in our case is just couple of sets. Pool would sound good to describe and it is applicable in general for both GT job and Honeypot job.

Hovewer if you do not want to use the word Pool -> it is okay, up to you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is made to be the same as storing deleted_frames in the Data model.

I can't really understand the explanation. Hovewer in the future this design may be a problem if we want to use the same Data object to create multiple tasks (this is not a fact that we will do this, but anyway).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, considering existing database layout feature like: import raw data, select them and create tasks based on them already not implementable without new database classes.

elif db_segment.type == "specific_frames":
frame_set = set(frame_range).intersection(db_segment.frames or [])
else:
raise ValueError(f"Unknown segment type: {db_segment.type}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that raising uncaught exception is good in migration file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On our prod we only have specific_frames defined, so, it will not be a problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is to fail the migration, if the DB contains invalid entries. We don't know why they are there and what to do with them.

@bsekachev
Copy link
Member

can't create a task with honeypot job and context images:

Traceback (most recent call last):
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/rq/worker.py", line 1431, in perform_job
    rv = job.perform()
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/rq/job.py", line 1280, in perform
    self._result = self._execute()
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/rq/job.py", line 1317, in _execute
    result = self.func(*self.args, **self.kwargs)
  File "/usr/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/cvat/cvat/apps/engine/task.py", line 1347, in _create_thread
    models.RelatedFile.objects.bulk_create(db_related_files)
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/models/query.py", line 803, in bulk_create
    returned_columns = self._batched_insert(
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/models/query.py", line 1831, in _batched_insert
    self._insert(
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/models/query.py", line 1805, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1822, in execute_sql
    cursor.execute(sql, params)
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/backends/utils.py", line 102, in execute
    return super().execute(sql, params)
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.IntegrityError: duplicate key value violates unique constraint "engine_relatedfile_data_id_path_a7223d1e_uniq"
DETAIL:  Key (data_id, path)=(5, /home/bsekachev/app.cvat.ai/cvat_enterprise/cvat/data/data/5/raw/context_images example/related_images/3Z2A3692_jpg/3Z2A3692.jpg) already exists.

@zhiltsov-max
Copy link
Contributor Author

@bsekachev

can't create a task with honeypot job and context images:

Should be fixed now.

Copy link

sonarcloud bot commented Oct 2, 2024

@bsekachev bsekachev merged commit 1285858 into develop Oct 3, 2024
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants