Ticket Batch Process

A database table/model "Ticket" has 1 million rows. The table has a "token" column that holds a random unique UUID value for each row, determined by Django/Python logic at the time of creation.

Due to a data leak, the candidate should write a Django management command to iterate over every Ticket record to regenerate the unique UUID value.

The command should inform the user of progress as the script runs and estimate the remaining time.

The script should also be sensitive to the potentially limited amount of server memory, avoiding loading the full dataset into memory all at once, and should show an understanding of Django ORM memory usage and query execution.

Finally, the script should ensure that if it is interrupted, it saves progress and restarts near the point of interruption so that it does not need to process the entire table from the start.

Environment Setup

Build and start Docker Compose services
```
docker-compose up -d --build
```
Note: We should stop any service running on port 8000 and port 5432 (PostgreSQL database) to avoid build failures.
Access the app container
```
docker exec -it app /bin/bash
```
Migrate the database
```
python manage.py migrate
```

Add sample data to the database

python manage.py ticket --init --size 50000

Run the command

Run the command to regenerate a token for each ticket record
```
python manage.py ticket --process-token
```
The command shows the progress of the script as well as the estimated time remaining
```
completed:  2.30 % (1150 of 50000) | remaining time:   31 second(s)
```
If the command is interrupted for any reason, it can save progress and, when restarted, will continue from near the point of interruption
```
completed: 30.30 % (14800 of 48850) | remaining time:   21 second(s)
```

Solution

We assume that the Ticket table has the ID field as the primary key, which is an auto-increment number.

We use a table batchprogress to keep track of the state of the batch process and the last ID of the record successfully processed.

The sample project demonstrates how we process a large dataset using Django queryset, so we don’t focus on the logic of token generation.

For testing purposes, the database credentials (user, password, etc.) are also hardcoded. In a real development environment, these should be stored in a .env file

Implementation

There are two main management commands: init and process-token, located at src/app/management/commands/ticket.py

1. The `init` command

The init command will clean up the data and regenerate sample data with the total number of records specified by the size argument (default: 1000).

2. The `process-token` command

We build a queryset that filters and orders the records based on the primary key (which by default has a B-Tree index).

We calculate the total records that need to be processed first. After that, we loop until we process all the records.

total = Ticket.objects.filter(pk__gt=last_pk).count()

...

offset = 0
while offset < total:
    ...

    last_pk = self.batch_process(last_pk)
    offset += self.BULK_SIZE

For each iteration of the loop, we limit the results by BULK_SIZE (default: 50)

chunk = iter(Ticket.objects.filter(pk__gt=pk).order_by("pk")[:self.BULK_SIZE])

The chunk here is also an iterator, so it won't load into memory until we access it.

For each chunk of data, we update the batch state by saving the last ID of the record processed. We also use bulk_update for tickets to reduce the number of query operations.

try:
    Ticket.objects.bulk_update(bulk, ["token"])
    self.update_batch_state({'last_value': last_pk}, None, None)
except:
    self.update_batch_state({'last_value': pk}, None, None)
    return pk

The progress and estimated time remaining are calculated based on the total records that need to be processed and the number of records already processed successfully in the time taken so far

percentage = offset * 100.0 / total

took = time.time() - start_time
return {
    'percentage': percentage,
    'remaining_time': "%0.0f" % ((total - offset) / offset * took) if offset > 0 else "-"
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ticket Batch Process

Environment Setup

Run the command

Solution

Implementation

1. The `init` command

2. The `process-token` command

About

Releases

Packages

Languages

duongpv7/ticket-batch-process

Folders and files

Latest commit

History

Repository files navigation

Ticket Batch Process

Environment Setup

Run the command

Solution

Implementation

1. The init command

2. The process-token command

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. The `init` command

2. The `process-token` command

Packages