This platform was developed with the help of the Pansurg community, with the aim of providing a way to accommodate the document annotation and evaluation method currently used by Pansurg.
The platforms makes use of Amazon Web Services resources such as S3 and DynamoDB through use the AWS Amplify framework. An AppSync GraphQL API is used to interact with data items stored in DynamoDB.
It is important to note that it is assumed that administrators have full access to the AWS account on which the resources will be hosted.
The administrator is able to assign tasks to curators, reassign incomplete tasks and delete tasks.
Each annotation task has several features: the annotation instructions, the document to be annotated, the text highlight for indicated Named Entity recognition (NER) labels, and the categorical annotation questions to classify/evaluate the document. Each feature needs to be selected in the task creation page, which will be accessible to the administrator upon sign-in
Question forms can be created from the “Question Form creation” tab within the task creation page, with will then be stored in the Question Form DynamoDB table for the environment.
Administrators can reassign any incomplete tasks with the “Reassign tasks” page.
Question centric task deletion is possible from the task deletion page.
Ongoing progress of incomplete tasks can be seen in the “Active tasks” page.
Curators are able to view all annotation tasks assigned to them within the “Tasks” tab, and can then annotate and submit them. All completed task results are viewable in the “Completed tasks” page.
The home page is available to all users and displays annotation results for completed tasks.
The outline of the platform architecture can be seen below:
The semantic agreement and the inter-annotator agreement are calculated with Lambda functions. The Spacy library used within the calculation of the semantic agreement is unfortunately too large to be used in the classic Lambda deployment package and is hosted within a Docker image (a feature currently not supported by Amplify). The semantic agreement therefore has its own manual setup process as described in the Setup section.
All annotation data is stored as items in DynamoDB, and is free for retrieval. NER labels for each annotation task are stored in the annotation task DynamoDB table within the “labels” item. Categorical question answers are stored in the “question_answers” item.
The semantic agreement and inter-annotator agreement, calculated by the lambda functions, is stored within the “Medical Question” data item.
Annotation documents are stored within the platform S3 image bucket.
It is assumed that the Administrator has access to the AWS account hosting the platform resources. Once the annotation platform is fully set up, to create account, simply press the “Sign in” button on the top right hand corner and press “Create account” to create a new profile. Once the account is confirmed, the new user will need to be added to the correct group manually. From the AWS web console, navigate to the Cognito interface and select “Manage User Pools”. Select the user pool for the newly created environment. From “Users and groups”, select “Groups” and the “Admin” group. From here, add the newly created user to the “Admin” group to access the administrative features in the site. To add curators, repeat this process instead with the ”Curators” group. Each user will need to be manually reviewed to ascertain what permissions they should have. Once a curator has been added, annotation tasks can now be assigned to them by the task creation process.
Documents for annotation will be stored in the S3 bucket associated with created Amplify environment. Document will also have to be uploaded by use of the AWS console. Navigate to the S3, and select the image bucket for the relevant Amplify environment. Create a “public” folder, and navigate to it. Within this folder, create new folders to subdivide the annotation documents. Documents can then be stored within these folders. These documents can then be selected from the task creation page.
The documents for annotation will have to be formatted as follows:
# URL to online version
Url_to_online_version
# Title
Document title
# Abstract
Abstract paragraph 1
# Main text
Main text paragraph 1
Main text paragraph 2
The number of curators per document can be changed by altering the .env file within platform directory and changing the REACT_APP_NUMBER_CURATORS environment variable.
More features may be added to the platform. Currently, The question format only supports single answer categorical questions. More question format may be added by altering relevant elements such as the “QuestionFormCreation”, “AnnotationQuestions” and “TasksId” components within the directory.
The highlight component currently does not support mobile use, if a new highlight tool is to be created, relevant changes will need to be made to the “AnnotationPage” component.
Currently a maximum of 10 different labels with colours are able to be assigned to the documents. Additional colours may be added through alteration of the colours defined in the “adminConstants” file.
If you wish to make any changes to the code, fork the repository, add a feature or fix a bug and then file a pull request to merge changes into the main repository.
This project makes use of the amplify cli and react-specific dependencies:
npm install -g @aws-amplify/cli
npm install aws-amplify @aws-amplify/ui-react
An AWS account is also required to be configured, details of which is shown here:
The lambda function for the semantic agreement makes use of Docker:
npm install -g docker
The project can then cloned with the command whilst in an empty directory:
git clone https://github.com/PanSurg/annotate-it.git
Initialize the Amplify application with:
amplify init
Accept all existing configurations. You will then be issued with a series of prompts. Enter the new name for the Amplify environment which will host the resources for the platform. Choose the default editor of your choice, and use the AWS profile that was set up previously.
More details are available here:
https://docs.amplify.aws/cli/start/workflows/#amplify-console
Once the resources have been successfully set up, run the following command:
amplify push
Set the codegen parameter to a value of 6
amplify codegen –maxDepth 6
Install all dependencies with the following command:
npm install
Once all dependencies have been installed, it should now be possible to launch the application in your local host with:
npm start
The semantic agreement lambda function requires separate setup, as Amplify unfortunately does not support deployment of Docker images as Lambda functions. Docker is required, so make sure this is installed.
Navigate to the “lambda-docker” folder within the project directory and change the variables medicalQuestionTable and annotationTaskTable within app.py to the names of your project tables. Change to the relevant endpoint for your api (accessible via the Appsync console>API for platform environment> Settings>API ID), and replace with the name of your environment. The number of curators can be set with the number_required_curators variable.
Now you need to create an ECR repository and create a new repository to store the image.
Navigate to Elastic Container Registry interface and click “Create Repository” in the right hand corner.
Enter the name of the repository (e.g. semantic-agreement) and click “Create repository”
Navigate to the newly created repository and click “View push commands” at the top right hand corner of the site. Some instructions should be available in the format as follows:
Simply copy and paste these instructions in order while within the lambda-docker folder, and the image should be built and pushed to the ECR repository.
Navigate to the Lambda management console within the AWS user interface, and select “Create function” as shown here:
Within the create function page, click “Container image”. Enter the function name (e.g. semantic-agreement) and paste the URI of the ECR repository created earlier.
Change the default execution to an existing one. From the dropdown, select the Lambda role associated with the project environment, which should have been created upon project initialisation.
After creation, click “Add trigger” and select the “Annotation task” table for the project environment. Be sure to reduce the batch size from 100 to 20.
A final step is to change the timeout in “General configuration” from 3 secs to 60 secs, change the max memory used to 3000MB and the ephemeral storage to 2048.
To deploy the platform, navigate to the Amplify console within the AWS web user interface and select the annotation platform application. Within “Hosting environments”, connect a new branch and select the desired branch as well as the amplify environment. Make sure the GitHub account is authenticated. Confirm and deploy. The amplify.yml build file is shown here:
version: 1
env:
variables:
VERSION_AMPLIFY: 8.4.0
backend:
phases:
preBuild:
commands:
- npm i aws-amplify
- npm i @aws-amplify/auth
build:
commands:
- '# Execute Amplify CLI with the helper script'
- amplifyPush --simple
frontend:
phases:
preBuild:
commands:
- npm install
build:
commands:
- npm run build
artifacts:
baseDirectory: build
files:
- '**/*'
cache:
paths:
- node_modules/**/*
Within Rewrites and Redirects, also make sure to add the following rule:
Source address: </^((?!.(css|gif|ico|jpg|js|png|txt|svg|woff|ttf)$).)*$/>
Target addres: /index.html
Type: 200 (Rewrite)
Country code: -