Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dense captioning #8

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"python.defaultInterpreterPath": "/usr/local/bin/python3",
"python.pythonPath": "/usr/local/bin/python3"
}
48 changes: 48 additions & 0 deletions Andy-Manual.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Dense Captioning Front-End Manual

### Introduction

The Front-End for Dense Captioning is meant to act as an easy to use user interface to run a machine learning algorithm on images submitted by the user. This algorithm can take an inputed image and return a large JSON file of data that contains the analysis of the algorithm.

This output contains:

- The image name
- The captions for each image. This is what the algorithm has identified in the image
- The "score" for each caption. This is the confidence the algorithm has assigned to that specific caption
- The bounding box coordinates for each caption. This defines where in the original image the caption is refering to

The front-end is meant to be an easy to use user interface to abstract away all the intricate processess associated with running this algorithm, whilst also providing a more digestible format for the results of the algorithm.

### How does it work?

The Front-End is based on a React.js framework and Material UI is used for the styling of components. The front end allows for the user to submit an image file, which then is sent to an Azure cloud storage and retrived by the algorithm in the backend. The algorithm runs and uploads the output of the algorithm into the back-end database where all the information will be stored. The data is then retrived by the front-end from the database and displays 4 captions from the image that have the highest scores. Each caption will be displayed along with its score, and the coordinates of the bounding box.

The information on how the backend and the algorithm works can be found on their own manual pages.

### Setup

- Download Docker and follow the setup guide found in the UMass-Rescue/596-S22-Backend repository ReadMe file
- Download/install node, and install yarn through node
- In the dense_frontend subdirectory of the aforementioned repository, run "yarn start" in the terminal
- The web app will display on localhost:3000

## What is Implemented vs "In-Production"

### Currently Implemented

- Image submission to front-end
- Image submission to azure and obtaining image from azure to do analysis (seperate from front-end)
- Front-End obtaining results from database
- Results tab that displays all the output information
- back-end route for obtaining results for an inputed image name

### In-Production

- Linking the front-end to the azure functionality
- Containerizing algorithm so that everything works within the same directory
- Adding a search image function so the user can type in any image name of any image stored in the database
and the results will be displayed.

## Future Plans

A potentail feature that can be implemented would be to display the given image, but cropped at the bounding boxes for each caption. This would be more user friendly and easier to use instead of just having the bounding box coordinates displayed for each caption. Another potential feature to add is dynamically adding and subtracting the number of captions displayed per the user's preference. Currently it only displays 4 captions and if one wanted more or less they would have to add them or subtract them manually.
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ COPY requirements.txt requirements.txt

RUN pip install --no-cache-dir --upgrade -r /rescue/requirements.txt

RUN pip install python-multipart

COPY ./app /rescue/app
44 changes: 44 additions & 0 deletions alembic/versions/6cd5ad04ca7a_densecaptioning.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
"""DenseCaptioning

Revision ID: 6cd5ad04ca7a
Revises: bcdeb6b47060
Create Date: 2022-03-09 12:07:45.244683

"""
from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
revision = '6cd5ad04ca7a'
down_revision = 'bcdeb6b47060'
branch_labels = None
depends_on = None


def upgrade():
op.create_table(
"denseCaptionParent",
sa.Column("id", sa.Integer, primary_key=True, index=True),
sa.Column("imageName", sa.String, unique=True, index=True),
)

op.create_table(
"denseCaptionChild",
sa.Column("id", sa.Integer, primary_key=True, index=True),
sa.Column("caption", sa.String, index=True),
sa.Column("score", sa.Float),
sa.Column("bounding_x", sa.Float),
sa.Column("bounding_y", sa.Float),
sa.Column("bounding_w", sa.Float),
sa.Column("bounding_h", sa.Float),
sa.Column("parent_id", sa.Integer),

sa.ForeignKeyConstraint(('parent_id',), ['denseCaptionParent.id'], ),
)


def downgrade():
op.drop_table("denseCaptionParent")
op.drop_table("denseCaptionChild")

39 changes: 38 additions & 1 deletion app/crud.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,41 @@ def create_message(message: schemas.MessageCreate, db: Session):
db.add(db_message)
db.commit()
db.refresh(db_message)
return db_message
return db_message

def create_dense_caption(data: schemas.DenseCaptionCreate, db: Session):
for entry in data.results:
db_dense_caption_parent = models.DenseCaptionParent(imageName=entry.img_name)
db.add(db_dense_caption_parent)
db.commit()
db.refresh(db_dense_caption_parent)
for i in range(len(entry.scores)):
db_dense_caption_child = models.DenseCaptionChild(score=entry.scores[i], caption=entry.captions[i], bounding_x=entry.boxes[i][0], bounding_y=entry.boxes[i][1], bounding_w=entry.boxes[i][2], bounding_h=entry.boxes[i][3], parent_id=db_dense_caption_parent.id)
db.add(db_dense_caption_child)
db.commit()
db.refresh(db_dense_caption_child)
return db_dense_caption_parent

def get_children(parent_id: int, db: Session, skip: int = 0, limit: int = 100):
data = db.query(models.DenseCaptionChild).filter(models.DenseCaptionChild.parent_id==parent_id).offset(skip).limit(limit).all()
final = {}
final['children'] = data
return final

def get_parents(image_name: str, db: Session, skip: int = 0, limit: int = 5):
data = db.query(models.DenseCaptionParent).filter(models.DenseCaptionParent.imageName==image_name).offset(skip).limit(limit).all()
final = {}
final['parents'] = data
return final

def get_images_keyword(keyword: str, db: Session, skip: int = 0, limit: int = 100):
data = db.query(models.DenseCaptionChild).filter(models.DenseCaptionChild.caption.contains(keyword)).offset(skip).limit(limit).all()
final_data = []
for child in data:
temp1 = db.query(models.DenseCaptionParent).filter(models.DenseCaptionParent.id==child.parent_id).offset(skip).limit(limit).all()
final_data.append(temp1[0].imageName)
#final_data.append(child.parent_id)
real_final = list(dict.fromkeys(final_data))
final = {}
final['temp'] = real_final
return final
43 changes: 40 additions & 3 deletions app/main.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,25 @@
from email import message
from typing import List
from typing import List, Dict
import shutil

from fastapi import Depends, FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi import Depends, FastAPI, HTTPException, File, UploadFile
from sqlalchemy.orm import Session

from . import crud, models, schemas
from .database import SessionLocal, engine

app = FastAPI()

origins = ["*"]

app.add_middleware(
CORSMiddleware,
allow_origins=origins,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Dependency
def get_db():
db = SessionLocal()
Expand Down Expand Up @@ -70,4 +81,30 @@ def reader_messages_for_user(user_id: int, skip: int = 0, limit: int = 100, db:
# Route - POST - create new message between user and recipient
@app.post("/messages/", response_model=schemas.Message)
def create_message(message: schemas.MessageCreate, db: Session = Depends(get_db)):
return crud.create_message(db=db, message=message)
return crud.create_message(db=db, message=message)

@app.post("/denseCaptionCreate/", response_model=schemas.DenseCaptionChild)
def create_dense_caption(data: schemas.DenseCaptionCreate, db: Session = Depends(get_db)):
return crud.create_dense_caption(data=data, db=db)

@app.get("/denseCaptionGet/{parent_id}/child", response_model=Dict[str, List[schemas.DenseCaptionChild]])
def get_children(parent_id: int, skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
children = crud.get_children(parent_id=parent_id, db=db, skip=skip, limit=limit)
return children

@app.get("/denseCaptionGetParents/{image_name}", response_model=Dict[str, List[schemas.DenseCaptionParent]])
def get_parents(image_name: str, skip: int = 0, limit: int = 5, db: Session = Depends(get_db)):
parents = crud.get_parents(image_name=image_name, db=db, skip=skip, limit=limit)
return parents

@app.get("/denseCaptionGetimages/{keyword}", response_model=Dict[str, List[str]])
def get_images(keyword: str, skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
images = crud.get_images_keyword(keyword=keyword, db=db, skip=skip, limit=limit)
return images

@app.post("/denseCaptionUploadImages", response_model=Dict[str, str])
def image(image: UploadFile = File(...)):
image_object = image.file
with open("destination.jpg", "wb+") as upload:
shutil.copyfileobj(image_object, upload)
return {"filename": image.filename}
26 changes: 24 additions & 2 deletions app/models.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from datetime import datetime
from sqlalchemy import TIMESTAMP, Boolean, Column, ForeignKey, Integer, String, Date
from sqlalchemy import TIMESTAMP, Boolean, Column, ForeignKey, Integer, String, Date, Float
from sqlalchemy.orm import relationship

from .database import Base
Expand Down Expand Up @@ -33,4 +33,26 @@ class Message(Base):
sender_id = Column(Integer, ForeignKey("users.id"))
recipient_id = Column(Integer, index=True)

sender = relationship("User", back_populates="messages")
sender = relationship("User", back_populates="messages")

class DenseCaptionParent(Base):
__tablename__ = "denseCaptionParent"

id = Column(Integer, primary_key=True, index=True)
imageName = Column(String, index=True)

children = relationship("DenseCaptionChild", back_populates="parent")

class DenseCaptionChild(Base):
__tablename__ = "denseCaptionChild"

id = Column(Integer, primary_key=True, index=True)
caption = Column(String, index=True)
score = Column(Float, index=True)
bounding_x = Column(Float, index=True)
bounding_y = Column(Float, index=True)
bounding_w = Column(Float, index=True)
bounding_h = Column(Float, index=True)
parent_id = Column(Integer, ForeignKey("denseCaptionParent.id"))

parent = relationship("DenseCaptionParent", back_populates="children")
60 changes: 59 additions & 1 deletion app/schemas.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from datetime import datetime
from typing import List, Optional
from typing import List, Optional, Dict, Union
from pydantic import BaseModel
from sqlalchemy import TIMESTAMP, true

Expand Down Expand Up @@ -48,3 +48,61 @@ class Message(BaseModel):

class Config:
orm_mode = True

#---------------------------------------------------------


class CreateOpt(BaseModel):
output_dir: str
num_to_draw: int
final_nms_thresh: float
use_cudnn: int
text_size: int
max_images: int
gpu: int
splits_json: str
vg_img_root_dir: str
checkpoint: str
num_proposals: int
rpn_nms_thresh: float
image_size: int
input_image: str
input_split: str
box_width: int
input_dir: str
output_vis_dir: str
output_vis: int


class CreateResult(BaseModel):
img_name: str
scores: List[float]
captions: List[str]
boxes: List[List[float]]


class DenseCaptionCreate(BaseModel):
opt: CreateOpt
results: List[CreateResult]


class DenseCaptionChild(BaseModel):
id: int
caption: str
score: float
bounding_x: float
bounding_y: float
bounding_w: float
bounding_h: float
parent_id: int

class Config:
orm_mode = True


class DenseCaptionParent(BaseModel):
id: int
imageName: str

class Config:
orm_mode = True
15 changes: 15 additions & 0 deletions bounding_boxes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import cv2

img = cv2.imread('imgs/doorbellDay.jpg')

x = 331
y = 103
w = 230
h = 292

cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0))

cv2.imshow('image', img)

cv2.waitKey(0)
cv2.destroyAllWindows()
23 changes: 23 additions & 0 deletions dense_frontend/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.

# dependencies
/node_modules
/.pnp
.pnp.js

# testing
/coverage

# production
/build

# misc
.DS_Store
.env.local
.env.development.local
.env.test.local
.env.production.local

npm-debug.log*
yarn-debug.log*
yarn-error.log*
Loading