Skip to content

V1.2 crontask #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 54 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
7d2b2ce
changed title
vg-leanix Dec 14, 2020
ea740de
new title
vg-leanix Dec 14, 2020
e4b09fe
updated gitignore
vg-leanix Dec 14, 2020
00ea1b5
cleansing
vg-leanix Dec 19, 2020
8e8548f
monngodb backend + ws
vg-leanix Dec 19, 2020
cbc1381
docker cheatsheet
vg-leanix Dec 19, 2020
bc7fdf1
more stable build
vg-leanix Dec 19, 2020
791db04
bug fix: no update data available
vg-leanix Dec 19, 2020
84ef043
persist taskList to localStorage for smoother UX
vg-leanix Dec 19, 2020
e2dd6c8
enable file sharing between api hub and celery wor
vg-leanix Dec 19, 2020
12c42d5
Merge branch 'master' into v1.1_mongodb
vg-leanix Dec 19, 2020
a01045c
added api call for serving pptx
vg-leanix Dec 20, 2020
97f8c34
optimized build
vg-leanix Dec 20, 2020
24d6e0e
Merge branch 'v1.1_mongodb' of https://github.com/vg-leanix/knowlix i…
vg-leanix Dec 20, 2020
764445e
renamed alert to more succicent naming
vg-leanix Dec 20, 2020
5b9606d
minor bug
vg-leanix Dec 20, 2020
6640ad7
change title to knowlix
vg-leanix Dec 20, 2020
f2a9c8d
plausibilty checks before launching expensive job
vg-leanix Dec 20, 2020
60ac893
notifications for rejections by APIHub
vg-leanix Dec 20, 2020
5ab4cf0
architechture maps
vg-leanix Dec 20, 2020
949cb43
cutting fluff
vg-leanix Dec 21, 2020
fd229fb
inserted new architecture pic
vg-leanix Dec 21, 2020
b13f804
bugfix on README
vg-leanix Dec 21, 2020
c555d0f
headline to readme
vg-leanix Dec 21, 2020
b600667
new architecture map
vg-leanix Dec 21, 2020
44211c8
update
vg-leanix Dec 21, 2020
fa51ccc
first commit on branch
vg-leanix Dec 21, 2020
d4c5d08
push
vg-leanix Dec 21, 2020
20099b8
cleaned setup
vg-leanix Dec 21, 2020
12b15c4
first stage of new celery beat workflow
vg-leanix Dec 21, 2020
8a0d015
error handling for various mongo events
vg-leanix Dec 22, 2020
181bda9
created crontask to delete downloaded files
vg-leanix Dec 22, 2020
ac9f870
smaller python image
vg-leanix Dec 22, 2020
8d02157
basic logging for task
vg-leanix Dec 22, 2020
128f240
added celery beat cronjob worker
vg-leanix Dec 22, 2020
80ecbf3
changed worker naming convention
vg-leanix Dec 22, 2020
c95bf02
fixed cronscheduler
vg-leanix Dec 23, 2020
1dbd93c
built api call to register download status
vg-leanix Dec 23, 2020
3748284
documentation of in code api calls
vg-leanix Dec 23, 2020
5fc7154
track status started
vg-leanix Dec 23, 2020
681b30b
see last commit
vg-leanix Dec 23, 2020
6346a1f
corrected db link
vg-leanix Dec 23, 2020
119e15a
filename wrong
vg-leanix Dec 23, 2020
1a2cc1b
smaller build image
vg-leanix Dec 23, 2020
daad944
bugfix
vg-leanix Dec 23, 2020
b8e31be
fixed deletion error
vg-leanix Dec 23, 2020
709f3d5
consistent use of MDB as persistent layer
vg-leanix Dec 23, 2020
92b15aa
no localstorage needed anymore
vg-leanix Dec 23, 2020
c6b7b40
finished work on DB sync
vg-leanix Dec 23, 2020
15d925d
used localStorage to prevent exessive use of api
vg-leanix Dec 24, 2020
0fc2cc9
changed build name
vg-leanix Dec 24, 2020
5ada050
more stable build
vg-leanix Dec 24, 2020
4d67c62
donwload section finish
vg-leanix Dec 24, 2020
c983731
cleaning build files
vg-leanix Dec 27, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,6 @@ data.json
*.pptx
/backend/output
*.pyc
*.log
*.log
mongodb/
db-backup/
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@

# knowlix


Microservice to autmatically create onboarding slides

![LIX Builder](https://github.com/vg-leanix/pptx-tool/blob/main/Thumbnail.png)

## Architecture
![Architecture](https://github.com/vg-leanix/knowlix/blob/v1.1_mongodb/knowlix%20architecture.png)
1 change: 1 addition & 0 deletions backend/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
env/
5 changes: 3 additions & 2 deletions backend/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
FROM python:3.8.1
FROM python:3.9.1-slim-buster

WORKDIR /usr/app


COPY req.txt ./
COPY api.py core.py main.py master.pptx req.txt server.py ./

# RUN mkdir output
RUN mkdir output

RUN pip install --upgrade pip
RUN pip install -r req.txt --no-cache-dir
Expand Down
164 changes: 131 additions & 33 deletions backend/api.py
Original file line number Diff line number Diff line change
@@ -1,39 +1,46 @@
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
import pymongo
from fastapi.responses import FileResponse
from pydantic import BaseModel
from typing import List
from main import create_pptx, get_sections
import os
from datetime import datetime
from pptx import Presentation
from server import celery
import json
import uuid
from datetime import datetime
from pymongo import MongoClient


## CONFIG ##
file_path = "master.pptx"
pres= Presentation(file_path)
pres = Presentation(file_path)
MONGODB = os.getenv("MONGODB")
client = MongoClient(MONGODB)
db = client["taskdb"]["ta"]

tags_metadata= [
tags_metadata = [
{
"name": "powerpoint",
"description": "handling powerpoint"
"name": "powerpoint",
"description": "handling powerpoint"
},
{
"name": "job management",
"description": "managing celery tasks"
"name": "job management",
"description": "managing celery tasks"
},

]

app = FastAPI(
title= "SurfBoard",
description= "API Hub for the LeanIX Onboarding Deck",
version= "1.0.0",
title="Knowlix",
description="API Hub for the LeanIX Onboarding Deck",
version="1.0.0",
openapi_tags=tags_metadata)



app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
Expand All @@ -43,42 +50,133 @@
expose_headers=[]
)


class PPTX(BaseModel):
sections: List[str]


@app.get("/v1/sections", tags = ["powerpoint"])


class Download(BaseModel):
taskID: str


## API ENDPOINTS ##
@app.get("/v1/sections", tags=["powerpoint"])
async def provide_sections():

sections = get_sections(pres)

if (not sections) or (len(sections)==0):
raise HTTPException(status_code=404, detail="No Sections in Master pptx")
if (not sections) or (len(sections) == 0):
raise HTTPException(
status_code=404, detail="No Sections in Master pptx")

return JSONResponse(sections,status_code=200)
return JSONResponse(sections, status_code=200)


@app.post("/v1/pptxjob", tags = ["job management"])
async def deliver_pptx(pptx: PPTX):
@app.post("/v1/pptxjob", tags=["job management"])
async def trigger_pptx_task(pptx: PPTX):
task_name = "pptx"
sections = pptx.sections
kwargs ={
'sections':sections,
'downloadStatus': 'ready'
}

no_sections = len(sections)
sections_available = True
exists_already = False
status = None
custom_id = str(uuid.uuid4().hex)
timestamp = datetime.now().isoformat()

kwargs = {
'sections': sections,
'customID': custom_id,
'downloaded': False,
'date_started': timestamp

}

if no_sections != 0:
exists_already = check_existence(sections, db)
else:
sections_available = False

if not exists_already and sections_available:
task = celery.send_task(task_name, kwargs=kwargs, serializer='json')

if sections_available and not exists_already:
status = "success"

elif not sections_available:
status = "no_sections"

task = celery.send_task(task_name, kwargs = kwargs, serializer='json')
elif exists_already:
status = "pptx_exists"

package = {
'taskID': task.id,
'sections': sections
'taskID': custom_id,
'sections': sections,
'status': status
}


return JSONResponse(package)






@app.post("/v1/download", tags=["powerpoint"])
async def download_pptx(download: Download):

task_id = download.taskID

result = db.find_one({"kwargs.customID": task_id}, {'result': 1, '_id': 0})
unpack = result["result"]
unpack = json.loads(unpack)
file_path = unpack["filePath"]

# return file_path
return FileResponse(file_path)


@app.post("/v1/registerDownload", tags=["powerpoint"], status_code=201)
async def register_download(task_id: Download):
task_id = task_id.taskID

res = db.update_one({"kwargs.customID": task_id},
{"$set": {"kwargs.downloaded": True}
})

changed_docs = res.modified_count

return {'changedDocuments': changed_docs}


@app.get("/v1/getDownloads", tags=["powerpoint"])
async def getDownloads():
res = db.find({}).sort(
[("kwargs.date_started", pymongo.DESCENDING)]).limit(10)
results = list()

for item in res:
taskID = item["kwargs"]["customID"]
date_started = item["kwargs"]["date_started"]
status = item["status"]
sections = item["kwargs"]["sections"]

package = {
'taskID': taskID,
'date_started': date_started,
'status': status,
'sections': sections
}
results.append(package)

return JSONResponse(results, status_code=200)


### UTILS ###

def check_existence(sections, db):
exists_already = False
no_sections = len(sections)
query = {"kwargs.sections": {"$size": no_sections, "$all": sections}}

hits = db.count_documents(query)

if hits > 0:
exists_already = True

return exists_already
7 changes: 0 additions & 7 deletions backend/clean_output.py

This file was deleted.

95 changes: 48 additions & 47 deletions backend/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,87 +2,88 @@
import uuid
import lxml.etree as etree


def extract_slide_mapping(slidelist):
"""this method will get the mapping between a slide_id and rID"""

slide_mapping=dict()
slide_mapping = dict()

for slide in slidelist:
rid=slide.attrib['{http://schemas.openxmlformats.org/officeDocument/2006/relationships}id']
slide_id=slide.attrib['id']
slide_mapping[slide_id]=rid
rid = slide.attrib['{http://schemas.openxmlformats.org/officeDocument/2006/relationships}id']
slide_id = slide.attrib['id']
slide_mapping[slide_id] = rid

return slide_mapping

def prepare_sections(keys, presentation, mapping,all_sections=False):

def prepare_sections(keys, presentation, mapping, all_sections=False):
"""this method will prepare a xml tree based on the passed section names the user wants to have in
the pptx"""

nmap=presentation.slides._sldIdLst.nsmap

all_sections=compile_sections(presentation,mapping)
root=etree.Element('{http://schemas.openxmlformats.org/presentationml/2006/main}sldIdLst', nsmap=nmap)

#TODO: create toggle for
if (all_sections) and (len(keys)!=0):
nmap = presentation.slides._sldIdLst.nsmap

all_sections = compile_sections(presentation, mapping)
root = etree.Element(
'{http://schemas.openxmlformats.org/presentationml/2006/main}sldIdLst', nsmap=nmap)

# TODO: create toggle for
if (all_sections) and (len(keys) != 0):
for key in keys:
section=all_sections[key]
section = all_sections[key]

for slide in section:
etree.SubElement(root, '{http://schemas.openxmlformats.org/presentationml/2006/main}sldId',attrib=slide,nsmap=nmap)

etree.SubElement(
root, '{http://schemas.openxmlformats.org/presentationml/2006/main}sldId', attrib=slide, nsmap=nmap)

return root


def compile_sections(presentation, mapping):
"""this method will get all the sections that are in the pptx"""
ns='{http://schemas.openxmlformats.org/officeDocument/2006/relationships}id'
xml=etree.fromstring(presentation.part.blob)
nsmap = {'p14':'http://schemas.microsoft.com/office/powerpoint/2010/main'}

ns = '{http://schemas.openxmlformats.org/officeDocument/2006/relationships}id'
xml = etree.fromstring(presentation.part.blob)
nsmap = {'p14': 'http://schemas.microsoft.com/office/powerpoint/2010/main'}
sections = xml.xpath('.//p14:sectionLst', namespaces=nsmap)[0]

collector=dict()
pairs_col=list()


collector = dict()
pairs_col = list()

for section in sections:
key=section.attrib['name']
key = section.attrib['name']

for slidelist in section:
for slide in slidelist:
pairs=dict()
slide_id=slide.attrib['id']
pairs = dict()
slide_id = slide.attrib['id']

# lookup in slide mapping to get rID
rID = mapping[slide_id]

#lookup in slide mapping to get rID
rID=mapping[slide_id]

pairs['id']=slide_id
pairs[ns]=rID
pairs['id'] = slide_id
pairs[ns] = rID

pairs_col.append(pairs)

collector[key] = pairs_col
pairs_col = list()

collector[key]=pairs_col
pairs_col=list()

return collector

def replace_slides(new_xml,presentation,folder, save=False):


def replace_slides(new_xml, presentation, folder, save=False):
"""This method will take a xml tree and create the final pptx out of it"""
uid=str(uuid.uuid4().hex)[:10]
file_path= f"{folder}/{uid}.pptx"
slidelist=presentation.slides._sldIdLst
uid = str(uuid.uuid4().hex)[:10]
file_path = f"{folder}/{uid}.pptx"
slidelist = presentation.slides._sldIdLst

slidelist.getparent().replace(slidelist, new_xml)


slidelist.getparent().replace(slidelist,new_xml)

if save:
presentation.save(file_path)

return file_path


def print_xml(xml):
print(etree.tostring(xml, pretty_print=True, encoding="unicode"))
print(etree.tostring(xml, pretty_print=True, encoding="unicode"))
Loading