Skip to content

Commit

Permalink
OOPreview service
Browse files Browse the repository at this point in the history
  • Loading branch information
kam193 committed Oct 10, 2024
1 parent c04976e commit 6f7d06b
Show file tree
Hide file tree
Showing 11 changed files with 230 additions and 0 deletions.
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@ Supported heuristics:

- newly created domains (based on WHOIS data).

# OOPreview

Simple service that uses [OnlyOffice Document Builder](https://api.onlyoffice.com/docbuilder/basic)
to generate documents previews, with the high compatibility with Microsoft Office formats. Supported
generating the preview for the first or all pages.

### PCAP Extractor

This service list TCP flows from a pcap file using Tshark. If supported by Tshark, it can also extract files.
Expand Down
4 changes: 4 additions & 0 deletions oo-preview/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.env
.randomnotes/
.git/
*.pyc
42 changes: 42 additions & 0 deletions oo-preview/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
ARG REGISTRY=
ARG MANIFEST_REGISTRY=ghcr.io/
ARG BASE_IMAGE=cccs/assemblyline-v4-service-base:stable
FROM ${BASE_IMAGE}

ENV SERVICE_PATH service.al_run.AssemblylineService

USER root
RUN --mount=type=secret,id=apt,target=/etc/apt/apt.conf.d/00cfg \
echo "deb http://deb.debian.org/debian bookworm contrib non-free" > /etc/apt/sources.list.d/contrib.list \
&& apt update && apt upgrade -y \
&& apt install -y libstdc++6 \
libcurl4-gnutls-dev \
libc6 \
libxml2 \
libcurl4 \
fonts-dejavu \
fonts-opensymbol \
fonts-liberation \
ttf-mscorefonts-installer \
fonts-crosextra-carlito

ARG OO_VERSION=v8.1.0
RUN wget https://github.com/ONLYOFFICE/DocumentBuilder/releases/download/${OO_VERSION}/onlyoffice-documentbuilder_amd64.deb \
&& apt-get install -y ./onlyoffice-documentbuilder_amd64.deb

USER assemblyline
COPY requirements.txt requirements.txt

RUN --mount=type=secret,id=pypi,target=/etc/pip.conf,mode=0444 \
pip install --no-cache-dir --user --requirement requirements.txt && rm -rf ~/.cache/pip

WORKDIR /opt/al_service
COPY . .

USER root
ARG BASE_TAG=4.5.0.stable
RUN sed -i "s|\(image: \${REGISTRY}\).*\(kam193/.*\)|\1$MANIFEST_REGISTRY\2|g" service_manifest.yml && \
sed -i "s/\$SERVICE_TAG/$BASE_TAG$(cat VERSION)/g" service_manifest.yml && \
python /opt/al_service/service/finish_installation.py

USER assemblyline
4 changes: 4 additions & 0 deletions oo-preview/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
include ../common.mk

AL_SERVICE_NAME=OOPreview
# SERVICE_NAME=assemblyline-service-template
9 changes: 9 additions & 0 deletions oo-preview/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# OOPreview

Simple service that uses [OnlyOffice Document Builder](https://api.onlyoffice.com/docbuilder/basic)
to generate documents previews, with the high compatibility with Microsoft Office formats. Supported
generating the preview for the first or all pages.

Theoretically supported formats: https://api.onlyoffice.com/editors/conversionapi#text-matrix (not all recognized by AL)

Built service includes the OnlyOffice binaries licensed as AGPL, see [OnlyOffice license](https://github.com/ONLYOFFICE/DocumentBuilder/blob/master/LICENSE.txt), and non-free Microsoft fonts installed by [ttf-mscorefonts-installer](https://packages.debian.org/bookworm/ttf-mscorefonts-installer)
1 change: 1 addition & 0 deletions oo-preview/VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1
1 change: 1 addition & 0 deletions oo-preview/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
assemblyline-v4-service
Empty file added oo-preview/service/__init__.py
Empty file.
73 changes: 73 additions & 0 deletions oo-preview/service/al_run.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
import os
import sys
import zipfile

from assemblyline.common.identify_defaults import type_to_extension
from assemblyline_v4_service.common.base import ServiceBase
from assemblyline_v4_service.common.request import ServiceRequest
from assemblyline_v4_service.common.result import Result, ResultImageSection

sys.path.append("/opt/onlyoffice/documentbuilder/")

import docbuilder # noqa: I001


class AssemblylineService(ServiceBase):
def __init__(self, config=None):
super().__init__(config)

def start(self):
self.log.info(f"start() from {self.service_attributes.name} service called")

self.log.info(f"{self.service_attributes.name} service started")

def execute(self, request: ServiceRequest) -> None:
result = Result()
request.result = result

try:
ext = type_to_extension[request.file_type]
except KeyError:
ext = "." + request.file_type.split("/")[-1]

input_path = os.path.join(self.working_directory, f"input{ext}")
input_path = f"/tmp/input{ext}"
os.symlink(request.file_path, input_path)

output_file = "image.png"
only_first = "true"

if request.get_param("preview_all_pages"):
output_file = "images.zip"
only_first = "false"

output_path = os.path.join(self.working_directory, output_file)

builder = docbuilder.CDocBuilder()
builder.OpenFile(input_path, "")
builder.SaveFile(
"image",
output_path,
f"<m_oThumbnail><format>4</format><aspect>1</aspect><first>{only_first}</first><width>1500</width><height>1500</height></m_oThumbnail>",
)
builder.CloseFile()

request.set_service_context(f"OnlyOffice {builder.GetVersion().decode()}")
section = ResultImageSection(request, "Document preview")
if not request.get_param("preview_all_pages"):
section.add_image(output_path, "First page", "Preview of the first page")
else:
page = 1
with zipfile.ZipFile(output_path, "r") as zip_ref:
total_pages = len(zip_ref.filelist)
for file in zip_ref.filelist:
zip_ref.extract(file, self.working_directory)
section.add_image(
os.path.join(self.working_directory, file.filename),
file.filename,
f"Preview of page {page} of {total_pages}",
)
page += 1

section.promote_as_screenshot()
result.add_section(section)
19 changes: 19 additions & 0 deletions oo-preview/service/finish_installation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# For some reason, the DocumentBuilder is not ready to use
# by the assemblyline user until the first usage by root.

import os
import sys

sys.path.append("/opt/onlyoffice/documentbuilder/")

import docbuilder

builder = docbuilder.CDocBuilder()
builder.OpenFile("/opt/onlyoffice/documentbuilder/empty/new.docx", "")
builder.SaveFile(
"image",
"/tmp/thumbnail.png",
"<m_oThumbnail><format>4</format><aspect>1</aspect><first>true</first><width>1024</width><height>1024</height></m_oThumbnail>",
)
builder.CloseFile()
os.remove("/tmp/thumbnail.png")
71 changes: 71 additions & 0 deletions oo-preview/service_manifest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
name: OOPreview
version: $SERVICE_TAG
description: |
Generate documents previews using OnlyOffice Document Builder, keeping high compatibility
with Microsoft Office formats. Supported generating the preview for the first or all pages.
OnlyOffice Document Builder license: https://github.com/ONLYOFFICE/DocumentBuilder/blob/master/LICENSE.txt
enabled: true

accepts: document/(pdf$|office/.*|mobi|epub)|code/html
rejects: empty
stage: CORE
category: Static Analysis
uses_tags: false
file_required: true
timeout: 90
is_external: false

# config:
# KEY: "VALUE"

submission_params:
- default: false
name: preview_all_pages
type: bool
value: false

# -1000: safe
# 0 - 299: informative
# 300 - 699: suspicious
# 700 - 999: highly suspicious
# >= 1000: malicious

# heuristics:
# - description: Some score
# filetype: "*"
# heur_id: 1
# name: Score
# score: 0

docker_config:
image: ${REGISTRY}ghcr.io/kam193/assemblyline-service-oopreview:$SERVICE_TAG
cpu_cores: 0.5
ram_mb: 768
ram_mb_min: 256
allow_internet_access: true

# update_config:
# update_interval_seconds: 7200 # 2 hours
# generates_signatures: false
# wait_for_update: true
# sources:

# dependencies:
# updates:
# container:
# ram_mb: 3072
# ram_mb_min: 256
# allow_internet_access: true
# command: ["python", "-m", "service.updater"]
# image: ${REGISTRY}ghcr.io/kam193/assemblyline-service-oopreview:$SERVICE_TAG
# ports: ["5003"]
# environment:
# - name: UPDATER_DIR
# value: /opt/clamav_db/
# volumes:
# updates:
# mount_path: /opt/clamav_db/
# capacity: 2147483648 # 2 GB
# storage_class: default
# run_as_core: True

0 comments on commit 6f7d06b

Please sign in to comment.