From 78877c8b355f41de2b1bb031a7534ba0c2333b51 Mon Sep 17 00:00:00 2001 From: Marcel Klehr Date: Fri, 27 Oct 2023 14:14:52 +0200 Subject: [PATCH 1/3] add(Text2Image): Add dev docs for TextToImage OCP and OCS API as well as admin docs for TextToImage Signed-off-by: Marcel Klehr --- admin_manual/ai/index.rst | 13 +- developer_manual/client_apis/OCS/index.rst | 1 + .../client_apis/OCS/ocs-text2image-api.rst | 176 +++++++++++++++++ developer_manual/digging_deeper/index.rst | 1 + .../digging_deeper/text2image.rst | 187 ++++++++++++++++++ 5 files changed, 377 insertions(+), 1 deletion(-) create mode 100644 developer_manual/client_apis/OCS/ocs-text2image-api.rst create mode 100644 developer_manual/digging_deeper/text2image.rst diff --git a/admin_manual/ai/index.rst b/admin_manual/ai/index.rst index 3608550157b..0e1bd3a0896 100644 --- a/admin_manual/ai/index.rst +++ b/admin_manual/ai/index.rst @@ -27,7 +27,8 @@ Overview of AI features "Speech-To-Text","`Whisper Speech-To-Text `_","Yellow","Yes","Yes - Whisper models by OpenAI","No","Yes" "","`OpenAI and LocalAI integration `_","Yellow","Yes","Yes - Whisper models by OpenAI","No","No" "","`Replicate integration `_","Yellow","Yes","Yes - Whisper models by OpenAI","No","No" - "Image generation","`OpenAI and LocalAI integration (via OpenAI API) `_","Red","No","No","No","No" + "Image generation","`Local Stable Diffusion `_","Yellow","Yes","Yes - StableDiffusion XL model by StabilityAI","No","Yes" + "","`OpenAI and LocalAI integration (via OpenAI API) `_","Red","No","No","No","No" "","`OpenAI and LocalAI integration (via LocalAI) `_","Yellow","Yes","Yes - StableDiffusion models by StabilityAI","No","Yes" "","`Replicate integration `_","Yellow","Yes","Yes - StableDiffusion models by StabilityAI","No","No" "Text generation","`Local large language model (via GPT4all Falcon) `_","Green","Yes","Yes","Yes","Yes" @@ -91,3 +92,13 @@ Implementing apps * `Assistant `_ for various tasks * `Mail `_ for summarizing mail threads (see :ref:`the Nextcloud Mail docs` for how to enable this) + + +Image generation +^^^^^^^^^^^^^^^^ +As you can see in the table above we have multiple apps offering Image generation capabilities. In downstream apps like the Text-to-Image helper app, users can use the image generation functionality regardless of which app implements it behind the scenes. + +Implementing apps +~~~~~~~~~~~~~~~~~ + +* `Text-to-Image Helper `_ for providing a Text-to-Image smart picker diff --git a/developer_manual/client_apis/OCS/index.rst b/developer_manual/client_apis/OCS/index.rst index 2cc3bede65d..af76db8d8c9 100644 --- a/developer_manual/client_apis/OCS/index.rst +++ b/developer_manual/client_apis/OCS/index.rst @@ -19,3 +19,4 @@ The old documentation is still kept as it provides some additional documentation ocs-user-preferences-api ocs-translation-api ocs-textprocessing-api + ocs-text2image-api diff --git a/developer_manual/client_apis/OCS/ocs-text2image-api.rst b/developer_manual/client_apis/OCS/ocs-text2image-api.rst new file mode 100644 index 00000000000..cdd01e9b31b --- /dev/null +++ b/developer_manual/client_apis/OCS/ocs-text2image-api.rst @@ -0,0 +1,176 @@ +.. _ocs-text2image-api: + +====================== +OCS Text-To-Image API +====================== + +.. versionadded:: 28 + +The OCS Text-To-Image API allows you to run image generation tasks implemented by apps using :ref:`the backend Text-To-Image API`. + +The base URL for all calls to this API is: */ocs/v2.php/text2image/* + +All calls to OCS endpoints require the ``OCS-APIRequest`` header to be set to ``true``. + + +Check availability +------------------ + +.. versionadded:: 28 + +* Method: ``GET`` +* Endpoint: ``/is_available`` +* Response: + - Status code: + + ``200 OK`` + - Data: + ++----------------------+--------+---------------------------------------------------------------------------------------------------------------+ +| field | type | Description | ++----------------------+--------+---------------------------------------------------------------------------------------------------------------+ +|``isAvailable`` | bool | Boolean indicating whether any Text-To-Image providers are installed | ++----------------------+--------+---------------------------------------------------------------------------------------------------------------+ + +Schedule a task +--------------- + +.. versionadded:: 28 + +.. note:: The endpoint is rate limited as it can be quite resource intensive. Users can make 20 requests in 2 minutes, guests only 5 + +* Method: ``POST`` +* Endpoint: ``/schedule`` +* Data: + ++-------------------+-------------+--------------------------------------------------------------------------------+ +| field | type | Description | ++-------------------+-------------+--------------------------------------------------------------------------------+ +|``input`` | string | The input text for the task | ++-------------------+-------------+--------------------------------------------------------------------------------+ +|``numberOfImages`` | int | The number of images to generate (optional; default: 8) | ++-------------------+-------------+--------------------------------------------------------------------------------+ +|``appId`` | string | The id of the requesting app | ++-------------------+-------------+--------------------------------------------------------------------------------+ +|``identifier`` | string | An app-defined identifier for the task (optional) | ++-------------------+-------------+--------------------------------------------------------------------------------+ + +If possible the task will be executed while the request is processed on the server, otherwise it is scheduled as a background job. + +* Response: + - Status code: + + ``200 OK`` + + ``412 Precondition Failed`` - When the task type is not available currently + + ``429 Too Many Requests`` - When the rate limiting was exceeded + + - Data: + + ``id`` - Only provided in case of ``200 OK``, the assigned task id, int + + ``input`` - Only provided in case of ``200 OK``, the task input, string + + ``status`` - Only provided in case of ``200 OK``, the current task status, int, see backend API + + ``userId`` - Only provided in case of ``200 OK``, the originating userId of the task, string + + ``appId`` - Only provided in case of ``200 OK``, the originating appId of the task, string + + ``identifier`` - Only provided in case of ``200 OK``, the originating appId of the task, string + + ``numberOfImages`` - Only provided in case of ``200 OK``, the number of generated images, int + + ``completionExpectedAt`` - Only provided in case of ``200 OK``, the date and time when the result is expected to be completed as a UNIX timestamp, int + + ``message`` - Only provided when not ``200 OK``, an error message in the user's language, ready to be displayed + +Fetch a task by ID +------------------ + +.. versionadded:: 28 + +.. note:: The endpoint is rate limited as it can be quite resource intensive. Users can make 20 requests in 2 minutes, guests only 5 + +* Method: ``POST`` +* Endpoint: ``/task/{id}`` + +* Response: + - Status code: + + ``200 OK`` + + ``404 Not Found`` - When the task could not be found + + - Data: + + ``id`` - Only provided in case of ``200 OK``, the assigned task id, int + + ``input`` - Only provided in case of ``200 OK``, the task input, string + + ``status`` - Only provided in case of ``200 OK``, the current task status, int, see backend API + + ``userId`` - Only provided in case of ``200 OK``, the originating userId of the task, string + + ``appId`` - Only provided in case of ``200 OK``, the originating appId of the task, string + + ``identifier`` - Only provided in case of ``200 OK``, the originating appId of the task, string + + ``numberOfImages`` - Only provided in case of ``200 OK``, the number of generated images, int + + ``completionExpectedAt`` - Only provided in case of ``200 OK``, the date and time when the result is expected to be completed as a UNIX timestamp, int + + ``message`` - Only provided when not ``200 OK``, an error message in the user's language, ready to be displayed + +Fetch a result image +-------------------- + +.. versionadded:: 28 + +* Method: ``POST`` +* Endpoint: ``/task/{id}/image/{index}`` + * ``index``: The index of the image, starting at 0 + +* Response: + - Status code: + + ``200 OK`` + + ``404 Not Found`` - When the task could not be found, isn't successful, isn't completed yet, or the index is out of bounds + + - Data: The raw image data + +Delete a task +------------- + +.. versionadded:: 28 + +* Method: ``DELETE`` +* Endpoint: ``/task/{id}`` + +* Response: + - Status code: + + ``200 OK`` + + ``404 Not Found`` - When the task could not be found + + - Data: + + ``id`` - Only provided in case of ``200 OK``, the assigned task id, int + + ``input`` - Only provided in case of ``200 OK``, the task input, string + + ``status`` - Only provided in case of ``200 OK``, the current task status, int, see backend API + + ``userId`` - Only provided in case of ``200 OK``, the originating userId of the task, string + + ``appId`` - Only provided in case of ``200 OK``, the originating appId of the task, string + + ``identifier`` - Only provided in case of ``200 OK``, the originating appId of the task, string + + ``numberOfImages`` - Only provided in case of ``200 OK``, the number of generated images, int + + ``completionExpectedAt`` - Only provided in case of ``200 OK``, the date and time when the result is expected to be completed as a UNIX timestamp, int + + ``message`` - Only provided when not ``200 OK``, an error message in the user's language, ready to be displayed + +List tasks by App +------------------ + +.. versionadded:: 28 + +.. note:: The endpoint is rate limited as it can be quite resource intensive. Guests can only do 5 requests within 2 minutes + +* Method: ``DELETE`` +* Endpoint: ``/tasks/app/{appId}`` +* Data: + ++-------------------+-------------+--------------------------------------------------------------------------------+ +| field | type | Description | ++-------------------+-------------+--------------------------------------------------------------------------------+ +|``appId`` | string | The id of the requesting app | ++-------------------+-------------+--------------------------------------------------------------------------------+ +|``identifier`` | string | An app-defined identifier for the task (optional) | ++-------------------+-------------+--------------------------------------------------------------------------------+ + +* Response: + - Status code: + + ``200 OK`` + + ``404 Not Found`` - When the task could not be found + + - Data: + + Only provided in case of ``200 OK``, an array of objects: + + ``id`` - the assigned task id, int + + ``input`` - the task input, string + + ``status`` - the current task status, int, see backend API + + ``userId`` - the originating userId of the task, string + + ``appId`` - the originating appId of the task, string + + ``identifier`` - the originating appId of the task, string + + ``numberOfImages`` - the number of generated images, int + + ``completionExpectedAt`` - the date and time when the result is expected to be completed as a UNIX timestamp, int + + ``message`` - Only provided when not ``200 OK``, an error message in the user's language, ready to be displayed diff --git a/developer_manual/digging_deeper/index.rst b/developer_manual/digging_deeper/index.rst index 60caccb4a87..c610dbad2e8 100644 --- a/developer_manual/digging_deeper/index.rst +++ b/developer_manual/digging_deeper/index.rst @@ -27,6 +27,7 @@ Digging deeper talk translation text_processing + text2image two-factor-provider users dashboard diff --git a/developer_manual/digging_deeper/text2image.rst b/developer_manual/digging_deeper/text2image.rst new file mode 100644 index 00000000000..ce2dce494a4 --- /dev/null +++ b/developer_manual/digging_deeper/text2image.rst @@ -0,0 +1,187 @@ +.. _text2image: + +============= +Text-To-Image +============= + +.. versionadded:: 28 + +Nextcloud offers a **Text-To-Image** API. The overall idea is that there is a central OCP API that apps can use to prompt tasks to latent diffusion AI models and similar image generation tools. To be technology agnostic any app can provide this functionality by registering a Text-To-Image provider. + +Consuming the Text-To-Image API +------------------------------- + +To consume the Text-To-Image API, you will need to :ref:`inject` ``\OCP\TextToImage\IManager``. This manager offers the following methods: + + * ``hasProviders()`` This method returns a boolean which indicates if any providers have been registered. If this is false you cannot use the image generation feature. + * ``runTask(Task $task)`` This method provides the actual functionality. The task is defined using the Task class. This method runs the task synchronously, so depending on the implementation it is uncertain how long it will take (between 3s and several hours). + * ``scheduleTask(Task $task)`` This method also runs a task, but asynchronously in a background job. The task is defined using the Task class. + * ``runOrScheduleTask(Task $task)`` This method also runs a task, but fist checks the expected runtime of the provider to be used. If the runtime fits inside the available processing time for the current request the task is run synchronously, otherwise it is scheduled as a background job. The task is defined using the Task class. + * ``getTask(int $id)`` This method fetches a task specified by its id. + * ``getUserTask(int $id, ?string $userId)`` This method fetches a task specified by its id and the user that is associated with it. + * ``getUserTasksByApp(?string $userId, string $appId, ?string $identifier = null)`` This method fetches tasks by a user created by a specific app (optionally, you can also specify the task identifier as an additional filter) + +If you would like to use the image generation functionality in a client, there are also OCS endpoints available for this: :ref:`OCS Text-To-Image API` + +Tasks +^^^^^ +To create a task we use the ``\OCP\TextToImage\Task`` class. Its constructor takes the following arguments: ``new \OCP\TextToImage\Task(string $input, string $appId, int $numberOfImages, ?string $userId, string $identifier = '')``. For example: + +.. code-block:: php + + $text2imageTask = new Task($documentTitle, "my_app", 8, $userId, (string) $documentId); + $text2imageManager->scheduleTask($text2imageTask); + +The task class objects have the following methods available: + + * ``getStatus()`` This method returns one of the below statuses. + * ``getId()`` This method will return ``null`` before the task has been passed to ``runTask`` or ``scheduleTask``, otherwise it will return an integer + * ``getInput()`` This returns the input string. + * ``getAppId()`` This returns the originating application ID of the task. + * ``getNumberOfImages()`` This returns the number of generated images for the task. + * ``getIdentifier()`` This returns the original scheduler-defined identifier for the task + * ``getUserId()`` This returns the originating user ID of the task. + * ``getOutputImages()`` This method will return ``null`` unless the task is successful, if its, it will return a list of ``IImage`` objects + +Task statuses +^^^^^^^^^^^^^ + +All tasks always have one of the below statuses: + +.. code-block:: php + + Task::STATUS_FAILED = 4; + Task::STATUS_SUCCESSFUL = 3; + Task::STATUS_RUNNING = 2; + Task::STATUS_SCHEDULED = 1; + Task::STATUS_UNKNOWN = 0; + + +Listening to the image generation events +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Since ``scheduleTask`` does not block, you will need to listen to the following events in your app to obtain the resulting images or be notified of any failure. + + * ``OCP\TextToImage\Events\TaskSuccessfulEvent`` This event class offers the ``getTask()`` method which returns the up-to-date task object, with the output from the model. + * ``OCP\TextToImage\Events\TaskFailedEvent`` In addition to the ``getTask()`` method, this event class provides the ``getErrorMessage()`` method which returns the error message as a string (only in English and for debugging purposes, so don't show this to the user) + + +For example, in your ``lib/AppInfo/Application.php`` file: + +.. code-block:: php + + $context->registerEventListener(OCP\TextToImage\Events\TaskSuccessfulEvent::class, ImageGenerationResultListener::class); + $context->registerEventListener(OCP\TextToImage\Events\TaskFailedEvent::class, ImageGenerationResultListener::class); + +The corresponding ``ImageGenerationResultListener`` class could look like the following: + +.. code-block:: php + + getTask()->getAppId() !== Application::APP_ID) { + return; + } + + if ($event instanceof TaskSuccessfulEvent) { + $images = $event->getTask()->getOutputImages() + // store $images somewhere + } + + if ($event instanceof TaskFailedEvent) { + $error = $event->getErrorMessage() + $userId = $event->getTask()->getUserId() + // Notify relevant user about failure + } + } + } + + +Implementing a Text-To-Image provider +-------------------------------------- + +A **Text-To-Image provider** is a class that implements the interface ``OCP\TextToImage\IProvider``. + +.. code-block:: php + + l->t('My awesome text to image provider'); + } + + public function generate(string $input, array $resources): void { + // write the resulting images to the file resources in $resources + } + } + +The method ``getId`` returns a string to uniquely identify the registered provider. You can use the class name for this for example. + +The method ``getName`` returns a string to identify the registered provider in the user interface and should be localized. + +The method ``generate`` implements the image generation step. It gets passed an array of ``resource`` values. The length of the array indicates how many images should be generated. Each image should be written to one of the resources, e.g. using ``fwrite()``. In case execution fails for some reason, you should throw a ``RuntimeException`` with an explanatory error message. + +The class would typically be saved into a file in ``lib/TextToImage`` of your app but you are free to put it elsewhere as long as it's loadable by Nextcloud's :ref:`dependency injection container`. + + +Provider registration +--------------------- + +The provider class is registered via the :ref:`bootstrap mechanism` of the ``Application`` class. + +.. code-block:: php + :emphasize-lines: 16 + + registerTextToImageProvider(ImageGenerationProvider::class); + } + + public function boot(IBootContext $context): void {} + + } From 8725d5185ae4645abd81d4cabd2711af10189dc4 Mon Sep 17 00:00:00 2001 From: Marcel Klehr Date: Mon, 6 Nov 2023 19:18:50 +0100 Subject: [PATCH 2/3] enh(Text2Image): Explain in more detail the outcomes of schedule/run/runOrSchedule with code Signed-off-by: Marcel Klehr --- .../digging_deeper/text2image.rst | 49 ++++++++++++++++++- 1 file changed, 48 insertions(+), 1 deletion(-) diff --git a/developer_manual/digging_deeper/text2image.rst b/developer_manual/digging_deeper/text2image.rst index ce2dce494a4..13f11df6c6b 100644 --- a/developer_manual/digging_deeper/text2image.rst +++ b/developer_manual/digging_deeper/text2image.rst @@ -30,7 +30,6 @@ To create a task we use the ``\OCP\TextToImage\Task`` class. Its constructor tak .. code-block:: php $text2imageTask = new Task($documentTitle, "my_app", 8, $userId, (string) $documentId); - $text2imageManager->scheduleTask($text2imageTask); The task class objects have the following methods available: @@ -43,6 +42,54 @@ The task class objects have the following methods available: * ``getUserId()`` This returns the originating user ID of the task. * ``getOutputImages()`` This method will return ``null`` unless the task is successful, if its, it will return a list of ``IImage`` objects +You could run the task directly as follows. However, this will block the current PHP process until the task is done, which can sometimes take dozens of minutes, depending on which provider is used. + +.. code-block:: php + + try { + $text2imageManager->runTask($text2imageTask); + } catch (\OCP\PreConditionNotMetException|\OCP\TextToImage\Exception\TaskFailureException $e) { + // task failed + // return error + } + // task was successful + +The wiser choice, when you are in the context of a HTTP controller, is to schedule the task for execution in a background job, as follows: + +.. code-block:: php + + try { + $text2imageManager->scheduleTask($text2imageTask); + } catch (\OCP\PreConditionNotMetException|\OCP\DB\Exception $e) { + // scheduling task failed + } + // task was scheduled successfully + +Of course, you might want to schedule the task in a background job **only** if it takes longer than the request timeout. This is what runOrScheduleTask does. + +.. code-block:: php + + try { + $text2imageManager->runOrScheduleTask($text2imageTask); + } catch (\OCP\PreConditionNotMetException|\OCP\DB\Exception $e) { + // scheduling task failed + // return error + } catch (\OCP\TextToImage\Exception\TaskFailureException $e) { + // task was run but failed + // status will be STATUS_FAILED + // return error + } + + switch ($text2imageTask->getStatus()) { + case \OCP\TextToImage\Task::STATUS_SUCCESSFUL: + // task was run directly and was successful + case \OCP\TextToImage\Task::STATUS_RUNNING: + case \OCP\TextToImage\Task::STATUS_SCHEDULED: + // task was deferred to background job + default: + // something went wrong + } + Task statuses ^^^^^^^^^^^^^ From e5a29f54bd648055cea3c71f91029a01213801df Mon Sep 17 00:00:00 2001 From: Marcel Klehr Date: Mon, 6 Nov 2023 19:19:26 +0100 Subject: [PATCH 3/3] enh(Text2Image): Add cross-references for task status values to OCS API docs Signed-off-by: Marcel Klehr --- developer_manual/client_apis/OCS/ocs-text2image-api.rst | 8 ++++---- developer_manual/digging_deeper/text2image.rst | 2 ++ 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/developer_manual/client_apis/OCS/ocs-text2image-api.rst b/developer_manual/client_apis/OCS/ocs-text2image-api.rst index cdd01e9b31b..908772e57fa 100644 --- a/developer_manual/client_apis/OCS/ocs-text2image-api.rst +++ b/developer_manual/client_apis/OCS/ocs-text2image-api.rst @@ -65,7 +65,7 @@ If possible the task will be executed while the request is processed on the serv - Data: + ``id`` - Only provided in case of ``200 OK``, the assigned task id, int + ``input`` - Only provided in case of ``200 OK``, the task input, string - + ``status`` - Only provided in case of ``200 OK``, the current task status, int, see backend API + + ``status`` - Only provided in case of ``200 OK``, the current task status, int, see :ref:`the backend Text-To-Image API` + ``userId`` - Only provided in case of ``200 OK``, the originating userId of the task, string + ``appId`` - Only provided in case of ``200 OK``, the originating appId of the task, string + ``identifier`` - Only provided in case of ``200 OK``, the originating appId of the task, string @@ -91,7 +91,7 @@ Fetch a task by ID - Data: + ``id`` - Only provided in case of ``200 OK``, the assigned task id, int + ``input`` - Only provided in case of ``200 OK``, the task input, string - + ``status`` - Only provided in case of ``200 OK``, the current task status, int, see backend API + + ``status`` - Only provided in case of ``200 OK``, the current task status, int, see :ref:`the backend Text-To-Image API` + ``userId`` - Only provided in case of ``200 OK``, the originating userId of the task, string + ``appId`` - Only provided in case of ``200 OK``, the originating appId of the task, string + ``identifier`` - Only provided in case of ``200 OK``, the originating appId of the task, string @@ -131,7 +131,7 @@ Delete a task - Data: + ``id`` - Only provided in case of ``200 OK``, the assigned task id, int + ``input`` - Only provided in case of ``200 OK``, the task input, string - + ``status`` - Only provided in case of ``200 OK``, the current task status, int, see backend API + + ``status`` - Only provided in case of ``200 OK``, the current task status, int, see :ref:`the backend Text-To-Image API` + ``userId`` - Only provided in case of ``200 OK``, the originating userId of the task, string + ``appId`` - Only provided in case of ``200 OK``, the originating appId of the task, string + ``identifier`` - Only provided in case of ``200 OK``, the originating appId of the task, string @@ -167,7 +167,7 @@ List tasks by App + Only provided in case of ``200 OK``, an array of objects: + ``id`` - the assigned task id, int + ``input`` - the task input, string - + ``status`` - the current task status, int, see backend API + + ``status`` - the current task status, int, see :ref:`the backend Text-To-Image API` + ``userId`` - the originating userId of the task, string + ``appId`` - the originating appId of the task, string + ``identifier`` - the originating appId of the task, string diff --git a/developer_manual/digging_deeper/text2image.rst b/developer_manual/digging_deeper/text2image.rst index 13f11df6c6b..ba91fcb9979 100644 --- a/developer_manual/digging_deeper/text2image.rst +++ b/developer_manual/digging_deeper/text2image.rst @@ -93,6 +93,8 @@ Of course, you might want to schedule the task in a background job **only** if i Task statuses ^^^^^^^^^^^^^ +.. _text2image_statuses: + All tasks always have one of the below statuses: .. code-block:: php