Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YSP-672: AI: Add media payloads and pipelines #15

Draft
wants to merge 40 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
798fac3
feat(pdfs): allow media as allowed content feed type
dblanken-yale Oct 2, 2024
5ee2369
feat(pdfs): retrieve the entity type we're targeting for use later
dblanken-yale Oct 2, 2024
4921e6d
feat(pdfs): abstract content data creation based on entityType
dblanken-yale Oct 2, 2024
bfa07ad
feat(pdf): allow media as possible entityType
dblanken-yale Oct 2, 2024
d58ee43
feat(pdfs): construct unique identifier based on type
dblanken-yale Oct 2, 2024
a262224
chore(pdfs): sort use statements
dblanken-yale Oct 2, 2024
87076b9
feat(content): create plugins for feed output
dblanken-yale Oct 2, 2024
392c61f
feat(YSP-672): new media does not ai index by default
dblanken-yale Oct 2, 2024
cecb5a8
chore(YSP-672): add comments
dblanken-yale Oct 2, 2024
e15ad4c
feat(YSP-672): upsert all sends media as well
dblanken-yale Oct 2, 2024
dc61214
feat(YSP-672): pass through the type to append to payloads
dblanken-yale Oct 2, 2024
cb6446f
feat(YSP-672): create a way to retrieve file-based data
dblanken-yale Oct 2, 2024
a7c8b11
feat(YSP-672): Set up each media type for content export
dblanken-yale Oct 2, 2024
4764994
feat(YSP-672): add ability to select media to include
dblanken-yale Oct 3, 2024
8f6d55a
feat(YSP-672): consider selected media types when using ai
dblanken-yale Oct 3, 2024
06be214
fix(YSP-672): use media URL for documentUrl
dblanken-yale Oct 3, 2024
ae1a5d4
chore(YSP-672): add comments
dblanken-yale Oct 3, 2024
11b81e2
chore(YSP-672): reference constant correctly
dblanken-yale Oct 3, 2024
c3eb39a
refactor(YSP-672): reuse sendJsonPost across calls
dblanken-yale Oct 3, 2024
de852f6
chore(YSP-672): allow config to be nullable
dblanken-yale Oct 3, 2024
f8811b9
feat(YSP-672): add enable/disable ai bulk actions
dblanken-yale Oct 7, 2024
4e88876
feat(YSP-672): bulk update AI state
dblanken-yale Oct 8, 2024
df00a1f
chore(YSP-672): fix misspelling
dblanken-yale Oct 8, 2024
2791996
feat(YSP-672): bulk operation depends on ai_engine permission level
dblanken-yale Oct 8, 2024
1f2f4a2
feat(YSP-672): add bulk ai operations for media
dblanken-yale Oct 8, 2024
98e666c
refactor(YSP-672): define entities globally
dblanken-yale Oct 8, 2024
0d2801b
refactor(YSP-672): generalize actions for code reuse
dblanken-yale Oct 9, 2024
d7c9387
chore(YSP-672): remove unneeded comment
dblanken-yale Oct 11, 2024
aec2f45
chore(YSP-672): satisfy linter
dblanken-yale Oct 11, 2024
bc53acc
refactor(YSP-672): refactor new media evaluation for easier reading
dblanken-yale Oct 11, 2024
5f904fd
fix(YSP-672): default media types to empty array for in_array
dblanken-yale Oct 11, 2024
eb5782e
fix(YSP-672): use only one default value
dblanken-yale Oct 11, 2024
d870513
docs(YSP-672): add metatag setup instructions
dblanken-yale Oct 15, 2024
3656122
feat(YSP-672): bypass draft state updates
dblanken-yale Oct 15, 2024
8730ef1
chore(YSP-672): fix awful code linting issues
dblanken-yale Oct 15, 2024
22b41a1
chore(YSP-672): include ContianerInterface reference
dblanken-yale Oct 15, 2024
f5e1fed
chore(YSP-672): use ClientFaactoryInterface reference
dblanken-yale Oct 15, 2024
a868bb9
chore(YSP-672): add more references
dblanken-yale Oct 15, 2024
90ef0c2
docs(YSP-672): comment constructor
dblanken-yale Oct 15, 2024
a4be970
chore(YSP-672): remove dpm
dblanken-yale Oct 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ai_engine.permissions.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Create permission for managin AI Engine settings.
administer ai engine:
title: 'Administer AI Engine'
description: 'Enable services and change sensative settings.'
description: 'Enable services and change sensitive settings.'
manage ai engine settings:
title: 'Manage AI Engine Settings'
description: 'Set and update AI Engine content and settings.'
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@

namespace Drupal\ai_engine_embedding\Form;

use Drupal\ai_engine_embedding\Service\EntityUpdate;
use Drupal\Core\Entity\EntityTypeManagerInterface;
use Drupal\Core\Form\ConfigFormBase;
use Drupal\Core\Form\FormStateInterface;
use Drupal\ai_engine_embedding\Service\EntityUpdate;
use Symfony\Component\DependencyInjection\ContainerInterface;

/**
* Setting form for the AI Engine Embedding module.
Expand All @@ -18,6 +20,13 @@ class AiEngineEmbeddingSettings extends ConfigFormBase {
*/
const CONFIG_NAME = 'ai_engine_embedding.settings';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a part of your current work, but I'm just now realizing that we should have a config/install for this settings file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that makes sense. I can add that as part of this.


/**
* Entity type manager.
*
* @var \Drupal\Core\Entity\EntityTypeManager
*/
protected $entityTypeManager;

/**
* {@inheritdoc}
*/
Expand All @@ -32,6 +41,21 @@ protected function getEditableConfigNames() {
return [self::CONFIG_NAME];
}

public function __construct(
EntityTypeManagerInterface $entityTypeManager,
) {
$this->entityTypeManager = $entityTypeManager;
}

/**
* {@inheritdoc}
*/
public static function create(ContainerInterface $container) {
return new static(
$container->get('entity_type.manager'),
);
}

/**
* {@inheritdoc}
*/
Expand Down Expand Up @@ -67,6 +91,13 @@ public function buildForm(array $form, FormStateInterface $form_state) {
'#description' => $this->t('The chunk size to split each document into'),
'#default_value' => $config->get('azure_chunk_size') ?? 3000,
];
$form['included_media_types'] = [
'#type' => 'checkboxes',
'#title' => $this->t('Included Media Types'),
'#options' => $this->getMediaTypes(),
'#default_value' => array_keys($config->get('included_media_types') ?? []),
'#default_value' => $config->get('included_media_types') ?? [],
dblanken-yale marked this conversation as resolved.
Show resolved Hide resolved
];
$form['actions'] = [
'#type' => 'details',
'#title' => $this->t('Embedding Operations'),
Expand Down Expand Up @@ -112,6 +143,7 @@ public function submitForm(array &$form, FormStateInterface $form_state) {
->set('azure_search_service_index', $form_state->getValue('azure_search_service_index'))
->set('azure_embedding_service_url', $form_state->getValue('azure_embedding_service_url'))
->set('azure_chunk_size', $azure_chunk_size)
->set('included_media_types', $form_state->getValue('included_media_types'))
->save();
parent::submitForm($form, $form_state);
}
Expand All @@ -124,4 +156,19 @@ public function actionUpsertAllDocuments(array &$form, FormStateInterface $form_
$service->addAllDocuments();
}

/**
* Retrieves the list of media types.
*
* @return array
* An array of media type labels.
*/
protected function getMediaTypes() {
$media_types = [];
foreach ($this->entityTypeManager->getStorage('media_type')->loadMultiple() as $media_type) {
$media_types[$media_type->id()] = $media_type->label();
}

return $media_types;
}

}
172 changes: 101 additions & 71 deletions modules/ai_engine_embedding/src/Service/EntityUpdate.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,32 @@

namespace Drupal\ai_engine_embedding\Service;

use Drupal\ai_engine_feed\Service\Sources;
use Drupal\Core\Config\ConfigFactoryInterface;
use Drupal\Core\Entity\EntityInterface;
use Drupal\Core\Entity\EntityPublishedInterface;
use Drupal\Core\Http\ClientFactory;
use Drupal\Core\Logger\LoggerChannelInterface;
use Drupal\ai_engine_feed\Service\Sources;
use Drupal\metatag\MetatagManager;

/**
* Service for updating the vector database as content is updated.
*/
class EntityUpdate {
/**
* The default chunk size for sending data to the AI Embedding service.
*
* @var int
*/
const CHUNK_SIZE_DEFAULT = 3000;

/**
* The allowed entity types for indexing.
*
* @var array
*/
const ALLOWED_ENTITIES = ['node', 'media'];

/**
* The configuration factory.
*
Expand Down Expand Up @@ -150,36 +162,56 @@ public function delete(EntityInterface $entity) {
* a cleanup routine to find and delete out of date chunks.
*/
public function addAllDocuments() {
$docTypes = ['node' => 'text', 'media' => 'media'];
$config = $this->configFactory->get('ai_engine_embedding.settings');
$data = $this->getData("upsert", $config, [], "");
$httpClient = $this->httpClientFactory->fromOptions([
'headers' => [
'Content-Type' => 'application/json',
],
]);
$endpoint = $config->get('azure_embedding_service_url') . '/api/upsert';

try {
$response = $httpClient->post($endpoint, ['json' => $data]);
// Loop through entityTypesToSend and send.
foreach (self::ALLOWED_ENTITIES as $entityType) {
$data = $this->getData("upsert", $config, ['entityType' => $entityType], "", $docTypes[$entityType]);
$endpoint = $config->get('azure_embedding_service_url') . '/api/upsert';
$response = $this->sendJsonPost($endpoint, $data);

if ($response->getStatusCode() === 200) {
$responseData = json_decode($response->getBody()->getContents(), TRUE);
$this->logger->notice(
'Removed node @id from vector database. Service response: @response',
'Upserted node @id from vector database. Service response: @response',
['@response' => print_r($responseData, TRUE)]
);
}
else {
$this->logger->notice(
'Unable to remove node @id from vector database. POST failed with status code: @code',
'Unable to upsert node @id to vector database. POST failed with status code: @code',
['@code' => $response->getStatusCode()]
);
return NULL;
}
}
}

/**
* Sends a post request to an endpoint with data.
*
* @param string $endpoint
* The endpoint to send the data to.
* @param array $data
* The data to send.
*
* @return \Psr\Http\Message\ResponseInterface
* The response from the post request.
*/
protected function sendJsonPost($endpoint, $data) {
$httpClient = $this->httpClientFactory->fromOptions([
'headers' => [
'Content-Type' => 'application/json',
],
]);

try {
return $httpClient->post($endpoint, ['json' => $data]);
}
catch (\Exception $e) {
$this->logger->error(
'An error occurred while upserting document: @error',
'An error occurred while posting document: @error',
['@error' => $e->getMessage()]
);
return NULL;
Expand All @@ -198,41 +230,26 @@ public function addAllDocuments() {
*/
public function upsertDocument(EntityInterface $entity) {
$config = $this->configFactory->get('ai_engine_embedding.settings');
$chunk_size = $config->get('azure_chunk_size') || CHUNK_SIZE_DEFAULT;
$entityTypeId = $entity->getEntityTypeId();
$route_params = [
'entityType' => $entity->getEntityTypeId(),
'entityType' => $entityTypeId,
'id' => $entity->id(),
];
$data = $this->getData("upsert", $config, $route_params, "");
$httpClient = $this->httpClientFactory->fromOptions([
'headers' => [
'Content-Type' => 'application/json',
],
]);
$endpoint = $config->get('azure_embedding_service_url') . '/api/upsert';
$response = $this->sendJsonPost($endpoint, $data);

try {
$response = $httpClient->post($endpoint, ['json' => $data]);

if ($response->getStatusCode() === 200) {
$responseData = json_decode($response->getBody()->getContents(), TRUE);
$this->logger->notice(
'Removed node @id from vector database. Service response: @response',
['@id' => $entity->id(), '@response' => print_r($responseData, TRUE)]
);
}
else {
$this->logger->notice(
'Unable to remove node @id from vector database. POST failed with status code: @code',
['@id' => $entity->id(), '@code' => $response->getStatusCode()]
);
return NULL;
}
if ($response->getStatusCode() === 200) {
$responseData = json_decode($response->getBody()->getContents(), TRUE);
$this->logger->notice(
'Upserted node @id to vector database. Service response: @response',
['@id' => $entity->id(), '@response' => print_r($responseData, TRUE)]
);
}
catch (\Exception $e) {
$this->logger->error(
'An error occurred while upserting document: @error',
['@error' => $e->getMessage()]
else {
$this->logger->notice(
'Unable to upsert node @id to vector database. POST failed with status code: @code',
['@id' => $entity->id(), '@code' => $response->getStatusCode()]
);
return NULL;
}
Expand All @@ -253,35 +270,20 @@ protected function removeDocument(EntityInterface $entity) {
"id_list" => [],
"id_filter_list" => [$this->sources->getSearchIndexId($entity)],
];
$httpClient = $this->httpClientFactory->fromOptions([
'headers' => [
'Content-Type' => 'application/json',
],
]);
$endpoint = $config->get('azure_embedding_service_url') . '/api/deletebyid';
$response = $this->sendJsonPost($endpoint, $data);

try {
$response = $httpClient->post($endpoint, ['json' => $data]);

if ($response->getStatusCode() === 200) {
$responseData = json_decode($response->getBody()->getContents(), TRUE);
$this->logger->notice(
'Removed node @id from vector database. Service response: @response',
['@id' => $entity->id(), '@response' => print_r($responseData, TRUE)]
);
}
else {
$this->logger->notice(
'Unable to remove node @id from vector database. POST failed with status code: @code',
['@id' => $entity->id(), '@code' => $response->getStatusCode()]
);
return NULL;
}
if ($response->getStatusCode() === 200) {
$responseData = json_decode($response->getBody()->getContents(), TRUE);
$this->logger->notice(
'Removed node @id from vector database. Service response: @response',
['@id' => $entity->id(), '@response' => print_r($responseData, TRUE)]
);
}
catch (\Exception $e) {
$this->logger->error(
'An error occurred while deleting document: @error',
['@error' => $e->getMessage()]
else {
$this->logger->notice(
'Unable to remove node @id from vector database. POST failed with status code: @code',
['@id' => $entity->id(), '@code' => $response->getStatusCode()]
);
return NULL;
}
Expand Down Expand Up @@ -330,7 +332,29 @@ protected function isIndexable(EntityInterface $entity) {
* TRUE if the entity should be embedded, FALSE otherwise.
*/
protected function isSupportedEntityType(EntityInterface $entity) {
return $entity->getEntityTypeId() === 'node';
$entity_type_id = $entity->getEntityTypeId();

if ($entity_type_id == 'media') {
return $this->isSupportedMediaType($entity);
}
else {
return in_array($entity->getEntityTypeId(), self::ALLOWED_ENTITIES);
}
}

/**
* Checks if an entity is supported by the embedding system.
*
* @param \Drupal\Core\Entity\EntityInterface $entity
* The entity to check.
*
* @return bool
* TRUE if the entity should be embedded, FALSE otherwise.
*/
protected function isSupportedMediaType(EntityInterface $entity) {
$config = $this->configFactory->get('ai_engine_embedding.settings');
$allowed_media_types = $config->get('included_media_types');
return in_array($entity->bundle(), $allowed_media_types);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that this will throw an error on sites that already have this module enabled but did not resave the form. Can not check in_array on NULL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch; I've added ?? [] for $allowed_media_types now.

}

/**
Expand Down Expand Up @@ -386,7 +410,7 @@ public function isIndexingEnabled(EntityInterface $entity) {
* @return array
* An array of data to send to the AI Embedding service.
*/
protected function getData($action = 'upsert', $config, $route_params = [], $data = ""): array {
protected function getData($action = 'upsert', $config = NULL, $route_params = [], $data = "", $doctype = 'text'): array {
$allowed_actions = ['upsert'];
if (!$config) {
throw new \Exception('Missing configuration object.');
Expand All @@ -396,7 +420,13 @@ protected function getData($action = 'upsert', $config, $route_params = [], $dat
throw new \Exception('Invalid action provided.');
}

$chunk_size = $config->get('azure_chunk_size') ?? CHUNK_SIZE_DEFAULT;
$allowed_doctypes = ['text', 'media'];

if (!in_array($doctype, $allowed_doctypes)) {
throw new \Exception('Invalid doctype provided.');
}

$chunk_size = $config->get('azure_chunk_size') ?? self::CHUNK_SIZE_DEFAULT;

$data_endpoint = "";
if ($data == "") {
Expand All @@ -405,7 +435,7 @@ protected function getData($action = 'upsert', $config, $route_params = [], $dat

return [
"action" => $action,
"doctype" => "text",
"doctype" => $doctype,
"service_name" => $config->get('azure_search_service_name'),
"index_name" => $config->get('azure_search_service_index'),
"data" => $data,
Expand Down
5 changes: 4 additions & 1 deletion modules/ai_engine_feed/ai_engine_feed.services.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,7 @@ services:
# Service to query and prepare content for the feed.
ai_engine_feed.sources:
class: Drupal\ai_engine_feed\Service\Sources
arguments: ['@entity_type.manager', '@logger.channel.default', '@renderer', '@request_stack', '@ai_engine_metadata.manager', '@entity_field.manager', '@config.factory']
arguments: ['@entity_type.manager', '@logger.channel.default', '@renderer', '@request_stack', '@ai_engine_metadata.manager', '@entity_field.manager', '@config.factory', '@plugin.manager.ai_engine_feed.content_feed_manager']
plugin.manager.ai_engine_feed.content_feed_manager:
class: Drupal\ai_engine_feed\ContentFeedManager
parent: default_plugin_manager
20 changes: 20 additions & 0 deletions modules/ai_engine_feed/src/Annotation/ContentFeedPlugin.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<?php

namespace Drupal\ai_engine_feed\Annotation;

use Drupal\Component\Annotation\Plugin;

/**
* Defines a content feed plugin annotation object.
*
* @Annotation
*/
class ContentFeedPlugin extends Plugin {
/**
* The plugin ID.
*
* @var string
*/
public $id;

}
Loading