本项目是利用Oracle数据库的能力,通过自定义的函数实现了AI功能,仅供学习和研究使用。使用者需自行承担风险,作者不对任何使用本项目导致的损失负责。
此项目使用自定义PL/SQL构建类似于SELECT AI的功能。
- 20240523,增加配置ollama模型的文档
- 20240523,修复execute immediate字符串过长的问题
- 20240522,修复SET_WALLET问题
- 20240520,增加生成SQL后的处理函数
- 20240517,增加本地数据库连接本地模型的验证
- 20240516,增加了对本地数据库的支持,需要配置证书。
- 20240515,完善文档,优化json存储和解析
- 20240508,初次上传
- 表
CUSTOM_CLOUD_AI_REGEST_MODELS
用于存储模型端点信息 - 表
CUSTOM_CLOUD_AI_PROFILES
用于存储prfile信息 - 自定义包
CUSTOM_CLOUD_AI
用于处理请求、调用模型
CUSTOM_CLOUD_AI
包含以下方法:
-
管理模型:
- procedure
REGIST_MODEL
: 注册模型,把LLM API的端点信息存储到数据表 - procedure
UNREGIST_MODEL
: 删除端点信息 - procedure
CREATE_PROFILE
: 创建Profile,存储模型调用信息 - procedure
DROP_PROFILE
: 删除Profile,删除模型调用信息 - procedure
UPDATE_PROFILE_PROMPT
: 更新Profile中的提示词模板
- procedure
-
调用模型:
- function
CHAT
: 直接向LLM提问,获取文本答案; - function
SHOWPROMPT
: 根据提示词模板,组装提示词 - function
SHOWSQL
: 把提示词发送给LLM,获得SQL - function
VALIDSQL
: 验证SQL是否能执行,如果能返回OK
,否则返回报错信息 - function
RUNSQL
: 执行SQL语句,返回数据
- function
注意:表名、列名保持大写。
创建表CUSTOM_CLOUD_AI_REGEST_MODELS
用于存储大模型的调用端点信息
-- DROP TABLE CUSTOM_CLOUD_AI_REGEST_MODELS;
CREATE TABLE CUSTOM_CLOUD_AI_REGEST_MODELS (
id NUMBER GENERATED BY DEFAULT AS IDENTITY,
model_provider VARCHAR2(256) UNIQUE,
model_endpoint VARCHAR2(1024),
model_auth VARCHAR2(1024),
model_response_parse_path VARCHAR2(1024),
model_request_template VARCHAR2(4096),
https_wallet_path VARCHAR2(1024),
https_wallet_password VARCHAR2(1024),
CONSTRAINT pk_custom_cloud_ai_regest_models PRIMARY KEY (id)
);
创建表CUSTOM_CLOUD_AI_PROFILES
用于存储大模型的调用端点信息
-- DROP TABLE CUSTOM_CLOUD_AI_PROFILES;
CREATE TABLE CUSTOM_CLOUD_AI_PROFILES (
id NUMBER GENERATED BY DEFAULT AS IDENTITY,
ai_profile_name VARCHAR2(256) UNIQUE,
ai_description VARCHAR2(4096),
attributes CLOB,
provider VARCHAR2(256),
model VARCHAR2(256),
temperature NUMBER,
max_tokens NUMBER,
stop_tokens VARCHAR2(256),
object_list CLOB,
prompt_template VARCHAR2(32767),
prompt_ddl VARCHAR2(32767),
CONSTRAINT pk_custom_cloud_ai_profiles PRIMARY KEY (id)
);
ALTER TABLE CUSTOM_CLOUD_AI_PROFILES ADD CONSTRAINT chk_json CHECK (attributes IS JSON);
以下步骤基于模型 deepseek-code 进行演示
注意:以管理员用户身份执行
BEGIN
DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(
host => 'api.deepseek.com',
ace => xs$ace_type(privilege_list => xs$name_list('http'),
principal_name => 'DEMOUSER',
principal_type => xs_acl.ptype_db)
);
END;
/
对于自治数据库,或者访问http链接,以上步骤即可打通网络。对于本地数据库访问https链接,需要配置证书。配置步骤:本地数据库访问https配置
访问链接以验证连通性。输出结果可能为空,不报错就算成功了。
SELECT UTL_HTTP.REQUEST('https://api.deepseek.com') RESPONSE from dual;
把模型的调用信息存储到数据表,一个模型接口注册一次即可,不用重复注册。
- provider:模型提供商名称,自定义的字符串
- endpoint:模型调用的API接口
- auth:调用API的密钥字符串(注意:这里会明文存储,注意保护数据安全)
- https_wallet_path:访问https的证书wallet地址,本地数据库访问https需要配置此项
- https_wallet_password:访问https的证书wallet的密码,本地数据库访问https需要配置此项(注意:这里会明文存储,注意保护数据安全)
- response_parse_path:请根据AI服务的API文档来指定。指定如何解析API响应的json数据来获取回答,形式是json解析路径,参考文档:SQL/JSON Path Expressions
- request_template:请根据AI服务的API文档来指定。请求数据的json模板,占位符如
<MODEL>
应为大写,在处理时会被变量替换。此例中deepseek的API与OpenAI的API格式一致,可替换的参数包括:- MODEL:模型型号
- TEMPERATURE:温度,大语言模型的生成参数,不填时默认值为0
- MAX_TOKENS:最大token数,大语言模型的生成参数
- STOP:停止符,大语言模型的生成参数
BEGIN
-- CUSTOM_CLOUD_AI.UNREGIST_MODEL(provider => 'deepseek');
CUSTOM_CLOUD_AI.REGIST_MODEL(
provider => 'deepseek',
endpoint => 'https://api.deepseek.com/chat/completions',
auth => 'sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx',
https_wallet_path => '/home/oracle/admin/mydb/my_wallet',
https_wallet_password => 'mypwd_1234',
response_parse_path => '$.choices[0].message.content',
request_template => '{
"model": "<MODEL>",
"temperature": <TEMPERATURE>,
"max_tokens": <MAX_TOKENS>,
"stop": <STOP>,
"stream": false,
"messages": [
{"role": "system","content": "You are a data analyst who is proficient in Oracle SQL."},
{"role": "user","content": "<CONTENT>"}
]
}'
);
END;
/
创建模型配置,以指定如何使用 LLM 以及包含哪些数据表。
可以在1个模型上创建多个配置,以指定不同的数据、提示词等灵活变更的信息。
- profile_name:配置名称,自定义字符串
- description:文字描述,记录此配置的描述信息
- attributes:json格式指定的配置属性,其中:
- provider:必填,注册模型时指定的模型供应商
- model:必填,模型型号,将替换上文中的
MODEL
- object_list:必填,查询的数据范围,指定表或视图
- temperature:非必填,将替换上文中的
TEMPERATURE
- max_tokens:非必填,将替换上文中的
MAX_TOKENS
- stop_tokens:非必填,将替换上文中的
STOP
BEGIN
-- CUSTOM_CLOUD_AI.DROP_PROFILE(profile_name => 'DEEPSEEK_LIANG');
CUSTOM_CLOUD_AI.CREATE_PROFILE(
profile_name =>'DEEPSEEK_LIANG',
description => 'Use public LLM AI API in Oracle Database',
attributes => '{
"provider": "deepseek",
"model" : "deepseek-coder",
"object_list": [{"owner": "DEMO", "name": "SALES"},
{"owner": "DEMO", "name": "CUSTOMERS"}
]
}');
-- below variables are option
-- "temperature": 0,
-- "max_tokens":512,
-- "stop_tokens": [";"],
END;
/
这是一个可选操作,程序包中已包含默认的提示词模板,执行create_profile
时会创建默认模板
- profile_name:上文创建的配置名称
- template_name:可选参数:
prompt_template
:向AI提问的提示词模板prompt_ddl
:构造数据表信息的文本模板
- template_value:提示词模板的内容,注意要保持占位符不变,否则后续替换会出问题。
BEGIN
CUSTOM_CLOUD_AI.UPDATE_PROFILE_PROMPT(
profile_name =>'DEEPSEEK_LIANG',
template_name => 'prompt_template',
template_value => '....');
END;
/
默认prompt_template
提示词模板,占位符包括:
<prompt>
:用户提出的问题<table_infos>
:从元数据中组合出的表信息<Example SQL pitch>
:示例查询,此项目的是根据用户提问搜索相似问题的答案,暂时没有起作用
You are a data analyst who is proficient in Oracle SQL.
Task Description: Generate Oracle SQL queries based on the provided context table schema (DDL).
Context:
Table Names and Table DDL: [Provide the Data Definition Language (DDL) for the table, including column names, data types, and comments. Column comments explain the meaning of each column.]
Input:
User Question: [Specify the question the user asks, which requires a SQL query response]
Output:
Executable clean SQL Query without text explaination: [Generated SQL query based on the given task]
Additional Instructions/Notes:
1. Ensure that the generated SQL query is syntactically correct and applicable to the provided table schema.
2. Only generate SELECT SQL queries, never answer INSERT, UPDATE, DELETE etc.
3. Generate plain text without markdown formate. Do not write anything else except the SQL query.
4. Always use English column names and Chinese column alias.
5. Select proper columns. Keep the column sort order as above.
6. If the question is too complex to answer, you can think it step by step. For example, the question "How many times do customers who spend more than 1000 spend" can be decomposed into "1. Who are the customers who spend more than 1000, 2. What are the times these users spend" .
Context:Oracle databse tables with their properties
<table_infos>
<Example SQL pitch>
Input:
User Question:<prompt>
Output:
默认prompt_ddl
提示词模板,占位符包括:
<table_comment>
:表注释,用于描述表的信息<schema>
:schema名称<table_name>
:表名称<列信息>
:由列信息组合成的字符串,以列名 数据类型, -- 列注释
形式组合
### Table meaning: <table_comment>
CREATE TABLE <schema>.<table_name> (
column_name data_type,
<column_info>
- 首先应该测试大语言模型的响应,看是否返回预期内容,如果结果出现错误,请查看前述步骤排错
使用英文提问:
SELECT
CUSTOM_CLOUD_AI.CHAT(
profile_name =>'DEEPSEEK_LIANG',
prompt => 'Introduce yourself'
) as RESPONSE
FROM DUAL;
预期结果:
As an AI developed by Deepseek, I''m here to assist with computer
science-related queries......
使用中文提问:
SELECT
CUSTOM_CLOUD_AI.CHAT(
profile_name =>'DEEPSEEK_LIANG',
prompt => '简要介绍一下你自己'
) as RESPONSE
FROM DUAL;
预期结果:
作为一个AI编程助手,我被设计用来回答计算机科学相关的问题......
- 测试提示词是否正确生成
SELECT
CUSTOM_CLOUD_AI.SHOWPROMPT(
profile_name =>'DEEPSEEK_LIANG',
prompt => '告诉我销售总金额'
) as RESPONSE
FROM DUAL;
预期结果:
You are a data analyst who is proficient in Oracle SQL.
Task Description: Generate Oracle SQL queries based on the provided context table schema (DDL).
Context:
Table Names and Table DDL: [Provide the Data Definition Language (DDL) for the table, including column names, data types, and comments. Column comments explain the meaning of each column.]
Input:
User Question: [Specify the question the user asks, which requires a SQL query response]
Output:
Executable clean SQL Query without text explaination: [Generated SQL query based on the given task]
Additional Instructions/Notes:
1. Ensure that the generated SQL query is syntactically correct and applicable to the provided table schema.
2. Only generate SELECT SQL queries, never answer INSERT, UPDATE, DELETE etc.
3. Generate plain text without markdown formate. Do not write anything else except the SQL query.
4. Always use English column names and Chinese column alias.
5. Select proper columns. Keep the column sort order as above.
6. If the question is too complex to answer, you can think it step by step. For example, the question "How many times do customers who spend more than 1000 spend" can be decomposed into "1. Who are the customers who spend more than 1000, 2. What are the times these users spend" .
Context:Oracle databse tables with their properties
### Table meaning: facts table, without a primary key; all rows are uniquely identified by the combination of all foreign keys
column_name data_type,
CREATE TABLE SH.SALES (
PROD_ID NUMBER, -- FK to the products dimension table
CUST_ID NUMBER, -- FK to the customers dimension table
TIME_ID DATE, -- FK to the times dimension table
CHANNEL_ID NUMBER, -- FK to the channels dimension table
PROMO_ID NUMBER, -- promotion identifier, without FK constraint (intentionally) to show outer join optimization
QUANTITY_SOLD NUMBER, -- product quantity sold with the transaction
AMOUNT_SOLD NUMBER, -- invoiced amount to the customer
);
### Table meaning: dimension table
column_name data_type,
CREATE TABLE SH.CUSTOMERS (
CUST_CREDIT_LIMIT NUMBER, -- customer credit limit
CUST_EMAIL VARCHAR2, -- customer email id
CUST_TOTAL VARCHAR2,
CUST_TOTAL_ID NUMBER,
CUST_SRC_ID NUMBER,
CUST_EFF_FROM DATE,
CUST_EFF_TO DATE,
CUST_VALID VARCHAR2,
CUST_ID NUMBER, -- primary key
CUST_FIRST_NAME VARCHAR2, -- first name of the customer
CUST_LAST_NAME VARCHAR2, -- last name of the customer
CUST_GENDER CHAR, -- gender; low cardinality attribute
CUST_YEAR_OF_BIRTH NUMBER, -- customer year of birth
CUST_MARITAL_STATUS VARCHAR2, -- customer marital status; low cardinality attribute
CUST_STREET_ADDRESS VARCHAR2, -- customer street address
CUST_POSTAL_CODE VARCHAR2, -- postal code of the customer
CUST_CITY VARCHAR2, -- city where the customer lives
CUST_CITY_ID NUMBER,
CUST_STATE_PROVINCE VARCHAR2, -- customer geography: state or province
CUST_STATE_PROVINCE_ID NUMBER,
COUNTRY_ID NUMBER, -- foreign key to the countries table (snowflake)
CUST_MAIN_PHONE_NUMBER VARCHAR2, -- customer main phone number
CUST_INCOME_LEVEL VARCHAR2, -- customer income level
);
Input:
User Question:告诉我销售总金额是多少
Output:
- 测试生成SQL
SELECT
CUSTOM_CLOUD_AI.SHOWSQL(
profile_name =>'DEEPSEEK_LIANG',
prompt => '告诉我销售总金额'
) as RESPONSE
FROM DUAL;
预期结果:
SELECT SUM(AMOUNT_SOLD) AS "销售总金额" FROM SH.SALES
- 验证SQL
正确示例:
SELECT
CUSTOM_CLOUD_AI.VALIDSQL('SELECT SUM(AMOUNT_SOLD) AS "销售总金额" FROM SH.SALES'
) as RESPONSE
FROM DUAL;
预期结果:
RESPONSE
OK
返回错误的示例:
SELECT
CUSTOM_CLOUD_AI.VALIDSQL('SELECT SUM(AMOUNT_SOLD_错误) AS "销售总金额" FROM SH.SALES'
) as RESPONSE
FROM DUAL;
预期结果:
RESPONSE
ORA-00904: "AMOUNT_SOLD_错误": invalid identifier
- 执行SQL
第二个参数nrow
指定返回行数,默认是10
SELECT CUSTOM_CLOUD_AI.RUNSQL('SELECT CUST_ID,SUM(AMOUNT_SOLD) AS "销售总金额" FROM SH.SALES GROUP BY CUST_ID') as RESPONSE
FROM DUAL;
指定返回2行
SELECT CUSTOM_CLOUD_AI.RUNSQL('SELECT CUST_ID,SUM(AMOUNT_SOLD) AS "销售总金额" FROM SH.SALES GROUP BY CUST_ID', 2) as RESPONSE
FROM DUAL;
预期结果:
[{"cust_id":12783,"销售总金额":97573.55},{"cust_id":10747,"销售总金额":99578.09}]