Skip to content

Commit b4f6684

Browse files
ctrlz526c121914yu
andauthored
New pic (#4858)
* 更新数据集相关类型,添加图像文件ID和预览URL支持;优化数据集导入功能,新增图像数据集处理组件;修复部分国际化文本;更新文件上传逻辑以支持新功能。 * 与原先代码的差别 * 新增 V4.9.10 更新说明,支持 PG 设置`systemEnv.hnswMaxScanTuples`参数,优化 LLM stream 调用超时,修复全文检索多知识库排序问题。同时更新数据集索引,移除 datasetId 字段以简化查询。 * 更换成fileId_image逻辑,并增加训练队列匹配的逻辑 * 新增图片集合判断逻辑,优化预览URL生成流程,确保仅在数据集为图片集合时生成预览URL,并添加相关日志输出以便调试。 * Refactor Docker Compose configuration to comment out exposed ports for production environments, update image versions for pgvector, fastgpt, and mcp_server, and enhance Redis service with a health check. Additionally, standardize dataset collection labels in constants and improve internationalization strings across multiple languages. * Enhance TrainingStates component by adding internationalization support for the imageParse training mode and update defaultCounts to include imageParse mode in trainingDetail API. * Enhance dataset import context by adding additional steps for image dataset import process and improve internationalization strings for modal buttons in the useEditTitle hook. * Update DatasetImportContext to conditionally render MyStep component based on data source type, improving the import process for non-image datasets. * Refactor image dataset handling by improving internationalization strings, enhancing error messages, and streamlining the preview URL generation process. * 图片上传到新建的 dataset_collection_images 表,逻辑跟随更改 * 修改了除了controller的其他部分问题 * 把图片数据集的逻辑整合到controller里面 * 补充i18n * 补充i18n * resolve评论:主要是上传逻辑的更改和组件复用 * 图片名称的图标显示 * 修改编译报错的命名问题 * 删除不需要的collectionid部分 * 多余文件的处理和改动一个删除按钮 * 除了loading和统一的imageId,其他都resolve掉的 * 处理图标报错 * 复用了MyPhotoView并采用全部替换的方式将imageFileId变成imageId * 去除不必要文件修改 * 报错和字段修改 * 增加上传成功后删除临时文件的逻辑以及回退一些修改 * 删除path字段,将图片保存到gridfs内,并修改增删等操作的代码 * 修正编译错误 --------- Co-authored-by: archer <[email protected]>
1 parent 8ed35ff commit b4f6684

File tree

60 files changed

+2529
-283
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+2529
-283
lines changed

packages/global/common/file/icon.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ export const fileImgs = [
66
{ suffix: '(doc|docs)', src: 'file/fill/doc' },
77
{ suffix: 'txt', src: 'file/fill/txt' },
88
{ suffix: 'md', src: 'file/fill/markdown' },
9-
{ suffix: 'html', src: 'file/fill/html' }
9+
{ suffix: 'html', src: 'file/fill/html' },
10+
{ suffix: '(jpg|jpeg|png|gif|bmp|webp|svg|ico|tiff|tif)', src: 'image' }
1011

1112
// { suffix: '.', src: '/imgs/files/file.svg' }
1213
];

packages/global/core/dataset/api.d.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ export type CreateDatasetCollectionParams = DatasetCollectionStoreDataType & {
5858
hashRawText?: string;
5959

6060
tags?: string[];
61+
imageIdList?: string[];
6162

6263
createTime?: Date;
6364
updateTime?: Date;
@@ -127,6 +128,7 @@ export type PgSearchRawType = {
127128
export type PushDatasetDataChunkProps = {
128129
q: string; // embedding content
129130
a?: string; // bonus content
131+
imageId?: string; //file id preview
130132
chunkIndex?: number;
131133
indexes?: Omit<DatasetDataIndexItemType, 'dataId'>[];
132134
};
@@ -151,4 +153,5 @@ export type PushDatasetDataProps = {
151153
};
152154
export type PushDatasetDataResponse = {
153155
insertLen: number;
156+
message?: string;
154157
};

packages/global/core/dataset/constants.ts

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,8 @@ export enum DatasetCollectionTypeEnum {
7777
file = 'file',
7878
link = 'link', // one link
7979
externalFile = 'externalFile',
80-
apiFile = 'apiFile'
80+
apiFile = 'apiFile',
81+
images = 'images'
8182
}
8283
export const DatasetCollectionTypeMap = {
8384
[DatasetCollectionTypeEnum.folder]: {
@@ -97,6 +98,9 @@ export const DatasetCollectionTypeMap = {
9798
},
9899
[DatasetCollectionTypeEnum.apiFile]: {
99100
name: i18nT('common:core.dataset.apiFile')
101+
},
102+
[DatasetCollectionTypeEnum.images]: {
103+
name: i18nT('dataset:core.dataset.Image collection')
100104
}
101105
};
102106

@@ -120,6 +124,7 @@ export const DatasetCollectionSyncResultMap = {
120124
export enum DatasetCollectionDataProcessModeEnum {
121125
chunk = 'chunk',
122126
qa = 'qa',
127+
imageParse = 'imageParse',
123128
backup = 'backup',
124129

125130
auto = 'auto' // abandon
@@ -133,6 +138,10 @@ export const DatasetCollectionDataProcessModeMap = {
133138
label: i18nT('common:core.dataset.training.QA mode'),
134139
tooltip: i18nT('common:core.dataset.import.QA Import Tip')
135140
},
141+
[DatasetCollectionDataProcessModeEnum.imageParse]: {
142+
label: i18nT('dataset:training.Image mode'),
143+
tooltip: i18nT('common:core.dataset.import.Chunk Split Tip')
144+
},
136145
[DatasetCollectionDataProcessModeEnum.backup]: {
137146
label: i18nT('dataset:backup_mode'),
138147
tooltip: i18nT('dataset:backup_mode')
@@ -172,14 +181,16 @@ export enum ImportDataSourceEnum {
172181
fileCustom = 'fileCustom',
173182
externalFile = 'externalFile',
174183
apiDataset = 'apiDataset',
175-
reTraining = 'reTraining'
184+
reTraining = 'reTraining',
185+
imageDataset = 'imageDataset'
176186
}
177187

178188
export enum TrainingModeEnum {
179189
chunk = 'chunk',
180190
qa = 'qa',
181191
auto = 'auto',
182-
image = 'image'
192+
image = 'image',
193+
imageParse = 'imageParse'
183194
}
184195

185196
/* ------------ search -------------- */

packages/global/core/dataset/controller.d.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ export type CreateDatasetDataProps = {
88
chunkIndex?: number;
99
q: string;
1010
a?: string;
11+
imageId?: string;
1112
indexes?: Omit<DatasetDataIndexItemType, 'dataId'>[];
1213
};
1314

@@ -19,6 +20,7 @@ export type UpdateDatasetDataProps = {
1920
indexes?: (Omit<DatasetDataIndexItemType, 'dataId'> & {
2021
dataId?: string; // pg data id
2122
})[];
23+
imageId?: string;
2224
};
2325

2426
export type PatchIndexesProps =
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
export type DatasetImageSchema = {
2+
_id: string;
3+
teamId: string;
4+
datasetId: string;
5+
collectionId?: string;
6+
name: string;
7+
contentType: string;
8+
size: number;
9+
metadata?: Record<string, any>;
10+
expiredTime?: Date;
11+
createdAt: Date;
12+
updatedAt: Date;
13+
};
14+
15+
// API请求参数类型
16+
export type UploadDatasetImageProps = {
17+
datasetId: string;
18+
collectionId?: string;
19+
};

packages/global/core/dataset/type.d.ts

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,7 @@ export type DatasetDataSchemaType = {
143143
updateTime: Date;
144144
q: string; // large chunks or question
145145
a: string; // answer or custom content
146+
imageId?: string;
146147
history?: {
147148
q: string;
148149
a: string;
@@ -179,6 +180,7 @@ export type DatasetTrainingSchemaType = {
179180
dataId?: string;
180181
q: string;
181182
a: string;
183+
imageId?: string;
182184
chunkIndex: number;
183185
indexSize?: number;
184186
weight: number;
@@ -254,10 +256,10 @@ export type DatasetDataItemType = {
254256
sourceId?: string;
255257
q: string;
256258
a: string;
259+
imageId?: string;
257260
chunkIndex: number;
258261
indexes: DatasetDataIndexItemType[];
259262
isOwner: boolean;
260-
// permission: DatasetPermission;
261263
};
262264

263265
/* --------------- file ---------------------- */

packages/global/core/dataset/utils.ts

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
import { TrainingModeEnum, DatasetCollectionTypeEnum } from './constants';
1+
import {
2+
TrainingModeEnum,
3+
DatasetCollectionTypeEnum,
4+
DatasetCollectionDataProcessModeEnum
5+
} from './constants';
26
import { getFileIcon } from '../../common/file/icon';
37
import { strIsLink } from '../../common/string/tools';
48

@@ -15,6 +19,9 @@ export function getCollectionIcon(
1519
if (type === DatasetCollectionTypeEnum.virtual) {
1620
return 'file/fill/manual';
1721
}
22+
if (type === DatasetCollectionTypeEnum.images) {
23+
return 'image';
24+
}
1825
return getFileIcon(name);
1926
}
2027
export function getSourceNameIcon({

packages/service/core/ai/model.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ export const getVlmModel = (model?: string) => {
2020
?.find((item) => item.model === model || item.name === model);
2121
};
2222

23+
export const getVlmModelList = () => {
24+
return Array.from(global.llmModelMap.values())?.filter((item) => item.vision) || [];
25+
};
26+
2327
export const getDefaultEmbeddingModel = () => global?.systemDefaultModel.embedding!;
2428
export const getEmbeddingModel = (model?: string) => {
2529
if (!model) return getDefaultEmbeddingModel();

0 commit comments

Comments
 (0)