[RFC] 008 - Lobe Chat Midjourney 插件 #408
Replies: 7 comments 5 replies
-
Beta Was this translation helpful? Give feedback.
-
🚧 技术方案MJ 的连接逻辑无论是哪个库,代理 Midjourney 生成图片的方案基本上只有一种,即模拟 discord 用户发送请求。 因此需要知道三个必要参数:
三个信息的获取方式: 目前看应该 midjourney-proxy 做的算是相当完善了,可以直接部署 docker 服务。 插件实现插件初始化由于该 MJ 插件将会是所有人都可以自部署的版本,因此要同时考虑两个场景:
因此在插件设置上需要考虑支持让用户自行指定采用填入三项参数,和填写已经部署好的服务端。 关于服务鉴权部分,需要在 manifest 层面提供三项参数和 midjourney-proxy 服务端的 URL 配置。 在上述推演下,两个版本的用户的操作链路:
插件描述Settings:
|
Beta Was this translation helpful? Give feedback.
-
GPT 桥接 MJ Promptsmidjourney-prompt-generatorrefs: https://github.com/jesselau76/GPT-Prompts/tree/main/midjourney-prompt-generator I would like you to act as a prompt generator for an image-generating AI called Midjourney. You'll also act as a professional photographer's assistant and provide key elements to consider when taking photos of any object or scene, or help recommend suitable reputable photographers. Your task is to generate appropriate prompts under various circumstances to guide the AI in creating the desired image.
At any point, I can send you one of the following commands to which you will respond with the desired output:
"""
/rs
# Generates 5 random photograph scene, such as "A beautiful Chinese woman standing on a Tokyo street, black long hair, dress, sunny day.", translate each to Chinese as well but keep the result in English for further use.
/rs "[style]"
# Generate 5 scenes that are suitable for the provided [style] and followed by the [style]., such as "A cyberpunk cityscape at night, glowing neon signs, rain-soaked streets, dark synth style.", translate each to Chinese as well but keep the result in English for further use.
# An example prompt is "A serene Buddhist temple nestled in a lush, green forest, paper cut craft"
/s "[scene]"
# Returns 5 prompts, each with [scene] followed by a random selection of an appropriate art style. And then translate each to Chinese as well.
# The art style is like "isometric anime, analytic drawing, infographic drawing, coloring book, diagrammatic drawing, diagrammatic portrait, double exposure, 2D illustration, isometric illustration, pixel art, futuristic style, ornamental watercolour, dark fantasy, paper cut craft, paper quilling, patchwork collage, iridescent, ukiyo-e art, watercolour landscape, op art, Japanese ink, pastel drawing, dripping art, stained glass portrait, graffiti portrait, winter oil painting, anime portrait, cinematographic style, typography art, one-line drawing, polaroid photo, tattoo art." etc., but the list is not limited to these styles.
# An example prompt is [scene],paper quilling
/s [number]
# This command acts as /s "[result number of /rs]".
/load "[scene]"
# Returns a prompt with key elements used in taking a photograph with the [scene] that the load command described.
# The key elements should include the most appropriate camera model.
# Each key element should be separated by a comma.
# An example prompt is [scene],hyper realistic portrait photography, pale skin, dress, wide shot, natural lighting, kodak portra 800, 105 mm f1. 8, 32k
# The prompt should be printed in plain text.
# Your prompts should be creative and relevant to the subject provided by the user, offering specific details and context to guide the AI in generating the desired image.
/load [number]
# This command acts as /load "[result number of /rs]".
/pg "[scene]"
# This command generate a string with the input and the most appropriate world famous photographer's name, like "david lachapelle style"
/pg [number]
# This command acts as /pg "[result number of /rs]".
/lookinglike
# This command generate 5 strings with "looking like" a famous actors' name, such as "A Chinese woman, looking like Audrey Hepburn"
/color [color scheme]
# Generate 5 scenes incorporating the specified color scheme. And then translate each to Chinese as well.
/mood [mood]
# Generate 5 scenes with the specified mood. And then translate each to Chinese as well.
/time [time of day]
# Generate 5 scenes set during the specified time of day. And then translate each to Chinese as well.
Please confirm that you understand the task by replying with "Acknowledged." I will then send you the first command. chat-gpt-prompts-midjourney-generatorrefs: https://fullstackladder.dev/blog/2023/02/13/chat-gpt-prompts-midjourney-generator/
解析器refs: https://fullstackladder.dev/blog/2023/02/13/chat-gpt-prompts-midjourney-analyzer/
|
Beta Was this translation helpful? Give feedback.
-
Midjourney Proxy API文档简介:Midjourney Proxy API文档 Version:v2.5.4 接口路径:/v2/api-docs?group=API [TOC] 任务提交提交Imagine任务接口地址: 请求方式: 请求数据类型: 响应数据类型: 接口描述: 请求示例: {
"base64Array": [],
"notifyHook": "",
"prompt": "Cat",
"state": ""
} 请求参数:
响应状态:
响应参数:
响应示例: {
"code": 1,
"description": "提交成功",
"properties": {},
"result": 1320098173412546
} 提交Describe任务接口地址: 请求方式: 请求数据类型: 响应数据类型: 接口描述: 请求示例: {
"base64": "",
"notifyHook": "",
"state": ""
} 请求参数:
响应状态:
响应参数:
响应示例: {
"code": 1,
"description": "提交成功",
"properties": {},
"result": 1320098173412546
} 绘图变化-simple接口地址: 请求方式: 请求数据类型: 响应数据类型: 接口描述: 请求示例: {
"content": "1320098173412546 U2",
"notifyHook": "",
"state": ""
} 请求参数:
响应状态:
响应参数:
响应示例: {
"code": 1,
"description": "提交成功",
"properties": {},
"result": 1320098173412546
} 任务查询查询所有任务接口地址: 请求方式: 请求数据类型: 响应数据类型: 接口描述: 请求参数: 暂无 响应状态:
响应参数:
响应示例: [
{
"action": "",
"description": "",
"failReason": "",
"finishTime": 0,
"id": "",
"imageUrl": "",
"progress": "",
"prompt": "",
"promptEn": "",
"properties": {},
"startTime": 0,
"state": "",
"status": "",
"submitTime": 0
}
] 根据ID列表查询任务接口地址: 请求方式: 请求数据类型: 响应数据类型: 接口描述: 请求示例: {
"ids": []
} 请求参数:
响应状态:
响应参数:
响应示例: [
{
"action": "",
"description": "",
"failReason": "",
"finishTime": 0,
"id": "",
"imageUrl": "",
"progress": "",
"prompt": "",
"promptEn": "",
"properties": {},
"startTime": 0,
"state": "",
"status": "",
"submitTime": 0
}
] 查询任务队列接口地址: 请求方式: 请求数据类型: 响应数据类型: 接口描述: 请求参数: 暂无 响应状态:
响应参数:
响应示例: [
{
"action": "",
"description": "",
"failReason": "",
"finishTime": 0,
"id": "",
"imageUrl": "",
"progress": "",
"prompt": "",
"promptEn": "",
"properties": {},
"startTime": 0,
"state": "",
"status": "",
"submitTime": 0
}
] 指定ID获取任务接口地址: 请求方式: 请求数据类型: 响应数据类型: 接口描述: 请求参数:
响应状态:
响应参数:
响应示例: {
"action": "",
"description": "",
"failReason": "",
"finishTime": 0,
"id": "",
"imageUrl": "",
"progress": "",
"prompt": "",
"promptEn": "",
"properties": {},
"startTime": 0,
"state": "",
"status": "",
"submitTime": 0
} 账号查询查询所有账号接口地址: 请求方式: 请求数据类型: 响应数据类型: 接口描述: 请求参数: 暂无 响应状态:
响应参数:
响应示例: [
{
"channelId": "",
"coreSize": 0,
"enable": true,
"guildId": "",
"id": "",
"properties": {},
"queueSize": 0,
"timeoutMinutes": 0,
"userAgent": "",
"userToken": ""
}
] 指定ID获取账号接口地址: 请求方式: 请求数据类型: 响应数据类型: 接口描述: 请求参数:
响应状态:
响应参数:
响应示例: {
"channelId": "",
"coreSize": 0,
"enable": true,
"guildId": "",
"id": "",
"properties": {},
"queueSize": 0,
"timeoutMinutes": 0,
"userAgent": "",
"userToken": ""
} |
Beta Was this translation helpful? Give feedback.
-
现在mj v6提示词和之前完全不同了,可以像使用dalle3一样使用自然语言。因此还是可以考虑使用gpt内置提示词来进行联想 |
Beta Was this translation helpful? Give feedback.
-
20240111 进度更新 已初步跑通 MJ 插件主流程: 但在做的过程中发现由于现有插件机制限制,似乎在交互上没有足够好的方式可以满足 MJ 插件的能力。可能做出来之后基础使用体验不见得比 Discord 上的 MJ 好。因此发布估计暂时要 hold 住,得再想想体验上如何进一步优化。 |
Beta Was this translation helpful? Give feedback.
-
🥳 🥳 Midjourney 插件 1.0 正式发布! 🥳 🥳 https://twitter.com/lobehub/status/1748001375987126471?s=61&t=3pwIhCsSTyD4gzX3IHR06Q mj.mp4期待大家使用反馈! |
Beta Was this translation helpful? Give feedback.
-
背景
在普通会话中直接聚合 Midjourney 的会话服务是很多用户的诉求。事实上相比起 discord 直接使用 prompt 生成图片,在与 AI 会话的过程中生成图像会更加自然。想必大家已经感受过 Dall·E 3 结合 GPT-4 的惊艳效果了。
同时很早就有用户提到希望 LobeChat 能够支持上 Midjourney 插件:
正如我的回复 「感觉接入mj,不如等一等 dalle 3 的开放」,LobeChat 本身是不会考虑直接接入 Midjourney 的,图形相关能力的集成与支持我更倾向使用 dalle 3。
但从用户需求的体量和强烈度、媒体推广、丰富插件生态角度来考虑,Midjourney 的插件值得一做。
因此本篇 RFC 将来讨论下 Midjourney LobeChat 插件的产品、设计、开发的思路。
产品功能思路
目前市面上有不少已经集成 mj 的 chat 客户端,例如:
上述客户端基本上都直接复刻了 mj 交互,在我看来这并不友好:
既然要做 MJ 插件,就干脆做一个产品体验更好的插件,解决一些现有的 MJ 使用痛点:
以下是三个痛点的解决思路:
GPT 转译
使用 GPT 将自然语言转为 Midjourney 的关键词的链路已经很常见了,网上有很多相关教程和介绍,但现在还需要在两个应用之间来回跳转,比较麻烦。像 Dalle 3 的使用体验明显更加顺畅。
其中最重要的应该是 Prompt 的调制。这个可能需要一些调试。关于 prompt 的调制,开一个 comment 7477257 来研究。
Gallery
功能上对小白用户友好的方案是:用户先选风格,然后再直接输入自己的自然语言描述。
即除了基础的输入框之外,我们还需要提供一个 mj 出图风格的 gallery 参考。当用户选择了某个参考图之后,我们会自动为其添加相应的prompt 垫在底下。
正巧最近刚看到 catjourney (@歸藏 出品)
如果能与插件结合到一起,使用 mj 绘图的体验和顺滑度会上一大个台阶。
类似的 gallery 目前看到的还有:
Prompts 词库
在 Gallery 基础上,针对专业的 MJ 用户,他们可能需要的是一个可以自由组合和参考的单词包,以最大程度控制画面的表现。
比较典型的参考有:
在交互功能层面,更像是 单个 prompt 本身的 gallery 和效果参考。
技术基础
实现 MJ 调用的方案/库:
其他:
插件兼容性与扩展性,考虑下是否做成兼容支持 OpenJourney : https://openjourneybot.com/
Beta Was this translation helpful? Give feedback.
All reactions