Figure out how inference will work #7

0x000011b · 2023-01-23T23:55:40Z

I don't plan on shelling out money for inference at the moment, so the initial plan is to have users bring their own "inference back-end" with them - likely Colab for now. Some points about this though:

Who will be responsible for creating the prompt, sending it off to the inference backend and parsing the resulting generation?
- Initial plan is to implement that here, and the front-end will simply POST user messages to an endpoint and receive responses (Maybe via WebSockets? Not sure holding on to a connection for 10+ seconds is a good idea)
- Pros:
  - We'll have real-world data on inference requests, which we can use to calculate how much $ it would cost to actually run inference ourselves (I've gotten many users suggesting I open a Patreon to cover hosting expenses - I'm unsure how well that'd pan out but with data this decision could be made a little more clearly)
  - We can automatically push new prompting code by just updating the server
- Cons:
  - Increased server load, since we'll be acting as a proxy for inference requests.
How will inference work for group chats? How do we decide which characters should speak and when?

The text was updated successfully, but these errors were encountered:

TearGosling · 2023-01-26T00:05:42Z

Adding onto the last bullet point, the naive route would to just let the LLM handle the work on which characters should speak. Of course, this would cause the model to forget who is speaking very quickly. It may be worth looking into datasets with multiple speakers at once, so as to get the model more used to the idea. You could even have some sort of control/sentinel token which dictates whether the LLM should be in "one-on-one" mode or "group chat" mode, which would double as a testing ground for future modules.
If the LLM even after this still forgets to bring back characters, you could have a system where it keeps track of how many responses it's been since a certain character (not since a user, so that multi-user chats don't cause this system to fail) has spoken, and if it's above a particular number, "force" the character to speak in the next reply. If multiple bots have the same number, pick randomly from a list.

My biggest concern, though, would not be the bot forgetting characters, but introducing new ones out of nowhere. We may need some sort of system to assist in this regard. Perhaps it would be better at keeping track of who is and isn't in the conversation with RLHF? Unsure.

conanak99 · 2023-02-22T10:04:20Z

Can I suggest we use the same approach as TavernAI(https://github.com/TavernAI/TavernAI) for inference?

Users need to prepare their own back-end of Kobold AI and paste the API into a textbox. (They can use Collab or run their back-end locally)
Paphos will create the prompt, send it off to the Kobold API endpoints provided by the user, and parse the resulting generation.
The front end will simply POST user messages to an endpoint and receive responses. No WebSocket is needed, the front end can just make a pooling every 2-4 seconds to see if the interference has finished.

0x000011b added the planning Stuff we need to think about label Jan 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out how inference will work #7

Figure out how inference will work #7

0x000011b commented Jan 23, 2023

TearGosling commented Jan 26, 2023

conanak99 commented Feb 22, 2023 •

edited

Loading

Figure out how inference will work #7

Figure out how inference will work #7

Comments

0x000011b commented Jan 23, 2023

TearGosling commented Jan 26, 2023

conanak99 commented Feb 22, 2023 • edited Loading

conanak99 commented Feb 22, 2023 •

edited

Loading