running as a service #131

lfoppiano · 2024-12-05T12:54:05Z

lfoppiano
Dec 5, 2024

Hi, first of all, than you for this tool, it's a very useful and interesting approach for running models on low resources.

I was wondering whether you have any plans to add a way to run it as a service, where the whole model is not loaded every time a new prompt is provided. Something like llama-server?

I did try to run a model quantized for BitNET with llama-server but it seems they are not compatible. Do you have any comment or suggestions?

Thank you in advance
Luca

celsowm · 2025-04-16T10:57:56Z

celsowm
Apr 16, 2025

Any news about it?

0 replies

caramdache · 2025-04-23T19:46:41Z

caramdache
Apr 23, 2025

Look here: #206 (comment)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

running as a service #131

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

running as a service #131

Uh oh!

Uh oh!

lfoppiano Dec 5, 2024

Replies: 2 comments

Uh oh!

celsowm Apr 16, 2025

Uh oh!

caramdache Apr 23, 2025

lfoppiano
Dec 5, 2024

celsowm
Apr 16, 2025

caramdache
Apr 23, 2025