Inferless
Popular repositories Loading
-
triton-co-pilot
triton-co-pilot PublicGenerate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments
-
qwq-32b-preview
qwq-32b-preview Public templateA 32B experimental reasoning model for advanced text generation and robust instruction following. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
-
whisper-large-v3
whisper-large-v3 Public templateState‑of‑the‑art speech recognition model for English, delivering transcription accuracy across diverse audio scenarios. <metadata> gpu: T4 | collections: ["CTranslate2"] </metadata>
-
deepseek-r1-distill-qwen-32b
deepseek-r1-distill-qwen-32b Public templateA distilled DeepSeek-R1 variant built on Qwen2.5-32B, fine-tuned with curated data for enhanced performance and efficiency. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
Repositories
- qwen3-embedding-0.6b Public template
600M parameter, 100 language embedding model that turns up to 32k token inputs into instruction-aware vectors. <metadata> gpu: A10 | collections: ["HF_Transformers"] </metadata>
inferless/qwen3-embedding-0.6b’s past year of commit activity - devstral-small Public template
An agentic LLM for software engineering tasks, excels at using tools to explore codebases, editing multiple files and power software engineering agents. <metadata> gpu: A100 | collections: ["HF_Transformers"] </metadata>
inferless/devstral-small’s past year of commit activity - deepseek-r1-qwen3-8b Public template
A distilled 8B parameter reasoning powerhouse, leveraging deep chain‑of‑thought from the DeepSeek R1‑0528—delivering SOTA open‑source performance. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
inferless/deepseek-r1-qwen3-8b’s past year of commit activity - nanonets-ocr-s Public template
Nanonets-OCR-s that turns images or PDFs into structured Markdown capturing tables, LaTeX, captions and tags—for fast, powerful, human-readable OCR. <metadata> gpu: A10 | collections: ["HF_Transformers"] </metadata>
inferless/nanonets-ocr-s’s past year of commit activity - Open-NotebookLM Public
inferless/Open-NotebookLM’s past year of commit activity - yolo11m-detect Public
inferless/yolo11m-detect’s past year of commit activity - kokoro Public template
82M parameters lightweight text-to-speech (TTS) model that delivers high-quality voice synthesis. <metadata> gpu: T4 | collections: ["SSE Events"] </metadata>
inferless/kokoro’s past year of commit activity - qwen3-14b Public template
14B model with hybrid approach to problem-solving with two distinct modes: "thinking mode," which enables step-by-step reasoning and "non-thinking mode," designed for rapid, general-purpose responses. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
inferless/qwen3-14b’s past year of commit activity - qwen2.5-omni-7b Public template
An advanced end-to-end multimodal which can processes text, images, audio, and video inputs, generating real-time text and natural speech responses. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
inferless/qwen2.5-omni-7b’s past year of commit activity - qwen3-8b Public template
Qwen3-8B is a language model that supports seamless switching between “thinking” mode-for advanced math, coding, and logical inference-and “non-thinking” mode for fast, natural conversation. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
inferless/qwen3-8b’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…