Qwen2.5 7B / 14B Instruct
Alibaba Cloud / Qwen Team · China
Hardware: 7B: 12-24GB VRAM. 14B: 24GB+ VRAM.
Processing: ~2-5h
Fine-tune Llama 4, Qwen 3.5, DeepSeek-R1, and StarCoder2 on your documents. Get a deployable GGUF or a hosted API endpoint — no infrastructure required.
Use documents already in your CaveauAI corpus, upload new files (PDF, DOCX, TXT), or paste Q&A pairs in ShareGPT JSONL format. Enable Auto Q&A Generation and we extract pairs for you.
Choose a proven model card or paste a Hugging Face model ID. We map it to the right GPU class, VRAM, tuning method, and license path before the job starts.
Download the GGUF file when complete. Optionally deploy to our hosted LiteLLM API with one click — your custom endpoint, ready in minutes.
All models fine-tuned with QLoRA on Unsloth — the fastest open-source training stack. Output: GGUF Q5_K_M for Ollama.
Alibaba Cloud / Qwen Team · China
Hardware: 7B: 12-24GB VRAM. 14B: 24GB+ VRAM.
Processing: ~2-5h
DeepSeek AI · China
Hardware: 8B: 24GB VRAM recommended. 14B: 24GB+ VRAM.
Processing: ~2-6h
Meta · United States
Hardware: Large GPU planning; often quantized or multi-GPU routes.
Processing: Scoped after access review
BigCode / Hugging Face collaboration · International
Hardware: 24GB+ VRAM recommended; quantized routes available.
Processing: ~4-6h
Google DeepMind · US / UK
Hardware: Gemma uses local GPU. Gemini uses API, no local VRAM.
Processing: Scoped by route
You choose · Global
Hardware: Estimated from parameter count, precision, context length, and tuning method.
Processing: Quoted before job submission
Use the selector to estimate model family, license path, processor class, VRAM, and processing time before you order.
Buy once, use anytime. Credits never expire.
Credits are non-refundable after use. Jobs deduct credits on submission. Cancelled pending jobs refunded in full.
Currently: Qwen 3.5 (7B, 14B), DeepSeek-R1 (8B, 14B), Llama 4 Scout & Maverick, Qwen 3.5 Coder 7B, DeepSeek-Coder-V2 Lite 16B, StarCoder2 15B, and Qwen 3.5 72B (Enterprise). More models are added regularly. You can also enter a custom HuggingFace model ID if it's a text-generation model.
A GGUF Q5_K_M file you can download and load into Ollama, LM Studio, or any GGUF-compatible runtime. Optionally, for 50 credits, we deploy your model to a hosted LiteLLM API endpoint, compatible with the OpenAI API format and active for 30 days.
Quick jobs (7–8B models, up to 1,000 Q&A pairs) finish in 2–4 hours on RTX 4090. Standard jobs (14B models, 5,000 pairs) take 4–6 hours. Enterprise jobs (72B on RTX 6000 96GB) run 8–16 hours. You'll receive an email notification when your model is ready.
You can use documents already in your CaveauAI corpus (we extract Q&A pairs automatically with the Auto Q&A Generation add-on), upload new PDF/DOCX/TXT files, or provide a ShareGPT JSONL file directly. Minimum 10 Q&A pairs required; 500–5,000 is the sweet spot for most use cases.
For most models, no. A few models (Llama 4) are gated on HuggingFace and require you to accept Meta's license and add your HF token in your account settings. We store it securely in our vault and use it only for that model download.
Start with a Starter pack for $29 — 50 credits is enough for a full 7B text model fine-tune on your data.
Open Model Studio