Fine-Tuning as a Service · GPU-backed · Hours, not weeks

Train a model
on your data.
In hours.

Fine-tune Llama 4, Qwen 3.5, DeepSeek-R1, and StarCoder2 on your documents. Get a deployable GGUF or a hosted API endpoint — no infrastructure required.

9
Base models supported
including Llama 4 Scout
Vast
Hugging Face reach
bring a model ID and we check the path
2h
Fastest job time
small text model on a focused dataset
Process

Three steps from data to model

01

Upload your data

Use documents already in your CaveauAI corpus, upload new files (PDF, DOCX, TXT), or paste Q&A pairs in ShareGPT JSONL format. Enable Auto Q&A Generation and we extract pairs for you.

02

Pick your model & hardware

Choose a proven model card or paste a Hugging Face model ID. We map it to the right GPU class, VRAM, tuning method, and license path before the job starts.

03

Deploy your model

Download the GGUF file when complete. Optionally deploy to our hosted LiteLLM API with one click — your custom endpoint, ready in minutes.

Catalog

Choose from the model room

All models fine-tuned with QLoRA on Unsloth — the fastest open-source training stack. Output: GGUF Q5_K_M for Ollama.

Qwen logo

Qwen2.5 7B / 14B Instruct

Alibaba Cloud / Qwen Team · China

Source
Parameters7.61B or 14.7B dense
ActiveAll parameters active
Best atText, multilingual chat, structured output, light coding
MediaText in / text out

Hardware: 7B: 12-24GB VRAM. 14B: 24GB+ VRAM.

Processing: ~2-5h

Apache 2.0 Model card
DeepSeek logo

DeepSeek-R1 Distill

DeepSeek AI · China

Source
Parameters8B / 14B distill classes
ActiveAll parameters active
Best atReasoning, math, careful answers
MediaText in / text out

Hardware: 8B: 24GB VRAM recommended. 14B: 24GB+ VRAM.

Processing: ~2-6h

MIT Model card
Meta logo

Llama 4 Scout / Maverick

Meta · United States

Source
Parameters109B or 400B total MoE
Active17B active per token
Best atLong-context assistants, image understanding, multilingual work
MediaText + image in / text + code out

Hardware: Large GPU planning; often quantized or multi-GPU routes.

Processing: Scoped after access review

BigCode logo

StarCoder2 15B

BigCode / Hugging Face collaboration · International

Source
Parameters15B dense
ActiveAll parameters active
Best atCode completion, repository patterns, developer tools
MediaCode/text in / code/text out

Hardware: 24GB+ VRAM recommended; quantized routes available.

Processing: ~4-6h

BigCode OpenRAIL-M Model card
Google logo

Gemma / Gemini routes

Google DeepMind · US / UK

Source
ParametersGemma open weights; Gemini API models
ActiveDepends on selected model
Best atGemma for open tuning; Gemini for multimodal API workflows
MediaGemini supports text, image, video, audio inputs via API

Hardware: Gemma uses local GPU. Gemini uses API, no local VRAM.

Processing: Scoped by route

HF logo

Custom Hugging Face model

You choose · Global

Source
Parameters1B adapters to 70B+ models
ActiveDense or MoE
Best atSpecialized domains, languages, codebases, embeddings, vision-language
MediaDepends on the model card

Hardware: Estimated from parameter count, precision, context length, and tuning method.

Processing: Quoted before job submission

Model-specific Model card

Not sure which model fits?

Use the selector to estimate model family, license path, processor class, VRAM, and processing time before you order.

Open model selector

📝 Text General For Q&A, summarisation, support, research

Qwen 3.5 7B
RTX 3060 · ~2–3h
Apache 2.0 50 cr
DeepSeek-R1 8B
RTX 4090 · ~2–4h
MIT 50 cr
Llama 4 Scout
RTX 4090 · ~3–5h
Llama Comm. 100 cr
Qwen 3.5 14B
RTX 4090 · ~3–5h
Apache 2.0 150 cr
DeepSeek-R1 14B
RTX 4090 · ~4–6h
MIT 150 cr
Llama 4 Maverick
RTX 5090 · ~5–8h
Llama Comm. 300 cr

💻 Code Specialist For code review, completion, and dev tools

Qwen 3.5 Coder 7B
RTX 4090 · ~2–4h
Apache 2.0 75 cr
DeepSeek-Coder-V2 Lite
RTX 4090 · ~4–6h
DeepSeek 150 cr
StarCoder2 15B
RTX 4090 · ~4–6h
BigCode RAIL 125 cr
Available Add-ons
Auto Q&A Generation
Extract training pairs from your docs
+20 cr
Priority Queue
Start within 15 minutes
+25 cr
30-Day API Hosting
Hosted LiteLLM endpoint
+50 cr
Monthly Auto-Refresh
Retrain as data grows
+60 cr/mo
Credits

Simple credit-based pricing

Buy once, use anytime. Credits never expire.

Starter
50cr
$29
$0.58/cr
Good for 1 quick job
Buy Credits →
BEST VALUE
Professional
200cr
$79
$0.40/cr
3–4 standard jobs
Buy Credits →
Studio
600cr
$199
$0.33/cr
4+ professional jobs
Buy Credits →
Enterprise
1,800cr
$499
$0.28/cr
Team usage, best rate
Buy Credits →

Credits are non-refundable after use. Jobs deduct credits on submission. Cancelled pending jobs refunded in full.

FAQ

Common questions

What models can I fine-tune?

Currently: Qwen 3.5 (7B, 14B), DeepSeek-R1 (8B, 14B), Llama 4 Scout & Maverick, Qwen 3.5 Coder 7B, DeepSeek-Coder-V2 Lite 16B, StarCoder2 15B, and Qwen 3.5 72B (Enterprise). More models are added regularly. You can also enter a custom HuggingFace model ID if it's a text-generation model.

What do I get when the job completes?

A GGUF Q5_K_M file you can download and load into Ollama, LM Studio, or any GGUF-compatible runtime. Optionally, for 50 credits, we deploy your model to a hosted LiteLLM API endpoint, compatible with the OpenAI API format and active for 30 days.

How long does fine-tuning take?

Quick jobs (7–8B models, up to 1,000 Q&A pairs) finish in 2–4 hours on RTX 4090. Standard jobs (14B models, 5,000 pairs) take 4–6 hours. Enterprise jobs (72B on RTX 6000 96GB) run 8–16 hours. You'll receive an email notification when your model is ready.

What training data format do you accept?

You can use documents already in your CaveauAI corpus (we extract Q&A pairs automatically with the Auto Q&A Generation add-on), upload new PDF/DOCX/TXT files, or provide a ShareGPT JSONL file directly. Minimum 10 Q&A pairs required; 500–5,000 is the sweet spot for most use cases.

Do I need a HuggingFace account?

For most models, no. A few models (Llama 4) are gated on HuggingFace and require you to accept Meta's license and add your HF token in your account settings. We store it securely in our vault and use it only for that model download.

Ready to train your model?

Start with a Starter pack for $29 — 50 credits is enough for a full 7B text model fine-tune on your data.

Open Model Studio