Together AI offers transparent pricing for AI infrastructure, including token-based serverless inference and dedicated GPU instances. Rates vary by model and hardware, with discounts for reserved capacity. Additional services cover code sandboxes, interpreters, and model fine-tuning.
Highlights
Token-based serverless inference pricing varies by model, with cached output rates often lower.
Dedicated single-tenant GPU instances are available for hardware like H100 and HGX B200.
Reserved GPU capacity offers discounted hourly rates for commitments from 7 to 180+ days.
Code sandboxes and interpreters are priced per hour or session, with filesystem storage fees.
Supervised fine-tuning is priced per 1M tokens, with tiers for standard and specialized needs.