Build what's next on the AI Native Cloud. Full-stack AI platform for inference, fine-tuning, and GPU clusters — powered by cutting-edge research.
⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell → Introducing Together AI's new look → 🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference → ⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available → 📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models → 🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts → The full stack platform for production AI, powered by cutting-edge systems research. We design a full-stack AI platform powered by cutting edge system research — helping teams ship faster, scale reliably and achieve superior unit economics. Open and responsible development Everything works best when we help the open-source community work better together. Our wonder, curiosity, and hope drive us to find ways to make everyone’s lives better. We are optimizers, making the most with what we have and not taking more than what we need. We build everything with the purpose of benefiting society. Featured partners that help us scale Meet our leaders, researchers and engineers building the systems behind Together AI. Senior Director of People Ops SVP of Engineering Infrastructure VP OF Technical Program Management
Mentions (30d)
0
Reviews
0
Platforms
3
Sentiment
0%
0 positive
Industry
information technology & services
Employees
380
Funding Stage
Series B
Total Funding
$533.5M
Introducing Mamba-3 🐍 Inference speeds are more i
Introducing Mamba-3 🐍 Inference speeds are more important than ever, driven by the rise in agents and inference-heavy RL rollouts. Linear models are fast in FLOPs but memory-bound during decode. Mamba-3's MIMO (multi-input, multi-output) variant fixes this: swap the recurrence from vector outer-product to matrix multiply, and you get a stronger model at the same decode speed. Fastest prefill+decode at 1.5B. Beats Mamba-2, GDN, and Llama-3.2-1B. Kernels open-sourced. #mamba3 #togetherresearch Congratulations to the team leading this research: @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9 @tri_dao @_albertgu
View originalPricing found: $0.30, $0.06, $1.20, $0.50, $2.80
Is "live AI video generation" a meaningful technical category or just a marketing term? [R]
Asking from a technical standpoint because I feel like the term is doing a lot of work in coverage of this space right now. Genuine real-time video inference, where a model is generating or transforming frames continuously in response to a live input stream, is a fundamentally different problem from fast video generation. Different architecture, different latency constraints, different everything. But in most coverage and most vendor positioning they get lumped together under "live" or "real-time" and I'm not sure the field has converged on a shared definition. Is there a cleaner way to think about the taxonomy here? And which orgs do people think are actually doing the harder version of the problem? submitted by /u/Tall_Bumblebee1341 [link] [comments]
View originalSee where you can catch us next: https://t.co/6X11GmtPI0
See where you can catch us next: https://t.co/6X11GmtPI0
View originalThat’s a wrap on HumanX. Custom comics, hats, a happy hour with @getmetronome & @nvidia, and two sessions on what actually matters for AI-native builders. #HumanX #TogetherAI #AINativeCloud https:
That’s a wrap on HumanX. Custom comics, hats, a happy hour with @getmetronome & @nvidia, and two sessions on what actually matters for AI-native builders. #HumanX #TogetherAI #AINativeCloud https://t.co/lOTNY8bhDp
View originalThe bridge stops being a tool you invoke and becomes a system that has continuous situational awareness of your codebase — its history, its structure, its runtime state.
Most Claude integrations work on text. This one works on the living code editor. What it does that CLI/Desktop can't: Real-time diagnostics — the bridge gets a live push from the language server the moment an error appears. Claude reacts as it happens, not when you remember to ask. Authoritative code intelligence — "What calls this function?" goes to the actual TypeScript engine, not grep. Gets dynamic dispatch, generics, and re-exports grep would miss. Editor context awareness — knows which files are open and what text is selected. "Explain this" means this exact thing, not whatever you copied into chat. Inline annotations — draws highlights, underlines, and hover messages directly in your editor, like a linter. Claude can mark suspicious lines during a review, then clear them when done. True semantic refactoring — rename a symbol across 40 files via the language server's rename protocol. Understands scope, shadowing, and module boundaries. Find-and-replace would break things. This doesn't. Live debugging — set breakpoints, pause execution, evaluate expressions against actual memory. "What is the value of this object right now?" answered from the running process, not inferred from source. Autonomous event hooks — fire without being asked: on save, on commit, on test failure, on branch switch. CLI and Desktop only act when prompted. The bridge watches and responds on its own. The common thread across all of these: Each surface contributes something the others can't: CLI — runs autonomously, no UI needed, works in scripts and schedules Desktop/Dispatch — receives human intent in natural language from anywhere, even a phone Cowork — writes and tests code in isolation, never touching your working branch Bridge — has live awareness of types, errors, references, runtime state, and editor focus. The bridge stops being a tool you invoke and becomes a system that has continuous situational awareness of your codebase — its history, its structure, its runtime state, and your own habits None of them alone can close the loop. Together they form a system where human intent enters at one end, gets grounded in real codebase knowledge in the middle, and produces tested, committed, reviewed output at the other, with a human only needed at the decision points they actually want to own. I built claude-ide-bridge an open-source MCP bridge that gives Claude live access to your IDE's language server, debugger, and editor state. free and open source: github.com/Oolab-labs/claude-ide-bridge submitted by /u/wesh-k [link] [comments]
View originalGemma 4 31B brings dense multimodal reasoning to Together AI. Try Now: https://t.co/Xx1rbOe7m4
Gemma 4 31B brings dense multimodal reasoning to Together AI. Try Now: https://t.co/Xx1rbOe7m4
View originalHighlights: 👉 Configurable thinking mode for step-by-step reasoning 👉 Multimodal understanding with text and image input, including document parsing and OCR 👉 Native function calling with structure
Highlights: 👉 Configurable thinking mode for step-by-step reasoning 👉 Multimodal understanding with text and image input, including document parsing and OCR 👉 Native function calling with structured tool use for agent workflows 👉 Production-ready on the AI Native Cloud—99.9% SLA, 256K context, and support for 140+ languages
View originalIntroducing Gemma 4 31B from @GoogleDeepMind on Together AI. AI natives can now use Gemma 4 31B on Together and benefit from reliable inference for multimodal reasoning, tool use, and agentic workfl
Introducing Gemma 4 31B from @GoogleDeepMind on Together AI. AI natives can now use Gemma 4 31B on Together and benefit from reliable inference for multimodal reasoning, tool use, and agentic workflows. https://t.co/g9oyqiG56C
View originalGLM-5.1 gives teams a stronger model for coding, tool use, and sustained agent performance on Together AI. Learn more: https://t.co/GJlBvGVRWC
GLM-5.1 gives teams a stronger model for coding, tool use, and sustained agent performance on Together AI. Learn more: https://t.co/GJlBvGVRWC
View originalHighlights: 👉 28% coding improvement over GLM-5 with refined RL post-training 👉 Better long-horizon execution across hundreds of rounds and thousands of tool calls 👉 Thinking mode, tool calling, an
Highlights: 👉 28% coding improvement over GLM-5 with refined RL post-training 👉 Better long-horizon execution across hundreds of rounds and thousands of tool calls 👉 Thinking mode, tool calling, and structured JSON output for agent pipelines 👉 Production-ready on the AI Native Cloud—99.9% SLA, serverless and dedicated options
View originalIntroducing GLM-5.1 from @Zai_org on Together AI. AI natives can now use GLM-5.1 on Together and benefit from reliable inference for production-scale agentic engineering and long-horizon coding work
Introducing GLM-5.1 from @Zai_org on Together AI. AI natives can now use GLM-5.1 on Together and benefit from reliable inference for production-scale agentic engineering and long-horizon coding workflows. https://t.co/8vSsGhciqg
View originalhttps://t.co/OK7Qf267hX
https://t.co/OK7Qf267hX
View originalNew from Together Research: LLMs can fix query plans your database optimizer gets wrong. Up to 4.78x faster. Cost estimators fail when they miss semantic correlations: wrong join order, wrong access
New from Together Research: LLMs can fix query plans your database optimizer gets wrong. Up to 4.78x faster. Cost estimators fail when they miss semantic correlations: wrong join order, wrong access path, cascading errors. DBPlanBench feeds DataFusion's physical operator graph to an LLM, which patches the plan directly instead of regenerating it from scratch. On TPC-H / TPC-DS: → 4.78x peak speedup → 60.8% of queries improved >5% → Build memory: 3.3 GB → 411 MB Optimize on small-scale data, transfer to production.
View originalBlog: https://t.co/6GF8qCUeV4 Paper: https://t.co/oRMiQAzAts Code: https://t.co/Vffm57gMIV
Blog: https://t.co/6GF8qCUeV4 Paper: https://t.co/oRMiQAzAts Code: https://t.co/Vffm57gMIV
View originalHighlights: 👉 Text-to-video available now with 720P/1080P output, 2–15 second duration, and optional audio input 👉 More workflow control — continue scenes, steer outputs with references, and revise
Highlights: 👉 Text-to-video available now with 720P/1080P output, 2–15 second duration, and optional audio input 👉 More workflow control — continue scenes, steer outputs with references, and revise without restarting from scratch 👉 More of the suite coming soon — image-to-video, reference-to-video, and video edit 👉 Production-ready on the AI Native Cloud — 99.9% SLA, serverless inference, and enterprise deployment options
View originalIntroducing Wan 2.7 from @alibaba_cloud on Together AI. AI natives can now build with Wan 2.7 on Together AI and get a clearer path from first-generation video to continuation, reference-driven contr
Introducing Wan 2.7 from @alibaba_cloud on Together AI. AI natives can now build with Wan 2.7 on Together AI and get a clearer path from first-generation video to continuation, reference-driven control, and editing on one production platform. https://t.co/BXJPCaiyWM
View originalYes, Together Inference offers a free tier. Pricing found: $0.30, $0.06, $1.20, $0.50, $2.80
Based on user reviews and social mentions, the most common pain points are: API costs.
Based on 62 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.