We create the world’s fastest supercomputer and largest gaming platform.
Users generally praise NVIDIA for its impressive performance, particularly with AI and robotics applications, as highlighted by the excitement around projects using NVIDIA technology like the Jetson Orin Nano. However, there are concerns regarding the reliance on certain technologies like DLSS, which can sometimes produce misleading visual data. Users view the pricing of NVIDIA products as high but often justified by their cutting-edge capabilities. Overall, NVIDIA enjoys a strong reputation for innovation and technological leadership in the GPU and AI spaces.
Mentions (30d)
30
4 this week
Avg Rating
4.5
14 reviews
Platforms
2
Sentiment
15%
19 positive
Users generally praise NVIDIA for its impressive performance, particularly with AI and robotics applications, as highlighted by the excitement around projects using NVIDIA technology like the Jetson Orin Nano. However, there are concerns regarding the reliance on certain technologies like DLSS, which can sometimes produce misleading visual data. Users view the pricing of NVIDIA products as high but often justified by their cutting-edge capabilities. Overall, NVIDIA enjoys a strong reputation for innovation and technological leadership in the GPU and AI spaces.
Features
Use Cases
Industry
computer hardware
Employees
36,000
20
npm packages
40
HuggingFace models
g2
What do you like best about Nvidia AI Enterprise?NVIDIA AI Enterprise is a robust end-to-end software suite designed to help organizations as well as individual to accelerate their use of AI adoption with enterprise grade security and scalability . A key strength of this is its versatility,it supports a wide range of use cases, from NLP and computer vision to gen AI.It accelerates both AI development and deployment and its ease of use and implementation. Seamless integration with VMware and cloud-native environments. Review collected by and hosted on G2.com.What do you dislike about Nvidia AI Enterprise?Requires investment in NVIDIA-certified infrastructure for maximum efficiency. Steep learning curve for teams entirely new to AI workflows. Review collected by and hosted on G2.com.
What do you like best about Nvidia AI Enterprise?Nvidia AI Enterprise enables us to communicate with our environment using AI. It allows us to do the whole work in ease. Review collected by and hosted on G2.com.What do you dislike about Nvidia AI Enterprise?As i have used Nvidia AI Enterprise, till now i have not found any thing that i can dislike. By using such AI tool, it allows me to interact with new world. Review collected by and hosted on G2.com.
What do you like best about Nvidia AI Enterprise?It's like having a full toolbox for AI development, with everything you need from data preparation to model deployment. Plus, the performance boost you get from NVIDIA GPUs is fantastic! It's like having a turbocharger for your AI projects. Review collected by and hosted on G2.com.What do you dislike about Nvidia AI Enterprise?It's a comprehensive platform with a lot of features, but that also means it comes with a higher price tag. Additionally, while it's designed to be user-friendly, it might still have a learning curve for those who are new to AI or deep learning. So, while I appreciate its power and features, the cost and potential learning curve might be factors to consider for some users. Review collected by and hosted on G2.com.
What do you like best about Nvidia AI Enterprise?Nvidia AI Enterprise is a easy to use, more accurate and time saving Ai tools. Review collected by and hosted on G2.com.What do you dislike about Nvidia AI Enterprise?Nvidia AI Enterprice - pricing s a little bit higher. Review collected by and hosted on G2.com.
What do you like best about Nvidia AI Enterprise?The graphics uses for creation of new enterprise and moving the slides .Itt is really smooth and understand your requirement Review collected by and hosted on G2.com.What do you dislike about Nvidia AI Enterprise?The customer support and services needs more enhance as reaching to get some help on their services is tough Review collected by and hosted on G2.com.
What do you like best about Nvidia AI Enterprise?It was well crafted to harness the data based on the inputs we provide to get the desired outcome. Review collected by and hosted on G2.com.What do you dislike about Nvidia AI Enterprise?NVidia is all set with all the relevant features, nothing to improve much as such Review collected by and hosted on G2.com.
What do you like best about Nvidia AI Enterprise?Optimized Performance: Leverages NVIDIA GPUs for faster AI training and inference. Comprehensive Toolset: Includes essential tools, libraries, and pre-trained models. Enterprise Support: Offers technical support and regular updates. Scalability: Flexible deployment across various environments. Framework Integration: Compatible with popular AI frameworks. Review collected by and hosted on G2.com.What do you dislike about Nvidia AI Enterprise?High Cost: Expensive hardware and licensing fees. Complexity: Requires specialized knowledge and can have a steep learning curve. Review collected by and hosted on G2.com.
What do you like best about Nvidia AI Enterprise?It very helpfull for the prepare data and clean it for the training, performance improvement. Review collected by and hosted on G2.com.What do you dislike about Nvidia AI Enterprise?There is some high pricing, setting up and manage platform some complexity. Review collected by and hosted on G2.com.
What do you like best about Nvidia AI Enterprise?What stands out most about NVIDIA AI Enterprise is Optimized GPU Performance, Comprehensive AI Tools, Enterprise-Grade Support, Seamless Integration with Existing IT Infrastructure Review collected by and hosted on G2.com.What do you dislike about Nvidia AI Enterprise?Some potential downsides of NVIDIA AI Enterprise includes High Cost: The licensing and hardware requirements can be expensive, which might be a barrier for smaller businesses, Complexity: Setting up and managing the platform can be complex, especially for teams without in-depth AI or IT expertise., Hardware Dependence: The platform is heavily optimized for NVIDIA GPUs, which can limit flexibility if you want to use other hardware, Learning Curve: While it offers many powerful tools, the extensive feature set can have a steep learning curve for new users. Review collected by and hosted on G2.com.
What do you like best about Nvidia AI Enterprise?I am using nvidia gpu rtx 3070 and I can use it easily as main stream server because it is certified server from nvidia and most improtantly, they are sharing a public cloud server through Google cloud. so it is very helpful and their support would be available though these channels. It's implementation is very handy through gpu server and really handy to use it daily whenever required. There is no limitation to use it on daily basis that is plus. Thier ai model has a lot of ai features to I can use from it, I word on multiple idea through their ai. Integaration is very easy, I already have a gpu so I require no much efforts. Review collected by and hosted on G2.com.What do you dislike about Nvidia AI Enterprise?If you don't have a nvidia gpu or dpu, then you need some extrea online available resourses to configure it and use it, the hardware with powerful resourse is must. Review collected by and hosted on G2.com.
Is AI Worth the Cost? The ROI Reckoning and the Coming Market Correction
Prof G Markets (Live) Episode Title: Is AI Worth the Cost? The ROI Reckoning and the Coming Market Correction Location: The Castro Theatre, San Francisco, CA Hosts: Scott Galloway & Ed Nelson ED: We're going to talk about a topic not enough people talk about called AI. Nearly 50,000 workers have been laid off this year supposedly because of AI — that's almost as many as in all of 2025. For companies adopting AI, the thesis is simple: AI is supposed to do much of the work that humans do. In recent weeks, however, that thesis has hit a roadblock. More and more companies are reporting that despite the enormous power of AI, the technology is actually more expensive than the humans it is supposed to replace. Uber, for example, just blew through its entire 2026 AI budget in just four months. According to the COO, it is now getting harder to justify AI costs within the company. Microsoft is cancelling its Claude Code licenses across multiple divisions because it's simply gotten too expensive. And over at Nvidia, one executive said that the cost of compute is now "far beyond the cost of employees." Which all raises a crucial question for the AI industry: at what point does AI actually stop being worth it? This has blown up basically in the last 48 hours, with many companies coming out and saying they're not as confident about this whole AI thing as they used to be. ServiceNow is another company that just blew through their entire Anthropic budget. Technical staff at Stripe are reportedly spending nearly $100,000 on AI tokens every day. Salesforce is on track to spend $300 million on Anthropic tokens this year. Shopify said their earnings were "partially offset by increased LLM costs." We heard similar things from Meta, Spotify, and Pinterest. One Anthropic employee said his Claude Code bill came out to $150,000 in a single month. In some cases, it's getting very, very expensive. We've also seen an incentive — especially among tech companies — to use AI as much as possible. There was this idea that employees would engage in what we call "token maxing," where you use as many tokens as possible from your AI API. Companies like Meta and Amazon have even created internal leaderboards tracking how many AI tokens employees are using. The people using the most tokens are seen as the most AI-forward, the most AI-deployed — the ones who are going to get recognized, maybe even promoted. And this has resulted in extraordinary costs on the AI front. Now we're starting to see the next phase of this, Scott, where companies and their executives are beginning to realize: this is a little expensive. So the question becomes — at what point will AI actually pay off? I'll pose that question to you: at what point is it too much? SCOTT: I think we're already seeing hints of it, and I think it comes down to incentives. You were talking about how companies are trying to incentivize people to use AI more — and that's kind of an interesting part of the ecosystem right now. The adoption layer is trying to get people to use it, and companies have put in place the incentives to do that. But there was a recent survey by a professor at MIT who found that about 5% of the projects people are using tokens for can actually be connected by CFOs to some sort of return. So while I think they're really intoxicated by it — and talking about AI as much as you can in your earnings call is like adding "dot-com" back in the '90s — I think you're already starting to see some fatigue. And I think the AI companies are trying to get public as quickly as possible to raise that cheap capital before things start to — I don't want to say unwind, but... You can see how the string gets pulled here. A large company, a CEO who has a lot of credibility in the industry, just comes out and says: "We're dramatically scaling back our AI investment. Let's be honest, folks — we're just not seeing the return we'd initially hoped." And then Nvidia reports its first miss. Nvidia has beaten its estimates 15 quarters in a row. Nvidia's first miss probably takes the entire market down five or ten percent. You are seeing some productivity gains from this and quite frankly, they look as dramatic, if not more dramatic, than the internet. But look what happened in 2000. This definitely does feel like '99. And I'm waiting for the first CEO to come out and say we have to get procurement involved and dramatically scale back our expenses. I don't think it's that romantic, honestly. I think it's just going to be a traditional Fortune 500 company that starts the narrative: okay, this has been fun, but we have to dramatically decrease our AI investment because we're not seeing the ROI we'd anticipated. ED: Yeah. I mean, we heard a quote this week from the CEO of Match Group — not a huge company — but he said AI is costing them $5 to $10 million a year, and his exact words were: "I think we're benefiting from it, but it's hard to feel." So that's not great if we're supposed
View originalCross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA [R]
New preprint. A Mixture-of-Experts inference kernel (TritonMoE) written entirely in OpenAI Triton, targeting portability across NVIDIA and AMD without vendor-specific code. Highlights: A fused gate+up GEMM computes both SwiGLU projections from shared tile loads, eliminating 35% of global memory traffic. 89-131% of Megablocks throughput at inference batch sizes (up to 512 tokens) on A100; the same kernel runs on MI300X unchanged. Limitations: falls behind at 2048+ tokens, and degrades with 64+ experts under extreme routing skew. Paper: https://arxiv.org/abs/2605.23911 Code: https://github.com/bassrehab/triton-kernels Writeup with benchmarks: https://subhadipmitra.com/blog/2026/fused-moe-dispatch-triton/ submitted by /u/bassrehab [link] [comments]
View originalAI-generated CUDA kernels silently break training and inference [R]
Last month NVIDIA released SOL-ExecBench, a new benchmark of 235 production CUDA kernels lifted from DeepSeek, Qwen, Gemma, and Kimi. We took several top-ranked AI-generated submissions and tried using them in production workloads. Many of them broke, sometimes in surprising ways. One of those kernels is the fused embedding-gradient + RMSNorm backward pass, which runs at the end of every transformer training step. We took the fastest submission on the benchmark for it, and dropped it into the training loop of a small transformer. The kernel had passed the benchmark's verifier with room to spare. But in our training run, the loss diverged and never recovered. We started debugging. Replace the dataset distribution with uniformly sampled tokens, the divergence vanishes. Swap SGD for AdamW, also vanishes. This is the worst kind of bug for research. Symptoms and masks both look exactly like "the idea didn't work". It's the type of bug that can make researchers spend a long time debugging without knowing what's at fault: the dataset? the research idea? the architecture? or the implementation itself? Turns out, the actual bug is that the embedding-gradient half of the kernel accumulates in bf16 instead of fp32. Embedding backward sums many small gradient contributions into each token's row of the embedding matrix. With uniform random tokens the contributions spread evenly and bf16 precision is enough. In real text, a handful of token IDs end up with thousands of contributions: the small ones round to zero against the growing accumulator, and the high-frequency rows drift. AdamW's per-parameter normalization absorbs the resulting multiplicative bias, so under AdamW the same drift is invisible in the loss. The other broken submissions had different bug shapes (all interesting). More examples in our blogpost. submitted by /u/laginimaineb [link] [comments]
View originalOpenAI and ElevenLabs are adopting Google's SynthID watermarking
submitted by /u/Adi4x4 [link] [comments]
View original[P] Built a portable GPU ISA after reading too many architecture manuals [P]
I’ve been reading GPU architecture docs in my free time. NVIDIA PTX, AMD ISA reference guides, Intel Xe, reverse-engineered Apple GPU stuff. Over 5,000 pages across 16 microarchitectures. After a while you notice all four vendors are doing the same 11 things with different names. So I wrote a spec that covers all of them and built a toolchain around it. It’s called WAVE. You write a kernel once, it compiles to a portable binary, then thin backends translate it to Metal, PTX, HIP, or SYCL. Same binary verified on Apple M4 Pro, NVIDIA T4, and AMD MI300X. My co-author Onyinye built PyTorch integration and got identical training results across all backends. Please star on GitHub: https://github.com/Oabraham1/wave Preprint: https://arxiv.org/abs/2603.28793 Read full docs and how I built everything: https://wave.ojima.me pip install wave-gpu submitted by /u/not-your-typical-cs [link] [comments]
View originalCerebras Chip Sets Appear to be Optimized for LLM Use Cases
One distinction I think is getting lost in the Cerebras hype cycle is that Cerebras is primarily an LLM / generative AI infrastructure story, not a universal “all AI” chip story. That is not necessarily a criticism of Cerebras. Their wafer-scale approach is genuinely interesting, and for large model training and inference the design is compelling. Cerebras’ own public inference materials discuss applications mostly centered on open LLMs such as Llama, Qwen, GLM, and GPT-OSS. The inference metrics are expressed in tokens per second, which is fundamentally a language-model / generative inference framing rather than a robotics or industrial-control framing. What Kind of AI Compute? But “AI compute” is not one undifferentiated market. LLM inference is one class of AI compute. Robotics, autonomous vehicles, drones, industrial controls, real-time vision, embedded perception, video pipelines, and sensor-fusion systems are very different classes of AI compute. Thus, it appears from Cerebras’ own materials that their chip sets are not optimized for what comes after LLMs, such as JEPA-style World Models or other post-transformer architectures. Those systems are not merely asking, “How fast can I generate tokens?” They often care about power envelope, edge deployment, ruggedization, latency determinism, camera/radar/lidar integration, feedback loops, safety certification, and real-time physical control. Cerebras’ own CS-3 messaging, by contrast, frames the system around accelerating “the latest large AI models,” and the testing data is from the likes of Llama 2, Falcon 40B, MPT-30B, and multimodal models, again measured through tokens/second style throughput. The Chip Hierarchy This is also where the hardware distinction matters. Specialized ASICs are usually the narrowest bet: if the workload matches the chip, they can be extremely efficient, but that efficiency comes from specialization. Cerebras appears broader than a narrow single-use ASIC, but still much more concentrated around datacenter large-model training and inference. NVIDIA GPUs, by contrast, are less specialized but much more broadly useful across AI workloads, including LLMs, vision, robotics, simulation, autonomous systems, edge AI, and industrial applications. So the question is not merely whether Cerebras is “better” or “worse” than NVIDIA. The question is what part of the AI hardware market we are talking about? Challenge NVIDA? This is why I think people should be careful when saying Cerebras is going to “challenge Nvidia” without specifying the battlefield. Challenge Nvidia in what? High-speed LLM inference? Large model training? Datacenter generative AI workloads? That is a much more plausible and specific claim. Cerebras has even published and promoted work specifically on training large language models, and independent benchmarking literature also evaluates Cerebras WSE in terms of LLM training and inference performance. The Distinction that's Necessary The point is not that Cerebras is overhyped. The point is that it is important in a specific part of AI and that distinction should be made clear. Cerebras may become a very serious player in LLM infrastructure, especially if the market continues to reward faster and cheaper LLM inference. But that does not mean it is positioned the same way across non-LLM AI. The current hype cycle tends to conflate "LLMs" and general “AI” compute together and that makes the hardware discussion less useful and clear. So ultimately, an investment in Cerebras looks more like a bet on current LLM infrastructure than a broad bet on the future form of AI. It may be a good bet, but people should understand what kind of bet it is. submitted by /u/RazzmatazzAccurate82 [link] [comments]
View originalAre we nearly there?
Implying tech companies besides Anthropic, Google, and Nvidia have any money left over by 2027 after they all ran through cash on hand for tokens. submitted by /u/irelatetolevin [link] [comments]
View originalAre we nearly there?
Implying tech companies besides Anthropic, Google, and Nvidia have any money left over by 2027 after they all ran through cash on hand for tokens. I feel like there are reasonable people, like the guy behind the "ijustvibecodedthis" newsletter who are realistic and help you ACTUALLY become a better dev with ai but then there people like dario who lie out of their mouths submitted by /u/irelatetolevin [link] [comments]
View originalIf you use NVIDIA Isaac Sim for reinforcement learning, do you use Isaac Lab with it? Just want to get a sense of what the status quo is. [D]
The reason for this query is that I am in the process of shifting to Isaac Sim / Isaac Lab since that is what seems to be in use nowadays. However, Isaac Lab is proving to be somewhat difficult to handle. While it handles the logging, and the creation of multi-actor systems for algorithms like PPO beautifully (with, say, hundreds of actors), its documentation leaves much to be desired. I am also concerned about the ease of setting up new robotic environments, actions, rewards, policies and possibly even custom algorithms. So, what is it that you do at your lab? In my mind there's a trade-off. On the one hand, I use the Isaac Lab scaffolding but run into its idiosyncracies very frequently until I document everything I need. Or, I interface directly with Isaac Sim, but then I need to write my own handlers for interfacing Isaac Sim with the RL agent. submitted by /u/StayingUp4AFeeling [link] [comments]
View originalMemory
Your explanation is largely correct. The reason “memory” has become the dominant systems problem for LLMs is that modern transformers are increasingly memory-bandwidth bound, not compute-bound. The key shift is this: Training large models was mostly about FLOPs. Serving large models at scale is increasingly about moving KV cache data around fast enough. A single token generation step only performs a relatively modest amount of math compared to the amount of KV data that must be fetched from memory every step. Why this happens During inference, every new token attends to all prior tokens. So for token t, the model needs access to all prior K/V tensors: \text{KV Cache Size} \propto 2 \times L \times S \times H \times d Where: L = layers S = sequence length H = attention heads d = head dimension The killer is the S term. As context grows: 8K → manageable 128K → huge 1M → infrastructure problem A 70B model with long context can require hundreds of GBs of KV cache across concurrent users. Why bandwidth matters more than raw compute Modern GPUs like the NVIDIA H100 or NVIDIA Blackwell can perform enormous amounts of compute. But every generated token requires: Loading KV cache from memory Running attention Writing updated KV back That means inference speed often depends more on: HBM bandwidth memory locality cache management than tensor core throughput. This is why: HBM3E NVLink unified memory memory compression have become strategic bottlenecks. Why the KV cache can exceed model weights Model weights are static. KV cache is dynamic and scales with: users context length output length batch size Example intuition: 70B model weights might occupy ~140 GB FP16 But serving thousands of users with long contexts can require multiple TBs of KV cache So operators increasingly optimize: cache reuse eviction paging quantization instead of just model size. Why vLLM and PagedAttention mattered so much Before systems like vLLM, memory fragmentation was catastrophic. PagedAttention essentially borrowed ideas from operating systems: divide KV into pages allocate dynamically avoid contiguous memory assumptions That dramatically improved: utilization batching throughput This was one of the biggest inference infrastructure breakthroughs of the last few years because it improved economics without changing the model itself. The deeper issue: transformers scale poorly with context Standard attention fundamentally has a retrieval problem: Each token potentially references every prior token. Even though compute optimizations exist, the architecture still requires huge memory movement. That’s why researchers are exploring: Grouped Query Attention (GQA) Multi-Query Attention (MQA) sliding window attention recurrent memory state-space models hybrid retrieval systems The industry increasingly believes: infinite-context transformers using naive KV scaling are economically unsustainable. Why inference economics are now the focus Training frontier models is expensive. But operating them continuously at global scale is potentially even larger economically. For many providers: inference cost dominates memory dominates inference cost That’s why companies across the stack are racing on memory: NVIDIA → HBM + NVLink + Grace AMD → MI300 unified memory Cerebras → wafer-scale SRAM Groq → deterministic low-latency SRAM-heavy architecture Marvell Technology → custom memory fabrics The bottleneck has shifted from: “Can we train bigger models?” to: “Can we serve them cheaply and fast enough?” submitted by /u/Annual_Judge_7272 [link] [comments]
View originalpipeline is really slow - consulting [D]
Hi, after a long debugging process and many discussions, I wanted to ask for advice from people who may have encountered similar training bottlenecks. My goal is imitation learning for robotics. Model / Pipeline Observation space: 4 RGB robot cameras image resolution: 128x128x3 small vector of robot joint velocities (14 dims) Pipeline: Shared ResNet18 encoder processes each image Each image embedding dimension is 128 Final input to policy: 4 * 128 image embedding concatenated with 14-dim state vector Policy backbone: DiT (Diffusion Transformer) ~8 layers hidden dim: 512 8 attention heads total params: ~50M Diffusion setup: predict action chunks of length ~50 diffusion timesteps: 4 Dataset / Storage Dataset stored in Zarr Data access is indexed/reference-based (not loading huge chunks into RAM) train/val split is contiguous no shuffling Current encoder setup Initially trained end-to-end During debugging I switched to ImageNet pretrained ResNet18 Encoder is currently frozen Hardware / Software GPU: NVIDIA A4500 RAM: 48GB Storage: SSD CUDA: 12.8 PyTorch: 2.9 Precision: bf16 mixed precision (also tested fp32) Dataloader batch size: 2 8 persistent workers pinned memory enabled Preprocessing preprocessing is minimal normalization + float conversion only preprocessing happens inside the multimodal encoder on GPU Profiler results (PyTorch profiler) Current workload split: train_dataloader_next: 4.41s / 41.84s = 10.5% batch_to_device: 0.32s / 41.84s = 0.77% training_step: 12.78s = 30.5% backward: 10.83s = 25.9% optimizer_step (wrapper total): 26.09s = 62.4% Problem The training is much slower than I expected. Current behavior: CPU utilization: ~100% GPU utilization: ~20–30% GPU utilization can even become LOWER with synthetic data VRAM usage is relatively low Throughput is around 10 iterations/sec Epoch of ~50k samples takes around 30 minutes Additional observations Increasing batch size does NOT reduce epoch wall-clock time Sometimes larger batches make things slower Freezing the encoder did not improve throughput much Replacing dataset samples with synthetic/random tensors improved throughput by only ~50% Synthetic dataset was initialized directly in memory I do not believe this setup should be this slow. At this rate, training takes multiple days. For comparison, I saw papers with somewhat similar architectures mentioning ~10 hour training times on RTX 4090. With my setup 10 hours is completely not enough. Does anyone see something obviously wrong or have suggestions for where I should investigate next? Please help, can't know what to do! submitted by /u/Potential_Hippo1724 [link] [comments]
View originali think flat-rate ai is dying.
tldr: longer one, but the point is simple: i think flat-rate ai is dying because the compute economics are starting to leak into the user experience. i think flat-rate ai is dying. and i don’t mean “ai is over” or whatever. i mean the $20/$200 subscription thing is starting to break. i’m on claude max. i use claude code a laaawt (actually can’t remember the last time my laptop was open without a terminal). and the thing that feels different lately is not just “claude got dumber” or “claude got slower”. maybe it did. maybe it didn’t. in the annoying daily way, you start thinking about usage, context, model choice, cache, tools, and whether this next prompt is going to burn half your session. that’s not really a chatbot subscription anymore. it’s some wierd middle thing where i pay monthly but still have to think about burn rate. and that kinda pisses me off. not because i expect infinite compute for $20, but because the product is still sold like a simple subscription while the actual experience is turning into metered infra. i also checked my own spend and it’s ugly. i’ve burned through around 11k since january because of heavy coding. and yeah, i haven’t had the time to properly audit this, so take it as “what it feels like” not a clean spreadsheet claim. but for roughly the same amount, i feel like i could code an entire year before. now it disappears in a few months if i’m really using the thing hard. that’s the part that made this click for me. look at anthropic’s own pricing chart: current sonnet is $3/$15 per million tokens. current opus is $5/$25. fast mode for opus 4.6/4.7 is $30/$150. https://platform.claude.com/docs/en/about-claude/pricing then look at the compute announcement: anthropic says the spacex deal gives them 220,000+ nvidia gpus, and that this lets them raise claude code limits. https://www.anthropic.com/news/higher-limits-spacex sorry but that’s the tell. if new compute capacity changes how much your $200 subscription can do, then you didn’t buy “ai access”. you bought a slice of scarce inference capacity. and the docs basically say it out loud now. usage depends on model choice, conversation length, tools, complexity, extended thinking, and all your claude surfaces sharing the same budget. claude code carries old context unless you clear or compact. tools eat tokens. opus eat limits faster. long sessions quietly become expensive sessions. my guess is 2027 looks way less like netflix and way more like aws. the good model costs more. speed costs more. deep thinking probably costs more. agents probably get their own meter. teams get pools. serious users get reserved capacity or whatever they end up calling it. basically all the boring cloud pricing stuff, but now inside a chat product. and honestly, maybe that’s fine. maybe that’s the only business model that survives. but then say that. so when people say “claude got worse”, i think part of that is real. but part of it is probably this: i think the cheap phase is ending. and nobody really wants to say out loud what the normal price is going to be. submitted by /u/tikkivolta [link] [comments]
View originalAnthropic and OpenAI don't want better models, they want to sell more tokens
There is a saying in auto racing that describes the current state of AI providers: “Go as slow as you can to win”, that translates as “Spend as low as you can on R&D to stay slightly better than average”. Let’s put our tin foil hats on and look at it from the business perspective of an AI provider. Follow the money AI providers do not make money on training models but on selling inference. It means, from a business perspective, if OpenAI could keep selling GPT-3 forever, they would not spend money on training a better model but keep milking the cow they already have. But they couldn’t, because it was still “cheap” ($80–$100 million for GPT-4) to train a better model, and there was a risk someone else would. That fear of losing to the better model got us where we are. Makes sense. But let’s look at modern times. Training a model is not “cheap” anymore, it’s mega expensive (estimated to be $1.5–$2 billion for GPT-5). There is only a handful of companies who can afford such an affair. And a new model will not necessary better (so sell more inference). An expensive gamble. What it means for the business: Training a new model is mega expensive, raising money for that is getting harder Training a new model is not a revenue stream, selling inference is Having somewhat capable models that don’t one-shot prompts but need “prolonged thinking” (self-prompting) is actually better for the business of selling tokens than a great model that one-shots SCREW NEW MODELS, SELL MORE INFERENCE! Better model is not a goal anymore Is that what’s happening? Did Anthropic and OpenAI accept their niche and unspokenly (or spokenly, we don’t know) decide to “go as slow as they can” with creating new models, as they both are winning anyway? That would sound reasonable if the goal is to make money (which is why commercial companies are created). Let’s look back 6 months (eternity in the AI world) at Anthropic’s release history: Nov 2025 Opus 4.5 released. The last model that felt like an improvement compared to its predecessor. Feb 2026 Opus 4.6: no shockwave, some users reverted back to 4.5. Maybe got slightly better, but only because it was “thinking for longer” (e.g. burning more tokens without extra prompting). April 2026 Opus 4.7: same underwhelming release, the biggest improvement is that the model now thinks even longer and prompts the user less, e.g. burns even more of your tokens without you asking it. To sum up: last 6 month we seen no quality improvements, but better token burn without bothering the user. From the other side, they also squeeze developers into using Claude Code (their AI harness): End of 2025: forbade usage of Claude subscription in 3rd party harnesses (OpenCode, etc.) Start of 2026: blocked subscription usage of OpenClaw, Hermes and other agents From June 2026: programmatic usage of their Claude Code (for example in scripts) will be forbidden as well. They force you into their harness, where they do as much as they can to keep the tokens flowing. Cherry on top of the pie: Boris Cherny, the head of Claude Code, stated he sees the AI coding future in “agent loops” — an agent keeps prompting itself until the task is completed. Have you noticed the difference? The goal is not to “one-shot” the answer anymore (that needs improving models) but “a loop” that keeps going until the problem is solved. And that loop is a money-making machine for Anthropic, great for the business. That approach also makes money for the whole AI supply chain: AI providers making margin on selling tokens Data centers selling GPU hours NVIDIA selling GPUs What does that mean? Lots of tech companies financially benefit from somewhat intelligent models but not intelligent enough to one-shot all questions. And those models are already there. So it’s likely we won’t see massive model improvements in upcoming future. There is no point in it. Top LLMs are on a more or less the same level, competition is miles behind. Time to make money on inference, or go IPO. submitted by /u/kgoncharuk [link] [comments]
View originalRethinking AI Bubble
For those worried about the AI Bubble bursting, it's not happening, at least for now, not until atleast OpenAI and Anthropic are listed (later this year). And if you actually discount Nvidia, and check the PE of AI companies right now OpenAI (35x) and anthropic (13x), these valuations do not really seem unsustainable as of now, and not to mention unlike the DotCom bubble, they have massive data centre infrastructure, so this is all not in the air. AI is here to stay, it's already altering our lives, taking up workspaces and transforming work, there is a massive upfront cost but that does not immediately signal a bubble unfolding. If any bubble bursts, it would not be solely the AI Bubble, it would be the government bonds and the dollar bubble. Edit: I wrote the post hastily, sorry for writing Valuation/Revenue as PE. submitted by /u/Upstair_Speaker [link] [comments]
View originalAi models
Fresh from Bloomberg today: the Pentagon is actively evaluating multiple frontier AI models — especially from OpenAI and Google’s Gemini — across military theater commands as it moves away from relying heavily on Anthropic’s Claude in classified environments. The backdrop is a major dispute earlier this year between Anthropic and the Pentagon over contract language tied to “lawful operational use.” Anthropic reportedly pushed back on terms that could permit domestic mass surveillance or fully autonomous weapons without meaningful human oversight. After negotiations collapsed, the Pentagon designated Anthropic a “supply-chain risk” and accelerated efforts to onboard rival models instead. That triggered a rapid shift toward a multi-vendor AI strategy: OpenAI, Google, Microsoft, Amazon Web Services, NVIDIA, xAI, and others have signed agreements for classified or operational military AI deployments. Google’s Gemini models were recently added to the Pentagon’s internal AI portal, while OpenAI expanded access to models inside classified defense networks. The Pentagon is now testing how different models respond to identical prompts, especially in ambiguous or high-stakes military workflows. Officials noted the systems “respond differently,” highlighting a major real-world challenge with LLM deployment. Why this matters: Defense agencies increasingly view frontier AI as critical infrastructure, similar to cloud or semiconductors. Moving from a single preferred model to multiple vendors improves resilience and bargaining power, but creates major integration and reliability challenges. The episode exposed growing tension between commercial AI safety policies and government/national-security priorities. So far, the biggest beneficiaries appear to be OpenAI and Google, both of which have expanded defense relationships while Anthropic fights the designation in court. submitted by /u/Annual_Judge_7272 [link] [comments]
View originalNVIDIA uses a tiered pricing model. Visit their website for current pricing details.
NVIDIA has an average rating of 4.5 out of 5 stars based on 14 reviews from G2, Capterra, and TrustRadius.
Key features include: NVIDIA GTC, Data Center, Artificial Intelligence, Agentic AI, Short Description, NVIDIA Nemotron 3 Omni, Introducing NVIDIA Nemotron 3 Omni, L’Oréal Uses post 1.
NVIDIA is commonly used for: Accelerate power-flexible AI deployment with Emerald AI, Build autonomous agents that perceive, reason, and act on enterprise knowledge, Enhance security in autonomous agents using NVIDIA OpenShell, Deploy self-evolving agents with control and governance, Utilize NVIDIA Dynamo 1.0 for large-scale inference, Develop robotics and vision AI agents for autonomous vehicles.
NVIDIA integrates with: NVIDIA DGX Station™, NVIDIA DGX Spark™, NVIDIA CUDA-X™, NVIDIA Omniverse™, NVIDIA ALCHEMI, NVIDIA CloudXR 6.0, NVIDIA Dynamo integration with vLLM, NVIDIA integration with Synopsys engineering solutions, Collaboration with T-Mobile and Nokia for 5G edge AI, Partnership with Dassault Systèmes for industrial transformation.
Mistral AI
Company at Mistral AI
2 mentions
Based on user reviews and social mentions, the most common pain points are: LLM costs, cost per token, API costs.
Based on 124 social mentions analyzed, 15% of sentiment is positive, 81% neutral, and 3% negative.