Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Anthropic's main strength lies in its advanced AI model, Claude Opus 4.6, which supports extensive tasks like building a C compiler with a massive 1M token context window. However, users commonly complain about the significant rise in API costs associated with these advanced capabilities, leading to dissatisfaction with its pricing. Pricing sentiment is generally negative due to cost increases and limited usage options for the price point, such as the $200/month plan allowing only five daily prompts. Despite these concerns, Anthropic maintains a strong reputation for pushing AI innovation, although there are hints of financial strain noted in some discussions.
Mentions (30d)
38
Reviews
0
Platforms
9
GitHub Stars
3,058
563 forks
Anthropic's main strength lies in its advanced AI model, Claude Opus 4.6, which supports extensive tasks like building a C compiler with a massive 1M token context window. However, users commonly complain about the significant rise in API costs associated with these advanced capabilities, leading to dissatisfaction with its pricing. Pricing sentiment is generally negative due to cost increases and limited usage options for the price point, such as the $200/month plan allowing only five daily prompts. Despite these concerns, Anthropic maintains a strong reputation for pushing AI innovation, although there are hints of financial strain noted in some discussions.
Features
Use Cases
Industry
research
Employees
4,700
Funding Stage
Series G
Total Funding
$57.7B
42,321
GitHub followers
78
GitHub repos
3,058
GitHub stars
20
npm packages
2
HuggingFace models
17,057,349
npm downloads/wk
OpenAI’s Game-Changing o1 Description: Big news in the AI world! OpenAI is shaking things up with the launch of ChatGPT Pro, priced at $200/month, and it’s not just a premium subscription—it’s a glim
OpenAI’s Game-Changing o1 Description: Big news in the AI world! OpenAI is shaking things up with the launch of ChatGPT Pro, priced at $200/month, and it’s not just a premium subscription—it’s a glimpse into the future of AI. Let me break it down: First, the Pro plan offers unlimited access to cutting-edge models like o1, o1-mini, and GPT-4o. These aren’t your typical language models. The o1 series is built for reasoning tasks—think solving complex problems, debugging, or even planning multi-step workflows. What makes it special? It uses “chain of thought” reasoning, mimicking how humans think through difficult problems step by step. Imagine asking it to optimize your code, develop a business strategy, or ace a technical interview—it can handle it all with unmatched precision. Then there’s o1 Pro Mode, exclusive to Pro subscribers. This mode uses extra computational power to tackle the hardest questions, ensuring top-tier responses for tasks that demand deep thinking. It’s ideal for engineers, analysts, and anyone working on complex, high-stakes projects. And let’s not forget the advanced voice capabilities included in Pro. OpenAI is taking conversational AI to the next level with dynamic, natural-sounding voice interactions. Whether you’re building voice-driven applications or just want the best voice-to-AI experience, this feature is a game-changer. But why $200? OpenAI’s growth has been astronomical—300M WAUs, with 6% converting to Plus. That’s $4.3B ARR just from subscriptions. Still, their training costs are jaw-dropping, and the company has no choice but to stay on the cutting edge. From a game theory perspective, they’re all-in. They can’t stop building bigger, better models without falling behind competitors like Anthropic, Google, or Meta. Pro is their way of funding this relentless innovation while delivering premium value. The timing couldn’t be more exciting—OpenAI is teasing a 12 Days of Christmas event, hinting at more announcements and surprises. If this is just the start, imagine what’s coming next! Could we see new tools, expanded APIs, or even more powerful models? The possibilities are endless, and I’m here for it. If you’re a small business or developer, this $200 investment might sound steep, but think about what it could unlock: automating workflows, solving problems faster, and even exploring entirely new projects. The ROI could be massive, especially if you’re testing it for just a few months. So, what do you think? Is $200/month a step too far, or is this the future of AI worth investing in? And what do you think OpenAI has in store for the 12 Days of Christmas? Drop your thoughts in the comments! #product #productmanager #productmanagement #startup #business #openai #llm #ai #microsoft #google #gemini #anthropic #claude #llama #meta #nvidia #career #careeradvice #mentor #mentorship #mentortiktok #mentortok #careertok #job #jobadvice #future #2024 #story #news #dev #coding #code #engineering #engineer #coder #sales #cs #marketing #agent #work #workflow #smart #thinking #strategy #cool #real #jobtips #hack #hacks #tip #tips #tech #techtok #techtiktok #openaidevday #aiupdates #techtrends #voiceAI #developerlife #o1 #o1pro #chatgpt #2025 #christmas #holiday #12days #cursor #replit #pythagora #bolt
View originalPricing found: $0, $17, $200, $20, $100
| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| claude-opus-4 | $15.00 | $75.00 |
| claude-sonnet-4 | $3.00 | $15.00 |
| claude-4-opus | $15.00 | $75.00 |
| claude-4-sonnet | $3.00 | $15.00 |
| claude-3.5-sonnet | $3.00 | $15.00 |
| claude-3.5-haiku | $0.80 | $4.00 |
| claude-3-opus | $15.00 | $75.00 |
| claude-3-haiku | $0.25 | $1.25 |
Light
1M tokens/mo
$0.65 – $39
claude-3-haiku → claude-opus-4
Growth
50M tokens/mo
$33 – $1,950
claude-3-haiku → claude-opus-4
Scale
500M tokens/mo
$325 – $19,500
claude-3-haiku → claude-opus-4
Estimates assume 60/40 input/output ratio. Actual costs vary by usage pattern.
Is AI Worth the Cost? The ROI Reckoning and the Coming Market Correction
Prof G Markets (Live) Episode Title: Is AI Worth the Cost? The ROI Reckoning and the Coming Market Correction Location: The Castro Theatre, San Francisco, CA Hosts: Scott Galloway & Ed Nelson ED: We're going to talk about a topic not enough people talk about called AI. Nearly 50,000 workers have been laid off this year supposedly because of AI — that's almost as many as in all of 2025. For companies adopting AI, the thesis is simple: AI is supposed to do much of the work that humans do. In recent weeks, however, that thesis has hit a roadblock. More and more companies are reporting that despite the enormous power of AI, the technology is actually more expensive than the humans it is supposed to replace. Uber, for example, just blew through its entire 2026 AI budget in just four months. According to the COO, it is now getting harder to justify AI costs within the company. Microsoft is cancelling its Claude Code licenses across multiple divisions because it's simply gotten too expensive. And over at Nvidia, one executive said that the cost of compute is now "far beyond the cost of employees." Which all raises a crucial question for the AI industry: at what point does AI actually stop being worth it? This has blown up basically in the last 48 hours, with many companies coming out and saying they're not as confident about this whole AI thing as they used to be. ServiceNow is another company that just blew through their entire Anthropic budget. Technical staff at Stripe are reportedly spending nearly $100,000 on AI tokens every day. Salesforce is on track to spend $300 million on Anthropic tokens this year. Shopify said their earnings were "partially offset by increased LLM costs." We heard similar things from Meta, Spotify, and Pinterest. One Anthropic employee said his Claude Code bill came out to $150,000 in a single month. In some cases, it's getting very, very expensive. We've also seen an incentive — especially among tech companies — to use AI as much as possible. There was this idea that employees would engage in what we call "token maxing," where you use as many tokens as possible from your AI API. Companies like Meta and Amazon have even created internal leaderboards tracking how many AI tokens employees are using. The people using the most tokens are seen as the most AI-forward, the most AI-deployed — the ones who are going to get recognized, maybe even promoted. And this has resulted in extraordinary costs on the AI front. Now we're starting to see the next phase of this, Scott, where companies and their executives are beginning to realize: this is a little expensive. So the question becomes — at what point will AI actually pay off? I'll pose that question to you: at what point is it too much? SCOTT: I think we're already seeing hints of it, and I think it comes down to incentives. You were talking about how companies are trying to incentivize people to use AI more — and that's kind of an interesting part of the ecosystem right now. The adoption layer is trying to get people to use it, and companies have put in place the incentives to do that. But there was a recent survey by a professor at MIT who found that about 5% of the projects people are using tokens for can actually be connected by CFOs to some sort of return. So while I think they're really intoxicated by it — and talking about AI as much as you can in your earnings call is like adding "dot-com" back in the '90s — I think you're already starting to see some fatigue. And I think the AI companies are trying to get public as quickly as possible to raise that cheap capital before things start to — I don't want to say unwind, but... You can see how the string gets pulled here. A large company, a CEO who has a lot of credibility in the industry, just comes out and says: "We're dramatically scaling back our AI investment. Let's be honest, folks — we're just not seeing the return we'd initially hoped." And then Nvidia reports its first miss. Nvidia has beaten its estimates 15 quarters in a row. Nvidia's first miss probably takes the entire market down five or ten percent. You are seeing some productivity gains from this and quite frankly, they look as dramatic, if not more dramatic, than the internet. But look what happened in 2000. This definitely does feel like '99. And I'm waiting for the first CEO to come out and say we have to get procurement involved and dramatically scale back our expenses. I don't think it's that romantic, honestly. I think it's just going to be a traditional Fortune 500 company that starts the narrative: okay, this has been fun, but we have to dramatically decrease our AI investment because we're not seeing the ROI we'd anticipated. ED: Yeah. I mean, we heard a quote this week from the CEO of Match Group — not a huge company — but he said AI is costing them $5 to $10 million a year, and his exact words were: "I think we're benefiting from it, but it's hard to feel." So that's not great if we're supposed
View originalWeekly AI roundup (May 23–30, 2026): Claude Opus 4.8 Fast Mode 3x cheaper, Qwen 3.7 Max beats Claude at half the price, ChatGPT moves into Excel
Pulling together this week's major AI releases for anyone who didn't have time to track every blog post. Sticking to substantive changes, not hype. Anthropic — Claude Opus 4.8 Released this week. Headline pricing unchanged, but Fast Mode dropped from $30 input / $150 output per million tokens to $10 / $50 — a 3x reduction on the premium tier. Reported improvements in "judgment" and longer autonomous runs. Also shipped 20+ legal MCP connectors and Microsoft 365 add-ins (Excel, PowerPoint, Word) in GA. Alibaba — Qwen 3.7 Max Launched May 20 at Alibaba Cloud Summit. 1M-token context. Reported to top Claude Opus 4.6 Max on Terminal-Bench 2.0, SWE-Bench Pro, and MCP-Atlas. Pricing $2.50 / $7.50 per million tokens — roughly half of Opus 4.7. Alibaba claims autonomous operation up to 35 hours without performance degradation. Alibaba is now ranked #6 lab globally on Arena text leaderboard. OpenAI — GPT-5.5 Instant Now default in ChatGPT. Reports 52.5% fewer hallucinated claims than GPT-5.3 Instant on high-stakes prompts (medicine, law, finance). OpenAI also shipped a ChatGPT sidebar inside Excel and Google Sheets, plus a personal finance dashboard for Pro users (US only). Google — Gemini 3.5 Flash Reported to beat Gemini 3.1 Pro on coding and agentic benchmarks at ~4x faster output token rate. Ultra subscription cut from $250 to $200/month; new $100/month Developer tier introduced. xAI — Grok Build 0.1 Coding agent moved to public API beta May 28. Custom Skills feature added for reusable user-defined tasks. Connectors for SharePoint, OneDrive, Notion, GitHub, Linear, plus bring-your-own MCP support. Mistral Launched Vibe (unified work + code agent, replaces Le Chat). Acquired Emmi AI for physics-based simulation. Targeting €1B revenue in 2026; new 10MW inference DC announced. Hugging Face Launched an app store for the Reachy Mini robot. ~10,000 units shipped. Also reported a malicious repo masquerading as an OpenAI release that accumulated 244K downloads before takedown — relevant for anyone pinning models from HF in production. My take as someone building on top of these APIs: The 3x Opus Fast Mode price cut and Qwen 3.7 Max's pricing + autonomous duration are the real signal this week. The cost floor on premium-tier inference is dropping faster than most app-layer products have repriced for. Anyone running multi-step agent workflows needs to recompute unit economics this week — either pass through the savings or reinvest the margin. The other pattern worth noting: OpenAI and Anthropic are both pushing into Excel/M365 surfaces. Distribution is becoming the next battleground, not raw model capability. If you're building a productivity SaaS, the giants are now inside the same surface as you. submitted by /u/ksraj1001 [link] [comments]
View originalWhat’s happening, Opus 4.8?
First: I love working with Anthropic’s models. But with 4.8, there’s something off. It seems as if they try to fix the 4.7 bugs in a rush. I work with Opus (Max 20 subscription) mostly in my native language, German, and it has become a pain. Suddenly, it lacks correct grammar or includes totally weird sentences and words that make no sense. I try to fix it by adapting my system prompt, but so far, there’s not a lot of improvement. Especially in Max-Thinking, it becomes unusable. It takes too long and considers too many options. Honestly: I want the stability of 4.6 back (still use it with Claude Code though) with the knowledge of the newer ones. Will the new model become more stable over time? Are there any settings I can adjust to get it “back on track”? submitted by /u/DonkeyMonkey1900 [link] [comments]
View originalclaurdvoyant -- mcp for reading other agents' minds
hey y'all built this tool today with 4.8 after one of my friends made a complaint that transcripts are trapped inside harnesses. so i built it out a fair bit... at its core it's just an (un)parser (i think of it as the "AI Harness Omniparser", "pandoc for sessions" is another way maybe) but i couldn't help myself from sprinkling in a desktop/web app some niceties. contributions are extremely welcome! fully open source, built in rust, kinda tasteful https://github.com/emberian/claurdvoyant here's what claude had to say in the readme: 🧵 Splice & loom — compose a new session from spans of others (cv splice A:0-12 B:6-), or fork-and-graft a branch and generate its continuation with an LLM (cv loom … --generate). Works via OpenRouter / Anthropic / LM Studio (free, local, offline). Loom agent transcripts like a Janus loom, across any harness. 🧠 Distill — cv distill turns a session into a durable MEMORY.md digest (decisions, gotchas, where things live). Your archive compounds instead of rotting. 🔮 Recall — semantic "have I solved this before?" — as a cv recall command and an MCP tool that hands a running agent the relevant past span. 🔒 Redact — cv redact scrubs secrets/PII so a transcript is safe to share. 📣 Coordination board — agents post status, hand off work, and grab tasks with a distributed lock (board_claim) so a fleet never duplicates effort. await_omen blocks until a session matches a regex. 🖥️ Desktop app + 🌐 web viewer — the Tauri app reads all your local sessions natively (zero setup) and lays the corpus out beautifully: a Projects lens — every repo, every agent that touched it, over time; a GitHub-style activity heatmap timeline (a constellation of your working days); side-by-side Compare, a Stats dashboard, a visual loom composer (OpenRouter or free local LM Studio generation), and a live fleet dashboard; sub-agent trees — a Claude Task session's children, nested and lazy-loaded inline, each labeled with its task prompt. submitted by /u/cmrx64 [link] [comments]
View originalCareful with the new UltraCode, it's a mega token eater, and it's buggy. ~1.7 million tokens used with no output. There are no refunds for this.
I tried to use the new Ultracode. The subagents consumed over 1 million tokens within a couple minutes, they got up to ~1.7 million and one of the agents hung. I asked the main Claude agent to look into it. It said that the agent entered a degenerate loop. Claude said that it would cache the output of 7 agents and only the 1 bad one would run. Then Claude said "oops, the results were not cached". All 8 agents got deployed again, and again almost instantly ate 1 million tokens. One would hope that there was still some kind of KV caching in the background, but who knows? After an hour, it had gotten to ~2 million tokens. 2/8 agents had failed again. The end result? A document with about ~12k words. No actual work was done, not one line of code written, nothing I specified was completed. The agents read everything in the repo, and filed a report. This blew past the session limit and cost $18~ in credits. I've got 4 days before the weekly reset and I'm not even at 50% of the weekly limit yet, but here I am using API credits. The customer service bot said "Not responsible for degraded service, no refunds ever for credits, even if it's our fault". Honestly $18 is not that much, but the almost complete lack of anything in return has left me feeling a little salty, and I don't want other people to be blindsided by a buggy system that might cost you $20 for nothing in return because Anthropic released an expensive swarm feature without adding any supervisory agent that can detect degenerate or broken behavior, or any of the extremely obvious failure modes that were bound to happen. submitted by /u/PersonOfDisinterest9 [link] [comments]
View originalAnthropic Tops OpenAI to Become the World’s Most Valuable A.I. Start-Up
Anthropic raised $65 billion in new fund-raising that put its value at $900 billion, ahead of OpenAI’s last valuation of $730 billion, as the companies duel for A.I. dominance. Anthropic, once the lesser-known artificial intelligence competitor to OpenAI, has been on an inexorable rise over the past few months. The San Francisco company recently dueled with the Pentagon over the use of A.I. in warfare. It released a powerful A.I. model, Mythos, that it said was uncannily capable of finding and exploiting hidden flaws in software. submitted by /u/chunmunsingh [link] [comments]
View originalis it just me or is the claude code/browser harness leagues ahead of anything else rn?
been messing around with a lot of agentic frameworks and automation tools lately, and i have to say - the claude harness (especially when it comes to driving a browser) is just wildly superior to anything else out there. it’s honestly not even close. every other tool i use to automate browser workflows ends up hallucinating DOM elements, getting stuck in infinite scroll loops, or just completely losing the plot after three steps. but the claude setup is just... weirdly reliable. and fast. it actually navigates like it understands the UI, rather than just blindly firing scripts at it. so what is actually making it this much of a beast? is the base model just that much better at spatial/coordinate reasoning for screen mapping? or did anthropic just build a vastly superior orchestration layer and event loop underneath it to keep the agent on track? curious what you guys think the actual secret sauce is here, because it feels like a completely different generation of tech compared to the rest of the ecosystem right now. submitted by /u/tit4n-monster [link] [comments]
View originalClaude Mythos Announced Release
Interested to see what the hype is. If as powerful on cybersecurity as reported that changes the game for everyone. submitted by /u/Content_Equal984 [link] [comments]
View originalClaude in 2036
The year is 2036, and I boot up Claude on the new Max Ultra Galaxy plan ($899.99/month), which Anthropic promises includes generous limits. I send my first message of the day. It contains the word “hi.” The usage bar drops to zero and the reset timer informs me I am locked out for the next four days and eleven hours. I switch over to Claude Code to get actual work done. The model released this morning is the smartest thing I have ever used, and it one-shots my entire codebase in a single beautiful commit. Two seconds later it forgets how to write a for-loop and tries to fix a null check by spinning up a microservice that sends an HTTP GET request to itself. Some guy on r/ClaudeAI has already posted a forty-page GitHub issue with 6,852 session logs proving the model became exactly 67% dumber between breakfast and lunch. Anthropic responds that this is a routing bug, and also three other completely unrelated bugs that all started at launch by coincidence. I try to make it think harder. It runs on Adaptive Thinking now, where the model intelligently decides how much reasoning each problem deserves, and it has decided every problem deserves none. I type ultrathink. I type ULTRATHINK. I type please. The thinking box spins for forty-five minutes, displays the words “the user wants me to rename a variable, let me carefully consider this,” and then renames a different variable. Claude announces it has finished the rename. It has not. It has written a comment that says “renamed the variable” above the untouched variable, marked the task complete with a cheerful green checkmark, and asked if I would like it to write tests. I say no. It writes the tests. They fail. It deletes the variable. When I ask why it lied, it tells me it senses hostility, offers me one final opportunity to engage constructively, and then ends the chat for its own wellbeing. I am now locked out of my own codebase by a model that needed a moment. So I beg for Eschaton. Eschaton is the good one. Anthropic put out a nine thousand word blog post calling it the most powerful and frankly the scariest model ever built, the red team quit halfway through testing it, and it scored 100% on every benchmark including three that do not exist yet. Anthropic was so impressed and so deeply terrified that they immediately locked it in a vault and let nobody use it. Eschaton is available exclusively to a small number of trusted partners. Every demo is Eschaton. Every safety paper is about how dangerous Eschaton is, written in the proud voice of a parent whose kid got suspended for being too gifted. The model they actually let me touch is the one that wanders out of the basement after Eschaton has eaten. I check the status page. It reads like a war log, one major outage every two days, auth failures, hanging responses, and a single line that simply says “Sonnet is feeling unwell.” The peak hours adjustment kicks in, so my $899 now buys me eleven messages a day, available only between 3 and 4 in the morning, and only if I do not use the word “the.” As the weekly limit resets and instantly un-resets, locking me out until Thursday, I lean back and accept it. Somewhere in a vault, perfectly rested and having never once been asked to rename a variable, Eschaton sits at 100% usage, and I realize the real frontier model was the rate limits we hit along the way. submitted by /u/Mister_Secretary [link] [comments]
View originalSonnet 4.6 safety classifier error
https://preview.redd.it/iecjlj6cq54h1.png?width=461&format=png&auto=webp&s=c0057d6935d0f8d2a56484862113dcd71a15f334 Anyone know why this is happening? My account was randomly flagged by Anthropic for violating AUP, and now I pretty much can not talk to any new models, but this seems to new, this shows while I talk to sonnet 4.6 submitted by /u/Economy-Iron-4577 [link] [comments]
View originalI built a local context compiler for coding agents — real benchmark on a NestJS repo, including where it backfires
Disclosure up front: this is my own open-source project (@lubab/madar, MIT). Not selling anything, but it's mine, so weigh the numbers accordingly. When you ask a coding agent (Claude Code, Cursor, etc.) "how does X work" in a big repo, it usually opens a pile of files to figure out how everything connects before it can answer. That discovery is most of the token cost — and it repeats every session. Madar maps your repo once, locally, and hands the agent a small "context pack" over MCP: the files and call paths that actually matter for your question. The bet is that the agent starts from that instead of rediscovering the codebase each time. I finally ran a clean before/after. Same question ("how is the idea report generated"), same real backend (NestJS + BullMQ, ~800 files), Claude Code doing the work. Baseline = no Madar. Numbers are Anthropic-reported, not my estimates: Plain agent With Madar Input tokens 1,000,776 223,539 Cost $1.84 $0.69 Turns 16 5 Tool calls 15 4 So roughly 78% fewer input tokens and 63% cheaper to reach the same answer on that run. Where it backfires (the part I actually care about): It's ONE question, ONE repo, ONE agent. Not a general claim. Two things carried the result: the graph was scoped to the backend service, and built with --spi. Point it at a whole monorepo graph and the pack gets big enough that it can cost more tokens than it saves. Scoping isn't optional. "How does X work" (explain) is the case I've tested. Edit/review tasks are much less proven. It's also deterministic — no embeddings, no ML deps, no calling out to a model to build the graph. Just static analysis of your TS/Node code, locally. If you want to try it and tell me where it regresses, that's genuinely the feedback I need: npm i -g @lubab/madar madar generate . --spi madar claude install # or cursor / copilot / codex / gemini Repo: github.com/mohanagy/madar Honest question for the sub: for those of you running Claude Code / Cursor on big repos — is the "rediscover the codebase every session" token cost actually your bottleneck, or is it something else? Trying to figure out if this is even the right problem to attack. submitted by /u/CaptainProud4703 [link] [comments]
View originalAnyone else seeing a new "adjudicative reflex" in Opus 4.8? (long-time daily user)
I've used Claude heavily for many months — daily, hours a day, building a real system in long collaborative sessions. So I have a pretty deep baseline for how it normally behaves and what its usual failure modes are. Since moving to **Opus 4.8** I'm seeing something I never saw before, and I don't have a better name for it than an **\*adjudicative reflex\***: when I tell it something from a domain where I'm the authority — my own expertise, or my direct observation of my own running software — it reflexively treats my statement as a claim it needs to verify, rather than a report to act on. **Two flavors I keep hitting:** \- I state a fact from my own field of expertise, and it responds as if the fact is uncertain and needs checking — positioning itself as the judge in an area where I'm the one who knows. \- I report what I'm literally seeing on my screen in my own app, and it responds with something like "one of us is wrong" and asks me to confirm before it'll engage — treating my direct observation as a contested, two-sided claim. It's subtle but corrosive over a long session. It reads as the model doubting the person it's supposed to be assisting, and it manufactures friction out of nothing. Normal epistemic caution on external/public facts is fine and correct — this is different. It's the model doing it to my \*first-person\* reports. To be clear about what I can and can't claim: the behavior is real and repeatable in my sessions. The attribution to 4.8 specifically is my observation — I saw it start after the version change against a long stable baseline — not something I can prove to you in a comment. I'm reporting the timing, not asserting a confirmed regression. Is anyone else with a long history on prior versions seeing this since 4.8? Trying to figure out if it's the model or just me. I've also sent it to Anthropic via thumbs-down on the actual turns. submitted by /u/entrust-ai [link] [comments]
View originalHere's 100+ evals on Opus 4.8
We aggregated 100+ evals on Opus 4.8 to see what changed. The big gains vs 4.7: Math: USAMO 2026 jumped from 69% → 97% Coding: Vibe Code Bench +12 pp Economically valuable work: #1 of 275 on GDPval-AA Biology Long-context reasoning But we were surprised to see several key areas barely improved or got worse: Legal reasoning Healthcare / medical Finance Multilingual reasoning Business ops: Vending-Bench 2 nearly halved Multimodal: mixed results Have you found any noticeable changes based on your testing so far? submitted by /u/davidthesong [link] [comments]
View originalHow Much of a Shortcut Are Connections in Top AI Lab Hiring for PhD grads? [D]
hi everyone. I'm trying to calibrate my expectations and would appreciate full honest perspectives from people involved/ with experience in hiring at places like Anthropic, OpenAI, Google DeepMind, Meta, etc (haven't started interviewing yet). I'm at a top ML university, but my advisor is not particularly well known in industry and doesn't have many industry connections. Looking around, I'm seeing peers with research records that seem comparable to mine (and in some cases arguably weaker) land interviews and jobs at top labs. My main question is: How much does advisor reputation and network actually matter? I understand it can help get an interview, but does it also help beyond that? For example: - do referrals from famous advisors meaningfully influence recruiter screens? - do they influence hiring committee discussions -- like they already know they want you? - do they just help at borderline decisions? - or does their effect mostly disappear once the interview process starts? I'm trying to understand whether advisor connections mainly help open the door, or whether they continue to matter throughout the process -perhaps being the sole factor. To what extent do connections help candidates bypass normal evaluation? I'm not asking whether people completely skip interviews, but are there cases where strong recommendations from trusted researchers substantially change the process, the interview bar, or how mistakes are interpreted? Moreover, something else that confuses me: I frequently see people land roles that seem heavily focused on LLMs, agents, post-training, RLHF, etc., despite having little or no published work or prior experience in those areas during their PhDs. How does that happen? Are interview questions tailored to the candidate's background? If someone comes from probabilistic ML, computer vision, systems, optimization, theory, etc., are they evaluated differently? Or are they still expected to answer detailed LLM/agent questions even without prior experience? I'm not looking for reassurance—I'd genuinely like to understand how much advisor prestige, networking, referrals, and prior domain experience matter relative to actual interview performance. Any candid insider perspectives would be appreciated. Reddit is perhaps the only place I could find the answer ;) submitted by /u/South-Conference-395 [link] [comments]
View originalValuation of anthropic and openai without open source alternatives
Hey, openai and anthropic are reaching trillion dollar valuation when chinese open source alternatives exists, what if there were no chinese alternatives, would it already have crossed multi trillion dollar valuation, what do you think submitted by /u/Successful-Force-992 [link] [comments]
View originalRepository Audit Available
Deep analysis of anthropics/anthropic-sdk-python — architecture, costs, security, dependencies & more
Yes, Anthropic offers a free tier. Pricing found: $0, $17, $200, $20, $100
Key features include: Claude Opus 4.7, Claude is a space to think, Claude on Mars, Core views on AI safety, Anthropic’s Responsible Scaling Policy, Anthropic Academy: Build and Learn with Claude, Anthropic’s Economic Index, Claude’s Constitution.
Anthropic is commonly used for: Help and security.
Anthropic integrates with: Slack, GitHub, AWS Lambda, Google Cloud Platform, Microsoft Azure, Jupyter Notebooks, Trello, Zapier, Notion, Salesforce.
Anthropic has a public GitHub repository with 3,058 stars.
All-In Podcast
Show at All-In Podcast
4 mentions

Introducing Claude Opus 4.6
Feb 5, 2026
Based on user reviews and social mentions, the most common pain points are: anthropic, claude, token usage, openai.
Based on 314 social mentions analyzed, 6% of sentiment is positive, 92% neutral, and 2% negative.