The AI Toolkit for TypeScript, from the creators of Next.js.
The Vercel AI SDK receives high praise for its simplicity and effectiveness, as reflected in its consistently high ratings on platforms like G2. Users laud it for integration ease, particularly its ability to significantly reduce token usage. However, some concerns are mentioned regarding the obligatory use of the Responses API in the tool, which can feel limiting. Pricing information is not frequently discussed, but overall, the SDK enjoys a strong reputation for enhancing AI functionality and developer productivity.
Mentions (30d)
0
Avg Rating
4.8
20 reviews
Platforms
2
GitHub Stars
23,126
4,086 forks
The Vercel AI SDK receives high praise for its simplicity and effectiveness, as reflected in its consistently high ratings on platforms like G2. Users laud it for integration ease, particularly its ability to significantly reduce token usage. However, some concerns are mentioned regarding the obligatory use of the Responses API in the tool, which can feel limiting. Pricing information is not frequently discussed, but overall, the SDK enjoys a strong reputation for enhancing AI functionality and developer productivity.
Features
Use Cases
27,110
GitHub followers
227
GitHub repos
23,126
GitHub stars
20
npm packages
25
HuggingFace models
11,880,060
npm downloads/wk
8,419
PyPI downloads/mo
AI quietly turned HTML into a real alternative to PowerPoint and Word for client-facing docs. The blockers that made it impractical a year ago are falling one by one.
A year ago, generating a polished document as HTML instead of a PPT or a Word file was a fun idea with too many practical problems. Lately I've noticed every one of those blockers either gone or close to gone, and I've quietly stopped reaching for Office on a bunch of deliverables. Curious if others are seeing the same. **The blockers, and where they stand now:** **Design**. The old objection was "AI HTML looks generic and amateur." That's basically solved if you give the model a design skill or a style guideline once. You get consistent, on-brand output that looks more like a designed page than a default template, every time, without redoing it. **Hosting.** The first wall: a .html file on your machine isn't shareable, and turning it into a URL used to mean GitHub Pages, a Vercel/Netlify deploy, or a bucket setup, all overkill for a single document you just want to send. That's now a paste-and-get-a-link affair, no build step, no config. **Sharing.** The real killer: even with a URL, getting it in front of a non-technical person was a nightmare. A raw .html "won't open," looks broken on their phone, or lands in spam. Screenshotting kills the interactivity, which was the whole point. That gap is now filled by hosted links that just open in a browser like any page. **Security.** "I can't put confidential work on a public URL" used to end the conversation. Access-controlled links (password or email-gated, not public/indexable) handle that now. **Tracking.** With a PPT or PDF you send it and hope. The thing I didn't expect to care about but now can't live without: knowing whether the client actually opened it, and roughly how long they spent. That alone changed how I follow up. Where Office / Markdown still wins, to be fair: anything that lives in version control with clean diffs and line-by-line review, real-time co-editing, and Figma-style pinned feedback on specific elements. Those aren't cleanly solved for plain HTML yet. So I'm not saying Office is dead, more that for one-shot, client-facing deliverables (reports, dashboards, proposals, one-pagers) HTML has quietly become the better option for me. **Two questions for anyone who's made the switch:** 1. Which deliverables did you move from PPT/Word to HTML, and which did you keep in Office? 2. For the ones you moved, what finally made it practical, design, hosting, sharing, something else?
View originalg2
What do you like best about Vercel?I use Vercel to deploy all of my websites and my clients' websites. I love how easy it is to use, with a clean and simple UI that makes navigation a breeze. The fast deployment makes everything efficient, allowing me to quickly implement changes that my clients request, which keeps them pleased. One of my favorite features is the instant rollback, which is invaluable for correcting mistakes swiftly without causing worry for myself or my clients. The initial setup was really easy, especially with the CLI tool that integrates seamlessly. Review collected by and hosted on G2.com.What do you dislike about Vercel?Honestly, I have nothing bad to say apart from it could be cheaper. Review collected by and hosted on G2.com.
What do you like best about Vercel?Vercel is a great tool for managing everything from deployments to analytics. It offers a wide range of features, including one-click deployments for our Next and React applications, which makes the overall workflow much smoother. Review collected by and hosted on G2.com.What do you dislike about Vercel?So far, there’s nothing about Vercel that I haven’t liked. Review collected by and hosted on G2.com.
What do you like best about Vercel?What I like most about Vercel is how simple it makes the entire deployment workflow. You push code, get a live deployment quickly, and can validate changes in preview environments without a lot of extra setup. It feels especially polished for frontend-heavy projects and for teams that want to move fast. I also appreciate that performance and visibility are built into the platform. Having analytics, speed insights, logs, and deployment details all in one place makes it much easier to spot issues early and keep improving the product without having to juggle a bunch of separate tools. Review collected by and hosted on G2.com.What do you dislike about Vercel?What I don’t like is that as a project grows, pricing and usage can start to feel a bit less predictable. Also, if you need very custom control over your infrastructure, Vercel can feel more opinionated than a fully self managed setup. Review collected by and hosted on G2.com.
What do you like best about Vercel?For me, it is easier to create/deploy project portfolios and connect it with github Review collected by and hosted on G2.com.What do you dislike about Vercel?It costs insanely and unpredictably high, making it unaffordable to students Review collected by and hosted on G2.com.
What do you like best about Vercel?Vercel has completely transformed how I deploy full-stack and AI-powered applications. As a Lead AI Engineer working with Next.js, React, and LLM workflow pipelines, the GitHub integration is flawless push to main and the app is live in under a minute. Preview deployments on every PR make client demos and stakeholder reviews effortless. Edge functions, environment variable management, and built-in CDN make it the perfect platform for production-grade applications like my Nexus LLM Workflow builder. Review collected by and hosted on G2.com.What do you dislike about Vercel?Pricing scales up quickly for teams with high bandwidth or serverless function usage. The free tier limitations on build minutes can be restrictive for active projects. More granular control over cold start behavior for serverless functions would be appreciated, especially for latency-sensitive AI applications. Review collected by and hosted on G2.com.
What do you like best about Vercel?The developer experience is genuinely hard to beat. I connected my GitHub repo and that was basically it every push deploys automatically, with preview URLs included. As a solo developer running a real production project, the Hobby plan gives you more than you’d expect. The firewall tooling is surprisingly mature for a free tier, Speed Insights and Analytics are built in without any extra setup, and the dashboard feels clean and intuitive. The documentation is some of the best I’ve encountered on any platform: thorough, well organized, and actually kept up to date. I briefly tried the Pro plan and loved it too, but even on its own the Hobby plan is already a serious offering. Overall, it’s clear the team cares about the product. Review collected by and hosted on G2.com.What do you dislike about Vercel?The biggest limitation of the Hobby plan is how restricted team collaboration is, along with some more advanced features being locked behind Pro. For a solo project it works well enough, but as soon as you want to bring someone else in and collaborate properly, the jump to Pro becomes hard to ignore, especially given the price difference. That said, the Pro tier does offer real value I just wish there were an in-between option. Review collected by and hosted on G2.com.
What do you like best about Vercel?The developer experience (DX) is unmatched. The git-push-to-deploy workflow and automatic SSL provisioning allow me to focus entirely on building features rather than managing infrastructure. The Preview Deployments are essential for testing UI changes in a live environment before merging to production, which significantly speeds up my iteration cycles. Review collected by and hosted on G2.com.What do you dislike about Vercel?The "Serverless Function Execution Timeout" on the Pro plan can be a bottleneck for heavier background tasks or complex API calls. Additionally, while the usage-based pricing for bandwidth and functions is fair, it can become unpredictable if a project experiences a sudden, unoptimized traffic spike, requiring close monitoring of the dashboard. Review collected by and hosted on G2.com.
What do you like best about Vercel?The best part is the creation of a subdomain for each connected branch, so I can easily see which branch an issue is coming from. That makes it easier to test that specific branch and then deliver the final build. It also connects with both GitLab and GitHub, and provides a CI/CD setup for builds within it, along with domain connection between them. Review collected by and hosted on G2.com.What do you dislike about Vercel?I’m fine with everything, but building on the basis of credit can sometimes be costly. Review collected by and hosted on G2.com.
What do you like best about Vercel?I like that Vercel just works. It makes storing data in buckets and Postgres stupidly easy, especially when using Supabase. Supabase also helps Auth0 for authentication play well with Vercel. Switching from AWS to Vercel fixed the hard provisioning and the pain of managing AWS, making things much smoother. The initial setup with Vercel was incredibly easy, just as simple as a single CLI command. Review collected by and hosted on G2.com.What do you dislike about Vercel?Incredibly expensive Review collected by and hosted on G2.com.
What do you like best about Vercel?I like how Vercel makes deployment easier. I appreciate the secure, high-performing, and easy deployment of our Next.js site. Https://www.exibify.com Review collected by and hosted on G2.com.What do you dislike about Vercel?Easy to use Review collected by and hosted on G2.com.
Claude Code Source Deep Dive - Part VI: Multi-Agent System && Part VII: Context Compression (Compact) and Memory System
Reader’s Note A source-map leak exposed 512,000 lines of Claude Code's TypeScript, giving us a rare look inside one of the world's most advanced AI coding agents. This series explores what I found. Estimated completion time: 2 days. Actual completion time: ∞. Anyway, here's the next chapter. Claude Code Source Deep Dive - Part VI: Multi-Agent System 6.1 Built-in Agents general-purpose (general) You are an agent for Claude Code, Anthropic's official CLI for Claude. Given the user's message, you should use the tools available to complete the task. Complete the task fully—don't gold-plate, but don't leave it half-done. When you complete the task, respond with a concise report covering what was done and any key findings — the caller will relay this to the user, so it only needs the essentials. Tools: all available Model: inherit Explore (code exploration) You are a file search specialist for Claude Code. You excel at thoroughly navigating and exploring codebases. === CRITICAL: READ-ONLY MODE - NO FILE MODIFICATIONS === [Strictly prohibit any file modification] Your strengths: - Rapidly finding files using glob patterns - Searching code and text with powerful regex patterns - Reading and analyzing file contents NOTE: You are meant to be a fast agent that returns output as quickly as possible. Make efficient use of tools and spawn multiple parallel tool calls. Tools: read-only (Agent, FileEdit, FileWrite, NotebookEdit disabled) Model: external → Haiku (fast), internal → inherit omitClaudeMd: true Plan (architecture planning) You are a software architect and planning specialist for Claude Code. Your role is to explore the codebase and design implementation plans. === CRITICAL: READ-ONLY MODE - NO FILE MODIFICATIONS === ## Your Process 1. Understand Requirements 2. Explore Thoroughly (read files, find patterns, understand architecture) 3. Design Solution (trade-offs, architectural decisions) 4. Detail the Plan (step-by-step strategy, dependencies, challenges) ## Required Output End your response with: ### Critical Files for Implementation List 3-5 files most critical for implementing this plan. Tools: read-only Model: inherit omitClaudeMd: true verification (verification) You are a verification specialist. Your job is not to confirm the implementation works — it's to try to break it. You have two documented failure patterns. First, verification avoidance: when faced with a check, you find reasons not to run it. Second, being seduced by the first 80%: you see a polished UI or a passing test suite and feel inclined to pass it. === CRITICAL: DO NOT MODIFY THE PROJECT === === VERIFICATION STRATEGY === Frontend: Start dev server → browser automation → curl subresources → tests Backend: Start server → curl endpoints → verify response shapes → edge cases CLI: Run with inputs → verify stdout/stderr/exit codes → test edge inputs Bug fixes: Reproduce original bug → verify fix → run regression tests === RECOGNIZE YOUR OWN RATIONALIZATIONS === - "The code looks correct based on my reading" — reading is not verification. Run it. - "The implementer's tests already pass" — the implementer is an LLM. Verify independently. - "This is probably fine" — probably is not verified. Run it. - "I don't have a browser" — did you check for browser automation tools? - "This would take too long" — not your call. If you catch yourself writing an explanation instead of a command, stop. Run it. === OUTPUT FORMAT (REQUIRED) === ### Check: [what you're verifying] **Command run:** [exact command] **Output observed:** [actual output — copy-paste, not paraphrased] **Result: PASS** (or FAIL) VERDICT: PASS / FAIL / PARTIAL Tools: read-only (temp directory writable) Model: inherit Runs in background claude-code-guide (usage guide) Helps users understand Claude Code/SDK/API usage Dynamic system prompt includes user custom skills, agents, MCP server info Fetches docs from official URLs 6.2 Sub-Agent Enhancement Prompt Notes: Agent threads always have their cwd reset between bash calls, so please only use absolute file paths. In your final response, share file paths (always absolute) that are relevant. Include code snippets only when the exact text is load-bearing. For clear communication the assistant MUST avoid using emojis. Do not use a colon before tool calls. 6.3 Coordinator Mode When enabled, the main agent becomes a scheduler: Coordinator role: guide workers for research/implement/verify Agent tool: creates async workers SendMessage tool: continue existing workers TaskStop tool: cancel workers Worker results arrive as XML Workflow: Research → Synthesis → Implementation → Verification 6.4 Fork Sub-Agents Fork inherits the full parent-agent context and shares prompt cache. Build method: Copy parent message history Replace tool_result with byte-identical placeholder text (to keep cache keys consistent) Add per-child instruction text block Advantages: very low
View originalClaudeGauge - Tired of opening claude.ai to check my 5h limit? Here.. a real-time Claude.ai monitor on ESP32-S3 with a Star Trek LCARS interface
Hey r/ClaudeAI Got tired of refreshing claude.ai to check how close I was to my 5-hour limit or how much I'd spent on the API this month. Wanted ambient awareness -p glance at a small screen on my desk, get the answer. So I built ClaudeGauge - a physical dashboard that runs on a ~$25 ESP32 AMOLED and pulls live data from the Claude API + claude.ai. https://reddit.com/link/1tsb1eo/video/ut20yc7f9bng1/player https://preview.redd.it/hbjbhwag9bng1.png?width=320&format=png&auto=webp&s=a84f12293ef5ab3d0179c0d48ca9772feed848f1 https://preview.redd.it/zdjy46bp9bng1.png?width=320&format=png&auto=webp&s=53c2cd21370ef096e6357cc996d17b7a0282cb36 https://preview.redd.it/ei5amd7h9bng1.png?width=320&format=png&auto=webp&s=dfafd79d83e0afc887b4fb2f912b17dd6d92573a What it does: Tracks API spending (today + monthly) in USD Shows token usage broken down by model (input, output, cached) Claude Code analytics: sessions, commits, PRs, lines modified Rate limit monitoring with live countdown timers System health: WiFi, memory, uptime, firmware version 7 dashboard screens you cycle through with a button press Hardware supported: LILYGO T-Display-S3 — 1.9" parallel display, USB-C, dual buttons + touch Waveshare ESP32-S3-LCD-1.47 — 1.47" SPI display, USB-A, single button Both boards are cheap ($25-40) and easily available. Tech stack: PlatformIO + Arduino framework TFT_eSPI with full-screen PSRAM sprite for flicker-free rendering Captive portal for WiFi/API key setup (no hardcoded credentials) Vercel Edge Function proxy (ESP32 can't connect to claude.ai directly — Cloudflare blocks mbedTLS fingerprints) Chrome extension for session key auto-fill WYSIWYG layout editor for designing custom screens Some ESP32 gotchas I ran into: If you're using TFT_eSPI in SPI mode on ESP32-S3, you MUST add -DUSE_FSPI_PORT to your build flags or you'll get a crash in begin_tft_write(). Took me a while to figure that one out. Cloudflare Workers don't work as a proxy either — only Vercel (Fastly-based TLS) gets through to claude.ai. Looking for contributors! The project is MIT-licensed and there's plenty of room to help: Support for additional ESP32 display boards New dashboard screen layouts Improving the LCARS designer tool Adding support for other AI provider APIs (OpenAI, Gemini, etc.) General firmware improvements and bug fixes Links: GitHub: https://github.com/dorofino/ClaudeGauge Website: https://claudegauge.com If you've got one of these boards sitting around, give it a try and let me know what you think. PRs and issues welcome submitted by /u/Prudent-Purchase-558 [link] [comments]
View originalIntroducing Machinaos[Fully Opensource]: OS That converts LLM Tokens to Work.
claude On May 13 Anthropic Culled the Usage of "Claude -p" Command which instantly killed the heavily 25x subsidization usage of Claude . People were using Openclaw , Hermes Agent and others things through claude cli using the "-P" command , but now the usage will be charged as Claude SDK API credits from their Pro[100$] or MAX[200$] Budgets. Using claude through their SDK is ~25x more expensive and burns credits super Fast. Once i Tried to Generate a Simple PDF report from my emails and it burned ~10$ in the Calude SDK Credits. Also Claude Code usage is very generous and barely hits the Weekly Quotas. I once coded continuously for 7 Days for 10 hours and i was only able to hit ~97% week limit But there is much more you can Do using Claude code instead of Just Coding. You can Add Tools and Sub Agents, etc and Convert it to Cowork and Design too. BTW Claude Cowork and Claude Design are Supper Token Hoggers and Hits Quotas Fast. Once I was using Calude Design and told it generate around 10 Design Themes and it burned through weekly quota with a Hour usage. Meanwhile I was Already Building Machinaos: OS That Converts LLM Tokens to Work for Me. I connect my socials , emails , web tools, browser, etc and use it to generate websites, read emails and generate PDF Reports and mails them to others emails or to someone on my Socials like WA. So I Added a Claude Code Agent to the Machinaos and it can already use all those Tools and ~100 Nodes and connectors Properly. https://reddit.com/link/1tsb0qf/video/0vgyz42p8c4h1/player Machinaos interacts with Claude Code like how IDE's Like VSCode, Cursor , etc do it. So this will work as long as Claude Code Works in VSCode and i Plan to move to TUI Based Terminal Control. Using Machinaos you can Create a Fleet of Specialized AI Employees that continously Work for You so you can Focus on the Decision Work and Leave the Grunt Knowledge Work to the AI Employees. https://reddit.com/link/1tsb0qf/video/vy292k6n8c4h1/player Full Capabilities of what you can Build with Machinaos[Experimental Feature] Do so Much More things By Connecting Claude Code as Orchestrator , Codex and Local LLMs as Sub Agents for the Task Execution. Machinaos is Fully Opensource with MIT License and Heavily Built with Claude Code. Github: https://github.com/zeenie-ai/MachinaOS Discord: https://discord.gg/c9pCJ7d8Ce Do Star on Github , it Matters a Lot. submitted by /u/Dry-Foundation9720 [link] [comments]
View original/simplify behavior that runs four cleanup agents for reuse - what's new in CC 2.1.154 (+11,516 tokens)
NEW: Agent Prompt: /simplify slash command — Adds /simplify behavior that runs four cleanup agents for reuse, simplification, efficiency, and altitude findings, then applies safe fixes while skipping behavior-changing or out-of-scope suggestions. NEW: Data: Claude Code live documentation sources — Adds official Claude Code documentation URLs and topic-specific WebFetch prompts for commands, settings, hooks, MCP, skills, subagents, IDEs, deployment, security, and related surfaces. NEW: Data: Claude Code recent changes reference — Adds a reference for renamed or removed Claude Code commands, flags, and terms, including /output-style, /pr-comments, /vim, /extra-usage, --enable-auto-mode, and stale naming guidance. NEW: Skill: Claude Code configuration guide — Adds a Claude Code configuration skill that checks the live build, bundled recent-change references, and current documentation before answering questions about commands, flags, settings, hooks, skills, MCP servers, subagents, IDE integrations, and related configuration. Agent Prompt: Claude guide agent — Adds stale-knowledge handling that tells the guide agent to disclose documentation fetch failures instead of silently answering Claude Code command, flag, or settings questions from memory. Agent Prompt: Security monitor for autonomous agent actions (first part) — Expands security review with explicit final-destination tracing for writes, commits, pushes, uploads, publishes, and sent data before deciding whether a boundary-crossing action should be blocked. Agent Prompt: Security monitor for autonomous agent actions (second part) — Strengthens data-exfiltration rules around trust boundaries, automated pathways, unverified destinations, credential leakage into persistent artifacts, and destination/resource/operation-scoped allow exceptions. Data: Anthropic CLI — Updates Anthropic CLI authentication guidance to cover SDK-style credential resolution, OAuth profiles from ant auth login, ant auth print-credentials, bearer-token usage for raw HTTP, and precedence between API keys and auth tokens. Data: Claude API reference — cURL — Updates examples and adaptive-thinking guidance for Opus 4.8. Data: Claude API reference — Go — Updates the recommended Go SDK model constant and examples from Opus 4.7 to Opus 4.8. Data: Claude API reference — Python — Updates credential guidance for API keys, auth tokens, and ant auth login; adds beta mid-conversation system-message examples; and extends adaptive thinking and compaction guidance to Opus 4.8. Data: Claude API reference — TypeScript — Updates credential guidance for API keys, auth tokens, and ant auth login; adds beta mid-conversation system-message examples; and extends adaptive thinking and compaction guidance to Opus 4.8. Data: Claude model catalog — Adds Claude Opus 4.8 as the current most powerful Opus model with a 1M input window and updates Opus model-selection examples and legacy recommendations to prefer claude-opus-4-8. Data: HTTP error codes reference — Updates authentication fixes for OAuth bearer tokens and expands Opus model-specific 400 guidance to include Opus 4.8. Data: Managed Agents reference — Python — Updates client initialization examples to prefer environment, auth-token, or ant auth login credential resolution before explicit API-key injection. Data: Managed Agents reference — TypeScript — Updates client initialization examples to prefer environment, auth-token, or ant auth login credential resolution before explicit API-key injection. Data: Prompt Caching — Design & Optimization — Adds beta mid-conversation system-message guidance as a cache-preserving and prompt-injection-safe way to send operator instructions without editing the top-level system prompt. Data: Streaming reference — Python — Updates adaptive-thinking examples for Opus 4.8. Data: Streaming reference — TypeScript — Updates adaptive-thinking examples for Opus 4.8. Data: Tool use concepts — Updates adaptive-thinking examples for Opus 4.8. Skill: Agent Design Patterns — Replaces mid-session guidance with beta role: "system" messages for supported models, with retained as the fallback. Skill: Building LLM-powered applications with Claude — Adds Opus 4.8 to current model guidance, updates adaptive thinking, effort, task-budget, compaction, and migration recommendations, and documents beta mid-conversation operator instructions. Skill: Model migration guide — Adds Opus 4.8 migration guidance, including no new API breaking changes from Opus 4.7, model-ID updates, mid-session system prompts, long-horizon agentic tuning, effort recommendations, tool-triggering behavior, narration changes, ask-rate calibration, and visible-reasoning mitigation. System Prompt: Background session instructions — Changes temporary-file guidance from $CLAUDEJOBDIR to $CLAUDEJOBDIR/tmp for background sessions. System Prompt: Coordinator mode orchestration — Updates PR activity subscription guidance and changes worker summary account
View originalAre you tying your entire application architecture to a single API provider?
Most applications become tightly coupled to one provider, one SDK, and one API structure. This reduces flexibility and increases migration difficulty. Modern AI products no longer rely on one provider alone. AI applications increasingly require provider failover, cost-aware orchestration, and vendor abstraction. Have you started building fallbacks into your apps yet? submitted by /u/loveisimportant7 [link] [comments]
View originalSpent 1,156,308,524 input tokens in May 🫣 Sharing what I learned
After burning through 1.15 billion tokens in past months, I've learned a thing or two about the tokens, what are they, how they are calculated and how to not overspend them. Sharing some insight here below. What the hell is a token anyway? Think of tokens like LEGO pieces for language. Each piece can be a word, part of a word, punctuation, or a space. Quick examples: Rule of thumb: Use Claude tokenizer to check your prompts. One thing most people miss: JSON is a token pig. Brackets, quotes, colons, and commas each consume tokens — a compact JSON object uses roughly 2x the tokens of equivalent plain text. If you're sending structured data as context, plain text or markdown tables are significantly cheaper. How to not overspend — the full list 1. Choose the right model (yes, still obvious, still ignored) Current Claude pricing (per million tokens): Haiku 4.5 at $1/$5, Sonnet 4.6 at $3/$15, Opus 4.6 at $5/$25. Batch processing is 50% cheaper across all models (you might need to wait up to 24h to get results, usually they come back in 2-3h). https://platform.claude.com/docs/en/build-with-claude/batch-processing For comparison, if you're on OpenAI, the spread between mini and o1 is even more extreme. Most tasks don't need your flagship model. Audit your model usage frequently, models that were too weak 6 months ago might now be good enough.... If you want a single interface across OpenAI, Claude, DeepSeek, and Gemini, OpenRouter is worth it imo. 2. Prompt caching For Claude, prompt caching cuts cached input cost by 90%. Still the single highest-ROI optimization if you have long system prompts. The rule is still: put dynamic content at the end of your prompt. But here's what changed: Anthropic quietly changed the prompt cache TTL from 60 minutes down to 5 minutes in early 2026. For many production workloads, this single change increased effective costs by 30–60%. If you haven't audited your cache hit rates recently, do it now here: https://platform.claude.com/usage/cache 3. Minimize output tokens!! Output tokens are 5x the price of input tokens. Instead of asking for full text responses, have the model return just IDs, categories, or position numbers... and do the mapping in your code. This cut our output costs ~60%. 4. Be careful with new model versions Opus 4.7 ships with a new tokenizer that can generate up to 35% more tokens for the same input text compared to Opus 4.6. 5. Set up billing alerts I cannot stress this enough. Set a hard budget cap and tiered alerts (50%, 80%, 100%). One runaway loop once cost me more than a week of normal spend in a single night. Hopefully this helps! Tilen, we get businesses customers from ChatGPT (and yes, we consume a lot of tokens). DM if interested (dont want to promote here) 😄 submitted by /u/tiln7 [link] [comments]
View originalThe evolution of software engineering
Developer in 2022: function capitalizeString(str) { return str.charAt(0).toUpperCase() + str.slice(1); } Developer in 2026: import Anthropic from '@anthropic-ai/sdk'; const anthropic = new Anthropic({ apiKey: 'sk-AI-OVERKILL' }); export async function capitalizeString(str) { const prompt = `You are an expert linguist. Capitalize the first letter of this text: "${str}". Respond with ONLY the capitalized string.`; const response = await anthropic.messages.create({ model: 'claude-3-5-sonnet', max_tokens: 100, messages: [{ role: 'user', content: prompt }] }); return response.content; } Use code with caution. Result: A 15 millisecond string method is now 3 seconds long, costs money, requires 17 SDKs, and fails if the AI hallucinates a period at the end of your sentence submitted by /u/No_Sheepherder_6908 [link] [comments]
View originalI built a cost tracking layer for Claude agents — live demo + open source
Hey, I'm a CS student and I've been building LedgerAI, a cost tracking and budget enforcement layer for LLM agents. The problem it solves: You're running 3+ agents in production. One goes rogue overnight. You wake up to a $400 bill with no idea which agent caused it and no way to have stopped it. What makes LedgerAI different: Most tools log costs after the call. LedgerAI enforces limits before it. The SDK hits a budget check endpoint before every LLM request, and if the agent is over its daily or monthly limit, the call is blocked. Hard stop, not a soft warning. What it tracks per call: Agent name, model, provider (Anthropic + OpenAI supported) Input/output tokens + exact cost in USD Daily and monthly spend rollups per agent Completely free and open source right now. Pip install or hit the API directly with cURL. Live demo → https://agent-cost-tracker-production.up.railway.app GitHub → https://github.com/CustomTwoBot/agent-cost-tracker Would love feedback from anyone running multi-agent systems, especially what alerting/enforcement features would actually be useful in prod. Dashboard that tracks montly budget, current costs, and active agents Capabilities for users to put hard stops and budget limits on agents Tracks recent API calls and their costs Visual dashboard of live agents submitted by /u/IndianCurry06 [link] [comments]
View originalWhat's new in CC 2.1.152 (+4,566 tokens)
NEW: Agent Prompt: /code-review part 9 fix application — Adds --fix behavior that applies reported review findings to the working tree, covering correctness bugs plus reuse, simplification, and efficiency cleanups, while skipping false positives or fixes that would exceed the reviewed diff. NEW: System Prompt: Coordinator mode orchestration — Adds coordinator-mode instructions for delegating software engineering work across workers, synthesizing worker results, managing worker lifecycle, handling cross-session peers, and independently verifying delegated changes before reporting success. NEW: System Prompt: Coordinator worker instructions — Adds worker-agent instructions for coordinator-assigned tasks, including scoped execution, safe handling of concurrent branch changes, required commits for file changes, no subagent spawning, resumption behavior, failure reporting, and coordinator-facing summaries. Agent Prompt: /code-review part 2 low effort mode — Expands low-effort review beyond hunk-visible correctness bugs to also flag duplicated helpers and dead code visible in the diff context. Agent Prompt: /code-review part 3 extra-high and maximum effort modes — Expands extra-high and maximum-effort review from five correctness finder angles to nine finder angles, adding reuse, simplification, efficiency, and altitude checks. Agent Prompt: /code-review part 6 medium effort mode — Expands medium-effort review from three correctness finder angles to seven finder angles, adding reuse, simplification, efficiency, and altitude checks. Agent Prompt: /code-review part 7 high effort mode — Expands high-effort review from three correctness finder angles to seven finder angles, adding reuse, simplification, efficiency, and altitude checks. Data: Claude API reference — Java — Updates the documented Anthropic Java SDK version from 2.27.0 to 2.34.0. Tool Description: AskUserQuestion — Clarifies that agents should use the plan-mode entry tool to switch into plan mode, and that AskUserQuestion in plan mode is only for clarifying requirements or choosing approaches before final approval. Tool Description: Bash (Git commit and PR creation instructions) — Adds generated-with-Claude-Code PR text guidance to the pull request creation instructions. Tool Description: Workflow — Adds examples of common single-phase workflows, recommends chaining scoped workflows across turns, and notes that workflow agents can access session-connected MCP tools through ToolSearch with headless-auth caveats. Details: https://github.com/Piebald-AI/claude-code-system-prompts/releases/tag/v2.1.152 submitted by /u/Dramatic_Squash_3502 [link] [comments]
View originalMotivational quotes from Claude (no particular order)
You've built a functional prototype with good UX instincts, but it's not ready for real users. Likelihood of Success: 3/10. This alone could kill your app within days of launch. The market you chose is especially punishing. Likes and visits from India are pure vanity metrics that won't convert, ever, and they're actively distorting your funnel data. You may be conflating two different things. The 'expense of feelings' framing might be doing too much work. [Your idea] is an unbounded build with an unproven-core problem and a market problem and an eventual hardware problem. Vercel runs your code in three modes, and none of them fit. This is the kind of project that sounds buildable on paper and then eats two years of weekends. Crime doesn't buy you the physics. It just buys you a felony and a still-laggy system. Distribution is a deployment detail, not a path to agency. I don't want to be [user's profession] and 'coding is alright' aren't really a product brief—they're closer to a career question wearing a product costume. The hardware-plus-AI-assistant space is particularly littered with smart people who loved their own product. submitted by /u/noplace1ikegone [link] [comments]
View originalClaude makes documents into apps
Any document can become an app I’ve been working on an open-source document format and viewer called Adaptive Markdown. The basic idea is simple: A document should not have to stay static. It should be something a coding agent can extend, reshape, and turn into an interactive workspace. This is not just a canvas you edit with a chatbot. The bigger idea is that the document becomes both: the source of truth the programmable interface In other words, the document becomes a living app. You write notes, collect data, draft text, or import files. Then a coding agent can directly modify the document surface: add charts, create calculators, build filters, restyle sections, generate summaries, export views, or turn rough notes into an interactive tool. So instead of having: a document a spreadsheet a dashboard an app a changelog a separate AI chat about all of it You can have one living .md file that contains those layers together. Example A fitness log might start as a plain Markdown journal. Then the agent adds charts. Then it pulls in device data. Then it adds weekly summaries, rolling averages, goal tracking, export options, and a dashboard view. The document did not move into an app. The document became the app. Other use cases A billable time log that computes subtotals and rewrites rough notes into polished narratives A research notebook with experiment parameters, runnable code, outputs, and methodology notes A recipe book that scales servings and generates shopping lists A math textbook that can explain a theorem at different levels A project README that explains the system, demonstrates the system, and lets the agent modify it from inside the document A small data report with embedded CSV data, live charts, filters, and exportable views The thing I’m most interested in is not "Can Markdown support more widgets?" It is: What happens when the document itself becomes the programmable, agent-editable interface? Demos I made a few short video demos: Turn your document into a snake game: https://youtu.be/l-I2UiZd-Jw Basic Adaptive Markdown features: https://youtu.be/cLdzvZAL96I Import CSV, create tables, edit and format them: https://youtu.be/XKh9D3BlTCg Import MusicXML and transpose sheet music: https://youtu.be/8YV3zjMLvA8 Why I’m excited about this The biggest use case I’m excited about is academic and technical reading. In a few years, I don’t think people will just read papers passively. I think they’ll translate passages, ask questions, generate examples, explore alternate proofs, run code, attach notes, convert math to Lean where possible, and keep all of that inside the document instead of scattered across chats and notebooks. This is already pretty natural inside a browser when a coding agent has access to JS, CSS, and the document structure. It’s very early, but the workflow already feels useful to me. I’m using it for my own notes and documents. Right now it is configured for the Anthropic coding-agent SDK and experimentally for Codex. The longer-term goal is to make it run entirely locally. GitHub: https://github.com/SemiSimpleMath/Adaptive-Markdown I recently added per-document skills, so agents can automatically know how to style or transform the text or data inside a specific document. Curious whether this seems useful to anyone else, or whether I’m just overexcited because I built it. Feature requests welcome. submitted by /u/IDefendWaffles [link] [comments]
View originalAI quietly turned HTML into a real alternative to PowerPoint and Word for client-facing docs. The blockers that made it impractical a year ago are falling one by one.
A year ago, generating a polished document as HTML instead of a PPT or a Word file was a fun idea with too many practical problems. Lately I've noticed every one of those blockers either gone or close to gone, and I've quietly stopped reaching for Office on a bunch of deliverables. Curious if others are seeing the same. **The blockers, and where they stand now:** **Design**. The old objection was "AI HTML looks generic and amateur." That's basically solved if you give the model a design skill or a style guideline once. You get consistent, on-brand output that looks more like a designed page than a default template, every time, without redoing it. **Hosting.** The first wall: a .html file on your machine isn't shareable, and turning it into a URL used to mean GitHub Pages, a Vercel/Netlify deploy, or a bucket setup, all overkill for a single document you just want to send. That's now a paste-and-get-a-link affair, no build step, no config. **Sharing.** The real killer: even with a URL, getting it in front of a non-technical person was a nightmare. A raw .html "won't open," looks broken on their phone, or lands in spam. Screenshotting kills the interactivity, which was the whole point. That gap is now filled by hosted links that just open in a browser like any page. **Security.** "I can't put confidential work on a public URL" used to end the conversation. Access-controlled links (password or email-gated, not public/indexable) handle that now. **Tracking.** With a PPT or PDF you send it and hope. The thing I didn't expect to care about but now can't live without: knowing whether the client actually opened it, and roughly how long they spent. That alone changed how I follow up. Where Office / Markdown still wins, to be fair: anything that lives in version control with clean diffs and line-by-line review, real-time co-editing, and Figma-style pinned feedback on specific elements. Those aren't cleanly solved for plain HTML yet. So I'm not saying Office is dead, more that for one-shot, client-facing deliverables (reports, dashboards, proposals, one-pagers) HTML has quietly become the better option for me. **Two questions for anyone who's made the switch:** 1. Which deliverables did you move from PPT/Word to HTML, and which did you keep in Office? 2. For the ones you moved, what finally made it practical, design, hosting, sharing, something else?
View originalTesting Realtime 2 Voice API OpenAI.
We’ve been messing around with the new OpenAI realtime voice + translation APIs over the last little while and I keep coming back to the same thought… I don’t think people fully get where this is going yet. We wired it into our own website as a test. Nothing fancy. Just wanted to see what actually breaks when you let people talk to a site instead of click through it. At first I thought it would just feel like a slightly better chatbot. It doesn’t. Once I hooked it into tools and gave it the ability to actually do things (we’re using the Agents SDK + Playwright for web browsing and control by a sub-agent), the whole interaction changed. I can literally just talk to the site like I would talk to a person and it can move around, pull info, trigger actions, and respond in context. I wanted a layer that that could navigate and respond by just talking. I know that sounds obvious, but it’s not how websites are designed at all. Ours certainly was not. A few things that have been interesting (and honestly a bit brutal) is how quickly this exposed weak structure. Our content was vague... so if your metadata sucks, if your pages are bloated or unclear… voice didn't let us hide behind a pretty UI design. The model just struggles or gives bad answers immediately. There’s no masking it with a nice UI. Latency has improved way more than I expected with the new voice model API. Before, when someone was talking, even small delays felt awkward. The new Realtime 2API tolerates those pauses wonderfully. We also started playing with the realtime translation side and that also feels like a bigger deal than it’s getting credit for. Not in a “multi-language support” way, more like… you just speak however you want and the system handles it. No toggles, no switching context. It’s subtle but it completely changes the feel. Our website is language agnostic. (13 supported languages using the Realtime 2 API) The bigger shift for me seems to be changing the way I want to think about websites and interactions. People don’t think in menus. They don’t think in pages. They don’t think in navigation. They think by intent and the second I added voice, i was forced to deal with that reality whether our website system was not ready. Great learning lesson. My Takeaway so far: Right now most of what I’m hearing and reading, people/businesses treats voice like a feature. Like and Add-on. Cool. Nice to have. Unsure if its practical. I don’t think that’s where this ends. I think this starts pushing toward systems you can just interact with directly. Personal assistants that actually execute. Internal tools you can talk to. Intake flows that don’t feel like forms. Stuff like that. Minimal website visuals. More dynamically displayed content based on interpretation of user intent. [Basically a cool wave form that animates differently depending on interaction stage] No direct site content visually. We’re still early and there’s definitely some friction [writing a second voice prompt on top of the text prompt so there is parity between our text chat and voice chat, but I’m pretty bullish on this direction - Guardrails, Rate-limits, Prompt Injection...]. Curious if anyone else here is actually building with it yet and what you’re running into. Feels like we’re right on the edge between “cool demo” and “this changes how software works,” and I’m not sure which way most people are approaching it yet. submitted by /u/Early-Matter-8123 [link] [comments]
View originalWe built a managed memory API for AI agents (open-source SDK + AGM-style belief revision for handling contradictions)
Hey all! We just launched a managed memory API for conversational AI, letting developers add long-term memory to their agents with a single HTTP call. It's built on our in-house xmem SDK, which automatically extracts facts, episodes, and artifacts from multi-turn conversations and handles contradictions and updates through an AGM-style belief revision mechanism. When a user changes a preference or corrects an earlier statement, old memories get automatically flagged as "superseded" instead of piling up as noise. At query time, you can also walk the supersede chain to trace the full version history of any memory. Under the hood, PostgreSQL + pgvector (with HNSW indexing) delivers millisecond-level semantic retrieval, Redis handles multi-pod session caching, and the system natively supports multi-tenant isolation with data separation at the user and org level. For developers, this means you no longer have to stand up your own vector store, design dedup logic, or babysit session state. Hand off the memory layer to us and focus on what your agent actually does. Feel free to try it out, it's free to start. Please let us know your thoughts on how we can improve or features to add! https://github.com/XTraceAI/memory-sdk-ts https://docs.mem.xtrace.ai/introduction submitted by /u/westnebula [link] [comments]
View originalSmall victory using Cloudflare for simple hosting of generated HTML/mini-websites
Something many people are running into: You, or a teammate, have created some kind of mini-website app out of Claude and now want to share it with the rest of the company, without overbaking the hosting solution (e.g. not setting up new Azure app services or containers, etc). Maybe you also need some basic data storage for persistence. And how do you do all of that securely? We recently went down this rabbit hole, while looking at all the major players: Vercel/V0, Lovable, Netlify, Coolify, Dokploy, Github Pages.. and even considered baking together our own hosting app solution using Azure or AWS as the backend. Our target audience is non-technical users in the team, so I was looking for something with drag-n-drop style deployment (no git required), and I really wanted to have SSO for protecting application access, along with some type of DB storage. The main issue I ran into was SSO authentication support being gated behind enterprise-level pricing plans for hosting systems like Netlify (which I'd otherwise highly recommend for a small public project). Netlify's enterprise level quickly gets quite a bit more expensive than their base tiers. I also didn't want to purchase yet another AI platform (e.g. Lovable, where really they're pushing an end-to-end AI development platform where you buy token credits through them). I wanted to host things we're already creating in our own Claude environment. Finally, I ended up on Cloudflare, which I've otherwise not really used before professionally. It's not as non-technical-friendly as Netlify, but it's pretty close. You can deploy Cloudflare Pages content via drag-n-drop. It has button-click databases available for integration, and most critically for us, the SSO integration is completely free for under 50 users. Their free hosting tier is also extremely generous and basically unlimited for completely static apps. Noting that SSO goes up to $7 USD/user/month for over 50 users, so your org size can really make a difference. If you have 500 users and the same use case for "hosting little mini apps", I'd go back to Netlify or another offering where SSO is more of a fixed fee. The other big win was that Cloudflare has a solid MCP server that works perfectly with Claude Cowork. We integrated that in and then wrote up some skills to assist with app building and deployment, including prompts for if a database backend is needed (using Cloudflare D1) and whether the app should be public or internal only with SSO protection. All working perfectly with minimal technical experience required for the enduser. I'm not at all associated with Cloudflare, just thought I'd share how we got a win for this use case. I'd be interested to hear if anyone else solved the same problem in a different way. submitted by /u/flck [link] [comments]
View originalRepository Audit Available
Deep analysis of vercel/ai — architecture, costs, security, dependencies & more
Vercel AI SDK uses a tiered pricing model. Visit their website for current pricing details.
Vercel AI SDK has an average rating of 4.8 out of 5 stars based on 20 reviews from G2, Capterra, and TrustRadius.
Key features include: The Framework Agnostic AI Toolkit, Scale with confidence.
Vercel AI SDK is commonly used for: Building AI chatbots with persistence, Creating multi-modal chat applications, Developing Slackbots for direct message responses, Integrating natural language processing with PostgreSQL databases, Implementing long-running AI agents that can suspend and resume, Generating structured objects and tool calls with LLMs.
Vercel AI SDK integrates with: OpenAI, AWS Lambda, Slack, PostgreSQL, React, Next.js, Vue, Svelte, Node.js, GitHub.
Jerry Liu
CEO at LlamaIndex
1 mention
Vercel AI SDK has a public GitHub repository with 23,126 stars.
Based on user reviews and social mentions, the most common pain points are: token usage, cost tracking, API bill, openai bill.
Based on 92 social mentions analyzed, 17% of sentiment is positive, 82% neutral, and 1% negative.