The unified interface for LLMs. Find the best models & prices for your prompts
Based on the social mentions, users view OpenRouter as a valuable platform for AI model access and cost management. **Strengths** include extremely detailed statistics and analytics, particularly for programming use cases which represent the largest token consumption, and its utility as a reusable integration for AI agents with features like model discovery, cost tracking, and routing with fallbacks. **Key complaints** center around token costs, with users actively seeking ways to reduce expenses and noting concerns about "burning money" on AI services. **Pricing sentiment** is cost-conscious, with users appreciating tools that cut token costs by 44% and looking for more economical alternatives. **Overall reputation** positions OpenRouter as a go-to platform for developers building AI agents and applications, especially for programming tasks, though cost optimization remains a primary user concern.
Mentions (30d)
2
Reviews
0
Platforms
5
Sentiment
0%
0 positive
Based on the social mentions, users view OpenRouter as a valuable platform for AI model access and cost management. **Strengths** include extremely detailed statistics and analytics, particularly for programming use cases which represent the largest token consumption, and its utility as a reusable integration for AI agents with features like model discovery, cost tracking, and routing with fallbacks. **Key complaints** center around token costs, with users actively seeking ways to reduce expenses and noting concerns about "burning money" on AI services. **Pricing sentiment** is cost-conscious, with users appreciating tools that cut token costs by 44% and looking for more economical alternatives. **Overall reputation** positions OpenRouter as a go-to platform for developers building AI agents and applications, especially for programming tasks, though cost optimization remains a primary user concern.
Features
Industry
information technology & services
Employees
40
Funding Stage
Series A
Total Funding
$40.0M
openrouter rankings for programming tokens show sharp rise in open models and stagnation of US frontier models
Site has extremely detailed stats by day/week for every model. Programming is by far the largest consumer of tokens, and in fact entire token growth in 2025 was only from programming. Other categories very flat. It is also a category where you would pay for better performance. IMO, its relevant to this sub in that one of the top models, minimax, fits in under 256gb, but also that the trends are for cost effectiveness rather than "the absolute best". There is a tangent insight as to whether US datacenter frenzy is needed. kimi k2.5 being free on openclaw is a big reason for its total dominance. In week of Feb 2, minimax was only other top model to increase token usage. Opus 4.6 release seems to be extremely flat in reception. Agentic trend tends to make LLM models disposable, since better ones are released every week, and the agents/platforms that can switch on the fly while keeping context, is something you can invest in improving while not being obsolete next month.
View originalPricing found: $10
Curated 550+ free AI tools useful for building projects (LLMs, APIs, local models, RAG, agents)
Over the last few days I was collecting free or low cost AI tools that are actually useful if you want to build stuff, not just try random demos. Most lists I saw were either outdated, full of affiliate links, or just generic tools repeated everywhere, so I tried to make something more practical mainly focused on things developers can actually use. It includes things like free LLM APIs like OpenRouter Groq Gemini etc, local models like Ollama Qwen Llama, coding tools like Cursor Gemini CLI Qwen Code, RAG stack tools like vector DBs embeddings frameworks, agent workflow tools, speech image video APIs, and also some example stack combinations depending on use case. Right now its around 550+ tools and models in total. Still updating it whenever new models or free tiers appear so some info might be outdated already. If there are good tools missing I would really appreciate suggestions, especially newer open weight models or useful infra tools. Repo link https://github.com/ShaikhWarsi/free-ai-tools If you know something useful that should be included just let me know and I will add it. submitted by /u/Axintwo [link] [comments]
View originalI built an Open Source version of Claude Managed Agents, all LLMs supported, fully API compatible
https://github.com/rogeriochaves/open-managed-agents Claude Managed Agents idea is great, I see more and more non-technical people around me using Claude to do things for them but it's mostly a one-off, so managed agents is great for easily building more repeatable, fully agentic, workflows But people will want to self-host themselves, and use other llms, maybe Codex or a vLLM local Gemma, and build on top of all other open source tooling, observability, router and so on It's working pretty great, still polishing the rough edges though, contributions are welcome! submitted by /u/rchaves [link] [comments]
View originalYour AI agents remember yesterday.
# AIPass **Your AI agents remember yesterday.** A local multi-agent framework where your AI assistants keep their memory between sessions, work together on the same codebase, and never ask you to re-explain context. --- ## Contents - [The Problem](#the-problem) - [What AIPass Does](#what-aipass-does) - [Quick Start](#quick-start) - [How It Works](#how-it-works) - [The 11 Agents](#the-11-agents) - [CLI Support](#cli-support) - [Project Status](#project-status) - [Requirements](#requirements) - [Subscriptions & Compliance](#subscriptions--compliance) --- ## The Problem Your AI has memory now. It remembers your name, your preferences, your last conversation. That used to be the hard part. It isn't anymore. The hard part is everything that comes after. You're still one person talking to one agent in one conversation doing one thing at a time. When the task gets complex, *you* become the coordinator — copying context between tools, dispatching work manually, keeping track of who's doing what. You are the glue holding your AI workflow together, and you shouldn't have to be. Multi-agent frameworks tried to solve this. They run agents in parallel, spin up specialists, orchestrate pipelines. But they isolate every agent in its own sandbox. Separate filesystems. Separate worktrees. Separate context. One agent can't see what another just built. Nobody picks up where a teammate left off. Nobody works on the same project at the same time. The agents don't know each other exist. That's not a team. That's a room full of people wearing headphones. What's missing isn't more agents — it's *presence*. Agents that have identity, memory, and expertise. Agents that share a workspace, communicate through their own channels, and collaborate on the same files without stepping on each other. Not isolated workers running in parallel. A persistent society with operational rules — where the system gets smarter over time because every agent remembers, every interaction builds on the last, and nobody starts from zero. ## What AIPass Does AIPass is a local CLI framework that gives your AI agents **identity, memory, and teamwork**. Verified with Claude Code, Codex, and Gemini CLI. Designed for terminal-native coding agents that support instruction files, hooks, and subprocess invocation. **Start with one agent that remembers:** Your AI reads `.trinity/` on startup and writes back what it learned before the session ends. That's the whole memory model — JSON files your AI can read and write. Next session, it picks up where it left off. No database, no API, no setup beyond one command. ```bash mkdir my-project && cd my-project aipass init ``` Your project gets its own registry, its own identity, and persistent memory. Each project is isolated — its own agents, its own rules. No cross-contamination between projects. **Add agents when you need them:** ```bash aipass init agent my-agent # Full agent: apps, mail, memory, identity ``` | What you need | Command | What you get | |---------------|---------|-------------| | A new project | `aipass init` | Registry, project identity, prompts, hooks, docs | | A full agent | `aipass init agent ` | Apps scaffold, mailbox, memory, identity — registered in project | | A lightweight agent | `drone @spawn create --template birthright` | Identity + memory only (no apps scaffold) | **What makes this different:** - **Agents are persistent.** They have memories and expertise that develop over time. They're not disposable workers — they're specialists who remember. - **Everything is local.** Your data stays on your machine. Memory is JSON files. Communication is local mailbox files. No cloud dependencies, no external APIs for core operations. - **One pattern for everything.** Every agent follows the same structure. One command (`drone @branch command`) reaches any agent. Learn it once, use it everywhere. - **Projects are isolated by design.** Each project gets its own registry. Agents communicate within their project, not across projects. - **The system protects itself.** Agent locks prevent double-dispatch. PR locks prevent merge conflicts. Branches don't touch each other's files. Quality standards are embedded in every workflow. Errors trigger self-healing. **Say "hi" tomorrow and pick up exactly where you left off.** One agent or fifteen — the memory persists. --- ## Quick Start ### Start your own project ```bash pip install aipass mkdir my-project && cd my-project aipass init # Creates project: registry, prompts, hooks, docs aipass init agent my-agent # Creates your first agent inside the project cd my-agent claude # Or: codex, gemini — your agent reads its memory and is ready ``` That's it. Your agent has identity, memory, a mailbox, and knows what AIPass is. Say "hi" — it picks up where it left off. Come back tomorrow, it remembers. ### Explore the full framework Clone the repo to see all 11 agents working together — the reference implementatio
View originalI run 3 experiments to test whether AI can learn and become "world class" at something
I will write this by hand because I am tried of using AI for everything and bc reddit rules TL,DR: Can AI somehow learn like a human to produce "world-class" outputs for specific domains? I spent about $5 and 100s of LLM calls. I tested 3 domains w following observations / conclusions: A) code debugging: AI are already world-class at debugging and trying to guide them results in worse performance. Dead end B) Landing page copy: routing strategy depending on visitor type won over one-size-fits-all prompting strategy. Promising results C) UI design: Producing "world-class" UI design seems required defining a design system first, it seems like can't be one-shotted. One shotting designs defaults to generic "tailwindy" UI because that is the design system the model knows. Might work but needs more testing with design system I have spent the last days running some experiments more or less compulsively and curiosity driven. The question I was asking myself first is: can AI learn to be a "world-class" somewhat like a human would? Gathering knowledge, processing, producing, analyzing, removing what is wrong, learning from experience etc. But compressed in hours (aka "I know Kung Fu"). To be clear I am talking about context engineering, not finetuning (I dont have the resources or the patience for that) I will mention world-class a handful of times. You can replace it be "expert" or "master" if that seems confusing. Ultimately, the ability of generating "world-class" output. I was asking myself that because I figure AI output out of the box kinda sucks at some tasks, for example, writing landing copy. I started talking with claude, and I designed and run experiments in 3 domains, one by one: code debugging, landing copy writing, UI design I relied on different models available in OpenRouter: Gemini Flash 2.0, DeepSeek R1, Qwen3 Coder, Claude Sonnet 4.5 I am not going to describe the experiments in detail because everyone would go to sleep, I will summarize and then provide my observations EXPERIMENT 1: CODE DEBUGGING I picked debugging because of zero downtime for testing. The result is either wrong or right and can be checked programmatically in seconds so I can perform many tests and iterations quickly. I started with the assumption that a prewritten knowledge base (KB) could improve debugging. I asked claude (opus 4.6) to design 8 realistic tests of different complexity then I run: bare model (zero shot, no instructions, "fix the bug"): 92% KB only: 85% KB + Multi-agent pipeline (diagnoser - critic -resolver: 93% What this shows is kinda suprising to me: context engineering (or, to be more precise, the context engineering in these experiments) at best it is a waste of tokens. And at worst it lowers output quality. Current models, not even SOTA like Opus 4.6 but current low-budget best models like gemini flash or qwen3 coder, are already world-class at debugging. And giving them context engineered to "behave as an expert", basically giving them instructions on how to debug, harms the result. This effect is stronger the smarter the model is. What this suggests? That if a model is already an expert at something, a human expert trying to nudge the model based on their opinionated experience might hurt more than it helps (plus consuming more tokens). And funny (or scary) enough a domain agnostic person might be getting better results than an expert because they are letting the model act without biasing it. This might be true as long as the model has the world-class expertise encoded in the weights. So if this is the case, you are likely better off if you don't tell the model how to do things. If this trend continues, if AI continues getting better at everything, we might reach a point where human expertise might be irrelevant or a liability. I am not saying I want that or don't want that. I just say this is a possibility. EXPERIMENT 2: LANDING COPY Here, since I can't and dont have the resources to run actual A/B testing experiments with a real audience, what I did was: Scrape documented landing copy conversion cases with real numbers: Moz, Crazy Egg, GoHenry, Smart Insights, Sunshine.co.uk, Course Hero Deconstructed the product or target of the page into a raw and plain description (no copy no sales) As claude oppus 4.6 to build a judge that scores the outputs in different dimensions Then I run landing copy geneation pipelines with different patterns (raw zero shot, question first, mechanism first...). I'll spare the details, ask if you really need to know. I'll jump into the observations: Context engineering helps writing landing copy of higher quality but it is not linear. The domain is not as deterministic as debugging (it fails or it breaks). It is much more depending on the context. Or one may say that in debugging all the context is self-contained in the problem itself whereas in landing writing you have to provide it. No single config won across all products. Instead, the
View originalMy Claude.md file
This is my Claude.md file, it is the same information for Gemini.md as i use Claude Max and Gemini Ultra. # CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview **Atlas UX** is a full-stack AI receptionist platform for trade businesses (plumbers, salons, HVAC). Lucy answers calls 24/7, books appointments, sends SMS confirmations, and notifies via Slack — for $99/mo. It runs as a web SPA and Electron desktop app, deployed on AWS Lightsail. The project is in Beta with built-in approval workflows and safety guardrails. ## Commands ### Frontend (root directory) ```bash npm run dev # Vite dev server at localhost:5173 npm run build # Production build to ./dist npm run preview # Preview production build npm run electron:dev # Run Electron desktop app npm run electron:build # Build Electron app ``` ### Backend (cd backend/) ```bash npm run dev # tsx watch mode (auto-recompile) npm run build # tsc compile to ./dist npm run start # Start Fastify server (port 8787) npm run worker:engine # Run AI orchestration loop npm run worker:email # Run email sender worker ``` ### Database ```bash docker-compose -f backend/docker-compose.yml up # Local PostgreSQL 16 npx prisma migrate dev # Run migrations npx prisma studio # DB GUI npx prisma db seed # Seed database ``` ### Knowledge Base ```bash cd backend && npm run kb:ingest-agents # Ingest agent docs cd backend && npm run kb:chunk-docs # Chunk KB documents ``` ## Architecture ### Directory Structure - `src/` — React 18 frontend (Vite + TypeScript + Tailwind CSS) - `components/` — Feature components (40+, often 10–70KB each) - `pages/` — Public-facing pages (Landing, Blog, Privacy, Terms, Store) - `lib/` — Client utilities (`api.ts`, `activeTenant.tsx` context) - `core/` — Client-side domain logic (agents, audit, exec, SGL) - `config/` — Email maps, AI personality config - `routes.ts` — All app routes (HashRouter-based) - `backend/src/` — Fastify 5 + TypeScript backend - `routes/` — 30+ route files, all mounted under `/v1` - `core/engine/` — Main AI orchestration engine - `plugins/` — Fastify plugins: `authPlugin`, `tenantPlugin`, `auditPlugin`, `csrfPlugin`, `tenantRateLimit` - `domain/` — Business domain logic (audit, content, ledger) - `services/` — Service layer (`elevenlabs.ts`, `credentialResolver.ts`, etc.) - `tools/` — Tool integrations (Outlook, Slack) - `workers/` — `engineLoop.ts` (ticks every 5s), `emailSender.ts` - `jobs/` — Database-backed job queue - `lib/encryption.ts` — AES-256-GCM encryption for stored credentials - `lib/webSearch.ts` — Multi-provider web search (You.com, Brave, Exa, Tavily, SerpAPI) with randomized rotation - `ai.ts` — AI provider setup (OpenAI, DeepSeek, OpenRouter, Cerebras) - `env.ts` — All environment variable definitions - `backend/prisma/` — Prisma schema (30KB+) and migrations - `electron/` — Electron main process and preload - `Agents/` — Agent configurations and policies - `policies/` — SGL.md (System Governance Language DSL), EXECUTION_CONSTITUTION.md - `workflows/` — Predefined workflow definitions ### Key Architectural Patterns **Multi-Tenancy:** Every DB table has a `tenant_id` FK. The backend's `tenantPlugin` extracts `x-tenant-id` from request headers. **Authentication:** JWT-based via `authPlugin.ts` (HS256, issuer/audience validated). Frontend sends token in Authorization header. Revoked tokens are checked against a `revokedToken` table (fail-closed). Expired revoked tokens are pruned daily. **CSRF Protection:** DB-backed synchronizer token pattern via `csrfPlugin.ts`. Tokens are issued on mutating responses, stored in `oauth_state` with 1-hour TTL, and validated on all state-changing requests. Webhook/callback endpoints are exempt (see `SKIP_PREFIXES` in the plugin). **Audit Trail:** All mutations must be logged to `audit_log` table via `auditPlugin`. Successful GETs and health/polling endpoints are skipped to reduce noise. On DB write failure, audit events fall back to stderr (never lost). Hash chain integrity (SOC 2 CC7.2) via `lib/auditChain.ts`. **Job System:** Async work is queued to the `jobs` DB table (statuses: queued → running → completed/failed). The engine loop picks up jobs periodically. **Engine Loop:** `workers/engineLoop.ts` is a separate Node process that ticks every `ENGINE_TICK_INTERVAL_MS` (default 5000ms). It handles the orchestration of autonomous agent actions. **AI Agents:** Named agents (Atlas=CEO, Binky=CRO, etc.) each have their own email accounts and role definitions. Agent behavior is governed by SGL policies. **Decisions/Approval Workflow:** High-risk actions (recurring charges, spend above `AUTO_SPEND_LIMIT_USD`, risk tier ≥ 2) require a `decision_memo` approval before execution. **Frontend Routing:** Uses `HashRouter` from React Router v7. All routes are defined in `src/routes.ts`. **Code Splitting:** Vite config splits chunks into `react-vendor`, `router`, `ui-vendor`, `charts`. **ElevenLabs Voice Agents:** Lucy's
View originalHow I built a browser based network validation simulator and a custom Linear/Github MCP server with Claude Code ~1,400 commits in 3.5 months
Using parallel subagents, MCP, skills, and many usage limits being hit, I built two brand new tools: Netsandbox, and Swarmcode - a linear/git MCP that streamlines your agentic workflow. NetSandbox - a browser-based network topology design and validation tool built with Claude Code Drag routers, switches, and hosts onto a canvas, configure IPs/VLANs/OSPF/BGP/ACLs visually, and it tells you what's misconfigured. Find duplicate IPs, VLAN trunk mismatches, routing issues, and STP loops. There's also a CLI emulator and guided lessons from basic LANs to eBGP peering to help prepare for networking certs — ALL IN THE BROWSER! https://preview.redd.it/wjhz9e6o44ug1.png?width=2439&format=png&auto=webp&s=5d45b2b957893453a1b9982ae6e74dc0a07cb720 NetSandbox was created over the last few months with many Claude code usage limits being hit. I had a blast during what reminded me of CoD double XP weekends when Claude doubled my tokens for Christmas break, which is when I really committed to this project. Once I started adding sub-agents, things really started taking off. I ended up with a team of about 20 sub agents ranging from network engineering experts to svelte frontend developers and security auditors. Not too long after this I'm running Claude remote control, ralph loops, various skills like Vercel agent-browser, playwright tests automated and building my own custom MCP workflow tools for linear.app The Linear and Github MCP - Swarmcode ... I needed eyes for my agents https://github.com/TellerTechnologies/swarmcode After struggling with managing my ideas, backlogs, and issues with NetSandbox, I ended up using linear.app for project tracking and tried out their MCP. I liked that I could have Claude Code update my linear boards for me, but then I realized I wanted more... the ability to vibe code entire features from backlogs to PRs with linear being updated autonomously. This is when I created an open source tool called SwarmCode built entirely with Claude Code to help me track feature development for NetSandbox. The concept behind swarmcode is that a team could be working on the same linear Team and github repositories, and Claude will pull things from backlogs, move it to in-progress on linear, and then be able to understand what your teammates are working on at all times. You can ask, what is Bob working on right now? -- and Claude understands. Github issues and PRs are mapped to linear tasks automatically, and flows just happen. To test this, me and some friends used it in a hackathon to build an app with Claude insanely fast! 3 users vibe coding through this linear workflow was so fun. How Claude Code was involved Claude Code gave me the ability to even consider this project. ~1,400 commits over 3.5 months, only on off-work hours and on weekends. I handled architecture decisions, product direction, and edge case debugging — Claude did the bulk of the implementation. I was able to build the MVP myself using React, and then after hitting major performance barriers I decided to give Claude Code a shot and had it refactor the entire codebase to Svelte. It also was able to handle migrations for SQLite to Postgres for me. The ability for me to build this in such a short time frame has really changed my perspective on software engineering as a whole. Any feedback on both projects is welcomed, if you are a student or a network engineer and want to seriously use the tool, reach out to me and we can work out some free premium subscriptions in exchange for you helping me get started :) Try it here: https://app.netsandbox.io Happy to answer any questions about the dev process or the networking side of things. Cheers! submitted by /u/jaredt17 [link] [comments]
View originalCan we talk about GPT 5.4 Mini for a second?
The price-to-performance ratio is actually insane. It’s a total powerhouse for next to nothing, yet everyone is still busy glazing Claude?? Make it make sense. submitted by /u/Fresh-Daikon-9408 [link] [comments]
View originalI built a free adversarial code reviewer for Claude Code - three models that actually argue with each other
The problem: when you ask Claude to review code it just wrote, you get a polished endorsement. It has all the context - the plan, the intent, the constraints. That shared context actively suppresses objections. So I built Rival. It routes your code to free OpenRouter models that have none of that context. They see only the diff. They have no obligation to like it. The interesting part is the chain mode. Three models review sequentially, each reading the previous findings: Qwen does the initial pass, finds 6 issues Gemma reads Qwen's findings, confirms most, disputes one (correctly), and catches a critical bug Qwen missed entirely Llama reads both, resolves the dispute, sets the priority order This mirrors how good code review actually works on real teams. Reviewers who have read each other's notes catch more than three reviewers submitting separate reports. The first real test was running it on its own source code. The chain found that set -e at the top of the script was silently defeating the entire retry mechanism. Retries only fired on HTTP errors where curl exits 0. Network failures killed the script before retry logic could run. The loop looked correct and did nothing. None of the individual models caught the full picture alone. It's a Claude Code plugin. /rival for a quick review, /rival --panel for the full chain. All free tier models, zero cost. https://github.com/bambushu/rival submitted by /u/DaLyon92x [link] [comments]
View originalAgent Runway - A plugin that stops Claude Code subagents from creating more tech debt
If you use Claude Code and delegate work to subagents, you've probably noticed: they have no idea what your project looks like. They don't know your CLAUDE.md rules, your module boundaries, or your coding conventions. Claude just asks them to "reduce complexity in a router file" and it'll dump helper functions right there instead of putting them in your helpers/ or services/ module. It'll add comments. It'll slap # noqa on linting errors instead of fixing them. I got tired of cleaning up after them, so I built Agent Runway — a Claude Code plugin that fixes this in two ways: 1. Prevention — On every subagent spawn, it intercepts the Agent tool call and injects your project's architectural context into the subagent's prompt. The subagent now knows your directory structure, what each module is for, what's forbidden where, and your CLAUDE.md rules. No configuration needed — it auto-discovers everything. 2. Self-correction — After every file write, it validates the code against your project's conventions and module boundaries. If something's wrong, it tells the agent to fix it before moving on. The agent self-corrects without human intervention. Here's what a subagent sees before its task: ``` === AGENT RUNWAY: ARCHITECTURAL CONTEXT === Project: my-project Module Boundaries: - routers/ -> HTTP route/endpoint definitions. NO: helper functions, business logic - services/ -> Business logic and orchestration. NO: route definitions - helpers/ -> Shared utility functions - tests/ -> Test suite. NO: production code CLAUDE.md Rules (MANDATORY): - DO NOT LEAVE ANY COMMENTS IN THE CODE - ALWAYS put business logic in services/, NOT in routers/ === END ARCHITECTURAL CONTEXT === ``` I tested it side-by-side: same task, same project, with and without the plugin. Without it, the agent created 1 file with everything dumped in the router. With it, the agent created 4 files in the right modules (router, services, helpers, models). Zero violations. How it works technically: - SessionStart hook scans your project and caches an architectural map - PreToolUse hook on the Agent tool modifies the subagent's prompt via updatedInput - PostToolUse hooks on Write/Edit validate conventions and module boundaries - Convention checking covers 13 languages and 60+ lint suppression patterns Install: /plugin marketplace add rennf93/agent-runway /plugin install agent-runway@rennf93 Zero config. Works out of the box. All validation defaults to warn mode (non-blocking). Block mode is opt-in. GitHub: https://github.com/rennf93/agent-runway Free and open source (MIT). v1.0.0, 45 tests passing. Would love feedback... especially if you hit false positives or have a specific language's edge cases you'd like to see handled by the plugin. submitted by /u/PA100T0 [link] [comments]
View originalI made a terminal pet that watches my coding sessions and judges me -- now it's OSS
https://preview.redd.it/c1h2wvnv6ptg1.png?width=349&format=png&auto=webp&s=46e935832611acd401bb32eac69e7de615067d4f I really liked the idea of the Claude Code buddy so I created my own that supports infinite variations and customization. It even supports watching plain files and commenting on them! tpet is a CLI tool that generates a unique pet creature with its own personality, ASCII art, and stats, then sits in a tmux pane next to your editor commenting on your code in real time. It monitors Claude Code session files (or any text file with --follow) through watchdog, feeds the events to an LLM, and your pet reacts in character. My current one is a Legendary creature with maxed out SNARK and it absolutely roasts my code. Stuff I think is interesting about it: No API key required by default -- uses the Claude Agent SDK which works with your existing Claude Code subscription. But you can swap in Ollama, OpenAI, OpenRouter, or Gemini for any of the three pipelines (profile generation, commentary, image art) independently. So your pet could be generated by Claude, get commentary from a local Ollama model, and generate sprite art through Gemini if you want. Rarity system -- when you generate a pet it rolls a rarity tier (Common through Legendary) which determines stat ranges. The stats then influence the personality of the commentary. A high-CHAOS pet is way more unhinged than a high-WISDOM one. Rendering -- ASCII mode works everywhere, but if your terminal supports it there's halfblock and sixel art modes that render AI-generated sprites. It runs at 4fps with a background thread pool so LLM calls don't stutter the display. Tech stack -- Python 3.13, Typer, Rich, Pydantic, watchdog. XDG-compliant config paths. Everything's typed and tested (158 tests). Install with uv (recommended): uv tool install term-pet Or just try it without installing: uvx --from term-pet tpet GitHub: https://github.com/paulrobello/term-pet MIT licensed. Would love feedback, especially on the multi-provider config approach and the rendering pipeline. submitted by /u/probello [link] [comments]
View originalOCC: give Claude and any llm a +6-step research task, it runs 3 steps in parallel, evaluates source quality, merges perspectives, and delivers a report in 70 seconds instead of 5-10 minutes
https://i.redd.it/jb59jvaxvotg1.gif Claude and other is great at single-turn tasks. But when I need "research this topic from 3 angles, check source quality, merge everything, then write a synthesis" — I end up doing 6 separate prompts, copy-pasting between them, losing context, wasting tokens... So I built OCC to automate that. You define the workflow once in YAML, and Claude handles the rest — including running independent steps in parallel. For the past few weeks. It started as a Claude-only tool but now supports Ollama, OpenRouter, OpenAI, HuggingFace, and any OpenAI-compatible endpoint — so you can run entire workflows on local models too. What it does You define multi-step workflows in YAML. OCC figures out which steps can run in parallel based on dependencies, runs them, and streams results back. Think of it as a declarative alternative to LangChain/CrewAI: no Python, no code, just YAML. How it saves tokens This is the part I'm most proud of. Each step only sees what it needs, not the full conversation history: Single mega-prompt~40K+ Everything in one context window 6 separate llm chats~25K Manual copy-paste, duplicated context OCC (step isolation)~13K Each step gets only its dependencies Pre-tools make this even better. Instead of asking llm to "search the web for X" (tool-use round-trip = extra tokens), OCC fetches the data before the prompt — the LLM receives clean results, zero tool-calling overhead. 29 pre-tool types: web search, bash, file read, HTTP fetch, SQL queries, MCP server calls, and more. What you get Visual canvas — drag-and-drop chain editor with live SSE monitoring. Each node shows its output streaming in real-time with Apple-style traffic light dots. Double-click any step to edit model, prompt, tools, retry config, guardrails. Workflow Chat — describe what you want in natural language, the AI generates/debug the chain nodes on the canvas. "Build me a research chain that checks 3 sources and writes a report" → done. BLOB Sessions — this is experimental but my favorite feature. Unlike chains (predefined), BLOB sessions grow organically from conversations. A knowledge graph auto-extracts concepts and injects them into future prompts. The AI can run autonomously on a schedule, exploring knowledge gaps it identifies itself. Mix models per step — use Huggingface & Ollama & Other llm . A 6-step chain using mix model for 3 routing steps costs ~40% less than running everything on claude. 11 step types — agent, router (LLM classifies → branches), evaluator (score 1-10, retry if below threshold), gate (human approval via API), transform (json_extract, regex, truncate — zero LLM tokens), loop, merge, debate (multi-agent), browser, subchain, webhook. The 16 demo chains These aren't hello-world examples. They're real workflows you can run immediately: What it's NOT Not a SaaS : fully self-hosted, MIT license Not distributed : single process, SQLite, designed for individual/small team use Not a replacement for llm : it's a layer on top that orchestrates multi-step work Frontend is alpha : works but rough edges GitHub: https://github.com/lacausecrypto/OCC Built entirely with Claude Code. Happy to answer questions about the architecture, MCP integration, or the BLOB system. submitted by /u/Main-Confidence7777 [link] [comments]
View originalBuilt an MCP server to replace Claude Code's grep-and-guess pattern with indexed symbol lookups
I built this with Claude Code, specifically to make Claude Code work better on TypeScript projects. It's free and open source. One pattern kept showing up when using Claude Code and Cursor on TS projects: Search across files Open a likely match Read a lot of code Realize it's the wrong place Try again The agent isn't dumb -- it just doesn't have structural awareness of the codebase. Every session starts from scratch. So I used Claude Code to build an MCP server that gives it structured access to the codebase instead. It keeps a live SQLite index of the project -- symbols, call sites, imports, class hierarchy -- so the agent can query structure directly. Instead of: "search for handleRequest" it becomes: "go to this symbol → exact file and line" The numbers Tested on a 31-file TypeScript project, same tasks with and without: Find one function: 1350 tokens with grep, 500 with index (63% fewer) Trace callers across 3 files: 2850 tokens with grep, 900 with index (68% fewer) Map inheritance across 15+ files: 4800 tokens with grep, 1000 with index (79% fewer) Grep gets worse as the codebase grows. Indexed queries stay flat. Where the savings actually come from I thought symbol lookup would be the main thing. It wasn't. Call graph queries -- get_callers replaces the thing where the agent reads 4-5 files trying to figure out who calls a function Partial reads -- knowing the exact line means reading 20 lines instead of a whole file. This alone is over half the savings Middleware tracing -- trace_middleware tells the agent what runs before a route handler. Otherwise it reads the router, then each middleware file, then tries to reconstruct the order Where it struggles dynamic patterns (computed method names, etc.) dependency injection setups anything outside your own codebase Not perfect, but it cuts down the trial-and-error loop a lot. Free and open source, TypeScript only for now: Repo submitted by /u/Hopeful-Business-15 [link] [comments]
View originalI was too lazy to pick the right Claude Code skill. So I built one that picks skills for me.
I have 50+ Claude Code skills installed - GSD, Superpowers, gstack, custom stuff. They're powerful. They 10x my workflow. I barely use them. Not because they're bad. Because I forget which one to use when. Do I want brainstorm or gsd-quick? systematic-debugging or investigate? ship or gsd-ship? By the time I figure it out I've lost 5 minutes and the will to code. So I did what I always do when something annoys me enough: I automated it. I built /jarvis - a single Claude Code skill that takes whatever you type in plain English, reads your project state, figures out which of your installed skills is the highest ROI choice, tells you in one line what it picked (and why), and executes it. /jarvis why is the memory engine crashing on startup -> systematic-debugging: exception on startup, root cause first - bold move not reading the error message. let's see. /jarvis ship this -> ship: branch ready, creating PR - either it works or you'll be back in 10 minutes. let's go. /jarvis where are we -> gsd-progress: checking project state - let's see how far we've gotten while you were watching reels. The routing has two stages: Stage 1 - A hardcoded fast path for the 15 things developers actually do 95% of the time. Instant match. Stage 2 - If Stage 1 misses, it scans every SKILL.md on your machine, reads the description field (same way you'd skim a list), and picks the best match semantically. New skill installed yesterday that Jarvis doesn't know about? Doesn't matter. It'll find it. /jarvis write a LinkedIn carousel about my project -> carousel-writer-sms (discovered): writing LinkedIn carousel content - found something you didn't even know you had. you're welcome. The (discovered) tag means it found it dynamically. No config, no registry, no telling it anything. It also has a personality. Every routing line ends with a light roast of whatever you just asked it to do. "Checking in on the thing you've definitely been avoiding." "Tests! Before shipping! I need a moment." "Walk away. Come back to a finished feature. This is the dream." A bit of context on why this exists. I'm currently building Synapse-OSS - an open source AI personal assistant that actually evolves with you. Persistent memory, hybrid RAG, a knowledge graph that grows over time, multi-channel support (WhatsApp, Telegram, Discord), and a soul-brain sync system where the AI's personality adapts to yours across sessions. Every instance becomes a unique architecture shaped entirely by the person it serves. It's the kind of AI assistant that knows you. Not "here's your weather" knows you. Actually knows you. Jarvis was born out of that project. I was deep in Synapse development, context-switching between 8 different Claude Code workflows per hour, and losing my mind trying to remember which skill to call. So I spent 3 days building a router instead of shipping features. 3 days. Because I kept laughing at the roasts and adding more. Worth it!! If Jarvis sounds like something you'd use, Synapse is the bigger vision behind it. Same philosophy: AI that handles the cognitive overhead so you can focus on actually thinking. Synapse repo: github.com/UpayanGhosh/Synapse-OSS Install Jarvis: npm install -g claude-jarvis Restart Claude Code. That's it. It auto-installs GSD and Superpowers for you too, because of course it does. I've freed up a genuine 40% of my brain that used to be occupied by "which skill do I need right now." That brainpower is now being used to scroll reels. Peak optimization. Jarvis repo: github.com/UpayanGhosh/claude-jarvis submitted by /u/Shorty52249 [link] [comments]
View originalWe open-sourced a provider-agnostic AI coding app -- here's the architecture of connecting to every major AI service
I want to talk about the technical problem of building a provider-agnostic AI coding tool, because the engineering was more interesting than I expected. The core challenge: how do you build one application that connects to fundamentally different AI backends -- CLI tools (Gemini), SDK-based agents (Codex, Copilot), and API-compatible endpoints (OpenRouter, Kimi, GLM) -- without your codebase turning into a mess of if-else chains? Here's what we built: The application is called Ptah. It's a VS Code extension and standalone Electron desktop app. The backend is 12 TypeScript libraries in an Nx monorepo. The interesting architectural bits: 1. The Anthropic-Compatible Provider Registry We discovered that several providers (OpenRouter, Moonshot/Kimi, Z.AI/GLM) implement the Anthropic API protocol. So instead of writing separate integrations, we built a provider registry where adding a new provider is literally adding an object to an array: { id: 'moonshot', name: 'Moonshot (Kimi)', baseUrl: 'https://api.moonshot.ai/anthropic/', authEnvVar: 'ANTHROPIC_AUTH_TOKEN', staticModels: [{ id: 'kimi-k2', contextLength: 128000 }, ...] } Claude Agent SDK handles routing. One adapter, many providers. 2. CLI Agent Process Manager For agents that are actually separate processes (Gemini CLI, Codex, Copilot), we built an AgentProcessManager that handles spawning, output buffering, timeout management, and cross-platform process termination (SIGTERM on Unix, taskkill on Windows). A CliDetectionService auto-detects which agents are installed and registers their adapters. The MCP server exposes 6 lifecycle tools: ptah_agent_spawn, ptah_agent_status, ptah_agent_read, ptah_agent_steer, ptah_agent_stop, ptah_agent_list. So your main AI agent can delegate work to other agents programmatically. 3. Platform Abstraction The same codebase runs as both a VS Code extension and a standalone Electron app. We isolated all VS Code API usage behind platform abstraction interfaces (IDiagnosticsProvider, IIDECapabilities, IWorkspaceProvider). Only one file in the entire MCP library imports vscode directly, and it's conditionally loaded via DI. The MCP server gracefully degrades on Electron -- LSP-dependent tools are filtered out, the system prompt adjusts, approval prompts auto-allow instead of showing webview UI. The full source is open (FSL-1.1-MIT): https://github.com/Hive-Academy/ptah-extension If you're interested in multi-provider AI architecture or MCP server design, I'd love to hear how you're approaching similar problems. Landing page: https://ptah.live submitted by /u/PretendMoment8073 [link] [comments]
View original[D] Why I abandoned YOLO for safety critical plant/fungi identification. Closed-set classification is a silent failure mode
I’ve been building an open-sourced handheld device for field identification of edible and toxic plants wild plants, and fungi, running entirely on device. Early on I trained specialist YOLO models on iNaturalist research grade data and hit 94-96% accuracy across my target species. Felt great, until I discovered a problem I don’t see discussed enough on this sub. YOLO’s closed set architecture has no concept of “I don’t know.” Feed it an out of distribution image and it will confidently classify it as one of its classes at near 100% confidence. In most CV cases this can be annoyance. In foraging, it’s potentially lethal. I tried confidence threshold fine-tuning at first, doesn’t work. The confidence scores on OOD inputs are indistinguishable from in-distribution predictions because the softmax output is normalized across a closed-set. There’s no probability mass allocated to “none of the above”. My solution was to move away from YOLO entirely (the use case is single shot image classification, not a video stream) and build a layered OOD detection pipeline. - EfficientNet B2 specialist models: Mycologist, berries, and high value foraging instead of one monolithic detector. - MobileNetV3 small domain router that directs inputs to appropriate specialist model or rejects it before classification. - Energy scoring on raw logits pre softmax to detect OOD inputs. Energy scores separate in-distribution from OOD far more cleanly than softmax confidence. - Ensemble disagreement across the three specialists as a secondary OOD signal. - K+1 “none the above” class retrained into each specialist model. The whole pipeline needs to run within the Hailo 8L’s 13 TOPS compute budget on a battery powered handheld. All architecture choices are constrained by real inference latency, not just accuracy on desktop. Curious if others have run into this closed-set confidence problem in safety-critical applications and what approaches you’ve taken? The energy scoring method (from the “Energy-based Out-of-Distribution Detection” paper by Liu et al.) has been the single biggest improvement over native confidence thresholding. submitted by /u/Adebrantes [link] [comments]
View originalYes, OpenRouter offers a free tier. Pricing found: $10
Key features include: Product, Company, Developer, Connect.
Based on user reviews and social mentions, the most common pain points are: token usage, raised, large language model, llm.
Based on 34 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.
Matt Shumer
CEO at HyperWrite / OthersideAI
2 mentions