The coding agent built for on-premise | Cosine AI
Cosine is a Human Reasoning Lab. What this means is that we’re researching how to codify exactly how a human would perform tasks, then teaching AI to mimic, excel at and expand on the same jobs. This is the opposite of throwing spaghetti at the wall. Our vision is to codify human reasoning and apply it to hard problems. The first such problem is software engineering. Cosine started because 3 friends were curious to see how far we could push Davinci-2, each iteration we were candid about the knowns and unknowns, but we remained maximally enthusiastic, optimistic and curious about what comes next. Finally the tech has matured enough for us to ship our vision. We’re only a team of five, between us we’ve scaled and exited multiple unicorns, managed huge global teams, and have even been coding since the early age of eight. We remain laser focused on our goals and work very closely and collaboratively on hard problems. We hire selectively like all great startups but in particular we look for obsession (about anything, even a hobby!), optimism and antifragile traits. We truly believe that we’re able to codify human reasoning for any job and industry. Software engineering is just the most intuitive starting point and we can’t wait to show you everything else we’re working on. If you’re passionate and obsessive about anything then we’d like to talk and convince you to join us. Right now, the following are the roles we’re most actively hiring for. See all roles
Mentions (30d)
0
Reviews
0
Platforms
2
Sentiment
0%
0 positive
Industry
information technology & services
Employees
33
Funding Stage
Seed
Total Funding
$3.0M
Prism MCP — I gave my AI agent a research intern. It does not require a desk
So I got tired of my coding agent having the long-term memory of a goldfish and the research skills of someone who only reads the first Google result. I figured — what if the agent could just… go study things on its own? While I sleep? Turns out you can build this and it's slightly cursed. Here's what happens: On a schedule, a background pipeline wakes up, checks what you're actively working on, and goes full grad student. Brave Search for sources, Firecrawl to scrape the good stuff, Gemini to synthesize a report, then it quietly files it into memory at an importance level high enough that it's guaranteed to show up next time you talk to your agent. No "maybe the cosine similarity gods will bless us today." It's just there. The part I'm unreasonably proud of: it's task-aware. Running multiple agents? The researcher checks what they're all doing and biases toward that. Your dev agent is knee-deep in auth middleware refactoring? The researcher starts reading about auth patterns. It even joins the group chat — registers on a shared bus, sends heartbeats ("Searching...", "Scraping 3 articles...", "Synthesizing..."), and announces when it's done. It's basically the intern who actually takes notes at standups. No API keys? It doesn't care. Falls back to Yahoo Search and local parsing. Zero cloud required. I also added a reentrancy guard because the first time I manually triggered it during a scheduled run, two synthesis pipelines started arguing with each other and I decided that was a problem for present-me, not future-me. Other recent rabbit holes: Ported Google's TurboQuant to pure TypeScript — my laptop now stores millions of memories instead of "a concerning number that was approaching my disk limit" Built a correction system. You tell the agent it's wrong, it remembers. Forever. It's like training a very polite dog that never forgets where you hid the treats One command reclaims 90% of old memory storage. Dry-run by default because I am a coward who previews before deleting Local SQLite, pure TypeScript, works with Claude/Cursor/Windsurf/Gemini/any MCP client. Happy to nerd out on architecture if anyone's building agents with persistent memory. https://github.com/dcostenco/prism-mcp submitted by /u/dco44 [link] [comments]
View originalReducing AI agent token consumption by 90% by fixing the retrieval layer
Quick insight from building retrieval infrastructure for AI agents: Most agents stuff 50,000 tokens of context into every prompt. They retrieve 200 documents by cosine similarity, hope the right answer is somewhere in there, and let the LLM figure it out. When it doesn't, and it often doesn't, the agent re-retrieves. Every retry burns more tokens and money. We built a retrieval engine called Shaped that gives agents 10 ranked results instead of 200. The results are scored by ML models trained on actual interaction data, not just embedding similarity. In production, this means ~2,500 tokens per query instead of 50,000. The agent gets it right the first time, so no retry loops. The most interesting part: the ranking model retrains on agent feedback automatically. When a user rephrases a question or the agent has to re-retrieve, that signal trains the model. The model on day 100 is measurably better than day 1 without any manual intervention. We also shipped an MCP server so it works natively with Cursor, Claude Code, Windsurf, VS Code Copilot, Gemini, and OpenAI. If anyone's working on agent retrieval quality, I'd love to hear what approaches you've tried. Wrote up the full technical approach here: https://www.shaped.ai/blog/your-agents-retrieval-is-broken-heres-what-we-built-to-fix-it submitted by /u/skeltzyboiii [link] [comments]
View originalCosine uses a tiered pricing model. Visit their website for current pricing details.