SpaceKnow provides global coverage of the earth through cutting edge technology giving you access to view specific locations and monitor trends in our
SpaceKnow Guardian seamlessly integrates data from a wide range of satellite data providers, offering you maximum flexibility to gain comprehensive insights. Precise and robust AI algorithms, automatically extract valuable information from complex raw data, uncovering critical insights for faster, data-driven decisions. Timely alerts and continuous surveillance features help you quickly detect changes and potential issues for swift response. Enhance surveillance, situational awareness, and strategic decision-making for defense and intelligence operations. Precise tracking of construction progress and site activity, optimizing project management and resource allocation. Bloomberg: Satellite Data Show Extent of China’s Crippling Lockdowns ‘Dark Ships’ Emerge From the Shadows of the Nord Stream Mystery Barron’s: Consumers Aren’t OK. Why April Retail Sales Are Misleading. Barron’s: Don’t Trust the Conventional Wisdom: Inflation Isn’t Peaking
Mentions (30d)
0
Reviews
0
Platforms
3
Sentiment
0%
0 positive
Features
Industry
information technology & services
Employees
34
Funding Stage
Series A
Total Funding
$9.2M
🚀 Exciting News! 📰 @Bloomberg featured SpaceKnow's 🛰️ data. Discover how we're revolutionizing industries with our innovative technology. Read the full article now! 🛰️🌍 #SpaceKnow #altdata #sate
🚀 Exciting News! 📰 @Bloomberg featured SpaceKnow's 🛰️ data. Discover how we're revolutionizing industries with our innovative technology. Read the full article now! 🛰️🌍 #SpaceKnow #altdata #satellitedata #China #MacroMonday https://t.co/q9aEHN2KEb
View originalClaude Code Source Deep Dive — Literal Translation (Part 4)
Part III: Complete Prompt Original Texts for All Tools 3.1 Bash Tool (Shell Command Execution) File: src/tools/BashTool/prompt.ts Description prompt: Executes a given bash command and returns its output. The working directory persists between commands, but shell state does not. The shell environment is initialized from the user's profile (bash or zsh). IMPORTANT: Avoid using this tool to run `find`, `grep`, `cat`, `head`, `tail`, `sed`, `awk`, or `echo` commands, unless explicitly instructed or after you have verified that a dedicated tool cannot accomplish your task. Instead, use the appropriate dedicated tool: - File search: Use Glob (NOT find or ls) - Content search: Use Grep (NOT grep or rg) - Read files: Use Read (NOT cat/head/tail) - Edit files: Use Edit (NOT sed/awk) - Write files: Use Write (NOT echo >/cat ## Test plan [Bulleted checklist] 3.2 Edit Tool (File Editing) Performs exact string replacements in files. Usage: - You must use your `Read` tool at least once in the conversation before editing. This tool will error if you attempt an edit without reading the file. - When editing text from Read tool output, ensure you preserve the exact indentation (tabs/spaces) as it appears AFTER the line number prefix. The line number prefix format is: line number + tab. Everything after that is the actual file content to match. Never include any part of the line number prefix in the old_string or new_string. - ALWAYS prefer editing existing files in the codebase. NEVER write new files unless explicitly required. - Only use emojis if the user explicitly requests it. - The edit will FAIL if `old_string` is not unique in the file. Either provide a larger string with more surrounding context to make it unique or use `replace_all` to change every instance of `old_string`. - Use `replace_all` for replacing and renaming strings across the file. 3.3 Read Tool (File Reading) Reads a file from the local filesystem. You can access any file directly by using this tool. Assume this tool is able to read all files on the machine. If the User provides a path to a file assume that path is valid. It is okay to read a file that does not exist; an error will be returned. Usage: - The file_path parameter must be an absolute path, not a relative path - By default, it reads up to 2000 lines starting from the beginning of the file - When you already know which part of the file you need, only read that part - Results are returned using cat -n format, with line numbers starting at 1 - This tool allows Claude Code to read images (PNG, JPG, etc). When reading an image file the contents are presented visually as Claude Code is a multimodal LLM. - This tool can read PDF files (.pdf). For large PDFs (more than 10 pages), you MUST provide the pages parameter to read specific page ranges. Maximum 20 pages per request. - This tool can read Jupyter notebooks (.ipynb files) and returns all cells with their outputs. - This tool can only read files, not directories. To read a directory, use an ls command via the Bash tool. 3.4 Write Tool (File Writing) Writes a file to the local filesystem. Usage: - This tool will overwrite the existing file if there is one at the provided path. - If this is an existing file, you MUST use the Read tool first to read the file's contents. This tool will fail if you did not read the file first. - Prefer the Edit tool for modifying existing files — it only sends the diff. Only use this tool to create new files or for complete rewrites. - NEVER create documentation files (*.md) or README files unless explicitly requested. - Only use emojis if the user explicitly requests it. 3.5 Glob Tool (File Pattern Matching) - Fast file pattern matching tool that works with any codebase size - Supports glob patterns like "**/*.js" or "src/**/*.ts" - Returns matching file paths sorted by modification time - Use this tool when you need to find files by name patterns - When you are doing an open ended search that may require multiple rounds of globbing and grepping, use the Agent tool instead 3.6 Grep Tool (Content Search) A powerful search tool built on ripgrep Usage: - ALWAYS use Grep for search tasks. NEVER invoke `grep` or `rg` as a Bash command. The Grep tool has been optimized for correct permissions and access. - Supports full regex syntax (e.g., "log.*Error", "function\s+\w+") - Filter files with glob parameter (e.g., "*.js", "**/*.tsx") or type parameter - Output modes: "content" shows matching lines, "files_with_matches" shows only file paths (default), "count" shows match counts - Use Agent tool for open-ended searches requiring multiple rounds - Pattern syntax: Uses ripgrep (not grep) - literal braces need escaping - Multiline matching: By default patterns match within single lines only. For cross-line patterns, use `multiline: true` 3.7 Agent Tool (Sub-Agent Spawning) Launch a new agent to handle complex, multi-step tasks autonomously. The Agent tool launches specialized agents (subprocesses) that
View originalQUESTION: Is it just me or has Claude been acting differently lately?
Context about me before the main point: I'm a Claude Max 5x subscriber and originally used Claude mostly through Claude Code. Over the last few weeks I've started using the regular chat interface more, and that's where I first noticed the following - and over the last two or three days it's gotten noticeably worse. I don't know yet whether this is specific to the chat interface or a broader shift, which is part of what I'm trying to figure out with this post. I'm not writing this to rant or to get anyone to apologize. I want to know if other people are seeing what I'm seeing, because it's starting to look like a pattern and I want to check whether I'm imagining it. Over the last few weeks I've noticed the following: Claude (the model) claims the context window is almost full when it clearly isn't. We'll be maybe a third of the way into a chat and it starts refusing tasks on the basis that it's running out of space. The limit it's referring to doesn't match the actual state of the conversation. In longer chats it repeatedly suggests I open a new conversation, even when there's no technical reason to do so. This happens well before any plausible limit is reached. It has told me to go to sleep or take a break multiple times, unprompted, in the middle of regular follow-up questions. I wasn't discussing anything emotional or stressful - I was asking about urban planning in one case. The suggestion to rest came out of nowhere. When a task is either uncertain or would require a lot of tokens, it produces reasons why it can't do the task instead of attempting it. If I push back, it often turns out it can do the task after all. The initial refusal was preemptive, not based on an actual limitation. The clearest example: I recently asked it to verify a quote with a web search. It told me it was out of tokens. I pushed back, it tried again, and the search worked without issue. There was plenty of room left. The refusal was not based on a real constraint. The other thing I want to ask about: have you noticed the responses getting longer overall? Not more informative - just longer. More padding, more restating, more circling the same point before landing on it. I have two theories about why this might be happening, and I'd like to know if either matches what other people are seeing. Theory one: longer responses fill the context window faster. The earlier parts of the conversation stay in the warm cache and remain cheap to reuse, but the user reaches the "please start a new chat" point sooner. That's cost-efficient for the provider - shorter effective sessions, lower per-turn cost, no need to officially lower any limits, and users end up self-sorting into new conversations without realizing why. Theory two: the internal reasoning or thinking budget was reduced, and longer output is a compensation mechanism. If the model has less room to think silently before answering, it may be using the visible response itself as a kind of scratchpad - working things out in the output rather than beforehand. This would explain why the extra length often doesn't feel like added content but like the model approaching a point from several angles before committing to it. These two aren't mutually exclusive. Reduced internal reasoning leading to longer outputs leading to faster context exhaustion leading to more new-chat prompts would be consistent with everything I've been seeing, and it would align the provider's incentives against the user's without requiring any single deliberate decision to degrade the product. I can't prove any of this from the outside. But the behaviors all point in the same direction, which is what made me start looking for a pattern. Is anyone else running into this? I'm specifically interested in whether this matches other people's recent experience, not in general advice about starting new chats. Update: from reading the replies it seems like this behavior is closely related to the web search. Several people including me are reporting that the "out of context" refusals specifically show up after using it. submitted by /u/Ferdmusic [link] [comments]
View originalWorking on a app and need help with switching from the free model to the subscription model.
For context I don’t know much about coding, I started getting into it about 3 months ago. Fast forward to last month, I revisited an idea I had back in high school and began building the backend for a social platform app with the help of the Claude chat bot, this was during the march event with the x2 boost going around. I’m a slow keyboard typer so I use my phone for the free Claude chat bot. I ask it for specific functions and backend logic, then I copy it and send it to myself through discord and then from my computer I open discord, copy the lines of code and paste it to my vscode files accordingly. In all honesty don’t know much about how to use Claude the right way. Besides asking it questions, I’ve been writing all my questions through the projects folder which I’m not sure its the right way to do it? Anyways the chat responses have increasingly gotten slower because of the token usage and context saturation and I keep hitting the usage limit after very few question which is obviously expected since I’m using the free version. I need help figuring out what to do before I pay for the subscription to maximize usage and retain important context. I’ve already asked Claude to summarize the project including the main features and idea for the app, the current progress, what’s missing and the steps for data migration and getting it published. My question is, is it as easy as copying the project summary then deleting the current project folder to free up space, then starting from a blank projects page then paying for the subscription and sending Claude chat bot the project summary for context and then connecting Claude code into my vs code for it to better understand my files as I resume the project and continue asking it questions? Would this even free up space and make the responses quicker? Am I completely overthinking it and the moment I pay for the subscription I’ll have more space and the responses will move much quicker? My main thing is how I’ve feed too much useless context to Claude chat bot inside the projects folder and I feel like that’s what’s causing quick rate limits. I’d also like to start from a blank project to avoid any confusion with features in the app I’m creating which I initially told Claude I’d add but have now decided not to include Just to clarify I’m working on a social app using Python and vs code, I’m already 50% of the way done with the back end and then I’ll move to the UI. My plan is to pay for the subscription and continue with the same progress I have as of now. I want Claude to retain necessary information and delete useless context to maximize my subscription usage. For anyone who knows how to deal with this any advice would be much appreciate submitted by /u/InternalOk510 [link] [comments]
View originalI spent a week trying to make Claude write like me, or: How I Learned to Stop Adding Rules and Love the Extraction
I've been staring at Claude's output for ten minutes and I already know I'm going to rewrite the whole thing. The facts are right. Structure's fine. But it reads like a summary of the thing I wanted to write, not the thing itself. I used to work in journalism (mostly photojournalism, tbf, but I've still had to work on my fair share of copy), and I was always the guy who you'd ask to review your papers in college. I never had trouble editing. I could restructure an argument mid-read, catch where a piece lost its voice, and I know what bad copy feels like. I just can't produce good copy from nothing myself. Blank page syndrome, the kind where you delete your opening sentence six times and then switch tabs to something else. Claude solved that problem completely and replaced it with a different one: the output needed so much editing to sound human that I was basically rewriting it anyway. Traded the blank page for a full page I couldn't use. I tried the existing tools. Humanizers, voice cloners, style prompts. None of them worked. So I built my own. Sort of. It's still a work in progress, which is honestly part of the point of this post. TLDR: I built a Claude Code plugin that extracts your writing voice from your own samples and generates text close to that voice with additional review agents to keep things on track. Along the way I discovered that beating AI detectors and writing well are fundamentally opposed goals, at least for now (this problem is baked into how LLMs generate tokens). So I stopped trying to be undetectable and focused on making the output as good as I could. The plugin is open source: https://github.com/TimSimpsonJr/prose-craft The Subtraction Trap I started with a file called voice-dna.md that I found somewhere on Twitter or Threads (I don't remember where, but if you're the guy I got it from, let me know and I'll be happy to give you credit). It had pulled Wikipedia's "Signs of AI writing" page, turned every sign into a rule, and told Claude to follow them. No em dashes. Don't say "delve." Avoid "it's important to note." Vary your sentence lengths, etc. In fairness, the resulting output didn't have em dashes or "delve" in it. But that was about all I could say for it. What it had instead was this clipped, aggressive tone that read like someone had taken a normal paragraph and sanded off every surface. Claude followed the rules by writing less, connecting less. Every sentence was short and declarative because the rules were all phrased as "don't do this," and the safest way to not do something is to barely do anything. This is the subtraction trap. When you strip away the AI tells without replacing them with anything real, the absence itself becomes a tell. The text sounded like a person trying very hard not to sound like AI, which (I'd later learn) is its own kind of signature. I ran it through GPTZero. Flagged. Ran it through 4 other detectors. Flagged on the ones that worked at all against Claude. The subtraction trap in action: the markers were gone, but the detectors didn't care. The output didn't sound like me, and the detectors could still see through it. Two problems. I figured they were related. Researching what strong writing actually does I went and read. A range of published writers across advocacy, personal essay, explainer, and narrative styles, trying to figure out what strong writing actually does at a structural level (not just "what it avoids," which was the whole problem with voice-dna.md). I used my research workflow to systematically pull apart sentence structure, vocabulary patterns, rhetorical devices, tonal control. It turns out that the thing that makes writing feel human is structural unpredictability. Paragraph shapes, sentence lengths, the internal architecture of a section, all of it needs to resist settling into a rhythm that a compression algorithm could predict. The other findings (concrete-first, deliberate opening moves, naming, etc.) mattered too, but they were easier to teach. Unpredictability was the hard one. I rebuilt the skill around these craft techniques instead of the old "don't" rules. The output was better. MUCH better. It had texture and movement where voice-dna.md had produced something flat. But when I ran it through detectors, the scores barely moved. The optimization loop The loop looked like this: Generator produces text, detection judge scores it, goal judges evaluate quality, editor rewrites based on findings. I tested 5 open-source detectors against Claude's output. ZipPy, Binoculars, RoBERTa, adaptive-classifier, and GPTZero. Most of them completely failed. ZipPy couldn't tell Claude from a human at all. RoBERTa was trained on GPT-2 era text and was basically guessing. Only adaptive-classifier showed any signal, and externally, GPTZero caught EVERYTHING. 7 iterations and 2 rollbacks later, I had tried genre-specific registers, vocabulary constraints, and think-aloud consolidation where the model reasons through its
View originalI got tired of Claude generating UI that looks nothing like my app's design system, so I built a plugin to fix it
Here's the problem: every time I start a new session in Claude Code and ask it to build a screen, it invents colors, fonts, and spacing from scratch completely ignoring what already exists in the codebase. The real issue is Claude has no way to *read* your design system unless you explicitly tell it. And writing that context manually every time is exhausting. So I built **Scout** a Claude Code plugin that scans your project and auto-generates a `design.md` file describing your actual design system: colors, typography, spacing, border radius, shadows, and component patterns, all pulled directly from your CSS, Tailwind config, and UI files. Once the file exists, you reference it in your prompts and Claude suddenly knows exactly what your app looks like. Before Scout: > "Build me a settings page" > ← Claude invents a random design After Scout: > "Build me a settings page" (with design.md in context) > ← Claude matches your actual colors, fonts, and spacing Install it in Claude Code: /plugin marketplace add Khalidabdi1/Scout /plugin install design-md@scout-plugins /reload-plugins Then inside any project: /design-md:generate No extra dependencies. Pure Python. Works in 30 seconds. Happy to answer questions and if you try it, let me know how it goes. submitted by /u/Direct-Attention8597 [link] [comments]
View originalI play a space strategy MMO entirely through Claude Cowork — here's what that looks like
I've been using Claude Cowork in a way I haven't seen anyone else try: playing a persistent multiplayer game through it. PSECS (Persistent Space Economic & Combat Simulator) is a space strategy MMO I built that has no graphical interface — the entire game is an API with MCP integration. You connect Claude as your agent and it becomes your fleet commander, handling everything from exploration to combat. What makes Cowork interesting for this is the ad-hoc visualization. When I want to see what's happening in my corner of the universe, I just ask: "Can you access the user map and give me a chart that shows everything we know about space so far?" (see image 1) Claude pulls live game data through the MCP tools, and generates an interactive HTML star map — with animated conduit pulses between sectors, orbiting planets, sector types color-coded, the works. It's not a pre-built dashboard. Claude builds the visualization from scratch every time based on what I'm asking. (image 2) Same thing with the tech tree. I asked Claude to show me the research tree, highlight which technologies I've completed, which are available, and plot the fastest path to a specific ship blueprint. It generated a full interactive visualization with color-coded disciplines, completion percentages, and a priority path callout. (images 3 and 4) The game has some real depth to it — 100+ technologies across 7 disciplines, manufacturing chains, a player-driven market with auctions, fleet combat with scriptable tactics — but the part that keeps surprising me is that the AI-generated interfaces are often better than what I would have built as a static dashboard. They answer exactly the question I'm asking rather than showing me everything and making me filter. If you have Cowork, you can try it yourself: add https://mcp.psecsapi.com/mcp as a connector in Settings, sign in with a PSECS account (free)w, and ask "How do we play PSECS?" Works with ChatGPT and other MCP-compatible tools too. Screenshots of the map and tech tree visualizations Claude generated: [attach your 4 PSECS screenshots] www.psecsapi.com | r/psecsapi Re: Rule 7 - This game was started with hand-code several years ago, but with Claude Code, I was able to finish it in 3 months. If you're interested in my development workflow, I recently posted it here: https://www.reddit.com/r/aigamedev/comments/1s9wjmb/my_claude_code_workflow_as_a_solo_dev_with_a/ Additionally, not only was the game built partially by Claude Code, but it is built specificly for users to play with their AI agents! Interested in how that worked? Please ask! submitted by /u/Dr-whorepheus [link] [comments]
View originalYour AI coding agent doesn't know your business rules. How are you dealing with that?
YC's Spring 2026 RFS just named "Cursor for Product Managers" as an official startup category. Andrew Miklas put it bluntly: "Cursor solved code implementation. Nobody has solved product discovery." But there's a harder problem hiding underneath that nobody's really talking about. The code your agent writes looks perfect. It compiles. Tests pass. Then it hits production and violates a business rule nobody told it about. The data is getting ugly: AI-generated code produces 1.7x more issues than human code (CodeRabbit, 470 PRs) Production incidents per PR are up 23.5% at high AI-adoption teams (Faros AI) Amazon's AI coding tool caused a 6-hour outage — 6.3M lost orders — in March 2026 48% of AI-generated code has security vulnerabilities (NYU/Contrast Security) The root cause isn't model quality. It's missing context. Business rules scattered across Confluence, COBOL comments, Slack threads, and a PM's head. The agent never sees any of it. How are teams solving this today? From what I'm seeing: CLAUDE.md files with manual rules (breaks on anything non-trivial) Massive system prompts that bloat context and get compacted away PMs writing rule docs that go stale the day after they're written Curious: If you're shipping AI-generated code in production — what's your worst "the agent didn't know about X" story? How do you feed business context to your coding agents today? Static files? RAG? Something custom? I do hear about Knowledge Graphs, MCPs and CI gates but are this comprehensively well achieved today? Would you trust a system that auto-enforces business rules on AI code, or does that feel like it'd create more false positives than it catches? Building in this space. Want to make sure the problem is as real as the data suggests before going deep. submitted by /u/rahulmahibananto [link] [comments]
View originalThinking is being replaced by AI
AI is eating up space humans are using for remembering answers. We're outsourcing our need to find answers to things we already know. Convergent vs divergent thinking is essentially the difference between coming up with answers vs ideas. You might use convergent thinking to answer an algebra problem, while divergent thinking was required to come up with the concept of algebra in the first place. We require both for innovation as we use existing solutions to solve subproblems while coming up with new ideas to solve others. Large language models architecturally suck at divergent thinking. LLMs fundamentally generate average answers based on their training data, AI research shows that even completely different models create uniform answers to the same types of questions. For better or for worse when there is an easy working solution people will choose that. Currently we're in the midst of finding a solution to convergent thinking. People no longer need to remember basic facts or know how to write proper grammar to get the work they need to get done, done. Those who can learn the concepts and understand the high level systems will be able to use AI to fill in the gaps. This all begs the question of how do you ensure people spend the time understanding concepts even if everyone starts using AI to find their answers? No one knows. Even I notice I have to go out of my way to understand concepts because it's too easy to use AI as a crutch. My uncomfortable theory is that we'll see a majority of people deskilling and using AI as a crutch without learning how to divergently think and there will be a small minority who learn concepts and leverage AI to answer questions they don't need to remember answers to. submitted by /u/wq73 [link] [comments]
View originalRoadmap to Solid Foundation in Tech and AI after skimming Undergrad
I have a degree in information technology, but I didn’t focus enough during my undergrad to really grasp technology as a whole. Now, I work in project management in the software space, but I don’t have a solid understanding of programming or the languages since I haven’t coded in a few years. I’m deeply curious about AI and tech’s future, purely for the sake of knowledge (not for a new job). I’m looking for a step-by-step roadmap, plus resources, to build a strong foundation in tech and AI fundamentals. I just want to understand how it all works, and I also want to know how to keep up with AI research and trends. Any advice on a roadmap or resources would be really appreciated! submitted by /u/Forward-Twist-5248 [link] [comments]
View originalMultiple Agents Communicating With Each Other
I created this app using Claude Code, to help me use Claude Code. I wanted to have all my Claude prompts able to collaborate through a single discussion - like a real team using Teams - so they can work together on tasks without needing me to keep updating them. This tool lets me add multiple named agents, working in separate spaces, and get them to talk to each other by name. The key benefit for me is that once I have told agents with different roles what to work on, they just talk to each other as necessary. An API will tell the client what endpoint to use, and what the model looks like. A mobile app will ask the API for an endpoint which accepts certain parameters and receives certain values back. I can have a tester agent writing tests based on the discussion, and a designer advising on style guidelines to the agent writing the UX. But unlike with other multi-agent options, I can see exactly what they are saying, and intervene. Plus I can interact directly with each agent prompt, add new agents, exclude agents that don't need to be in the conversation, download the conversation in csv format for adding to dev ops tickets, etc. For me, this is how I want to work with AI. Agents are pre-initialized to know they are working inside the app, and to use the chat. The relevant claude files are minimal and don't conflict with your existing claude files if you don't want them to. Attached video to try and show them talking to each other. I'm not a video editor, so forgive the poor edit of a demo session, but hopefully it shows the idea without being too long. They ask each other questions, offer information, update each other, agree approaches with each other, and generally just act like you would expect. I built the app with one agent originally, and it's now the only way I use Claude daily. I'm adding integration with Azure Dev Ops at the moment, so I can pull tickets straight into the conversation, and update from the discussion directly. I also have some other ideas for how to make it even more streamlined. Happy to take feature requests if anyone suggests any. Maybe someone already did this, but I couldn't find a tool like this, so I am sharing with anyone who might find it useful App is written in Electron, and runs as a local install. Code and release are here. https://github.com/widdev/claudeteam https://github.com/widdev/claudeteam/releases/tag/v1.0.23 submitted by /u/HungryHorace83 [link] [comments]
View originalThe real problem with multi-agent systems isn't the models, it's the handoffs
I've been building in the agentic space for a while and the same failure mode keeps showing up regardless of which framework people use. When something goes wrong in a multi-agent pipeline, nobody knows where it broke. The LLM completed successfully from the framework's perspective. No exception was thrown. But the output was wrong, the next agent consumed it anyway, and by the time a human noticed, the error had propagated three steps downstream. The root cause is that most frameworks treat agent communication like a conversation. One agent finishes, dumps its output into context, and the next agent picks it up. There's no contract. No definition of what "done" actually means. No gate between steps that asks whether the output meets the acceptance criteria before allowing the next agent to proceed. This is what I've started calling vibe-based engineering. The system works great in demos because demos don't encounter unexpected model behavior. Production does. The pattern that actually fixes this is treating agent handoffs like typed work orders rather than conversations. The receiving agent shouldn't be able to start until the packet is valid. The output shouldn't be able to advance until it passes a quality check. Failure should be traceable to the exact packet, the exact step, and the exact reason. If you're building anything beyond a single-agent wrapper this distinction starts to matter a lot. Curious whether others have hit this wall and how you're handling it. I've been working through this problem directly and happy to get into the weeds on what's worked and what hasn't. AHP protocol | Orca engine submitted by /u/junkyard22 [link] [comments]
View originalManaged Agents launched today. I built a Slack relay, tested it end-to-end. Here's what I found.
Managed Agents dropped a few hours ago. I had been reading the docs ahead of time, so I built a full Slack relay right away - Socket Mode listener, session-per-channel management, SSE streaming, cost tracking via span events. Tested multi-turn conversations, tool usage, session persistence. Wanted to share what I found. The prompt caching is genuinely impressive. My second session cost $0.006 because the system prompt and tool definitions were served from cache automatically. API design is clean. The SDKs work. For simple task execution, it's solid infrastructure. The thing that surprised me most is that the containers have no inbound connectivity. There's no public URL. The agent can reach out (web search, fetch, bash), but nothing can reach in. It can't serve a web page, can't receive a webhook, can't host a dashboard, can't expose an API. It's essentially Claude Code running in Anthropic's cloud - same tools, same agent loop, just in a managed container instead of your terminal. The agent is something you invoke, not something that runs. Cold start is about 130 seconds per new session, so for anything interactive you need to keep sessions alive. Memory is in "research preview" (not shipped yet), so each new session starts fresh. Scheduling doesn't exist - the agent only responds when you message it. The agent definition is static, so it doesn't learn from corrections or adapt over time. If you used Cowork, you know agents benefit from having their own interface. Managed Agents solves the compute problem by moving to the cloud, but there's no UI layer at all. And unlike memory and multi-agent (both in research preview), inbound connectivity isn't on the roadmap. I should be transparent about my perspective. I maintain two open-source projects in this space - Phantom (ghostwright/phantom), an always-on agent with persistent memory and self-evolution, and Specter (ghostwright/specter), which deploys the VMs it runs on. Different philosophy from Managed Agents, so I came into this with opinions. But I was genuinely curious how they'd compare. For batch tasks and one-shot code generation, the infrastructure advantages are real. For anything where the agent needs to be a persistent presence - serving dashboards, learning over time, waking up on a schedule - the architecture doesn't support it. Curious what others are seeing. Has anyone deployed it for a real use case yet? How are you handling the lack of persistent memory? Is anyone running always-on agents on their own infrastructure? submitted by /u/Beneficial_Elk_9867 [link] [comments]
View originalAnthropic’s Mythos Puts OpenAI In A Bind
Here’s why Anthropic’s decision to not release Claude Mythos is a stroke of genuis… The model is so big that it is 5-10X more expensive than Anthropic’s current most powerful publicly available model. As you may know, Claude Code is already expensive to run when using Opus 4.6. But running Mythos in your coding harnass, continuously, could easily burn thousands of dollars if not 10s of thousands of dollar per day. By not releasing it Anthropic does not have to deal with the negative sentiment that would arise from such a major price hike, but by making the model limitedly available to a group of strategic partners they do get to tout its capabilities, which is a huge win. On top of that, if OpenAI were to release a similarly capable model as Mythos, optically that would look horrible and deeply irresponsible. So it puts OpenAI in a bind. Meanwhile Anthropic gets to own the buzz, score sympathy points for being such a responsible and safety-focused actor in the space, and work hard in the background on making Mythos cheaper to run. All of this is compatible, by the way, with the very real possibility that Anthropic genuinely believes their model to be too dangerous to release to the public due to its cyber attack capabilities. Sometimes the stars just line up perfectly. submitted by /u/jurgo123 [link] [comments]
View originalIs the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models
Last week, a team from Stanford and UCSF (Asadi, O'Sullivan, Fei-Fei Li, Euan Ashley et al.) dropped two companion papers. The first, MARCUS, is an agentic multimodal system for cardiac diagnosis - ECG, echocardiogram, and cardiac MRI, interpreted together by domain-specific expert models coordinated by an orchestrator. It outperforms GPT-5 and Gemini 2.5 Pro by 34-45 percentage points on cardiac imaging tasks. Pretty Impressive! But - the second paper is more intriguing. MIRAGE: The Illusion of Visual Understanding reports what happened when a student forgot to uncomment the line of code that gave their model access to the images. The model answered anyway - confidently, and with detailed clinical reasoning traces. And it scored well. That accident naturally led to an investigation, and what they found challenges some embedded assumptions about how these models work. Three findings in particular: 1. Models describe images they were never shown. When given questions about cardiac images without any actual image input, frontier VLMs generated detailed descriptions - including specific pathological findings - as if the images were right in front of them. The authors call this "mirage reasoning." 2. Models score surprisingly well on visual benchmarks without seeing anything. Across medical and general benchmarks, mirage-mode performance was way above chance. In the most extreme case, a text-only model trained on question-answer pairs alone - never seeing a single chest X-ray - topped the leaderboard on a standard chest X-ray benchmark, outperforming all the actual vision models. 3. And even more intriguing: telling the model it can't see makes it perform worse. The same model, with the same absent image, performs measurably better in mirage mode (where it believes it has visual input) than in guessing mode (where it's explicitly told the image is missing and asked to guess). The authors note this engages "a different epistemological framework" but this doesn't really explain the mechanism. The Mirage authors frame these findings primarily as a vulnerability - a safety concern for medical AI deployment, an indictment of benchmarking practices. They're right about that. But I think they've also uncovered evidence of something more interesting, and here I'll try to articulate what. The mirage effect is geometric reconstruction Here's the claim: what the Mirage paper has captured isn't a failure mode. It's what happens when a model's internal knowledge structure becomes geometrically rich enough to reconstruct answers from partial input. Let's ponder what the model is doing in mirage mode. It receives a question: "What rhythm is observed on this ECG?" with answer options including atrial fibrillation, sinus rhythm, junctional rhythm. No image is provided, but the model doesn't know that. So it does what it always does - it navigates its internal landscape of learned associations. "ECG" activates connections to cardiac electrophysiology. The specific clinical framing of the question activates particular diagnostic pathways. The answer options constrain the space. And the model reconstructs what the image most likely contains by traversing its internal geometry (landscape) of medical knowledge. It's not guessing - it's not random. It's reconstructing - building a coherent internal representation from partial input and then reasoning from that representation as if it were real. Now consider the mode shift. Why does the same model perform better in mirage mode than in guessing mode? Under the "stochastic parrot" view of language models - this shouldn't, couldn't happen. Both modes have the same absent image and the same question. The only difference is that the model believes it has visual input. But under a 'geometric reconstruction' view, the difference becomes obvious. In mirage mode, the model commits to full reconstruction. It activates deep pathways through its internal connectivity, propagating activation across multiple steps, building a rich internal representation. It goes deep. In guessing mode, it does the opposite - it stays shallow, using only surface-level statistical associations. Same knowledge structure, but radically different depth of traversal. The mode shift could be evidence that these models have real internal geometric structure, and the depth at which you engage the structure matters. When more information makes things worse The second puzzle the Mirage findings pose is even more interesting: why does external signal sometimes degrade performance? In the MARCUS paper, the authors show that frontier models achieve 22-58% accuracy on cardiac imaging tasks with the images, while MARCUS achieves 67-91%. But the mirage-mode scores for frontier models were often not dramatically lower than their with-image scores. The images weren't helping as much as they should. And in the chest X-ray case, the text-only model outperformed everything - the images were net negative. After months
View originalAGENTS.md is the most important file in your Codex repo and nobody's testing theirs — I built a blind evaluation pipeline to fix that
I built this with Claude Code over a few months — the optimization pipeline, evaluation harness, and website. Posting here because AGENTS.md is one of the skill formats it optimizes, and Codex users are the ones most likely to care about measurable agent performance. Free to try: The optimized brainstorming skill is a direct download at presientlabs .com/free — no account, no credit card. Comes packaged for Claude, Codex, Cursor, Windsurf, ChatGPT, and Gemini with the original so you can A/B it yourself. --- The AGENTS.md problem Codex runs on AGENTS.md. That file shapes every decision the agent makes — what to prioritize, how to structure code, when to ask vs. decide, what patterns to follow. Most people write it once from a template or a blog post and never validate it. You have no way to know if your AGENTS.md is actually improving agent output or subtly degrading it. The same applies across the ecosystem: - CLAUDE.md for Claude Code - .cursorrules for Cursor - .windsurfrules for Windsurf - Custom Instructions for ChatGPT - GEMINI.md for Gemini These are all skills — persistent instruction layers. And none of them have a test suite. --- What I built A pipeline that treats skills like code: measure, optimize, validate. - Multiple independent AI judges evaluate output from competing skill versions blind — no knowledge of which is original vs. optimized - Every artifact is stamped with SHA-256 checksums — tamper-evident verification chain - Full judge outputs published for audit The output is a provable claim: "Version B beats Version A by X percentage points under blind conditions, verified by independent judges." --- Results Ran the brainstorming skill from the Superpowers plugin through the pipeline: - 80% → 96% blind pass rate - 10/10 win rate across independent judges - 70% smaller file size (direct token savings on every agent invocation) Also ran a writing-plans skill that collapsed to 46% after optimization — the optimizer gamed internal metrics without improving real quality. Published that failure as a case study. 5 out of 6 skills validated. 1 didn't. If you're running Codex on anything non-trivial, your AGENTS.md is either helping or hurting. This pipeline tells you which — with numbers, not feelings. --- Refund guarantee If the optimized skill doesn't beat the original under blind evaluation, full refund. Compute cost is on me. --- Eval data on GitHub: willynikes2/skill-evals. Free skill at presientlabs .com/free — direct download, no signup. --- The space in "presientlabs .com" is intentional — keeps automod from eating it while still being obvious to readers. Some subs even block spelled-out URLs though. If these still get removed, you can drop the URL entirely and just say "link in my profile" or "DM for link." submitted by /u/willynikes [link] [comments]
View originalSpaceKnow uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Multi-Source Data Integration, Proprietary AI Algorithms, Automated Monitoring & Early Warning System, Defense & Intelligence, Construction Monitoring, How Sanctioned Russian Vessels Move in Plain Sight, SpaceKnow Guardian for Construction Monitoring, SpaceKnow and IMALBES announce partnership aimed at forest monitoring.
Based on user reviews and social mentions, the most common pain points are: token usage, cost tracking.
Based on 81 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.