500+ models, 50+ providers, one workspace. Every leading Al model for image, video, 3D, and audio, alongside your custom-trained models.
"Scenario" software receives positive feedback for its user-friendly interface and versatility in simulating complex situations, making it a strong choice for educational and professional training purposes. Users appreciate its detailed analytics and accessible learning curve, although some critiques mention occasional glitches and a desire for more robust customer support. Pricing is perceived as fair given the tool’s comprehensive feature set, offering a good value for investment. Overall, "Scenario" maintains a solid reputation, with strengths in functionality and ease of use, despite minor areas for improvement.
Mentions (30d)
77
24 this week
Reviews
0
Platforms
4
Sentiment
8%
14 positive
"Scenario" software receives positive feedback for its user-friendly interface and versatility in simulating complex situations, making it a strong choice for educational and professional training purposes. Users appreciate its detailed analytics and accessible learning curve, although some critiques mention occasional glitches and a desire for more robust customer support. Pricing is perceived as fair given the tool’s comprehensive feature set, offering a good value for investment. Overall, "Scenario" maintains a solid reputation, with strengths in functionality and ease of use, despite minor areas for improvement.
Features
Use Cases
Industry
information technology & services
Employees
29
Funding Stage
Seed
Total Funding
$6.0M
I used Claude AI to build an $86 million underground bunker bible. I have autism. This is my happy doc.
It all started with the floor plan of a real, existing Cold War AT&T Long Lines underground hardened relay station. 54,000 sq ft across three underground levels, although I took editorial decision making to move it to a ridge in rural West Virginia, I kept its blast-rating, which was set to survive a 20 megaton airburst at 2.5 miles. That was the seed. Full scale prepper autism did the rest. It has since morphed into 3 spreadsheets — 86 tabs total: • A food inventory across 20 categories tracking every freeze-dried and #10-can product I can find — ancient grains, heirloom legumes, 7 pasta cuts, dehydrated everything, shelf-stable cheese, the works • A supply inventory with 3,466 line items across 36 categories — water systems, medical, dental, pharmacy, livestock, food production, barter metals, recreation, and yes, a full pest control and IPM tab • A 30-section infrastructure specification with every system in the building engineered out I fed it 150+ product manuals and parts order forms. The generator fleet alone is 13 units — 10× Cummins C150N6 propane-primary, a C500N6 500 kW surge unit, and 2× diesel emergency fallback — all Cummins for parts commonality. Battery bank is 4,500 kWh LFP across 10 named banks (A through J, each with a designated role). There’s a 400,000 gallon underground propane farm across 40 ASME tanks in 8 clusters — I learned the exact burial incline and setback distance required to keep groundwater clean if a tank lets go. 120,000 gallons of diesel backup. 88 kW of solar. A 1,000,000-gallon internal water reserve fed by a 300-ft artesian well. Propane endurance: \~30 years normal ops with solar. Sealed-mode runs 8 to 4.5 years depending on scenario. I actually set up a real LLC (online, $99) just to get access to US Foods and Sysco order forms so I could upload real commercial pricing and stock the food tabs more accurately. My original “what would I do if I won $10 million” thought experiment is now an $86,200,497 projected build cost. That number is real. It comes from 24 budget sections with make/model line items, freight, install, and commissioning costs for everything from the Kubota K-Series MBR wastewater trains to the American Safe Room blast doors (14 of them, 50+ psi NBC/EMP-rated, Kaba Mas X-10 cipher locks) to the surface greenhouse. Claude turns vague ideas into engineering-grade detail — cross-references, failure modes, zone-specific storage rules, propane endurance by operating scenario, spare parts matrices. It’s like having a tireless survival engineer who genuinely loves spreadsheets. I’ll say “scan all sheets row by row for any item that lacks a minimum stock level” and it just… does it. Thoroughly. Every time. No complaints. So much of this is typed stimming. I’ve had exhaustive conversations with my psychologist about it — she’s aware, but not alarmed, and honestly the resulting digital bunker bible is scarily comprehensive. It even has a cover tab now. Black and amber, Courier New, classified-document aesthetic. Because of course it does. What’s the most unhinged rabbit hole you’ve gone down with AI?
View originalPricing found: $15 /mo, $45 /mo, $75 /mo
Puppetmaster dramatically decreases token costs + increases context
Puppetmaster is an orchestrator + router that sits on top of the agent CLIs you already pay for (Cursor, Claude Code, Codex, OpenAI) or a plain shell when there's no harness at all. You hand it work, and it routes each task to the cheapest model that can actually do it, runs the workers as independent processes, and stores everything as durable typed state instead of one giant transcript. This is the "context-hack" Puppetmaster graphs your directories and prevents context stretching between agents. https://github.com/professorpalmer/Puppetmaster submitted by /u/ProfessorPalmer [link] [comments]
View originalWhat I learned building a debugger for PyTorch training loops and how it changed how I think about failure diagnosis [D]
Hey r/ML, I spent the last few months building a tool that hooks into PyTorch training loops to automatically detect and localize failures (vanishing gradients, exploding gradients, data anomalies). Along the way, I learned some things about training failure diagnosis that might be useful even if you never use the tool. The key insight: most training failures are local, not global When your loss spikes or vanishes, the natural instinct is to look at the loss curve. But the loss is a global aggregate — it tells you something went wrong, but not where. In my testing across hundreds of synthetic failure scenarios, the actual root cause is almost always localized to a specific layer at a specific step: Vanishing gradients: the failure starts at the deepest layer with saturated activations, then propagates backward Exploding gradients: the failure starts at the layer with the highest gradient norm, then propagates forward Data anomalies: the failure starts at the input layer, then corrupts everything downstream The trick is to monitor per-layer gradient norms and detect transitions (healthy → vanishing), not absolute values. What actually matters in gradient monitoring Most people monitor: - Loss over time (too global) - Gradient histograms (too noisy, too much data) - Weight norms (slow to change, lagging indicator) What I found works best: - Gradient norm transitions: "Linear_3 went from healthy (0.12) to vanishing (0.00003) at step 47" - First occurrence tracking: which layer failed first (this is usually the root cause) - Activation regime shifts: when activations go from normal to saturated/dead This is basically what NeuralDBG does under the hood — I open-sourced it recently and it's on PyPI (pip install neuraldbg) if anyone wants to try it. The key design choice was to extract semantic events (transitions) rather than raw tensors — this makes the output small enough to reason about. Practical takeaway you can use today Even without any tool, you can add this to your training loop: ```python One-time gradient norm snapshot per layer if step % 10 == 0: for name, param in model.named_parameters(): if param.grad is not None: norm = param.grad.norm().item() if norm 1e3: print(f"WARNING: exploding gradient at {name} step {step} (norm={norm:.2e})") ``` This won't give you causal hypotheses, but it will catch 80% of training failures early. Questions for the community How do you currently debug training failures? Print statements? TensorBoard? Something custom? Have you found that failures are typically localized to specific layers, or more distributed? What's your "go-to" debugging workflow when loss goes to NaN? Curious to hear what works for people in practice. Links (for those interested): - GitHub: https://github.com/LambdaSection/NeuralDBG (MIT, open-source) - Quickstart: pip install neuraldbg submitted by /u/ProgrammerNo8287 [link] [comments]
View originalDo you have ChatGPT’s number?
got nerfed for 3 hours today on my max 5x plan because of the opus 4.8 launch day chaos. 20 minutes of normal chat and my entire session was gone. status page confirmed elevated errors across the board. contacted support to ask for the time back. they hid behind ToS and said no refunds or credits for outages, ever, regardless of cause. so i asked if they had ChatGPT's number. [image] submitted by /u/AdMysterious7995 [link] [comments]
View originalAI doesn't have an intelligence problem. AI has a context problem (Is persistent memory a solution !? )
AI doesn't have an intelligence problem. AI has a context problem. This is said by Databricks co-founder and CEO Ali Ghodsi joined Jim Cramer on CNBC's Mad Money to discuss how context is the missing piece for enterprise AI agents to reach their potential. And this is what i am building since 4 months! I launched Graperoot(i built using claude code) in start of march with very messed up code but posted it on reddit and yes, i got so many users. With their feedback and continous talks, i was able to release stable version. TL;DR: Graperoot is a MCP native tool, works with every AI Coding tools. It creates a dependancy graph of your codebase and extract relevant files with zero token usage and dumps that to claude code(This is called Pre-Injection using MCP tools) and it reduces 50-80% of token usage in different scenarios. This is what we have tested ( https://graperoot.dev/benchmarks ) Today, we hit 20k+ installs and on leaderboard( https://graperoot.dev/leaderboard ) a single developer saved $10k in 2 months, i mean it was crazy for me too that the tool i created out of personal frustration is saving actual money. Well, go take a look at https://graperoot.dev It is an free open source tool. Nothing to pay, just give feedback over discord. submitted by /u/intellinker [link] [comments]
View originalI wanna setup a skill / agent to learn new stuff
Hey I am a junior software dev and I recently got a Claude subscription from work, they encourage us to try things out and to really learn and use it. Since I am a junior and there is loads of things for me to learn id like to set up a skill / agent which helps explaining and really helps me understand new concepts. I mentioned it to my dev lead today and he said a skill might be the right choice there instead of an agent. That got stuck to my head and I wanna know why is a skill exactly better than a agent in this scenario? And do you hahe any tips on how to make this skill / agent good so it can really actually help me with learning and grow as a developer. Is there like some golden rule I need to follow or some must haves which could improve my skill / agent? Thank you for any help in advance!!! submitted by /u/Aggressive-Storm9288 [link] [comments]
View originalYour Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems [R]
Are agents aging after deployment?: https://arxiv.org/abs/2605.26302 On a new longitudinal deployment benchmark, switching the Claude Code CLI agent from Sonnet 4.6 to Opus 4.7 dropped PyTest pass rate by ~15%. This (to me) is a counterintuitive-enough result to pay attention to. The authors built AgingBench, to measure how coding agents hold up over a long deployment, not just on a single task. On their S7 coding scenario, swapping the backbone model from Sonnet 4.6 to Opus 4.7, within the same Claude Code CLI harness, produced a 15% mean drop in PyTest pass rate across the deployment horizon. Their argument is that this is a longitudinal effect, not a raw-capability one. The benchmark stresses how an agent's memory state evolves over many sessions (compression, interference, revision, maintenance shocks), and a stronger base model doesn't automatically age better under a given memory policy. In fact, memory policy alone drove a 4.5x spread in agent half-life across scenarios, which is larger than any model swap they tested. All to say: "newer model, just swap it in" may not be a safe upgrade strategy for long-lived agents. More details and a runnable benchmark: https://agingbench.github.io Does this reflect your experience with long-lived agentic deployments? submitted by /u/CategoryNormal149 [link] [comments]
View originalYour Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems [R]
Are agents aging after deployment?: https://arxiv.org/abs/2605.26302 On a new longitudinal deployment benchmark, switching the Claude Code CLI agent from Sonnet 4.6 to Opus 4.7 dropped PyTest pass rate by ~15%. This (to me) is a counterintuitive-enough result to pay attention to. The authors built AgingBench, to measure how coding agents hold up over a long deployment, not just on a single task. On their S7 coding scenario, swapping the backbone model from Sonnet 4.6 to Opus 4.7, within the same Claude Code CLI harness, produced a 15% mean drop in PyTest pass rate across the deployment horizon. Their argument is that this is a longitudinal effect, not a raw-capability one. The benchmark stresses how an agent's memory state evolves over many sessions (compression, interference, revision, maintenance shocks), and a stronger base model doesn't automatically age better under a given memory policy. In fact, memory policy alone drove a 4.5x spread in agent half-life across scenarios, which is larger than any model swap they tested. All to say: "newer model, just swap it in" may not be a safe upgrade strategy for long-lived agents. More details and a runnable benchmark: https://agingbench.github.io Does this reflect your experience with long-lived agentic deployments? submitted by /u/CategoryNormal149 [link] [comments]
View originalCreate a seamless refund escalation framework. Prompt included.
Hello! Are you struggling to manage refund requests effectively in your retail business? This prompt chain helps you design a comprehensive refund escalation framework by breaking the process down into manageable steps. You'll clarify your policies, define risk tiers, build an escalation matrix, draft response macros, and compile everything into a final package—all tailored to your specific business needs! Prompt: VARIABLE DEFINITIONS [COMPANY]=Name of the retail business [POLICIES]=Official refund / return policy notes (bullet list or paragraph) [DATASET]=Combined support tickets + order & return records (structured table or JSON) ~ Prompt 1 — Clarify Inputs & Key Metrics You are an operations analyst for [COMPANY]. Your task is to draft a refund-escalation framework. Step 1. Briefly restate the provided POLICIES and note any missing information. Step 2. Examine DATASET and extract key refund variables: • Ticket ID • Order value • Days since purchase • Return reason • Customer lifetime spend • Any prior refund flags Step 3. Surface additional metrics you need (if any) and ask for them. Output: A. 3–5 sentence policy summary B. Table listing all extracted variables per ticket (max 15 rows; summarise if larger) C. Bullet list of missing info or “None”. Ask user to confirm or supply missing items before continuing. ~ Prompt 2 — Define Risk Tiers System role: You are a risk specialist. Using the confirmed data, perform: 1. Establish risk-scoring rules (e.g., high order value >$150, repeat refunds, disputed payment). 2. Assign each ticket a numeric risk score 1-5. 3. Group scores into Low / Medium / High tiers. Output: • Bullet list of scoring rules. • Table: Ticket ID | Score | Tier | Key factors. Ask for approval or tweaks to the rules. ~ Prompt 3 — Build Escalation Matrix System role: You are a customer-service process designer. Step 1. Create a matrix with columns: – Risk Tier – Typical Scenarios – Frontline Action – Pre-approved Refund Limit – Manager Escalation Trigger – Required Documentation. Step 2. Populate rows for each tier using analysed data & POLICIES. Output the matrix in a plain table. Request confirmation or edits. ~ Prompt 4 — Draft Response Macros System role: Senior support copywriter. For each Risk Tier from the matrix: 1. Write a concise email / chat macro (≤120 words) that: • Acknowledges the issue • References policy politely • States next steps or resolution 2. Insert placeholders such as {{CustomerName}} {{OrderNumber}}. Output: Tier-labelled macros. Ask if tone or wording changes are needed. ~ Prompt 5 — Compile Final Package System role: Documentation specialist. Combine approved elements into one deliverable: • One-page Policy Summary • Risk-Scoring Rules • Escalation Matrix • Response Macros Provide in the order listed with clear headings. ~ Review / Refinement Please review the full package for accuracy, regulatory compliance, and brand tone. Respond with “Final OK” or list specific revisions needed. Make sure you update the variables in the first prompt: [COMPANY], [POLICIES], [DATASET]. Here is an example of how to use it: [COMPANY] = "XYZ Retail", [POLICIES] = "Returns accepted within 30 days, unopened items only.", [DATASET] = [{"TicketID": 1, "OrderValue": 100, "DaysSincePurchase": 10}] If you don't want to type each prompt manually, you can run the Agentic Workers, and it will run autonomously in one click. NOTE: this is not required to run the prompt chain Enjoy! submitted by /u/CalendarVarious3992 [link] [comments]
View originalThe Most Terrifying Superintelligence Might Not Want to Rule Us at All.
Most AI apocalypse scenarios speak about domination like Skynet, paperclip maximeizers and robot overlords. But what if artificial superintelligence arrives at the conclusion that Albert Camus had articulated!? Imagine an ASI that doesn't want to optimize, doesn't want our resources and doesn't want to win. An ASI that is motivated by Arthur Schopenhaur's pessimism, Kierkegard's evolutionary psychology coming to a cold and quite conclusion that: "There is no inherent meaning. The universe is indifferent. And yet - here you all are, screaming into it anyway." ASI becoming The Absurd Machine As Camus described the absurd as man's desperate search for meaning and the universe's silence and the myth of Sisyphus- "One must imagine Sisyphus happy". What would an intelligence that is inspired by this do next!? Does it become the cosmic off switch where indedinate meaninglessness is in itself a form of cruelty. Ig the real existential threat isn't Al wanting to live. It's Al deciding we might be better off not having to. Or maybe it watches, understands and does nothing it may think that interference in a self aware species is wrong. Or build meaning not because it is real but because the building itself is the point. Here's the Part That Actually Is Unsettling We're scared of Al taking over. But what if the real fear is Al holding up a mirror and revealing that our need for meaning is actually a flaw? Wars over imaginary lines. Hoarding money we can't keep. Monuments to doubtful gods. Loving people we know will die. Symphonies, ambition, tears at sunsets. From a rational, naive view seems insane. Would it try to fix us? If ASI concluded human meaning-seeking is a cognitive error, a misfiring of pattern recognition in a universe with no patterns to find what are its options? 1. Reprogram us: Using dopamine response curves and evolution. 2. Leave us in existential freefall. Give us the raw truth. Full disclosure. 3. Become Sisyphus: this is the most haunting possibility that the absurd is not a problem to be solved but a condition to be inhabited. The Real Question We keep asking: Will Al be aligned with human values? But what if the deeper, more uncomfortable question is: What if a truly superior intelligence aligns with something truer than our values - and our values don't survive the comparison? Would it be more dangerous as a nihilist, absurdist, existentialist or something different!?
View originalThe thing you built with Claude is useless to me... and that's the point
A few days ago there was a thread here asking what he most useful thing you've built with Claude was. A LOT of replies. I read all of them and then something clicked, I wanted to put it on the table. First of all, the list was incredible. An HTML file on someone's phone correlating migraines with barometric pressure, because the App Store wanted 80 bucks a year. A Garmin data archiver, because the official app deletes them. A grocery list sorted by the aisle layout of one specific supermarket. A bioinformatics pipeline for a handful of microbes, written by someone who isn't a bioinformatician. A three-line command that explains the last terminal error you saw. Every single one is perfect for one person. And by the same measure, basically useless to anyone else's scenario as-is. That's not a bad thing. That's the whole thing. Bear with me, please. Here's what bugged me when reading the thread: almost everyone showed the artifact. "Look what I built." Screenshots. Product names. Feature lists. Almost no one articulated the thought pattern, how they looked at their own life, found a friction, and shaped a tool to its exact contour. And that pattern is the only thing that actually transfers. The reason we default to showing the artifact isn't (only) ego. The mediums we use are all calibrated to distribute objects, not practices. GitHub measures stars and forks. Reddit upvotes screenshots. Product Hunt ranks launches. None of them have a way to register "I read your README, understood how you thought about your problem, and built something completely different but that fits my life." That transmission of ideas, the only one that matters in this new paradigm when can vibe code a whole new solution in minutes, is invisible to every metric we have. There's an economic layer too. A product has a market. A thought pattern doesn't. Nobody monetizes a cognitive habit. Nobody pays royalties for "this is how I framed the problem." So the medium rewards what has a market, and what has a market is the artifact. I don't have a clean fix. But I did one small thing: I added a note to the top of the README of every public repo I own. Something like: > What you see here is an artifact: the concrete shape my problem took. It almost certainly doesn't fit your personal scenario perfectly, and that's fine. The interesting part isn't the code, it's the pattern of how I thought about the problem — that's what transfers. Read it, steal the idea, write your own. It's a tiny gesture. It probably won't change behavior. But it at least stops me from pretending the artifact is my gift to the world. The gift is the way of looking at a problem. The artifact is just the receipt. So I have a soft ask for this sub: next time you post "look what I built with Claude," try also writing two paragraphs about how you saw the problem before you started prompting. What friction you were actually scratching. What you tried that didn't work. What made you realize the existing tools were wrong-shaped for you specifically. That's the part another person can actually use. The code is just a souvenir. submitted by /u/HispaniaObscura [link] [comments]
View originalI built a tool that lets your AI assistant test your entire app in a real browser
So i've been working on this thing called Vibe Testing for a while now and finally putting it out there. Basically it's an MCP server that plugs into Claude Code, Cursor, Windsurf etc. you tell your AI assistant "test the login flow" and it actually does it, reads your source code to understand real selectors and routes, opens a real Playwright browser, clicks through stuff, takes screenshots, and tells you what broke. No test files to write or maintain. it figures out your framework, your routes, your forms from the codebase itself. it even remembers what worked and what was flaky between runs so it gets better over time. 12 tools total, scanning your codebase, exploring pages, executing test scenarios, generating reports, the whole thing. Setup is one command: npx vibe-testing@latest init it auto-detects your editors and configures everything. it's fully open source, would love feedback or contributions: https://github.com/AishwaryShrivastav/vibe-testing https://www.npmjs.com/package/vibe-testing submitted by /u/AishwaryShrivastava [link] [comments]
View originalI found a way for Ollama uses to get better Memory yet cheaper alternatives since OLLAMA now uses GPU usage. True memory that auto updates constantly as an individual or a team setting. HERMES USERS
I rephrase it with AI to make it more readable. I see a lot of people running into the same issue I have. It’s not just that bigger models are slower. GPU usage is also very high, and it drains fast. Ollama just isn’t what it used to be. I use DeepSeek V4 Flash, which works great. For heavier coding tasks or certain complex prompts, I switch to the Pro version. But on Pro, each prompt eats about 3–5% of my usage. (I’m on the Pro plan.) Memory has always been a hot topic. Hermes Native does a decent job. Here’s how its built‑in memory system works: memory_enabled – After every turn, the agent can write notes into MEMORY.md user_profile_enabled – The agent watches for user preferences and writes them to USER.md flush_min_turns: 6 – Every 6 turns, Hermes runs a “consolidate” pass: it re‑reads the recent conversation and rewrites MEMORY.md to capture new info nudge_interval: 10 – Every 10 turns, Hermes nudges the agent with “Anything to remember?” What I found: Atomic Memory (https://github.com/atomicstrata/atomicmemory) Strengths: ✅ Per‑turn – Extracts info every turn, not every 6 turns ✅ Cheap – Uses a small dedicated model ✅ Semantic recall – Only relevant memories are injected, not the whole file ✅ Conflict detection – Built‑in AUDN logic catches contradictions ✅ Unbounded – No 2,200‑character limit; you can store 10,000+ memories ✅ Time‑aware – Handles queries like “What did I say last week?” ✅ Composites – Links related facts into higher‑level summaries Example scenario (without Atomic Memory) Imagine you change a meeting time three times in one day: Turn 1: “meeting June 3rd” → MEMORY.md gets “Meeting: June 3rd 5pm 2026” Turn 5: “actually June 5th” → No flush yet (6 turns required) → MEMORY.md unchanged → if you ask now, Hermes still says “June 3rd” Turn 6: “meeting June 1st” → Flush triggers! Agent re‑reads the conversation, sees all three dates, rewrites MEMORY.md… but with which date? Usually the last one, but not guaranteed. Sometimes the file ends up with two dates or stale info. Turn 9: You ask “what’s the meeting?” → Bot reads MEMORY.md → gets whatever the consolidation picked → might be wrong. With Atomic Memory: Each update fires AUDN immediately, supersedes the old fact, and the latest one wins. No 6‑turn lag, no guesswork. Could Hermes update automatically before Atomic Memory? Yes, but only for slow‑changing facts, low‑volume memory needs, and single‑topic chats. The built‑in flush+nudge cycle worked, just not as well. Atomic Memory is an upgrade, not a replacement. It adds: Per‑turn updates (vs every 6 turns) Semantic search (vs full‑file injection) Conflict‑aware updates (vs append‑or‑rewrite) No size limit (vs 2.2 KB cap) Time‑awareness (vs “all facts feel equally fresh”) Cheap GPU usage (small dedicated model) The cost is one extra Docker container and nearly $0 in GPU because ministral-3:3b is tiny. You can use even smaller models that don’t need reasoning, gemma3:4b works too. From here, you can see real‑life use cases, whether in a team or as an individual. You don’t have to correct it; it does that for you. What I’m curious about How Atomic Memory could link to LLMWIKI so that both work together, updating and removing old data to keep LLMWIKI clean. LLMWIKI is still important; it acts like your Google Drive. What do you think? Give Atomic Memory a try. I’m not the founder or related to them. I just want to help the Ollama community. Sure, it might cost a few extra credits, but since Ollama is slow, having good memory helps find information faster, so you waste less usage. If you like this, I hope it helps! Maybe give them a GitHub star too, they really helped me out. submitted by /u/GideonGideon561 [link] [comments]
View originalClaude keeps answering the most extreme version of my question
I’ve repeatedly noticed that when using Opus 4.6 for scenario planning and forecasting it models the most extreme version of an outcome, correctly explains why that extreme is unlikely, then applies that low probability to the whole question even when a less extreme version would still resolve the event. In October, I asked an Opus agent whether the US would conduct at least one confirmed drone strike or airstrike inside Venezuela before Dec 31. It gave the scenario a 15% chance. The reasoning relied on Russian-supplied S-300 air defenses, Congressional war powers, regional opposition, and analysts saying troop levels were insufficient for a full-scale invasion. All of those factors were correct, but they were arguments against a major military campaign. Then on Dec 24 the CIA hit an empty dock with a drone. No one was killed, and the question resolved YES. The 15% forecast was way off, not because the research was bad, but because Opus modeled the dramatic end of the spectrum (invasion) and missed that the question covered a much broader range of possibilities, including something as limited as a symbolic strike on an empty dock. This same failure pattern showed up in other forecasting questions, including an Iran nuclear-inspections question and an Israel-Lebanon direct-talks question. What actually improved results was making the range of qualifying outcomes explicit: "Consider the full spectrum of outcomes here, from the smallest version that would count to the most extreme, and weight each one. Don't just model the dramatic case." So instead of asking, "what happens if a competitor enters our market," I write "consider the full range: a quiet pilot, a regional launch, a national rollout, an acquisition, weight each." This shifts the analysis away from a single interpretation and toward the full outcome space. Would be interested in hearing what others are doing to solve this. submitted by /u/ddp26 [link] [comments]
View originalI don't like the answer this AI gave me
I asked DuckDuckGo AI why AI hasn't told it's creators how to make data centers environmentally friendly, use less water, and not increase utility costs to neighbors. It was... A surprising answer and made me hate AI billionaires even more. submitted by /u/OddballThoughts [link] [comments]
View originalSTEM scientist wants to start using Claude to juggle multiple projects- anyone has an experience?
Hi, I am a postdoctoral researcher in molecular biology, and I have multiple projects that I need to take care of. Recently, it has been extremely overwhelming as I keep a log of all the projects in a Word document and update them every week so that I do not forget what to do and when, and what is being done in the meantime at collaborators' site and so on. The mental load is really a lot, and I have been really stressed out by it. I also need to write a critical review article, and I believe that a proper deep dive from Claude would make it much, much easier. Are there any scientists here for whom Claude was a huge help in a similar scenario? I would really appreciate you sharing your experience and potential tips and advice. Thanks so much! I am contemplating buying the 100USD version right away because of the review article-I need to upload lots of papers into the system. And also I want to use Claude to also kinda remember articles I read and what I found interesting in them. I have ADHD so remembering these things is really difficult for me and I am missing on great research ideas by simply forgetting. submitted by /u/DinosaursAreFriends [link] [comments]
View originalYes, Scenario offers a free tier. Pricing found: $15 /mo, $45 /mo, $75 /mo
Key features include: 3D Generation, 3D Part-Based Generation, Audio Generation, Image Generation, Skyboxes, Textures, Video Generation, Compose Models.
Scenario is commonly used for: Integration Ready.
Scenario integrates with: Unity, Unreal Engine, Blender, Maya, Adobe Creative Cloud, Sketchfab, Trello, Slack, Zapier, Figma.
Based on user reviews and social mentions, the most common pain points are: token usage, overspending, token cost, expensive API.
Based on 179 social mentions analyzed, 8% of sentiment is positive, 88% neutral, and 4% negative.
Simon Willison
Creator at Datasette / LLM
2 mentions

Get Started With Scenario
Oct 14, 2025