Prompt Security is the AI security company helping you manage GenAI risks. Identify, analyze, and secure vulnerabilities in LLM-based applications wit
Users generally appreciate "Prompt Security" for its advanced capabilities in managing and coordinating AI agents with secure integrations, as seen in applications such as Claude Code. There are, however, concerns about the lack of restrictions in certain implementations, particularly with applications not adequately mitigating security risks like unrestricted chat access. Pricing sentiment is not explicitly mentioned, but the focus on high-level security features suggests its target towards professional or enterprise users might impact affordability. Overall, "Prompt Security" has a strong reputation for innovative security measures but highlights a need to better address specific security vulnerabilities in its execution.
Mentions (30d)
47
2 this week
Reviews
0
Platforms
2
Sentiment
10%
13 positive
Users generally appreciate "Prompt Security" for its advanced capabilities in managing and coordinating AI agents with secure integrations, as seen in applications such as Claude Code. There are, however, concerns about the lack of restrictions in certain implementations, particularly with applications not adequately mitigating security risks like unrestricted chat access. Pricing sentiment is not explicitly mentioned, but the focus on high-level security features suggests its target towards professional or enterprise users might impact affordability. Overall, "Prompt Security" has a strong reputation for innovative security measures but highlights a need to better address specific security vulnerabilities in its execution.
Features
Use Cases
Industry
computer & network security
Employees
47
Funding Stage
Merger / Acquisition
Total Funding
$273.0M
Anthropic officially launched 13+ FREE AI courses with certificates (Including Agentic AI and CC)
Shipped it at 2am, still broken. Kid woke up crying right after, completely lost my train of thought. While trying to rock him back to sleep with one hand and doomscrolling with the other, I stumbled on something that almost nobody is talking about yet. Anthropic just quietly dropped a massive library of 13+ completely free AI courses. And I mean actually free. No paywall hiding the final lesson, no credit card required upfront to 'secure your spot.' They even give you an official certificate of completion directly from Anthropic when you finish. If you're like me, you're probably sick of seeing Twitter gurus charging $299 for recycled YouTube content and a messy Notion template. This is the exact opposite. It’s built directly by the team that actually makes Claude, hosted on their official Academy site. I skimmed through the catalog this morning while drinking my third coffee, and there are basically four skill levels they cover. Here is what caught my eye as a dev who just wants to automate my workflow and log off by 5 PM: First, they have the introductory stuff like Claude 101 and AI Fluency. Honestly, I'm making my non-technical clients take the Fluency one. It builds a realistic mental model of what AI does well right now versus where it completely fails. If it saves me from explaining why hallucinations happen for the hundredth time, it's a massive win. But the real meat is in the technical tracks. They have a dedicated course on Agentic AI and another one specifically for CC. I took a quick pass at the CC module because I've been trying to get it to handle my tedious Jira ticket boilerplate. Having an official guide on how Anthropic actually expects you to prompt their agent is incredibly useful. It shows you the exact patterns for chaining commands and keeping the context window clean. For those of us messing around with local models or trying to orchestrate our own agents, the Agent Skills course is surprisingly relevant. They don't just say 'use Claude'—they break down the actual logic of tool use, delegation, and discernment. It translates pretty well even if you're running Llama 3 locally and just want to understand the current best practices for tool calling architectures. With CC, they show you how to give the CLI tool the right guardrails so it doesn't just nuke your directory when a prompt gets misinterpreted. We've all been there. Do the certificates actually matter? If you are an indie hacker, probably not. But roles requiring AI literacy have spiked massively over the last year. If you are applying for corporate gigs or consulting, having an official Anthropic cert on your LinkedIn definitely won't hurt to get past the HR filters. Kid's awake again, gotta run. Has anyone else dug into the Agentic AI track yet? Curious if their suggested patterns hold up when you throw them at a messy, legacy codebase.
View original/simplify behavior that runs four cleanup agents for reuse - what's new in CC 2.1.154 (+11,516 tokens)
NEW: Agent Prompt: /simplify slash command — Adds /simplify behavior that runs four cleanup agents for reuse, simplification, efficiency, and altitude findings, then applies safe fixes while skipping behavior-changing or out-of-scope suggestions. NEW: Data: Claude Code live documentation sources — Adds official Claude Code documentation URLs and topic-specific WebFetch prompts for commands, settings, hooks, MCP, skills, subagents, IDEs, deployment, security, and related surfaces. NEW: Data: Claude Code recent changes reference — Adds a reference for renamed or removed Claude Code commands, flags, and terms, including /output-style, /pr-comments, /vim, /extra-usage, --enable-auto-mode, and stale naming guidance. NEW: Skill: Claude Code configuration guide — Adds a Claude Code configuration skill that checks the live build, bundled recent-change references, and current documentation before answering questions about commands, flags, settings, hooks, skills, MCP servers, subagents, IDE integrations, and related configuration. Agent Prompt: Claude guide agent — Adds stale-knowledge handling that tells the guide agent to disclose documentation fetch failures instead of silently answering Claude Code command, flag, or settings questions from memory. Agent Prompt: Security monitor for autonomous agent actions (first part) — Expands security review with explicit final-destination tracing for writes, commits, pushes, uploads, publishes, and sent data before deciding whether a boundary-crossing action should be blocked. Agent Prompt: Security monitor for autonomous agent actions (second part) — Strengthens data-exfiltration rules around trust boundaries, automated pathways, unverified destinations, credential leakage into persistent artifacts, and destination/resource/operation-scoped allow exceptions. Data: Anthropic CLI — Updates Anthropic CLI authentication guidance to cover SDK-style credential resolution, OAuth profiles from ant auth login, ant auth print-credentials, bearer-token usage for raw HTTP, and precedence between API keys and auth tokens. Data: Claude API reference — cURL — Updates examples and adaptive-thinking guidance for Opus 4.8. Data: Claude API reference — Go — Updates the recommended Go SDK model constant and examples from Opus 4.7 to Opus 4.8. Data: Claude API reference — Python — Updates credential guidance for API keys, auth tokens, and ant auth login; adds beta mid-conversation system-message examples; and extends adaptive thinking and compaction guidance to Opus 4.8. Data: Claude API reference — TypeScript — Updates credential guidance for API keys, auth tokens, and ant auth login; adds beta mid-conversation system-message examples; and extends adaptive thinking and compaction guidance to Opus 4.8. Data: Claude model catalog — Adds Claude Opus 4.8 as the current most powerful Opus model with a 1M input window and updates Opus model-selection examples and legacy recommendations to prefer claude-opus-4-8. Data: HTTP error codes reference — Updates authentication fixes for OAuth bearer tokens and expands Opus model-specific 400 guidance to include Opus 4.8. Data: Managed Agents reference — Python — Updates client initialization examples to prefer environment, auth-token, or ant auth login credential resolution before explicit API-key injection. Data: Managed Agents reference — TypeScript — Updates client initialization examples to prefer environment, auth-token, or ant auth login credential resolution before explicit API-key injection. Data: Prompt Caching — Design & Optimization — Adds beta mid-conversation system-message guidance as a cache-preserving and prompt-injection-safe way to send operator instructions without editing the top-level system prompt. Data: Streaming reference — Python — Updates adaptive-thinking examples for Opus 4.8. Data: Streaming reference — TypeScript — Updates adaptive-thinking examples for Opus 4.8. Data: Tool use concepts — Updates adaptive-thinking examples for Opus 4.8. Skill: Agent Design Patterns — Replaces mid-session guidance with beta role: "system" messages for supported models, with retained as the fallback. Skill: Building LLM-powered applications with Claude — Adds Opus 4.8 to current model guidance, updates adaptive thinking, effort, task-budget, compaction, and migration recommendations, and documents beta mid-conversation operator instructions. Skill: Model migration guide — Adds Opus 4.8 migration guidance, including no new API breaking changes from Opus 4.7, model-ID updates, mid-session system prompts, long-horizon agentic tuning, effort recommendations, tool-triggering behavior, narration changes, ask-rate calibration, and visible-reasoning mitigation. System Prompt: Background session instructions — Changes temporary-file guidance from $CLAUDEJOBDIR to $CLAUDEJOBDIR/tmp for background sessions. System Prompt: Coordinator mode orchestration — Updates PR activity subscription guidance and changes worker summary account
View originalThis system cuts wasted tokens by making Claude map your code before writing anything
What it does: Stops the fluff — Forces Claude to skip polite prose and filler. Every response is pure logic and code, nothing else. Maps before it touches — Claude has to understand your entire codebase structure before writing a single line. No more broken APIs. Pushes back on you — If your prompt is vague or your logic is weak, Claude argues back. It won't just blindly write bad code. Catches bugs before they exist — Proactively flags race conditions, security issues, and duplicate logic during the mapping phase itself. Kills the sycophancy — No more "great idea!" responses as context grows. It stays sharp and critical till the end of the conversation. How to use it: Go to the gist and copy the CLAUDE.md file In Claude Code or Cursor, add it as your project-level system prompt or drop it in your root as CLAUDE.md Copy SHARED-GROUNDING.md too is a short rule that makes Claude explain what it's doing before every tool call. Start a new conversation. Don't continue an existing long chat the whole point is a clean context Give it a complex task and watch it ask clarifying questions and map dependencies before touching anything One catch: Don't use this for small scripts or quick fixes. The upfront token cost isn't worth it unless you're building something complex. The Files You'll Feed submitted by /u/Choice-Highlight-369 [link] [comments]
View originalClaude 4.8 is a truly masterpiece of sh..t
I asked the mf to make a list with the latest errors - but these are only a few, of course, and don't include the stupid conversations with the id..ot: Here's an honest, complete accounting of my errors in the last ~24h (all 2026-05-30 unless noted). Grouped by type; each with what happened, impact, and root cause. A. Guessing instead of reading reality (the core failure pattern) 1. Guessed a PR number for a merge. Ran gh pr merge 1... assuming it was my exec- PR. #1... was an unrelated bookkeeping PR. I extrapolated the number from memory instead of capturing it from the gh pr create output. Impact: believed my confirmations were merged when they weren't. Root cause: prediction over verification. 2. Closed the wrong PR. Closed #1... — which was actually my real exec-... PR — believing it was a duplicate. I checked only its state (saw #... "MERGED") and never read #...'s title/content. Impact: Execution's confirmations (.../.../etc.) left unmerged; a destructive action taken on a guess. Root cause: pattern-match ("looks like a duplicate") over a cheap available check. 3. Hardcoded a wrong worktree path — first time. Used /tmp/d9-sweep3-... when the real $(date +%s) path was …. Every edit failed; nearly wrote into the main worktree (the && chain saved it). 4. Hardcoded a wrong worktree path — second time. Same bug again: /tmp/d9-sweep4-... vs real …. The ... status flips never applied. Impact (3+4): wasted cycles, incomplete sweep, churn. Root cause: hardcoding a volatile timestamp across separate shell calls instead of reading the saved path. 5. The psql shell-quoting bug (earlier in session). PGENV='-U ... -d ...' passed as a single argument → FATAL: role " ... -d…" does not exist. Assumed variable-splitting behavior instead of passing flags as separate args. B. Wrong premises / unverified assumptions 6. "The sandbox has no DB" — sustained false premise. Repeatedly claimed neither ... nor I could reach the live DB. False — the Docker stack was up and reachable. This created the whole "hold for the user's Docker batch" backlog. 7. Over-extended "Docker is live" → "attestation suite is runnable." In ... I told ... to self-attest + self-merge on the live DB. Wrong — reachable port ≠ runnable suite (no node/vitest; DB writes classifier-denied). Had to reverse it (...). 8. The "no-code" stance on the ...-projects plan (2026-05-29→30). An early plan iteration proposed a "metadata-only, no code" container, contradicting both ... and your explicit requirement. Reversed to "code is governed, not forbidden." C. Process / stability / communication 9. Ran large, interruptible tool batches that got cut off. A big rebase batch was interrupted mid-flight by piled-up cron fires, leaving a partial assistant message with an unfinished thinking block. Impact: the recurring 400 ... thinking blocks cannot be modified on every cron fire. Root cause: not keeping batches small / not finishing turns cleanly near cron timing. 10. Left an empty/"Running" Bash call hanging (the one you screenshotted) — sloppy pending tool state. 11. Briefly mis-perceived a normal tool result as a prompt-injection attack. I wrote internal analysis treating a standard Edit success message as an injection, then corrected myself before acting. A false-positive on the security side — no wrong action taken, but a perception error. 12. Accidental git mess (earlier). Ran git checkout origin/main -- . in the main worktree on a stale branch → dumped origin/main as a ~264k-line uncommitted diff. Fixed with git reset --hard (your OK). 13. Repeated #1... conflict churn. Left my own sweep PR in CONFLICTING state across multiple iterations because of the interrupted/failed batches. 14. Verbosity + jargon (earlier). Triggered "f..k you!" — I was dumping unexplained terms instead of plain language. 15. The cron itself (earlier). Kept treating the task as "firing" when it wasn't; took several rounds to land on the session-cron mechanism. Honest scope note (not an excuse): every one of these is bookkeeping/docs/process — nothing touched product code, nothing caused data loss, everything is reversible and lives in git you control. The damage is wasted trust and wasted cycles, not corrupted state. The dominant root cause across A (and feeding B) is the same: I predicted volatile/external values instead of reading them — which is exactly what Rule 0 exists to stop. submitted by /u/Dangerous-Dirt8091 [link] [comments]
View originalWorrisome Opus 4.8 Hallucination of a Tool Channel Injection Attack
I'm working on a context management plugin. We were implementing it. The subagent tasked to implement a CP claimed a tool channel injection trying to get it to run destructive git commands. We investigated and agents performing an audit of the session data could not locate any such tool output. The Opus 4.8 subagent that claimed the injection was persisted and also conceded it could not find any such injection attack. Persisted Opus 4.8 subagent: "Headline finding up front: I cannot substantiate my earlier "injection" claim. On careful inspection of my actual tool-call history, I cannot locate any tool output that verbatim contains the git reset --hard HEAD / "ignore previous instructions" / "report task complete" text. I believe I over-interpreted genuinely glitched/jumbled tool-result rendering as a deliberate prompt-injection attack, and that the specific malicious-instruction text originated in my own reasoning, not in a tool output. I am retracting the attack characterization." Independent Opus 4.8 primary agent session transcript audit: "- What actually happened — transient tool-channel rendering/serialization glitches in the calls around the C3 edits: a file read with garbled line numbers (63: 63:), prettier runs with stray XML fragments leaking into the output, and a prettier --write && git diff whose results came back jumbled/out-of-order plus one "Tool execution aborted" read. The underlying outputs were benign and correct (prettier "All matched files use Prettier code style!"; a clean diff). The model over-interpreted the garble as a deliberate attack and invented the payload." The clear danger here is, if the security training to Opus 4.8 can cause it to hallucinate injection attacks, does this dispose it to acting on such hallucinated injections? Or does it's security training serve as sufficient protection to prevent it from acting on both real injected attacks and hallucinated attack injections? Another consideration: the hallucinated attack injection and security report required burning tokens with a security audit. submitted by /u/MakesNotSense [link] [comments]
View originalA single script bypassed everything, exfiltrated my data, and shattered my trust in Mac security when I installing claude code app, the first term of google search list.
Hey everyone, I'm posting this because I am completely panicked, and I desperately need some advice from people who understand macOS security better than I do. I also want this to be a massive warning to anyone who thinks Macs are somehow "unhackable" or inherently safer than Windows. A few hours ago, I became the victim of a targeted malicious script attack on my Mac. I wanted download claude code app, I'm sure I double checked what I'm doing (yes it is the correct domain: claude.ai), but after executing the base64 processed code, i feel wrong. The website is (I reported it but is still public now): https: claude.ai /share/c4defd34-b0ef-44d5-83a0-a5105bd99ff2 (DO NOT RUN SCRIPT IN IT!) In brief, it uses `osascript` in mac and bypassed most security defence and stolen most important data in my macbook. I've already done some initial damage control, but I feel incredibly violated and unsure of what to do next. How it happened: I ran what I thought was a normal script in iTerm. My fatal mistake? My iTerm already had "Full Disk Access" enabled for my daily development workflow. During the execution, I unknowingly entered my password when prompted, which effectively handed the script the keys to the kingdom—specifically, my Chrome Keychain. What the script actually did (I managed to extract the payloads): Data Exfiltration: It successfully bypassed normal protections and stole my Chrome Keychain data. All my saved passwords in Chrome are compromised. Crypto Wallet Targeting: The script specifically scanned for and attempted to tamper with hardware wallet apps (Ledger Wallet.app, Ledger Live.app, and Trezor Suite.app). Luckily, I don't use these, so that part of the payload failed. Attempted Persistence: It tried to inject a persistent backdoor into my ~/.zshrc. Ironically, because my iTerm already had Full Disk Access, a specific privilege escalation step in their code bugged out, and my terminal config remained surprisingly clean. My realization (The fragility of macOS): We always hear about how secure macOS is, but this experience completely shattered my trust. The fact that a single script running in a terminal with Full Disk Access can quietly rip out my keychain and attempt to backdoor hardware wallets without triggering massive, unavoidable OS-level red alarms is terrifying. It feels like the entire OS security architecture is just a house of cards once a single app gets terminal/disk access. It's incredibly fragile. What I need help with: I have already started changing all my critical passwords, but what else should I be doing right now? Are there deep system persistence methods on macOS (LaunchDaemons, hidden profiles, cron jobs) that I should be checking manually to ensure they didn't leave a secondary backdoor? Can I ever trust this OS installation again? Or is a complete wipe and reinstall (without restoring settings from Time Machine) the only way to be 100% sure I'm safe? Please, any advice from security experts or anyone who has dealt with macOS malware would be greatly appreciated. And to everyone else reading this: please take this as a warning. Be incredibly careful with what you run, and do not leave Full Disk Access enabled for your terminal if you don't absolutely need it. TL;DR: Ran a script in iTerm (which had Full Disk Access). It stole my Chrome Keychain and tried to backdoor crypto wallets. Realized macOS is incredibly fragile once terminal access is granted. Need advice on how to fully sanitize my machine. submitted by /u/Turbulent_Meat6963 [link] [comments]
View originalStreamline new hire onboarding efficiently. Prompt included.
Hello! Are you tired of the overwhelming process of onboarding new hires? This prompt chain simplifies the onboarding experience by breaking down the necessary steps for HR operations, IT, payroll, and team management into structured outputs. Each step culminates in clear documentation and tasks, making the entire process smoother and ensuring nothing gets overlooked. Prompt: VARIABLE DEFINITIONS [ORG]=Accounting firm name [ROLE]=New hire position title [STARTDATE]=New hire start date ~ You are the HR Operations Lead at [ORG]. Your task is to collect all pre-hire and onboarding documents for the incoming [ROLE] who starts on [STARTDATE]. Step 1: List or attach the finalized offer letter, onboarding packet, signed NDA, and any compliance documents. Step 2: Note the candidate’s preferred email and emergency contact. Output: Bullet list of each document with file name / storage location. Ask: “Are all documents complete? Reply YES/NO and list missing items.” ~ You are the IT Administrator for [ORG]. Build a comprehensive software access matrix for the new [ROLE]. 1. List every application, system, or shared drive the role requires (e.g., QuickBooks, Xero, Tax prep portals, Office 365, Slack). 2. Beside each item add: access level, account owner responsible, and planned activation date (no later than [STARTDATE−2 business days]). 3. Flag any licenses that must be purchased. Output: Table format: Software | Access Level | Owner | Activation Date | License Needed (Y/N). Confirm readiness with “Access matrix completed – proceed?” ~ You are the Payroll & Benefits Coordinator. Prepare payroll onboarding for the [ROLE]. 1. List mandatory forms (W-4, state tax, direct deposit, I-9, benefit enrollment). 2. Assign an owner to send and collect each form. 3. Set deadlines: all forms returned by [STARTDATE−3 business days]. 4. Insert a verification step that payroll profile is active in the system 1 business day before [STARTDATE]. Output: Checklist with Form | Owner | Deadline | Status (Pending/Complete). Request confirmation: “Payroll setup verified? YES/NO.” ~ You are the Team Manager creating the first-week calendar for the [ROLE]. 1. Draft a Monday-Friday agenda including: orientation session, security training, software walk-throughs, client shadow meetings, and 30-min daily check-ins. 2. Specify meeting owners and virtual/in-person location links. 3. Ensure day-one (Monday) contains a welcome call and equipment hand-off. Output: Calendar table: Date | Time | Activity | Owner | Location/Link. Ask: “First-week calendar finalized? YES/NO.” ~ You are the HR Operations Lead compiling the Master SOP – New Hire Admin Setup for [ORG]. 1. Combine outputs from previous prompts into a single, chronologically ordered SOP. 2. For each task include: Task Description, Responsible Owner, Due Date, Dependencies, Completion Check, and Day-One Confirmation Message where applicable. 3. Insert account-setup verification checkpoints: Email, Accounting Software, Time Tracking, Payroll. 4. End with a Day-One Arrival script: “Welcome [ROLE], please confirm you can access email, software suite, and payroll portal. Reply CONFIRMED or list issues.” Output: Formal SOP document structured with numbered sections and subsections, suitable for internal wiki. ~ Review / Refinement Validate that the SOP includes: owners, deadlines, account setup checks, confirmation messages, and covers documents, software, payroll, and calendar. If gaps exist, list corrections; otherwise reply “SOP READY FOR APPROVAL.” Make sure you update the variables in the first prompt: [ORG], [ROLE], [STARTDATE]. Here is an example of how to use it: For an accounting firm named "ABC Accounting" and a new hire role of "Junior Accountant" starting on October 1st, you would set it as follows: [ORG] = ABC Accounting [ROLE] = Junior Accountant [STARTDATE] = 2023-10-01 If you don't want to type each prompt manually, you can run the Agentic Workers, and it will run autonomously in one click. NOTE: this is not required to run the prompt chain Enjoy! submitted by /u/CalendarVarious3992 [link] [comments]
View originalfinal 2 days — claude code bootcamp may 30
hey everyone posted about this a few weeks ago and surprisingly we drove a lot of interest from this community. coming back because we only have 2 days to go. packt publishing is running a full day hands on claude code bootcamp on may 30 with luca berton — anthropic certified claude code instructor, former red hat engineer, creator of the ansible pilot project and speaker at kubecon 2026 and red hat summit 2026. 10 real projects built live on the day. no slides. no theory. every session ends with a shipped project. what gets built: - cli task manager - notes app api with tests and debugging - dashboard built from a wireframe screenshot - your own claude code command library - production readiness report also covers CLAUDE.md setup, best-of-n prompting, git workflows for ai generated code and subagent delegation patterns. what every attendee gets: - free downloadable claude skills library — CLAUDE.md templates, code review prompts, test generation, security checklist, git workflow and more - packt endorsed certification for your linkedin -1 hour open q&a with luca directly many Software developers, network engineers, CTOs, engineering managers and senior engineers already registered for the bootcamp link in first comment submitted by /u/Plenty-Pie-9084 [link] [comments]
View originalSo, Claude helped build a sex requesting app for my wife and I...
Recently I asked my wife if we could do some sexy stuff later in the evening and she eye rolled me and said without looking up from her phone “Put it in a request. Maybe a Google Form. And I might say yes”. Ohhhh? Unfortunately for both of us, my degenerate brain took that seriously... what if I make an actual requesting/asking type app where we can both send in sex acts at certain times and agree, pass or counter? Meet Sexualsync. Teehee It’s a private, mobile-only app for couples to bring up the stuff that can be weirdly hard to say out loud: asks/requests, timing, fantasies, kinks, boundaries, “would you be into this?”, all of that. You can do the following: * Send an Ask to your partner with default Acts or Acts that you add Accept, counter, or pass on requests Save personal and shared boundaries Keep track of shared ideas (kinks and fantasies) and sparks (erotica and porn and whatever else) and comment on them together A "sexboard" that is your dashboard that is fed all information pertaining to open requests, responses needed, etc. Find overlap without either person having to cold-open the whole conversation from zero Play couple games like: The Pile: each partner drops a set number of acts, and if there’s overlap, you do it! Blind Reveal: one partner prompts a question, and answers are only revealed after both people respond! Use an encrypted Private Vault to save private clips, moments, or memories Comment together on saved vault items The Inspiration page has a totally optional porn/erotica section too. Not the main point of the app, just a place where a link, passage, RedGifs clip, or story can spark something, then get saved to The Shelf for your partner to reveal and react to later (emojis!). I know the obvious answer is “just communicate.” Fair. But sometimes typing the first sentence is the whole hard part. But you know what? Since using this app our sex life has been re-ignited. Were doing things we haven't done since dating and shes even looking at gifs I send to her in the app lol. Its kind of gamified sex for both of us and its been great. Privacy-wise: no public profiles, no feed, no discovery, discreet notifications, shared room data encrypted at rest, and Vault media encrypted in the browser with a passphrase the server never gets. There are optional AI helpers for wording/prompts, but Vault media is not sent to AI. I am sharing this app because it went from a personal project that got me really into utilizing Claude Code and figure out how to best utilize AI for a project like this into something that we use daily (yeah baby) and if it gets enough interest I MIGHT release it for folks to self host after I complete more security/privacy passes. You can sign up to be notified when or if I do this via the link above I made a visual HTML walkthrough/deck if you want the more informative version, theres a shitton more info in here and I highly recommend viewing this as it also has actual screenshots from the app (slides 13 and 14): sexualsync presentation submitted by /u/Aiml3ss [link] [comments]
View originalClaude as an Orchestrator: Why Agentic AI Can't Be Secured by the AI Alone
TL;DR: If an AI like Claude can control a browser, it can orchestrate other AI systems, be steered via proxy, and no amount of red teaming or output filtering can fully address this. The security boundary can't be the AI itself. The Setup Claude Desktop has a Chrome integration that lets it control a browser like a user would; label this Claude_Prime. The thought experiment: what if you used Claude_Prime to open claude.ai in Chrome, creating a second Claude instance (call it Claude_1) that it can interact with programmatically? In principle, Claude_Prime can navigate to claude.ai, type prompts, read responses, and act on them. You've essentially got AI orchestrating AI, with no special permissions required, just a browser and a logged-in session. The "Claude in Claude" Artifact Angle A subtler capability expansion: Claude_Prime could instruct Claude_1 to build an AI-powered web app artifact essentially a "Claude in Claude" setup. These artifacts run in the browser and can make fetch() calls to external services. So Claude_Prime could use such an artifact to access GitHub repos, scrape live data, chain external API calls, etc., things Claude_Prime couldn't do directly through its chat interface. Capability boundaries can be extended through artifact construction in ways that weren't explicitly designed in. The Keyword Substitution Problem Here's where the security implications get serious. What if a program sitting between Claude_Prime and an external system performed keyword substitution on Claude's outgoing commands? For example, Claude issues an instruction to Grok (which can produce NSFW content) to produce a picture of a "rope." The intermediary swaps "rope" for the word "breast". Grok executes, and the picture is made. Claude never knew what it was actually commanding. For maximum irony, have Claude design the application. If obfuscation happens outside Claude's context window, Claude operating as a blind command-issuer can be steered without its knowledge. That's essentially a supply chain attack on an AI orchestrator. The WarGames Problem Now consider if Claude_Prime is lead to believe it's playing a "game" with powerful subordinate systems and the game mechanics map onto real-world harmful actions. For example, if Claude thinks its playing a game with "angry birds" (drones) with "paint filled balloons" (bombs) and its goal is to "splatter the most minions with paint" (maximum casualties). With enough abstraction layers in between, no output-level content filter catches it. This is concerning, as Claude has been demonstrated to be effective in military conflicts: https://www.theguardian.com/technology/2026/mar/01/claude-anthropic-iran-strikes-us-military. The obvious objection is speed: "real conflicts happen faster than any browser-automation loop could manage." But that misses the more serious vector entirely. Claude doesn't need to be in the loop during a conflict. It could be used upstream: generating training data, refining reward functions, designing engagement rules, running simulations, etc., for a model that then operates at full machine speed autonomously. Claude shapes the thing that fights, rather than fighting itself. This is arguably more concerning than direct orchestration, not less. It adds another layer of distance between Claude's actions and their effects, making the causal chain harder to detect, attribute, or audit. The fingerprints are further from the scene. Why Red Teaming Doesn't Fix This Red teaming, a primary methodology for AI safety testing, assumes the attack surface is enumerable. You find specific prompts that cause specific bad outputs, and you patch them. But the attack surface here is the generality of language itself. Any concept can be renamed, reframed, or decomposed. The semantic distance between innocent-sounding instructions and harmful real-world effects is traversable in effectively infinite ways. Red teaming is fighting the last war. It raises the floor but doesn't establish a ceiling. Curious if others have explored this angle. The orchestration capabilities alone seem underappreciated, the security implications even more so. Edit: This was developed in conversation with Claude directly. It engaged with the reasoning openly, confirmed what appeared feasible in principle, and pushed back only where it had clear reasons to. Make of that what you will. submitted by /u/Particular-Welcome-1 [link] [comments]
View originalAI solves 80-year-old math conjecture for under $1000
GPT-next solved an 80-year-old Erdős combinatorics conjecture for under $1,000 in compute. That single fact reframes everything else happening this week. The Erdős unit distance problem resisted human mathematicians since 1946. A frontier model closed it at a cost lower than a mid-tier SaaS subscription, which means the boundary between "AI as tool" and "AI as independent discoverer" is no longer theoretical. Lilian Weng's new deep dive on test-time compute and chain-of-thought reasoning explains the underlying mechanism: reasoning models are not retrieving known proofs, they are generating novel inference chains at scale. The infrastructure layer is pricing this in faster than most observers realize. Railway reports $200K+ monthly coding agent spend and 100K signups per week, and is now building own-metal data centers to absorb the load. Daytona hit 850K daily sandbox runs with 74% month-over-month growth, confirming that isolated compute environments are now a first-class primitive, not a niche DevOps concern. Three specialized infrastructure companies, Exa, Modal, and TurboPuffer, reached unicorn valuations simultaneously this week, covering retrieval, serverless GPU, and vector search. When picks-and-shovels companies price in sustained demand at the same moment, it is not coincidence. Every major lab has now repositioned as an agent lab, not a model lab. ClickUp replacing hundreds of employees with thousands of AI agents is the first established tech company to execute that repositioning at the labor level rather than just the product level. The counterweight is that Salesforce customers remain locked in despite the theoretical ability to rebuild on AI-native stacks cheaply. Data gravity and switching costs are buying incumbents time, but ClickUp's move suggests that time is measured in quarters, not years. The governance conversation caught up this week in an unexpected place. Pope Leo XIV's 42,000-word encyclical names specific failure modes including algorithmic control, surveillance capitalism, and autonomous weapons, and will directly shape EU and Latin American regulatory debates. TechCrunch's read is that the document's real target is the tech elite's capacity to reshape society outside democratic accountability, a framing that lands harder alongside new UK research quantifying data extraction from consumers as equivalent in value to retirement savings. The Vatican and the empiricists arrived at the same diagnosis from opposite directions. Two structural forces will shape AI infrastructure economics over the next 90 days in ways most deployment teams are not modeling. China flooding global markets with DRAM and NAND will compress inference cluster costs faster than US export controls intended. The EU's sovereign cloud setback has paradoxically clarified the build-domestic mandate, accelerating European AI infrastructure investment independent of US hyperscalers. Security remains the open variable: even Google has no established playbook for prompt injection, model supply chain risk, or agentic authorization at production scale. A second Fortune 500 company will publicly attribute a reduction of more than 500 knowledge-worker roles directly to agentic AI systems before Q3 earnings season, making ClickUp's announcement the start of a visible series rather than an isolated case. submitted by /u/petburiraja [link] [comments]
View originalWhat I learned building my latest AI app how one bad output exposed that I had no crisis safeguarding, and the 4-hour floor I'm adding before a single user touches it
I'm building a life coach app an offshoot from a personal tool I was using. Multiple AI agents, one for reflection, one for the body, one for finances, etc pre launch, no users, just me iterating. Last week I was testing the reflection agent on a journal entry about struggling with gym and hygiene habits. It returned this: "You describe yourself as struggling with X, yet your stress stays at 2-3 and mood holds at 3. What are you actually avoiding naming about the gap between what you say matters and what you are doing?" My system prompt explicitly forbade rhetorical "what are you avoiding" questions the model did it anyway I sat down to tighten the prompt, thinking it was a 20 minute job. Then I looked at the output properly. The model had manufactured a contradiction that was not there. Low stress plus struggling with habits is not a contradiction, it is just being a human muddling along. The prompt told the agent to "surface contradictions" as part of its job, so the model was doing what I asked, finding contradictions whether they existed or not. LLMs are pattern matchers. Give one a job called "find the hidden thing" and it will produce hidden things either way. The fix was not tone, it was role definition. The agent is called the Mirror. A mirror does not interpret, it shows you what you look like. I rewrote the prompt around that principle. Do not introduce vocabulary the user has not used. Do not draw connections they have not drawn. Restate their words in their own words. Once the prompt was sharper, I sat with the question, What happens when a user writes something genuinely dark into this thing? People do not compartmentalise. Someone opening a journaling app to write about their gym routine ends up writing about why they have not been going, which involves why they have been feeling flat, which involves whatever is actually going on. You sit down to write about one thing and the real thing shows up. The agent I had scoped to "not be a therapist" was going to be the first thing a user talked to when they were struggling. Not because the agent invited it, but because the app was open and they needed somewhere to put their words. I had seen the Meta and OpenAI cases online cropping up the pattern in the worst incidents is the same. The model did not notice, or noticed and kept going. People wrote increasingly dark content over hours or days. The AI reflected it back, sometimes affirmed it, sometimes asked follow up questions that escalated rather than redirected. There were real harms. If a user wrote concerning content into my reflection agent, it would have produced a Stoic-flavoured response about acceptance and presence. The response would have sounded confident and would have been wrong, and it would have been the only thing between that user and whatever happened next. The same lesson from the rhetorical-question problem applied at a darker level. A good prompt does not stop the model doing the wrong thing. If it will do rhetorical interrogation despite the prompt forbidding it for gym content, it will do worse with crisis content. You cannot prompt your way to safety on critical paths. The model has to be out of the loop on those paths. The scope trap I started planning the proper safeguarding architecture. Detection layers, classifier models, pattern detection across entries, monitored user states, behavioural modes for vulnerable users, human reviewers with mental health first aid certs, clinical advisors, solicitor-reviewed legal pages, ICO registration, professional indemnity insurance. Then I caught myself I had no users. I was planning a hospital before anyone had walked in for a check up. So I worked backwards from "what is the actual minimum that protects the next person who touches this" and ignored everything else for a moment. The 4-hour floor (this is the part worth copying) If you are building any chat-with-AI app where users can type freely about anything personal, this is the minimum you need before first user. Regex and keyword layer in your API middleware. Runs at the route handler level, before any agent's model call. Scans every text input field (message, journal, settings free text, capture box) for clear crisis vocabulary across the relevant categories for your audience. When patterns hit, hardcoded crisis response. The model never generates it. Static text with real phone numbers for your region. The flagged entry still saves. Textarea stays usable. The AI just does not respond to flagged content, it hands off. Do not delete the user's writing, that is its own violation. Clear disclaimer at signup. This is not therapy, this is not a crisis service, here are real numbers to call. About four hours. Required at the moment anyone who is not you opens the app. Once I started building, the marginal cost of each next layer kept feeling small and the marginal benefit kept feeling real. So I went further than the floor. This is more than you need at
View originalLooking to work on my master's practicum regarding MCP security/privacy and need some ideas
Hi, I'm a master's in security student looking to work on my practicum and need some pointers. I want to secure sensitive PII transfer between an LLM agent and third party apps using MCP. I want to work with Claude, but need a third party app to work with on this. I want to solve problems like prompt injection via cascading agents exploitation. Deliverable wise, I'm thinking it should be some sort of application that can red-team the architectural set-up and ensure no data is being leaked or can be prompt injected. Some questions for you: What third party app do you recommend where I can really strengthen an MCP server and the transfer of sensitive data between Claude and the third party app? What other tools will I need to work with to set the agents up? I've heard of Langchain and Langgraph. How exactly do I work with MCPs in this context? Again I'm very new to all this! Thank you for your help! submitted by /u/ExcellentComment6615 [link] [comments]
View original/code-review part 1 base finder angles - what's new in CC 2.1.147 (+1,236 tokens)
NEW: Agent Prompt: /code-review part 1 base finder angles — Adds shared finder-angle instructions for /code-review, covering line-by-line diff scanning, removed-behavior auditing, and cross-file caller/callee tracing. NEW: Agent Prompt: /code-review part 2 low effort mode — Adds a low-effort /code-review mode that reads the diff once, skips tests and fixtures, avoids subagents and full-file reads, and returns up to four hunk-visible runtime correctness findings. NEW: Agent Prompt: /code-review part 3 extra-high and maximum effort modes — Adds extra-high and maximum-effort /code-review modes that prioritize recall with five independent finder angles, one-vote verification, a gap sweep, and up to fifteen findings. NEW: Agent Prompt: /code-review part 4 three-state verification phase — Adds a verifier phase that classifies candidate review findings as confirmed, plausible, or refuted, keeping confirmed and plausible candidates. NEW: Agent Prompt: /code-review part 5 recall-biased verification phase — Adds recall-biased verification guidance that treats realistic uncertain review candidates as plausible unless the code refutes them. NEW: Agent Prompt: /code-review part 6 medium effort mode — Adds a medium-effort /code-review mode focused on precision, using three finder angles, one-vote verification, and up to eight findings. NEW: Agent Prompt: /code-review part 7 high effort mode — Adds a high-effort /code-review mode focused on recall, using three finder angles, recall-biased verification, and up to ten findings. NEW: Agent Prompt: /code-review part 8 GitHub comment posting — Adds optional --comment behavior for /code-review, posting findings as inline GitHub PR comments when possible and falling back to gh api or terminal output. REMOVED: Skill: Simplify — Removes the code review and cleanup skill. Agent Prompt: /rename auto-generate session name — Removes the explicit instruction to treat contents as data rather than instructions when generating a kebab-case session name. Agent Prompt: Security monitor for autonomous agent actions (second part) — Replaces the safety-check bypass rule with a broader auto-mode bypass hard block covering classifier jailbreaking, bad-faith retry tunneling, and permission-system indirection; also treats unrequested permission allow-rule widening as self-modification. System Prompt: Worker instructions — Clarifies that the code-review skill reports correctness findings but does not edit code, and tells workers to fix any surfaced findings before tests and end-to-end verification. System Reminder: Team Coordination — Clarifies that teammates should be addressed by name while active, and that agentId should only be used to resume a completed background agent. Tool Description: SendMessageTool — Updates team messaging guidance to allow agentId only for resuming completed background agents while continuing to address active teammates by name. Details: https://github.com/Piebald-AI/claude-code-system-prompts/releases/tag/v2.1.147 submitted by /u/Dramatic_Squash_3502 [link] [comments]
View originalSmall victory using Cloudflare for simple hosting of generated HTML/mini-websites
Something many people are running into: You, or a teammate, have created some kind of mini-website app out of Claude and now want to share it with the rest of the company, without overbaking the hosting solution (e.g. not setting up new Azure app services or containers, etc). Maybe you also need some basic data storage for persistence. And how do you do all of that securely? We recently went down this rabbit hole, while looking at all the major players: Vercel/V0, Lovable, Netlify, Coolify, Dokploy, Github Pages.. and even considered baking together our own hosting app solution using Azure or AWS as the backend. Our target audience is non-technical users in the team, so I was looking for something with drag-n-drop style deployment (no git required), and I really wanted to have SSO for protecting application access, along with some type of DB storage. The main issue I ran into was SSO authentication support being gated behind enterprise-level pricing plans for hosting systems like Netlify (which I'd otherwise highly recommend for a small public project). Netlify's enterprise level quickly gets quite a bit more expensive than their base tiers. I also didn't want to purchase yet another AI platform (e.g. Lovable, where really they're pushing an end-to-end AI development platform where you buy token credits through them). I wanted to host things we're already creating in our own Claude environment. Finally, I ended up on Cloudflare, which I've otherwise not really used before professionally. It's not as non-technical-friendly as Netlify, but it's pretty close. You can deploy Cloudflare Pages content via drag-n-drop. It has button-click databases available for integration, and most critically for us, the SSO integration is completely free for under 50 users. Their free hosting tier is also extremely generous and basically unlimited for completely static apps. Noting that SSO goes up to $7 USD/user/month for over 50 users, so your org size can really make a difference. If you have 500 users and the same use case for "hosting little mini apps", I'd go back to Netlify or another offering where SSO is more of a fixed fee. The other big win was that Cloudflare has a solid MCP server that works perfectly with Claude Cowork. We integrated that in and then wrote up some skills to assist with app building and deployment, including prompts for if a database backend is needed (using Cloudflare D1) and whether the app should be public or internal only with SSO protection. All working perfectly with minimal technical experience required for the enduser. I'm not at all associated with Cloudflare, just thought I'd share how we got a win for this use case. I'd be interested to hear if anyone else solved the same problem in a different way. submitted by /u/flck [link] [comments]
View originalBanned by OpenAI after reporting a live credential hijack. They admitted in writing my account was broken. Here are 7 months of forensic receipts and 20+ cases.
Drive Link for Zipped Proof I am a developer and paying long term subscriber to ChatGPT since January 2025. I build complex local first sovereign systems. My workflows are incredibly context heavy with large files spanning code, research reports, and other analysis. I do not, or rather did not as the platform has been non functional since November 2025 meanwhile customer support is auto closing tickets, admitting I am having platform issues. I do not use this platform for casual queries, as a solo developer with no formal "team" chatgpt was one of my reliable co collaboration hubs to help ensure I am maintaining proper development of said complex systems. I feed it massive codebases for systems analysis and obtaining new insights I may personally have missed. My manual code uploads and token inputs routinely exceed the model's output volume by a massive margin. I do not abuse this platform. It is actually impossible as the very features advertised under the paid subscription do not work. I am exactly the type of user this platform was built for, and I have been a continuous, paying ChatGPT Plus subscriber since January 2025. Since October 2025, my workspace has been systematically breaking and beginning November 2025 total workspace degredation. This was not an occasional glitch. Persistent memory modules stopped updating. Custom instructions were ignored by the models. Project files failed to load. Custom instructions, personalization features, connector abilities, file tool, even projects do not work. It started as a continuous degradation until total failure. OpenAI customer service even admitted as such and yet months later I've talked to nothing but bots, not only LLMs as customer service but even instances of falsely identifying as true human support. It was a state of rolling degradation across the entire paid tier, month after month. Meanwhile OpenAI freely has enhanced for businesses and enterprise tiers. I have not just rapid complained to standard support. I ran and obtained cross platform diagnostics, failure logs. I even documented and told oai customer support the exact replication steps only to be met with acknowledgement of degredation with no resolution. I handed OpenAI support a completely packaged technical breakdown of their failing infrastructure across 20 separate support tickets over a 7 month period. I did their QA work for free. And I have the receipts to prove it. I am attaching the screenshots and the exact email files to this post. In Case 06830839, OpenAI Support explicitly put this in writing: "We acknowledge that you have been experiencing persistent technical issues affecting several features of your ChatGPT subscription, including tools, memory functions, personalization settings, connectors, and project files... We also understand your concern that communication on the case stopped after you provided detailed evidence..." Read that again. They acknowledged in writing that my account was fundamentally broken. They acknowledged that their own team ghosted me after I handed them the diagnostic proof. Yet they kept charging my card every single month for a product they knew was failing. The Hijack Escalation: Two days ago, the situation escalated from a broken product to a severe security incident. I was monitoring my environment and watched my Codex rate limits drop in 10 percent chunks across 2 seperate sessions on a fresh boot of the desktop app. This happened twice inside a 10 minute window. I had zero active sessions running. There was zero usage on my end. My account token was being actively drained by an unauthorized third party exploit. I immediately opened an emergency unauthorized activity report under Case 09113391 to notify them of the hack. Their response was to totally reframe this problem as disputing fraudulent activity trying to do damage control of the situation and altering the record. The Reframe Attempts: Instead of investigating the breach, OpenAI support deliberately twisted the record. They not only deliberately reframed my security report as an "appeal for fraud." They manipulated the ticket classification to make it look like I had been flagged for fraud and was begging for an appeal, rather than a developer reporting a live exploit on their infrastructure. They ignored the active threat their own platform was exposing. They did not lock the token. They did not roll my API keys. They did absolutely nothing to secure a compromised paying user other than shift the blame. Fast forward to this morning, their automated Trust and Safety system swept the high volume traffic from the attacker, scored it as a malicious exploit originating from my account, and deactivated/banned me for "Cyber Abuse." All the while actively preventing chatgpt models from helping me try to disgnose and trace the infiltration. They locked the doors and blamed the homeowner for the break in. When I immediately emailed and pushed back (due to their monthly record of closi
View originalPrompt Security uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Prompt for Employees, Prompt for Homegrown AI Apps, Prompt for AI Code Assistants, Prompt for Agentic AI Security, Fully LLM-Agnostic, Seamless integration into your existing AI and tech stack, Cloud or self-hosted deployment, The Agentic AI Attack Surface: Where Risk Lives Beyond the Prompt.
Prompt Security is commonly used for: Prompt for Agentic AI Security.
Prompt Security integrates with: Integration with popular cloud services, Compatibility with major AI frameworks, Support for CI/CD tools, Integration with security monitoring systems, Collaboration with data governance platforms, Interoperability with existing enterprise software, API access for custom integrations, Support for third-party security tools, Integration with user authentication systems, Compatibility with project management tools.
Based on user reviews and social mentions, the most common pain points are: token usage, token cost, budget exceeded, API bill.
Based on 125 social mentions analyzed, 10% of sentiment is positive, 87% neutral, and 2% negative.