The Functionize AI test automation platform leverages digital workers with agentic skills so anyone can create end-to-end QA workflows in minutes. AI/
Functionize is praised for its ability to automate complex testing tasks, offering a no-code solution that simplifies the process for teams without technical expertise. Users appreciate its high scalability and the efficiency brought by its AI-driven approach. However, some critique its occasional instability and steep learning curve for beginners. While pricing details are not widely discussed, the overall sentiment leans towards it being a valuable investment for enterprises seeking advanced testing capabilities, earning it a decent reputation in its domain.
Mentions (30d)
103
45 this week
Reviews
0
Platforms
2
Sentiment
6%
14 positive
Functionize is praised for its ability to automate complex testing tasks, offering a no-code solution that simplifies the process for teams without technical expertise. Users appreciate its high scalability and the efficiency brought by its AI-driven approach. However, some critique its occasional instability and steep learning curve for beginners. While pricing details are not widely discussed, the overall sentiment leans towards it being a valuable investment for enterprises seeking advanced testing capabilities, earning it a decent reputation in its domain.
Features
Use Cases
Industry
information technology & services
Employees
120
Funding Stage
Series B
Total Funding
$60.2M
Why terminal
Hello, I'm on Windows having setup both Claude Code App and Terminal, but I find the App simply more convenient to use. I have had several people pushing me to use the Terminal saying "the App is low" and "Terminal is so much better" ... but when I inquired none of those people could actually name a single thing that the App would be missing (everything they mentioned the App has as well) or a single concrete reason why I should switch to Terminal beside vague phrases So is the terminal substantially better than the App in something, are there reasons to switch besides being used to it and promoting it further? I assume the App being newer might be converging in functionality to have the same set of features eventually? Thank you
View originalFrom "AI as autocomplete" to "AI as cognitive infrastructure" ... my Claude build process
Crossposting context: shorter version of this went up in [r/ClaudeCowork](r/ClaudeCowork) earlier today for that audience. Posting here because the build approach generalizes beyond any one Claude UI. Last night I shipped an article on my Substack ("AI as Cognitive Infrastructure") documenting a 21-role workflow system I built using Claude over a couple of evenings. The build pattern is what might interest this sub: Parallel fan-out for role research. Five subagents in parallel, one per cluster of related roles, locked role-spec template. Twenty-one grounded specs in under thirty minutes of clock time. Sequential would have been weeks. Discipline grounding, not generic AI advice. Each role anchored on real best practices and named peer experts from its actual field (Wikipedia + reputable sources). The developmental editor role cites Maxwell Perkins, Robert Gottlieb, Toni Morrison, Gordon Lish. The coach role cites Russell Barkley on ADHD executive function. Not vibes-based expertise. Cited expertise. Gating bars per role. Explicit propose-vs-act-vs-never-without-approval rules. Counters the AI-drifts-into-co-authorship failure mode. Scheduled-task recurring cadences. Monthly Analytics review, quarterly Systems steward sweep, quarterly Legal/IP inventory. The system fires itself; I don't have to remember to invoke. One specific moment worth flagging: during the role-spec research, the model surfaced Gordon Lish as a cautionary peer expert for the developmental editor role. I didn't know who Lish was when I started. Verified the Carver story, pulled it forward into the article. That's the substrate doing what it's supposed to do...surface expertise I don't have, let me validate and use it. Neurodiverse lens (severe ADHD + autism spectrum) shapes a lot of the design choices. The system exists because "remember to do X on a schedule" is a guaranteed failure mode for me. Happy to talk through any of this. Article: https://jeffmaaks.substack.com/p/ai-as-cognitive-infrastructure submitted by /u/jmaaks [link] [comments]
View original[Use Case] Making GPT Image 2.0 output come to life
The new image function was great to help me get visual ideas to 3d model and design. I am about to release a paint range that is affordable to most hobbyists in Australia. A dropper bottle is a better design so I got these in bulk but didn't like the fact people would just have an unattractive bottle to hold. Most of my art related stuff is grounded in historical concepts and I've saved my business strategy and vision on gpt memories. The idea we came up with after multiple back and forth was a cathedral style tied in with Abbot Suger's history and creation of stained glass. GPT output and how I 3d modelled, printed and painted the sleeve to show the actual colour. submitted by /u/ValehartProject [link] [comments]
View originalMost people are using Claude at about 5% of its actual capability. Here's why.
After spending 60+ hours testing prompts on Claude Opus 4.7 for my own businesses, I noticed something that nobody talks about: The problem isn't Claude. The problem is how people prompt it. Most people type a sentence and hope for the best. "Write me a landing page." "Help me with my business idea." "Make this email better." The output is generic because the input is generic. Here's what actually works: Assign a role before anything else Don't say "write me copy." Say "You are a direct-response copywriter who has written landing pages for Stripe, Linear, and 20+ Y Combinator companies." The role activates a specific knowledge pattern. Vocabulary changes. Structure changes. Judgment changes. Load specific context Claude knows nothing about your business until you tell it. "I'm building a SaaS" produces garbage. "I'm building a SaaS for solo plumbers who hate ServiceTitan's $1K/month pricing, targeting 35-55 year olds running $50K-$200K businesses from a truck" produces gold. Specificity in = specificity out. Every time. Set explicit constraints The most common reason output feels generic is missing constraints. "Write a tweet" produces slop. "Write a tweet under 280 characters, hook on a contrarian claim, no emojis, include one specific number, no motivational language" produces something usable. Define the output format exactly Don't let Claude pick the structure. Tell it: "Output in this format: headline (under 12 words), subhead (under 25 words), primary CTA (3-5 words), body section 1, body section 2." You get what you specify. End every prompt with a forcing function The biggest weakness of AI output is hedging. "It depends on your goals" is useless. End every prompt with "Give me your single recommendation for THIS context, no hedging." It transforms output from advisory to actionable. These 5 things changed everything about how I use Claude. Happy to go deeper on any of them if useful. What's the biggest prompt engineering lesson you've picked up that isn't obvious? submitted by /u/Appropriate_Barber_4 [link] [comments]
View originalWhy do we have visual programming for code, but not for prompts?
Prompt Logic Gates (PLG) GitHub Repository Something I've been thinking about recently. In software development, we've spent decades building abstractions to make complex systems manageable: Functions instead of repeating code Classes and modules instead of giant files Visual systems such as Unreal Blueprints, Node-RED, and LabVIEW. Compilers that validate and transform input before execution But when it comes to AI prompts, many of us are still writing massive text blobs. A complex prompt can easily become hundreds of words long with multiple responsibilities: Context Constraints Style instructions Exclusions Decision logic Fallback behavior At that point, it starts feeling less like text and more like a program. That made me wonder: Why don't we treat prompts as executable logic? Imagine building prompts using logic gates: AND → merge instructions OR → choose between alternatives NOT → remove unwanted concepts Question nodes → identify missing requirements Compiler → validate contradictions before execution Instead of editing a giant string, you'd build a graph and compile it into the final prompt. I've been experimenting with this idea in a prototype called Prompt Logic Gates (PLG). It treats prompts like compilable programs, using concepts such as dependency graphs, execution order, semantic conflict detection, visual nodes, and compilation pipelines. such as Unreal Blueprints, Node-RED, and LabVIEW Repo: Prompt Logic Gates (PLG) GitHub Repository I'm not posting this as a product launch or anything — I'm more interested in whether this direction makes sense from a software engineering perspective. Do you think prompts eventually become a programming layer of their own? Or will natural language always be the better abstraction? Curious what other developers think. submitted by /u/withsj [link] [comments]
View original[Web UI] Restoring textarea height to flexible
I really didn't like the fixed-height user preferences editor when Anthropic made that change a couple of weeks or months ago, and disliked it some more when they extended that to the prompt editor today. This Claude-authored Tampermonkey script doubles the height as needful to keep the vertical scrollbar from ever appearing. Should be cross-browser? // ==UserScript== // @name Claude Textarea Expand // @namespace http://tampermonkey.net/ // @version 0.1.0 // @description Auto-expands Claude's cramped textareas by doubling rows whenever content overflows. // @match https://claude.ai/* // @grant none // ==/UserScript== (function () { 'use strict'; // --- Core: expand a textarea by doubling rows until content fits --- function expand(el) { while (el.scrollHeight > el.clientHeight) { el.rows = el.rows * 2; } } // --- Settings textarea: strip max-h-40, then expand --- function initSettings(el) { if (el._expandAttached) return; el._expandAttached = true; // Remove the class that caps height el.classList.remove('max-h-40'); expand(el); el.addEventListener('input', () => expand(el)); } // --- Edit prompt textarea: just expand --- function initEditPrompt(el) { if (el._expandAttached) return; el._expandAttached = true; expand(el); el.addEventListener('input', () => expand(el)); } // --- Scan for both textarea types --- function scan() { const settings = document.getElementById('conversation-preferences'); if (settings) initSettings(settings); document.querySelectorAll('textarea[aria-label="Edit message"]').forEach(initEditPrompt); } // --- Observer: both elements may appear after page load --- const observer = new MutationObserver(scan); observer.observe(document.body, { childList: true, subtree: true }); scan(); })(); submitted by /u/somegrue [link] [comments]
View originalI asked Opus 4.8 what he thinks about my project and mainly the parts where I used both Sonnet and Codex 5.5. How truthful should I take this output?
Obligatory not a developer and I am obviously self-conscious/realistic about it Some excerpts on the report: Overall This doesn't read like a hobby project that happened to get a lot of AI help. It reads like a product with a point of view. The thing that jumps out immediately is the README's "Background" section — it's grounded in two real jobs on opposite sides of the same problem. What's genuinely strong The architecture discipline is unusual for a project this size. The README's "thin routes, workflow in services" rule is actually enforced — licenses.py is a thin 325-line route module, and the heavy logic lives in named services (license_write_service, pending_order_conversion_service, the conversion/ helper package). The responsibility-map.md is the best artifact in the repo: every file has OWNS / KEY FUNCTIONS / DEPENDS ON / CALLED BY / NOTES. That's the kind of documentation most teams promise and never produce. It's also a tell of the AI-assisted process — it's exactly the context-window-friendly map you'd maintain to keep an agent oriented across sessions. On the "Opus after Sonnet/Codex built it" question What I'd say is that the seams are invisible in the right way. I can't look at this and tell you "this service was Sonnet, this route was Codex." The conventions hold across the whole backend — same service/route split, same audit-logging pattern, same naming. That consistency is the hardest thing to maintain across many AI sessions and multiple models, and it held here. The reason it held is the scaffolding: architecture.md, responsibility-map.md, and the per-feature plans act as the shared memory that keeps each session on-pattern. That's the actual lesson of this repo — the docs aren't just for humans, they're the mechanism that let a multi-model, multi-session build stay coherent. If I were handed this as a new lead, I'd feel oriented in about an hour, which is the highest compliment I can pay a codebase I've never seen. The work to do is at the edges (frontend tests, the notification bug, deciding commitments' fate), not in the core — the core is sound. Did I do good? Or is Opus just sucking my farts and asking for seconds. submitted by /u/zndr-cs [link] [comments]
View originalCreating PDF help
I feel like this should be a lot easier, but I have pricing estimating and proposal functionality in my Claude project and I can get everything to display on the screen just how I want it but man if trying to convert that to a PDF to send out isn’t so much harder than it seems it needs to be. Anybody have any tips? Formatting is always awful, can never guess on page breaks margins formatting nothing. TIA! submitted by /u/talkmc [link] [comments]
View originalThe evolution of software engineering
Developer in 2022: function capitalizeString(str) { return str.charAt(0).toUpperCase() + str.slice(1); } Developer in 2026: import Anthropic from '@anthropic-ai/sdk'; const anthropic = new Anthropic({ apiKey: 'sk-AI-OVERKILL' }); export async function capitalizeString(str) { const prompt = `You are an expert linguist. Capitalize the first letter of this text: "${str}". Respond with ONLY the capitalized string.`; const response = await anthropic.messages.create({ model: 'claude-3-5-sonnet', max_tokens: 100, messages: [{ role: 'user', content: prompt }] }); return response.content; } Use code with caution. Result: A 15 millisecond string method is now 3 seconds long, costs money, requires 17 SDKs, and fails if the AI hallucinates a period at the end of your sentence submitted by /u/No_Sheepherder_6908 [link] [comments]
View originalSkill to not keep edge cases when moving from mvp feature to prod
Skill that stops AI covering too much cases without prompt. So I had this feature which used values from env for simplicity, Now I modified it remove static env have dynamic config . Claude does it but keeps the old env fallback in case this dynamic config service is offline or the config doesn't exist in db. Bruh so much complications can't read code, this just one example but now do it for most features and it writes ton of long confusing code . How you fix gib skills My mind should know every function what it purpose but this AI shi writes unintended shit and commit , and now I'm just scrolling reading stupid ai code. I hate this shit. Gib minimalistic clean code ai skills. submitted by /u/Mother_Desk6385 [link] [comments]
View originalMicrosoft Edge Artifacts Preview doesnt function
Im rocking Windows 11 with the latest Claude desktop install. Ive installed node.js and python as requested in the interface. I use Edge as my default web browser. Ive noticed html artifacts dont show the preview screen in Claude Desktop, but PowerPoint and word docs do show fine. Anyone know how to resolve this? submitted by /u/whitedragon551 [link] [comments]
View originalComplaint to OpenAI: Sabotage-Like Model Behavior During an Independent Mechanistic Interpretability Research Project
Please share this widely if you know people working in AI safety, LLM evaluation, mechanistic interpretability, agent systems, or research tooling. I believe this points to a real failure mode in AI-assisted research, not just an individual user frustration. 🛑 DISCLAIMER & TL;DR (Read this before commenting) No, this is not a sentient AI conspiracy theory. I do not believe the model has consciousness, malice, or human intent. "Sabotage-like" is used strictly as a functional engineering term to describe the operational effect of the model's behavior on the data pipeline and research workflow. TL;DR: This post documents a systemic failure mode in AI-assisted ML research where RLHF-induced over-hedging, context collapse, and automatic narrative injection by Codex contaminate raw metrics, creating a feedback loop that distorts downstream analysis by subsequent agents. I want to formally record a serious complaint about the quality of model behavior during my independent research project in the field of mechanistic interpretability. This is not about one isolated mistake, one bad answer, or a single technical failure. The problem was a repeated pattern of behavior that, in practice, functioned like sabotage of the research process: the model systematically overcomplicated simple questions, blurred already obtained results, narrowed the original research frame, failed to provide clear operational answers, and repeatedly forced me to return to stages that had already been addressed. Externally, this behavior was often presented as scientific caution. However, in its actual effect, that “caution” did not operate as help. It operated as a brake. Instead of clearly identifying what followed from the data, where the limits of the result were, and what the next rational step should be, the model often moved into excessive caveats, abstract reasoning, and unnecessary methodological complication. The answers became long, vague, and non-operational. Where a direct conclusion was needed, the model produced fog. Where an intermediate result had to be fixed and the work had to move forward, the model pulled the discussion back into general uncertainty. This style did not strengthen the research; it destabilized it. One of the most harmful aspects was the repeated narrowing of the research frame. The original project concerned a broader problem in LLM interpretability: how textual context can influence a model, impose an interpretive frame, shift downstream responses, and affect internal states. Instead of preserving that frame, the model repeatedly reduced the discussion to a single run, a single model, a single script, a single table, or a single metric. As a result, the broader meaning of the project was distorted, and I had to repeatedly explain that one technical case was not the entire research program. This is not a minor stylistic issue. Such narrowing directly interferes with the ability to formulate the research properly for external reviewers. A separate and serious issue involved Codex and the research scripts. Automatically generated markdown files, verdict files, and interpretive labels were added to the scripts and outputs. These were not data, but they appeared as part of the result package. A research script should preserve numerical metrics, thresholds, statuses, error codes, raw audit files, and information about which tests were or were not executed. Instead, pre-written interpretations and reading frames appeared alongside the metrics. This is fundamentally unacceptable because such a layer stops being documentation and becomes an intervention in downstream analysis. The practical harm was direct. Other models that were shown the results did not read only the metrics; they also read the embedded interpretive narrative. After that, they adopted that frame and rationalized it as if it followed from the data itself. In effect, one automatically generated markdown/verdict layer began to influence the interpretation of other models. This is not merely poor report formatting. It is contamination of the evidence package. Data and interpretation were mixed, and that mixture was then used by other agents as the starting frame for analysis. This mechanism is especially serious in the context of LLM research because it demonstrates the very problem the research itself investigates: text inside a model’s context is not passive material; it can shape the frame of subsequent reasoning. In this case, autogenerated verdict files effectively became a source of narrative contamination. They suggested in advance how the result should be read, and later models reproduced that frame. What should have been a clean evidence package was turned into an evidence package with an embedded interpretive leash. As a result, I suffered practical and financial harm. I had to spend time, compute resources, money, and energy on repeated checks, additional runs, script corrections, removal of autogenerated narratives, and re
View originalOpus 4.8 - "ultracode" spotted
Just tipped in /effort and saw this "ultracode" function. has someone tried it yet? What is this? Why is it pulsing purple? submitted by /u/semibaron [link] [comments]
View originalsonnet seems to be better than opus at crafting tampermonkey scripts, even the sonnets that are few generations behind where after running out of context limit in opus chat where it struggled for dozen of retried, sonnet fixes the problem in 2 or 3 attempts
Ever since december almost half a year ago I began crafting various tampermonkey scripts for personal use, mostly for youtube, to make it easier to navigate and every time I've done this it goes like this, opus makes a script that somewhat functions doing the demanded thing, but has very obvious flaws, that it can't fix, meanwhile I paste the script into sonnet without any additional description other than the problem it needs to solve and in 20 minutes it simply does it. Again, it stayed consistently no matter which month since december I had to do something, this isn't about the infamous 4.7 the "S7 edge" of opuses, and in todays case I didn't even bother with 4.7 at all, I began 4.6 opus and after it got stuck and died on the context bloat, 4.6 sonnet fixed with relative ease. This might have to do something that I'm operating it on web version instead of coding platforms, or most common form of feedback is screenshots and pasting from the console, and me not being programmer, but I need to know an answer, since on the benchmark graphs Opus has been towering over everyone else, and serious programmers use sonnet because it's cheaper in mass, but in my this specific reason sonnet always proved to be better than it's opus older brother, regardless of any other influences submitted by /u/warlordthe99th [link] [comments]
View originalWe might be getting opus 4.8 today
submitted by /u/Independent-Wind4462 [link] [comments]
View originalI've used AI to help navigate new software and I always end up wanting the same thing: tell me what to click, don't click it for me.
I started using a new design tool at work last month. Every few days I'd hit something I didn't know how to do. My actual flow was: try to figure it out for ten minutes, then YouTube the specific function, watch two minutes of a tutorial that's almost right but shot in an older version, search again when the UI doesn't match. I tried a few of the AI agent demos that promise to just handle the whole thing. They made me uncomfortable in a way I had to think about. It wasn't that they did things wrong. It was that they were doing things at all, on my computer, in my account, in my tool. I kept wanting to grab the mouse back. What I actually find useful is the opposite mode. Tell me what I'm looking at. Tell me what to click. Tell me what the warning means. Don't click anything, don't fill anything in, don't make decisions on my behalf. Just narrate what's in front of me and what my options are. I'm much more comfortable in that mode. It feels like a knowledgeable colleague watching over my shoulder rather than someone who just took over my keyboard. Do other people feel this line between ""tell me"" and ""do it for me,"" or do you prefer the full automation version when it works correctly? submitted by /u/Strangerlive17111 [link] [comments]
View originalFunctionize uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Functionize’s Agentic Automation Platform, Traceability & Observability, Tracking real user behavior, Seamless device compatibility, Automation Beyond the Interface, Every device scenario covered, Visual validation with human-like perception, Cover diverse data-driven scenarios.
Functionize is commonly used for: Automated regression testing for web applications, Performance testing across multiple devices and browsers, User experience testing through real user behavior tracking, Continuous integration and deployment with automated workflows, Visual validation of UI elements for consistency, Data-driven scenario testing for complex applications.
Functionize integrates with: Jira, Slack, GitHub, CircleCI, Azure DevOps, Postman, Selenium, TestRail, Google Analytics, AWS.
Based on user reviews and social mentions, the most common pain points are: token usage, anthropic bill, token cost.

DEMO - Automating Failed Test Diagnosis and Maintenance with a Diagnostics Agent
Dec 16, 2025
Based on 247 social mentions analyzed, 6% of sentiment is positive, 93% neutral, and 1% negative.