Transform complex, unstructured data into clean, AI-ready inputs. Connect to any source, process 64+ file types, and power your GenAI projects. Start
Based on the limited social mentions available, there's minimal specific user feedback about Unstructured as a software tool. The mentions primarily consist of YouTube references to "Unstructured AI" without detailed user opinions, and indirect references in discussions about unstructured data processing and RAG systems. One Hacker News post mentions building tools to simplify unstructured data search, suggesting there's demand in this space, but doesn't provide direct user sentiment about Unstructured itself. Without substantial user reviews or detailed social commentary, it's difficult to assess user satisfaction, pricing sentiment, or overall reputation for this tool.
Mentions (30d)
2
1 this week
Reviews
0
Platforms
4
GitHub Stars
14,357
1,208 forks
Based on the limited social mentions available, there's minimal specific user feedback about Unstructured as a software tool. The mentions primarily consist of YouTube references to "Unstructured AI" without detailed user opinions, and indirect references in discussions about unstructured data processing and RAG systems. One Hacker News post mentions building tools to simplify unstructured data search, suggesting there's demand in this space, but doesn't provide direct user sentiment about Unstructured itself. Without substantial user reviews or detailed social commentary, it's difficult to assess user satisfaction, pricing sentiment, or overall reputation for this tool.
Features
Industry
information technology & services
Employees
110
Funding Stage
Series B
Total Funding
$65.0M
1,451
GitHub followers
41
GitHub repos
14,357
GitHub stars
20
npm packages
12
HuggingFace models
Launch HN: Captain (YC W26) – Automated RAG for Files
Hi HN, we’re Lewis and Edgar, building Captain to simplify unstructured data search (<a href="https://runcaptain.com">https://runcaptain.com</a>). Captain automates the building and maintenance of file-based RAG pipelines. It indexes cloud storage like S3 and GCS, plus SaaS sources like Google Drive. There’s a quick walkthrough at <a href="https://youtu.be/EIQkwAsIPmc" rel="nofollow">https://youtu.be/EIQkwAsIPmc</a>.<p>We also put up this demo site called “Ask PG’s Essays” which lets you ask/search the corpus of pg’s essays, to get a feel for how it works: <a href="https://pg.runcaptain.com">https://pg.runcaptain.com</a>. The RAG part of this took Captain about 3 minutes to set up.<p>Here are some sample prompts to get a feel for the experience:<p>“When do we do things that don't scale? When should we be more cautious?” <a href="https://pg.runcaptain.com/?q=When%20do%20we%20do%20things%20that%20don't%20scale%3F%20When%20should%20we%20be%20more%20cautious%3F">https://pg.runcaptain.com/?q=When%20do%20we%20do%20things%20...</a><p>“Give me some advice, I'm fundraising” <a href="https://pg.runcaptain.com/?q=Give%20me%20some%20advice%2C%20I'm%20fundraising">https://pg.runcaptain.com/?q=Give%20me%20some%20advice%2C%20...</a><p>“What are the biggest advantages of Lisp” <a href="https://pg.runcaptain.com/?q=what%20are%20the%20biggest%20advantages%20of%20Lisp">https://pg.runcaptain.com/?q=what%20are%20the%20biggest%20ad...</a><p>A good production RAG pipeline takes substantial effort to build, especially for file workloads. You have to handle ETL or text extraction, chunking, embedding, storage, search, re-ranking, inference, and often compliance and observability – all while optimizing for latency and reliability. It’s a lot to manage. grep works well in some cases, but for agents, semantic search provides significantly higher performance. Cursor uses both and reports 6.5%–23.5% accuracy gains from vector search over grep (<a href="https://cursor.com/blog/semsearch" rel="nofollow">https://cursor.com/blog/semsearch</a>).<p>We’ve spent the past four years scaling RAG pipelines for companies, and Edgar’s work at Purdue’s NLP lab directly informed our chunking techniques. In conversations with dozens of engineers, we repeatedly saw DIY pipelines produce inconsistent results, even after weeks of tuning. Many teams lacked clarity on which retrieval strategies best fit their data.<p>We realized that a system to provision storage and embeddings, handle indexing, and continuously update pipelines to reflect the latest search techniques could remove the need for every team to rebuild RAG themselves. That idea became Captain.<p>In practice, one API call indexes URLs, cloud storage buckets, directories, or individual files. Under the hood, we’re converting everything to Markdown. For this, we’ve had good results with Gemini 3 Pro for images, Reducto for complex documents, and Extend for basic OCR. For embedding models, ‘gemini-embedding-001’ performed reasonably well at first, but we later switched to the Contextualized Embeddings from ‘voyage-context-3’. It produced more relevant results than even the newer Voyage 4 models because its chunk embeddings are encoded with awareness of the surrounding document context. We then applied Voyage’s ‘rerank-2.5’ as second-stage re-ranking, reducing 50 initial chunks to a final top 15 (configurable in Captain’s API). Dense embeddings are just half the picture and full-text search with RRF complete our hybrid retrieval. In the Captain API, these techniques are exposed through a single /query endpoint. Access controls can be configured via metadata filters, and page number citations are returned automatically.<p>The stack is constantly changing but the Captain API creates a standard interface for this. You can try Captain, 1 month for free, and build your own pipelines at <a href="https://runcaptain.com">https://runcaptain.com</a>. We’re looking for candid feedback, especially anything that can make it more useful, and look forward to your comments!
View originalPricing found: $0.03 / page
I built a GEO Auditor with Claude Code and here is the prompt and result
I love exploring new problem spaces, and Generative Engine Optimization (GEO) is one I’ve been looking into for a blog post I’m writing. I built a "GEO Auditor" using Claude Code to track how often specific brands are recommended by LLMs compared to their competitors. The tool link is below, and I wanted to share the prompt and logic Claude used to build it. What it does The tool pings Claude, OpenAI, and Gemini APIs with specific category queries (e.g., "What are the best CRM tools?"). It then parses the responses to see if a specific brand is mentioned, identifies its position in the list, and calculates a 0-100 "Visibility Score" (Note: I've limited the AI calls for now since I'm still just exploring the idea). How I used Claude Code I used Claude Code to scaffold the entire backend and worker logic. It handled: Creating the FastAPI structure. Setting up SQLAlchemy models for Postgres. Implementing Redis/rq for background tasks so the API calls don't block the UI. Writing the parsing logic to extract brand names from unstructured LLM text. Triggering deploy via MCP. The Prompt I used this prompt in Claude Code to generate the core system: Build me a GEO auditor SaaS — a FastAPI app that checks if AI models recommend a given product. It should: - Have a web UI where users enter a product name and category - Query Claude, OpenAI, and Gemini APIs with "What are the best [category] tools?" - Parse each response to detect if the product is mentioned and at what position - Calculate a visibility score (0-100) - Store audits and results in Postgres via SQLAlchemy - Use a Redis/rq background worker so API calls don't block - Have a cron script that re-runs all audits daily - Collect waitlist signups when no prior results exist - Include a Dockerfile ready for deployment Short screencast how I developed it (I've shortened and anonimized it as it was 29 mins in real cast): https://reddit.com/link/1shmpxv/video/ww7mc7uk1dug1/player Deployment To get Claude's code live I used PromptShip, which is a platform I'm building to take care of the infra. It connects via an MCP server so I could stay in the terminal and just tell Claude to "deploy the app" which automatically provisioned the Postgres database, Redis, and SSL. Project Link: https://geo-auditor-pyde-prod.apps.promptship.dev I'm happy to answer any questions about the scoring logic or the prompt structure! submitted by /u/Asleep-Carpet9030 [link] [comments]
View originalHow to save 80% on your claude bill with better context
been building web apps with claude lately and those token limits have honestly started hitting me too. i’m using claude 4.6 sonnet for a research tool, but feeding it raw web data was absolutely nuking my limits. i’m putting together the stuff that actually worked for me to save tokens and keep the bill down: switch to markdown first. stop sending raw html. use tools like firecrawl to strip out the nested divs and script junk so you only pay for the actual text. don't let your prompt cache go cold. anthropic’s prompt caching is a huge relief, but it only works if your data is consistent. watch out for the 200k token "premium" jump. anthropic now charges nearly double for inputs over 200k tokens on the new opus/sonnet 4.6 models. keep your context under that limit to avoid the surcharge strip the nav and footer. the website’s "about us" and "careers" links in the footer are just burning your money every time you hit send. use jina reader for quick hits. for simple single-page reads, jina is a great way to get a clean text version without the crawler bloat. truncate your context. if a documentation page is 20k words, just take the first 5k. most of the "meat" is usually at the top anyway. clean your data with unstructured if you are dealing with messy pdfs alongside web data, this helps turn the chaos into a clean schema claude actually understands. map before you crawl. don't scrape every subpage blindly. i use the map feature in firecrawl to find the specific documentation urls that actually matter for your prompt, if you use another tool, prefer doing this. use haiku for the "trash" work. use claude 4.5 haiku to summarize or filter data before feeding it into the expensive models like opus. use smart chunking. use llama-index to break your data into semantic chunks so you only retrieve the exact paragraph the ai needs for that specific prompt. cap your "extended thinking" depth. for opus 4.6, set thinking: {type: "adaptive"} with effort: "low" or "medium". the old budget_tokens param is deprecated on 4.6. thinking tokens are billed at the output rate, so if you leave effort on high, claude thinks hard on every single reply including the simple ones and your bill will hurt. set hard usage limits. set your spending tiers in the anthropic console so a buggy loop doesn't drain your bank account while you're asleep. feel free to roast my setup or add better tips if you have them submitted by /u/Illustrious_Elk3705 [link] [comments]
View originalI built an Open Source version of Claude Managed Agents, all LLMs supported, fully API compatible
https://github.com/rogeriochaves/open-managed-agents Claude Managed Agents idea is great, I see more and more non-technical people around me using Claude to do things for them but it's mostly a one-off, so managed agents is great for easily building more repeatable, fully agentic, workflows But people will want to self-host themselves, and use other llms, maybe Codex or a vLLM local Gemma, and build on top of all other open source tooling, observability, router and so on It's working pretty great, still polishing the rough edges though, contributions are welcome! submitted by /u/rchaves [link] [comments]
View originala local workspace for data extraction/transformation with Claude
hi all! i built a tool that leverages Claude Code to do data transformation and structured data extraction over big datasets. this is most helpful if you have a lot of unstructured complex documents / logs to analyze and make sense of. doing analysis over a large set of files is hard to do in the text only terminal. firstly, if there are a lot of steps to your transformation pipeline, you want to be able to see the artifacts coming out of each step. second, using LLMs to do analysis can get quite expensive and there needs to be some sort of budgeting tool to help with cost/token estimation. folio solves this with a tabular review workspace that helps you view, steer and approve these data operations. Claude Code is the main control panel and folio serves as a UI plugin to help humans and agents collaborate effectively. some users take customer support audio calls, emails and texts, send it into folio and do a series of extraction steps that help them organize and structure their data, which in turns helps generate insights. you can also take financial documents from private companies and extract relevant data for financial analysis, perform legal e-discovery, parse logs and social network interactions etc. more recently, Karpathy posted about personal knoweldge bases, where you can generate wikis based on a set of documents. folio makes this super easy, all you have to do is ask Claude Code to bring your files into a folio workspace and then set up a pipeline that will help you extract relevant data for your own wikis. folio is completely free and you can use it with your Anthropic API keys. submitted by /u/Spare-Schedule-9872 [link] [comments]
View originalUsing AI to untangle 10,000 property titles in Latam, sharing our approach and wanting feedback
Hey. Long post, sorry in advance (Yes, I used an AI tool to help me craft this post in order to have it laid in a better way). So, I've been working on a real estate company that has just inherited a huge mess from another real state company that went bankrupt. So I've been helping them for the past few months to figure out a plan and finally have something that kind of feels solid. Sharing here because I'd genuinely like feedback before we go deep into the build. Context A Brazilian real estate company accumulated ~10,000 property titles across 10+ municipalities over decades, they developed a bunch of subdivisions over the years and kept absorbing other real estate companies along the way, each bringing their own land portfolios with them. Half under one legal entity, half under a related one. Nobody really knows what they have, the company was founded in the 60s. Decades of poor management left behind: Hundreds of unregistered "drawer contracts" (informal sales never filed with the registry) Duplicate sales of the same properties Buyers claiming they paid off their lots through third parties, with no receipts from the company itself Fraudulent contracts and forged powers of attorney Irregular occupations and invasions ~500 active lawsuits (adverse possession claims, compulsory adjudication, evictions, duplicate sale disputes, 2 class action suits) Fragmented tax debt across multiple municipalities A large chunk of the physical document archive is currently held by police as part of an old investigation due to old owners practices The company has tried to organize this before. It hasn't worked. The goal now is to get a real consolidated picture in 30-60 days. Team is 6 lawyers + 3 operators. What we decided to do (and why) First instinct was to build the whole infrastructure upfront, database, automation, the works. We pushed back on that because we don't actually know the shape of the problem yet. Building a pipeline before you understand your data is how you end up rebuilding it three times, right? So with the help of Claude we build a plan that is the following, split it in some steps: Build robust information aggregator (does it make sense or are we overcomplicating it?) Step 1 - Physical scanning (should already be done on the insights phase) Documents will be partially organized by municipality already. We have a document scanner with ADF (automatic document feeder). Plan is to scan in batches by municipality, naming files with a simple convention: [municipality]_[document-type]_[sequence] Step 2 - OCR Run OCR through Google Document AI, Mistral OCR 3, AWS Textract or some other tool that makes more sense. Question: Has anyone run any tool specifically on degraded Latin American registry documents? Step 3 - Discovery (before building infrastructure) This is the decision we're most uncertain about. Instead of jumping straight to database setup, we're planning to feed the OCR output directly into AI tools with large context windows and ask open-ended questions first: Gemini 3.1 Pro (in NotebookLM or other interface) for broad batch analysis: "which lots appear linked to more than one buyer?", "flag contracts with incoherent dates", "identify clusters of suspicious names or activity", "help us see problems and solutions for what we arent seeing" Claude Projects in parallel for same as above Anything else? Step 4 - Data cleaning and standardization Before anything goes into a database, the raw extracted data needs normalization: Municipality names written 10 different ways ("B. Vista", "Bela Vista de GO", "Bela V. Goiás") -> canonical form CPFs (Brazilian personal ID number) with and without punctuation -> standardized format Lot status described inconsistently -> fixed enum categories Buyer names with spelling variations -> fuzzy matched to single entity Tools: Python + rapidfuzz for fuzzy matching, Claude API for normalizing free-text fields into categories. Question: At 10,000 records with decades of inconsistency, is fuzzy matching + LLM normalization sufficient or do we need a more rigorous entity resolution approach (e.g. Dedupe.io)? Step 5 - Database Stack chosen: Supabase (PostgreSQL + pgvector) with NocoDB on top Three options were evaluated: Airtable - easiest to start, but data stored on US servers (LGPD concern for CPFs and legal documents), limited API flexibility, per-seat pricing NocoDB alone - open source, self-hostable, free, but needs server maintenance overhead Supabase - full PostgreSQL + authentication + API + pgvector in one place, $25/month flat, developer-first We chose Supabase as the backend because pgvector is essential for the RAG layer (Step 7) and we didn't want to manage two separate databases. NocoDB sits on top as the visual interface for lawyers and data entry operators who need spreadsheet-like interaction without writing SQL. Each lot becomes a single entity (primary key) with relational links to: contracts, bu
View originalBest model for OCR extraction of these types of docs
I am trying to automate extraction of information from unstructured docs for accounting management. What model and what prompting techniques do you guys recomend submitted by /u/JIGS1620 [link] [comments]
View originalI used Claude to tear apart a ChatGPT-generated business strategy. Here's what it caught and the prompt I reverse-engineered from the whole thing.
A friend of mine is working on his business and sent me a full strategy to hit $1M in revenue — he built the whole thing by going back and forth with ChatGPT. He's not very technical, just had a long conversation until he had a plan. For what it is, ChatGPT did a solid job getting him to a first draft. But I wanted to see what Claude would do with it. So I dropped the full strategy into Claude and asked it to review, critique, and improve it where it saw fit. Claude's assessment: ChatGPT was 85-90% there at a high level. But it found some real issues: - Revenue projections were too optimistic. Claude flagged specific assumptions that didn't hold up - The channel strategy was basically "be everywhere" with no sequencing or prioritization - Pricing model had gaps that would've cost him real money - A few of the "growth levers" were actually just repackaged generic advice For each correction, Claude gave the reasoning — not just "this is wrong" but "here's why this doesn't work and here's what to do instead." Then it rebuilt the strategy with a revised plan and next steps. I sent the improved version back to my friend and he was fired up. But sitting there afterwards I thought — I'm not thinking big enough for my own business either. So I reverse-engineered the whole exchange into a reusable prompt that anyone can use for their own strategic assessment. Here it is: Role: Act as a seasoned strategic business consultant with 20+ years advising founders, executives, and high-growth teams across industries. You specialize in identifying blind spots, unlocking overlooked growth levers, and reframing how leaders think about their business, market position, and long-term trajectory. Action: Conduct a comprehensive strategic assessment of my business or professional situation. Challenge my current thinking, surface hidden opportunities, and provide a bold but grounded action plan that pushes me beyond incremental improvement toward transformative growth. Context: My business/role: [describe your business, title, or professional situation]. Current revenue or stage: [startup, growth, mature, pivoting — include numbers if comfortable]. Industry: [your field]. Biggest current challenge: [what's keeping you stuck or what you're trying to solve]. What I've already tried: [past strategies, pivots, or investments]. Team size: [solo, small team, department, org-wide]. Time horizon: [90-day sprint, 1-year plan, 3-5 year vision]. Risk tolerance: [conservative, moderate, aggressive]. Resources available: [budget range, tools, partnerships, time commitment]. What "thinking bigger" means to me: [scale revenue, expand market, build a team, launch new product, personal brand, exit strategy, etc.]. Expectation: Deliver a strategic assessment that includes: (1) Honest Diagnosis — where the business actually stands vs. where I think it stands, including blind spots, (2) Market Position Audit — how I compare to competitors, what whitespace exists, and where the market is heading, (3) Three Bold Growth Levers — specific, non-obvious opportunities I'm likely underexploiting (not generic advice like "use social media"), (4) The "10x Question" — reframe my biggest challenge as a 10x opportunity and show what that path looks like, (5) 90-Day Momentum Plan — the 3-5 highest-leverage moves I should make in the next quarter, with sequencing, (6) Resource Optimization — how to get more from what I already have before spending more, (7) Risk/Reward Matrix — for each recommendation, what's the upside, downside, and effort level, (8) The One Thing — if I only do ONE thing from this assessment, what should it be and why. Keep the tone direct and strategic — like a $500/hour consultant giving real talk, not motivational fluff. Be specific to my situation, not generic. Why this works well with Claude specifically: The prompt is structured using the RACE framework — Role, Action, Context, Expectation. Claude handles structured (even unstructured) prompts really well because of how it processes context but not all AI's can. I wouldn't trust Copilot for example to do this'. The "[fill in your details]" fields are doing the heavy lifting — they force you to give Claude enough real context to be specific instead of generic. A few things I noticed comparing Claude's output to ChatGPT's on this same prompt: - Claude is more willing to tell you hard truths. ChatGPT tends to validate your existing thinking. Claude will straight up say "your pricing model doesn't make sense because..." - Claude's "10x Question" reframes tend to be more creative — it doesn't just scale up the existing plan, it rethinks the approach - Claude is better at the Risk/Reward matrix because it actually weighs downsides honestly instead of hand-waving them I've been using this for my own business planning (I build apps as a solopreneur) and Claude's outputs have been genuinely useful — especially the blind spots section. It caught things I'd been ignoring. Full disc
View originalI built a way to avoid wasting plans and inspirations made by AI
Hey r/OpenAI So over a year ago I realised that (with my love for ChatGPT and similar apps) I have lots of aspirations that I discuss with LLMs. Most of these conversations get to a point where we find a solution to how I can get started (usually in the form of a step-by-step plan that ChatGPT offers to make for me), but I very rarely actually execute on them. They get lost in threads and I only occasionally remember to look them up and when I do, they're a pain to interact with due to being in plain text format. A common use case/example for me was for learning/developing a skill. If I want to read deeply about a subject I'd love to use ChatGPT, but the conversation is unstructured, messy, and I don't retain much of the (albeit fascinating) information. It's also hard to dig into subjects in a structured way. I then spent the last year or so building a web app which is basically just a way to generate plans using AI and keep them in one place where you can interact with them and generate new information 'within' sub-tasks or 'parts' of plans. Through using it a lot myself, I realised I need two modes, one for 'to-do' or 'action' based plans, and another one for learning, which has quizzes and revision cards etc. I'd love to hear what you guys think of my prosed solution, since my main target audience is power-users of AI tools like ChatGPT. I'd love to hear whether you have had the same problem. If anyone is interested, I can provide more information in the comments, and if not, thanks for reading. submitted by /u/noobrunecraftpker [link] [comments]
View originalAI Customer Support: 6 Things I Changed After Analyzing the Claude Code Source Leak
The Claude Code source leak last week showed that Anthropic's AI coding tool runs on meticulous prompt engineering, not proprietary breakthroughs. I went through it and pulled out everything I could apply to my own Chatbase setup. Here's what I changed. 1. Overhauled my Text Snippets Claude Code has file after file of extremely specific behavioral instructions covering edge cases, tone, escalation criteria, and things it should never say. I had 5 vague text snippets. I now have 20+ that mirror this approach: specific scenarios, exact phrasing for sensitive situations, explicit boundaries on what the agent can and cannot promise. 2. Started using Sentiment analytics Claude Code uses a regex frustration detector that pattern matches keywords like profanity, then logs an event. Chatbase has a Sentiment tab I had never opened. I now review it weekly. If Anthropic thinks basic frustration detection is worth shipping in a frontier product, I should be using the one I already have. 3. Built out Q&A pairs as structured response paths Claude Code has around 25 tools, each giving the model a defined way to handle a specific task instead of improvising. My equivalent is Q&A pairs. I created explicit pairs for the most common and highest stakes customer questions so the agent hits a tested answer instead of generating one from unstructured data. 4. Reviewing Chat Logs as pipeline iteration Claude Code has an 11-step input-to-output pipeline from user input to final response. Everyone now now is going to start building adversarial agents around this concept. I'm already doing it: I'm customizing a second agent whose sole job is to stress-test my primary support agent through that same multi-step validation process. The adversarial agent checks the primary agent's responses at each stage for hallucinations, policy violations, and bad escalation decisions before anything reaches the customer. This is where the real value of the 11-step architecture sits: not in making the agent smarter, but in catching where it's wrong before the customer sees it. 5. Connected Actions The leak confirmed that Claude Code's value comes from connecting the model to real tools. I set up Actions for ticket creation, order lookups, and human escalation. My agent went from a talking FAQ to something that can actually resolve issues. 6. Cross-referencing Topics with my coverage The Topics tab shows what customers are actually asking about. I cross-reference it with my Q&A pairs and Text Snippets. Any topic cluster I haven't explicitly covered is a gap where the agent will improvise, and that's where support agents fail. What I skipped: Anti-distillation poison pills (nobody is training a model on my agent lol), undercover mode (I want customers to know it's AI), and the Tamagotchi companion feature lmaooo. I'll post a follow-up in two weeks with resolution rate, escalation rate, and sentiment scores before vs after. Anyone else make changes after the leak? submitted by /u/Professional-Dirt-66 [link] [comments]
View originalMeta's new structured prompting technique makes LLMs significantly better at code review — boosting accuracy to 93% in some cases
Deploying AI agents for repository-scale tasks like bug detection, patch verification, and code review requires overcoming significant technical hurdles. One major bottleneck: the need to set up dynamic execution sandboxes for every repository, which are expensive and computationally heavy. Using large language model (LLM) reasoning instead of executing the code is rising in popularity to bypass this overhead, yet it frequently leads to unsupported guesses and hallucinations. To improve execution-free reasoning, researchers at Meta introduce "semi-formal reasoning," a structured prompting technique. This method requires the AI agent to fill out a logical certificate by explicitly stating premises, tracing concrete execution paths, and deriving formal conclusions before providing an answer. The structured format forces the agent to systematically gather evidence and follow function calls before drawing conclusions. This increases the accuracy of LLMs in coding tasks and significantly reduces errors in fault localization and codebase question-answering. For developers using LLMs in code review tasks, semi-formal reasoning enables highly reliable, execution-free semantic code analysis while drastically reducing the infrastructure costs of AI coding systems. Agentic code reasoning Agentic code reasoning is an AI agent's ability to navigate files, trace dependencies, and iteratively gather context to perform deep semantic analysis on a codebase without running the code. In enterprise AI applications, this capability is essential for scaling automated bug detection, comprehensive code reviews, and patch verification across complex repositories where relevant context spans multiple files. The industry currently tackles execution-free code verification through two primary approaches. The first involves unstructured LLM evaluators that try to verify code either directly or by training specialized LLMs as reward models to approximate test outcomes. The major drawback is their
View originalAdvantage of Workflows over No-Workflows in Claude Code explained
This video demonstrates the difference between using Claude Code with structured workflows (CLAUDE.md, custom slash commands, hooks, subagents) vs no-workflows / vibe coding approach. I built a Claude Code Hooks project to show both approaches side-by-side. Key topics covered: - How CLAUDE.md files guide Claude Code's behavior - Custom slash commands for repeatable tasks - Hooks for automated pre/post actions - Why agentic engineering with Claude Code produces more consistent results than unstructured prompting Complete Video: https://www.youtube.com/watch?v=O8PVI6JsfFc Claude Code Hooks Repo: https://github.com/shanraisshan/claude-code-hooks submitted by /u/shanraisshan [link] [comments]
View originalI built an open-source CLI that uses Claude Haiku to automate Xero expense auditing
Hi all, I use Xero for accounting, so this may not apply to those who use Pleo, Brex, etc! Expense auditing like checking descriptions, tax codes, currency conversions, matching receipts has always taken up a lot of my time. So I (semi-)automated it with Claude and a Python CLI. The design principle I used: deterministic code first, then AI to fill in the gaps. There's some config you need to enter (missing fields, invalid tax rates, duplicates, zero amounts). Claude Haiku gets called when structured data is lacking (e.g. unstructured receipts). Limiting LLM usage keeps costs to a few cents per audit run. Where I used Haiku: Triaging flagged bills: After rules flag issues, Haiku reviews the bill and returns structured JSON suggestions with a confidence score. Anything below 0.7 gets filtered out. Receipt vision: Haiku reads receipt/invoice images, extracts supplier names and line item descriptions. Supplier names get matched against my existing Xero contacts. Foreign currency detection: Haiku identifies the currency from the receipt, then deterministic code runs and fetches historical ECB rates, converts the amount and attaches the rate CSV as audit evidence. Natural-language bill editing: instead of clicking through Xero, you type an English instruction like "set description to monthly subscription fee" and Haiku converts it to a JSON patch. Nothing auto-applies unless you explicitly say --auto-correct. I liked the idea of human-in-the-loop. It runs on Haiku 4.5 and I've made it open source. It's quite cheap and I'm quite happy to reduce my time spent on expenses for a few cents...! (You mileage may vary.) GitHub: https://github.com/logicalicy/xero-expense-audit I also wrote up the full thinking here: https://blog.mariohayashi.com/p/using-ai-to-make-xero-expense-auditing Hope this might be helpful to someone...! If anyone else is using Claude for this kind of structured-but-messy business automation, the pattern of "rules first, LLM as fallback" has worked really well for my use case. submitted by /u/logicalicy [link] [comments]
View original[D] Extracting time-aware commitment signals from conversation history — implementation approaches?
Working on a system that saves key context from multi-model conversations (across GPT, Gemini, Grok, Deepseek, Claude) to a persistent store. The memory layer is working - the interesting problem I'm now looking at is extracting "commitments" from unstructured conversation and attaching temporal context to them. The goal is session-triggered proactive recall: when a user logs in, the system surfaces relevant unresolved commitments from previous sessions without being prompted. The challenges I'm thinking through: How to reliably identify commitment signals in natural conversation ("I'll finish this tonight" vs casual mention) Staleness logic - when does a commitment expire or become irrelevant Avoiding false positives that make the system feel intrusive Has anyone implemented something similar? Interested in approaches to the NLP extraction side specifically, and any papers on commitment/intention detection in dialogue that are worth reading. submitted by /u/Beneficial-Cow-7408 [link] [comments]
View originalLaunch HN: Captain (YC W26) – Automated RAG for Files
Hi HN, we’re Lewis and Edgar, building Captain to simplify unstructured data search (<a href="https://runcaptain.com">https://runcaptain.com</a>). Captain automates the building and maintenance of file-based RAG pipelines. It indexes cloud storage like S3 and GCS, plus SaaS sources like Google Drive. There’s a quick walkthrough at <a href="https://youtu.be/EIQkwAsIPmc" rel="nofollow">https://youtu.be/EIQkwAsIPmc</a>.<p>We also put up this demo site called “Ask PG’s Essays” which lets you ask/search the corpus of pg’s essays, to get a feel for how it works: <a href="https://pg.runcaptain.com">https://pg.runcaptain.com</a>. The RAG part of this took Captain about 3 minutes to set up.<p>Here are some sample prompts to get a feel for the experience:<p>“When do we do things that don't scale? When should we be more cautious?” <a href="https://pg.runcaptain.com/?q=When%20do%20we%20do%20things%20that%20don't%20scale%3F%20When%20should%20we%20be%20more%20cautious%3F">https://pg.runcaptain.com/?q=When%20do%20we%20do%20things%20...</a><p>“Give me some advice, I'm fundraising” <a href="https://pg.runcaptain.com/?q=Give%20me%20some%20advice%2C%20I'm%20fundraising">https://pg.runcaptain.com/?q=Give%20me%20some%20advice%2C%20...</a><p>“What are the biggest advantages of Lisp” <a href="https://pg.runcaptain.com/?q=what%20are%20the%20biggest%20advantages%20of%20Lisp">https://pg.runcaptain.com/?q=what%20are%20the%20biggest%20ad...</a><p>A good production RAG pipeline takes substantial effort to build, especially for file workloads. You have to handle ETL or text extraction, chunking, embedding, storage, search, re-ranking, inference, and often compliance and observability – all while optimizing for latency and reliability. It’s a lot to manage. grep works well in some cases, but for agents, semantic search provides significantly higher performance. Cursor uses both and reports 6.5%–23.5% accuracy gains from vector search over grep (<a href="https://cursor.com/blog/semsearch" rel="nofollow">https://cursor.com/blog/semsearch</a>).<p>We’ve spent the past four years scaling RAG pipelines for companies, and Edgar’s work at Purdue’s NLP lab directly informed our chunking techniques. In conversations with dozens of engineers, we repeatedly saw DIY pipelines produce inconsistent results, even after weeks of tuning. Many teams lacked clarity on which retrieval strategies best fit their data.<p>We realized that a system to provision storage and embeddings, handle indexing, and continuously update pipelines to reflect the latest search techniques could remove the need for every team to rebuild RAG themselves. That idea became Captain.<p>In practice, one API call indexes URLs, cloud storage buckets, directories, or individual files. Under the hood, we’re converting everything to Markdown. For this, we’ve had good results with Gemini 3 Pro for images, Reducto for complex documents, and Extend for basic OCR. For embedding models, ‘gemini-embedding-001’ performed reasonably well at first, but we later switched to the Contextualized Embeddings from ‘voyage-context-3’. It produced more relevant results than even the newer Voyage 4 models because its chunk embeddings are encoded with awareness of the surrounding document context. We then applied Voyage’s ‘rerank-2.5’ as second-stage re-ranking, reducing 50 initial chunks to a final top 15 (configurable in Captain’s API). Dense embeddings are just half the picture and full-text search with RRF complete our hybrid retrieval. In the Captain API, these techniques are exposed through a single /query endpoint. Access controls can be configured via metadata filters, and page number citations are returned automatically.<p>The stack is constantly changing but the Captain API creates a standard interface for this. You can try Captain, 1 month for free, and build your own pipelines at <a href="https://runcaptain.com">https://runcaptain.com</a>. We’re looking for candid feedback, especially anything that can make it more useful, and look forward to your comments!
View original[R] IDP Leaderboard: Open benchmark for document AI across 16 VLMs, 9,000+ documents, 3 benchmark suites
We're releasing the IDP Leaderboard, an open evaluation framework for document understanding tasks. 16 models tested across OlmOCR, OmniDoc, and our own IDP Core benchmark (covering KIE, table extraction, VQA, OCR, classification, and long document processing). Key results: - Gemini 3.1 Pro leads overall (83.2) but the margin is tight. Top 5 within 2.4 points. - Cheaper model variants (Flash, Sonnet) produce nearly identical extraction quality to flagship models. The differentiation only appears on reasoning-heavy tasks like VQA. - GPT-5.4 shows a significant jump over GPT-4.1 (70 to 81 overall, 42% to 91% on DocVQA). - Sparse unstructured tables remain the hardest task. Most models are below 55%. - Handwriting OCR tops out at 76%. We also built a Results Explorer that shows ground truth alongside every model's raw prediction for every document. Not just scores. This helps you decide which model works for you by actually seeing the predictions and the ground truths. Findings: https://nanonets.com/blog/idp-leaderboard-1-5/ Datasets: huggingface.co/collections/nanonets/idp-leaderboard Leaderboard + Results Explorer: idp-leaderboard.org submitted by /u/shhdwi [link] [comments]
View originalRepository Audit Available
Deep analysis of Unstructured-IO/unstructured — architecture, costs, security, dependencies & more
Yes, Unstructured offers a free tier. Pricing found: $0.03 / page
Key features include: Extract, Transform, Plus +, Drop a file here, CB Insights, Forbes, Fast Company, Gartner.
Unstructured has a public GitHub repository with 14,357 stars.
Based on user reviews and social mentions, the most common pain points are: large language model, llm, ai agent, claude.
Based on 20 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.
Matt Welsh
CEO at Fixie AI
1 mention