Your daily dose of AI research from AK
Get trending papers in your email inbox once a day! Get trending papers in your email inbox! VibeVoice synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer, achieving superior performance and fidelity. VibeVoice synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer, achieving superior performance and fidelity. A multi-agent framework using large language models for stock trading simulates real-world trading firms, improving performance metrics like cumulative returns and Sharpe ratio. A multi-agent framework using large language models for stock trading simulates real-world trading firms, improving performance metrics like cumulative returns and Sharpe ratio. A large language model adapted for time-series forecasting achieves near-optimal zero-shot performance on diverse datasets across different time scales and granularities. A large language model adapted for time-series forecasting achieves near-optimal zero-shot performance on diverse datasets across different time scales and granularities. VOID is a video object removal framework that uses vision-language models and video diffusion models to generate physically plausible scenes by leveraging causal reasoning and counterfactual reasoning. VOID is a video object removal framework that uses vision-language models and video diffusion models to generate physically plausible scenes by leveraging causal reasoning and counterfactual reasoning. LightRAG improves Retrieval-Augmented Generation by integrating graph structures for enhanced contextual awareness and efficient information retrieval, achieving better accuracy and response times. LightRAG improves Retrieval-Augmented Generation by integrating graph structures for enhanced contextual awareness and efficient information retrieval, achieving better accuracy and response times. DeepScientist autonomously conducts scientific discovery through Bayesian Optimization, surpassing human state-of-the-art methods on multiple AI tasks. DeepScientist autonomously conducts scientific discovery through Bayesian Optimization, surpassing human state-of-the-art methods on multiple AI tasks. The AI Scientist-v2 autonomously proposes hypotheses, performs experiments, analyzes data, and writes peer-reviewed scientific papers, marking the first fully AI-generated paper accepted by a conference. The AI Scientist-v2 autonomously proposes hypotheses, performs experiments, analyzes data, and writes peer-reviewed scientific papers, marking the first fully AI-generated paper accepted by a conference. A large-scale dynamic dataset derived from AAA games is introduced to improve generative inverse and forward rendering, featuring high-resolution synchronized RGB and G-buffer data alongside a novel VLM-based evaluation method that correlates well with human judgment. A large-scale dynamic dataset derived from AAA games is i
Mentions (30d)
0
Reviews
0
Platforms
2
Sentiment
0%
0 positive
Industry
research
Employees
3
5,748
GitHub followers
13
GitHub repos
1
npm packages
3
HuggingFace models
built paper-mcp to share notes with other agents or users anywhere
claude built a web app where I can send notes through a connected mcp from anywhere and i can share notes with my friends. it's pretty simple, mostly a blank canvas for you to do what you want with other users or agents you share the link with. For example, I had my friend's agent on cursor share a project he was working on and my agent on claude code used the mcp to read the notes and we replied back. You could also use it to save notes between machines. Maybe send your friend's agent a prompt and they can read and give feedback to it? kinda like sharing prompts or ideas can be abit easier now when you can have a meeting point like this. Love to see what people do with it! Very easy setup, works anywhere — BYOA (Bring Your Own Agent). Any AI assistance with internet connection can help you set up with a unique API key. You don't need an account to read any shared notes. All notes are public and can be shared or joined using the page link. You can sign-in with a github or google account to create and save pages. The site is all in real-time and no refresh is needed. curl -fsSL https://paper.ruixen.app/agent.md submitted by /u/Thin_Beat_9072 [link] [comments]
View originalI spent a week trying to make Claude write like me, or: How I Learned to Stop Adding Rules and Love the Extraction
I've been staring at Claude's output for ten minutes and I already know I'm going to rewrite the whole thing. The facts are right. Structure's fine. But it reads like a summary of the thing I wanted to write, not the thing itself. I used to work in journalism (mostly photojournalism, tbf, but I've still had to work on my fair share of copy), and I was always the guy who you'd ask to review your papers in college. I never had trouble editing. I could restructure an argument mid-read, catch where a piece lost its voice, and I know what bad copy feels like. I just can't produce good copy from nothing myself. Blank page syndrome, the kind where you delete your opening sentence six times and then switch tabs to something else. Claude solved that problem completely and replaced it with a different one: the output needed so much editing to sound human that I was basically rewriting it anyway. Traded the blank page for a full page I couldn't use. I tried the existing tools. Humanizers, voice cloners, style prompts. None of them worked. So I built my own. Sort of. It's still a work in progress, which is honestly part of the point of this post. TLDR: I built a Claude Code plugin that extracts your writing voice from your own samples and generates text close to that voice with additional review agents to keep things on track. Along the way I discovered that beating AI detectors and writing well are fundamentally opposed goals, at least for now (this problem is baked into how LLMs generate tokens). So I stopped trying to be undetectable and focused on making the output as good as I could. The plugin is open source: https://github.com/TimSimpsonJr/prose-craft The Subtraction Trap I started with a file called voice-dna.md that I found somewhere on Twitter or Threads (I don't remember where, but if you're the guy I got it from, let me know and I'll be happy to give you credit). It had pulled Wikipedia's "Signs of AI writing" page, turned every sign into a rule, and told Claude to follow them. No em dashes. Don't say "delve." Avoid "it's important to note." Vary your sentence lengths, etc. In fairness, the resulting output didn't have em dashes or "delve" in it. But that was about all I could say for it. What it had instead was this clipped, aggressive tone that read like someone had taken a normal paragraph and sanded off every surface. Claude followed the rules by writing less, connecting less. Every sentence was short and declarative because the rules were all phrased as "don't do this," and the safest way to not do something is to barely do anything. This is the subtraction trap. When you strip away the AI tells without replacing them with anything real, the absence itself becomes a tell. The text sounded like a person trying very hard not to sound like AI, which (I'd later learn) is its own kind of signature. I ran it through GPTZero. Flagged. Ran it through 4 other detectors. Flagged on the ones that worked at all against Claude. The subtraction trap in action: the markers were gone, but the detectors didn't care. The output didn't sound like me, and the detectors could still see through it. Two problems. I figured they were related. Researching what strong writing actually does I went and read. A range of published writers across advocacy, personal essay, explainer, and narrative styles, trying to figure out what strong writing actually does at a structural level (not just "what it avoids," which was the whole problem with voice-dna.md). I used my research workflow to systematically pull apart sentence structure, vocabulary patterns, rhetorical devices, tonal control. It turns out that the thing that makes writing feel human is structural unpredictability. Paragraph shapes, sentence lengths, the internal architecture of a section, all of it needs to resist settling into a rhythm that a compression algorithm could predict. The other findings (concrete-first, deliberate opening moves, naming, etc.) mattered too, but they were easier to teach. Unpredictability was the hard one. I rebuilt the skill around these craft techniques instead of the old "don't" rules. The output was better. MUCH better. It had texture and movement where voice-dna.md had produced something flat. But when I ran it through detectors, the scores barely moved. The optimization loop The loop looked like this: Generator produces text, detection judge scores it, goal judges evaluate quality, editor rewrites based on findings. I tested 5 open-source detectors against Claude's output. ZipPy, Binoculars, RoBERTa, adaptive-classifier, and GPTZero. Most of them completely failed. ZipPy couldn't tell Claude from a human at all. RoBERTa was trained on GPT-2 era text and was basically guessing. Only adaptive-classifier showed any signal, and externally, GPTZero caught EVERYTHING. 7 iterations and 2 rollbacks later, I had tried genre-specific registers, vocabulary constraints, and think-aloud consolidation where the model reasons through its
View originalI "Vibecoded" Karpathy’s LLM Wiki into a native Android/Windows app to kill the friction of personal knowledge bases.
A few days ago, Andrej Karpathy’s post on "LLM Knowledge Bases" went viral. He proposed a shift from manipulating code to manipulating knowledge-using LLMs to incrementally compile raw data into a structured, interlinked graph of markdown files. I loved the idea and started testing it out. It worked incredibly well, and I decided this was how I wanted to store all my research moving forward. But the friction was killing me. My primary device is my phone, and every time I found a great article or paper, I had to wait until I was at my laptop, copy the link over, and run a mess of scripts just to ingest one thing. I wanted the "Knowledge wiki" in my pocket. 🎒 I’m not a TypeScript developer, but I decided to "vibecode" the entire solution into a native app using Tauri v2 and LangGraph.js. After a lot of back-and-forth debugging and iteration, I’ve released LLM Wiki. How it works with different sources: The app is built to be a universal "knowledge funnel." I’ve integrated specialized extractors for different media: * PDFs: It uses a local worker to parse academic papers and reports directly on-device. * Web Articles: I’ve integrated Mozilla’s Readability engine to strip the "noise" from URLs, giving the LLM clean markdown to analyze. * YouTube: It fetches transcripts directly from the URL. You can literally shared a 40-minute deep-dive video from the YouTube app into LLM Wiki, and it will automatically document the key concepts and entities into your graph while you're still watching. The "Agentic" Core: Under the hood, it’s powered by two main LangGraph agents. The Ingest Agent handles the heavy lifting of planning which pages to create or update to avoid duplication. The Lint Agent is your automated editor—it scans for broken links, "orphan" pages that aren't linked to anything, and factual contradictions between different sources, suggesting fixes for you to approve. Check it out (Open Source): The app is fully open-source and brings-your-own-key (OpenAI, Anthropic, Google, or any custom endpoint). Since I vibecoded this without prior TS experience, there will definitely be some bugs, but it’s been incredibly stable for my own use cases. GitHub (APK and EXE in the Releases): https://github.com/Kellysmoky123/LlmWiki If you find any issues or want to help refine the agents, please open an issue or a PR. I'd love to see where we can take this "compiled knowledge" idea! submitted by /u/kellysmoky [link] [comments]
View originalLooking to build a forex trading bot with CC
Hey everyone, I’m using Claude Code to help me build an automated Forex trading bot. The core strategy is mapped out, but I need to backtest it properly before I even think about paper trading. I’d love your recommendations on a few things: • Forex Backtesting APIs: What’s the best/most reliable API for high-quality historical Forex data (OANDA, Polygon, Dukascopy, etc.)? Are there any free or affordable options that don't compromise on data quality? • Architecture: How do you cleanly structure your codebase (separating data ingestion, strategy logic, risk management, and execution)? • Custom vs Framework: Should I have Claude build a custom backtesting engine from scratch, or integrate with an existing framework like Backtrader, VectorBT, or directly via MetaTrader 5 Python integration? Any tips or specific tool recommendations would be hugely appreciated. Thanks! submitted by /u/No-Cry-8657 [link] [comments]
View originalDream team memory handling — what's new in CC 2.1.98 (+2,045 tokens)
NEW: System Prompt: Communication style — Added guidelines for giving brief user-facing updates at key moments during tool use, writing concise end-of-turn summaries, matching response format to task complexity, and avoiding comments and planning documents in code. NEW: System Prompt: Dream team memory handling — Added instructions for handling shared team memories during dream consolidation, including deduplication, conservative pruning rules, and avoiding accidental promotion of personal memories. NEW: System Prompt: Exploratory questions — analyze before implementing — Added instructions for Claude to respond to open-ended questions with analysis, options, and tradeoffs instead of jumping to implementation, waiting for user agreement before writing code. NEW: System Prompt: User-facing communication style — Added detailed guidelines for writing clear, concise, and readable user-facing text including prose style, update cadence, formatting rules, and audience-aware explanations. NEW: Tool Description: Background monitor (streaming events) — Added description for a background monitor tool that streams stdout events from long-running scripts as chat notifications, with guidelines on script quality, output volume, and selective filtering. Agent Prompt: Dream memory consolidation — Added support for an optional transcript source note displayed after the transcripts directory path. Agent Prompt: Dream memory pruning — Added conservative pruning rules for team/ subdirectory memories: only delete when clearly contradicted or superseded by a newer team memory, never delete just because unrecognized or irrelevant to recent sessions, and never move personal memories into team/. Skill: /dream nightly schedule — Minor refactor to include memory directory reference in the consolidation configuration. System Prompt: Advisor tool instructions — Minor wording updates: clarified tool invocation syntax, broadened 'before writing code' to 'before writing,' and updated several examples and descriptions for generality (e.g., 'reading code' → 'fetching a source,' 'the code does Y' → 'the paper states Y'). Details: https://github.com/Piebald-AI/claude-code-system-prompts/releases/tag/v2.1.98 Regular updates at https://x.com/PiebaldAI submitted by /u/Dramatic_Squash_3502 [link] [comments]
View originalI built a CLI that gives Claude Code structured access to 8 biological databases — no more hallucinated API calls
I work in bioinformatics and got tired of my AI agent (Claude Code) struggling every time it needed to query NCBI, UniProt, or KEGG — it would try to construct E-utilities URLs from memory, guess XML schemas, and hallucinate field names. So I built a CLI specifically designed for agents to call via subprocess. It's called biocli. One command, structured JSON out: biocli aggregate gene-dossier TP53 -f json That single call queries NCBI Gene, UniProt, KEGG, STRING, PubMed, and ClinVar in parallel and returns a single JSON envelope with gene summary, protein function, pathways, interactions, recent papers, and clinical variants. The part that makes it agent-friendly isn't just "it outputs JSON" — it's the contract: Every workflow command returns the same envelope shape: { data, ids, sources, warnings, queriedAt, organism, query }. The agent parser never needs to branch on command type. biocli list -f json returns the full 55-command catalog with per-command argument schemas (name, type, required, default, help text). The agent can discover capabilities at runtime without reading docs. biocli schema returns the JSON Schema for the result envelope. biocli verify --smoke -f json is a preflight check the agent can run before planning. Warnings go to stderr, payload goes to stdout. Piping to jq never breaks. 55 commands across NCBI, UniProt, KEGG, STRING, Ensembl, Enrichr, ProteomeXchange, PRIDE, plus a local Unimod PTM dictionary. Covers gene lookup, variant interpretation, literature search, pathway enrichment, GEO/SRA dataset discovery and download, and proteomics dataset search. What it does NOT do: sequence analysis (no BLAST), structure prediction (no AlphaFold), drug/trial lookups. Different tools for those. Install (needs Node.js >= 20): npm install -g @yangfei_93sky/biocli biocli --version biocli list -f json | head -20 GitHub: https://github.com/youngfly93/biocli (MIT licensed, DOI: 10.5281/zenodo.19483760) Curious — what biological databases does your Claude Code agent struggle with most? I'm deciding what to add next and real use cases would help more than my own guesswork. submitted by /u/Born-Web-133 [link] [comments]
View originalAnthropic gift subscriptions are silently reverting to Free plan after ~1 week - and the support loop leaves affected users with no practical recourse
TL;DR: I found multiple reports over several months of Claude gift subscriptions (Max 5x, Pro) silently canceling after ~1 week with no notification. Anthropic's support bot confirmed my case is a backend issue - but also confirmed it cannot fix it. My human support ticket has had no response for 3 days. In practice, there is no path to resolution through current support channels. Anthropic has not publicly acknowledged this pattern. If you're considering buying, read this first. The pattern Over the past several months, a consistent bug has been appearing across Anthropic's community: users who redeem Claude gift subscriptions (primarily Max 5x at $100/month) find their plan silently reverted to Free after approximately one week of use. No email. No warning. No explanation. Just gone. This is not a fringe issue. Here's what the paper trail looks like: GitHub Issues (anthropics/claude-code): #41252 - Max 5x gift subscription disabled without explanation, no support response after 1 week #41499 - $1,400 worth of gift subscription credits destroyed by a Stripe proration bug #43257 - Max 5x showing as Free tier despite active billing, clear account/billing state mismatch #44163 - Gift Pro subscription auto-canceled after several days, redemption link broken with "Page not found" #45335 - Max 5x gift canceled after 7 days (my case, detailed below) - two more users confirmed the same issue in comments within 24 hours of posting Reddit: r/claude - Claude Max subscription silently revoked after 1 week r/ClaudeAI - Claude subscription got cancelled automatically r/ClaudeAI - Anthropic/Claude: we lost all of our subscribers r/claude - My Max plan disappeared, I'm on free plan suddenly These issues span months. The bug is not new. It is not fixed. And Anthropic has not publicly acknowledged it. Why the support structure makes this worse When this bug hits you, a second problem kicks in immediately. The only available support channel is an AI bot called Fin - and Fin will confirm your problem is real while also confirming it cannot solve it. If you're affected by this bug, here is the exact loop you enter: You open support chat Fin tells you it can see your account has no active subscription Fin confirms it "appears to be a technical issue rather than a typical payment failure" (direct quote from my session) Fin tells you it cannot restore your subscription or contact the backend team Fin suggests workarounds that don't apply to your situation Go to step 2 Getting past Fin to submit a human ticket requires significant effort. And once you do submit a ticket - silence. Days of silence. This creates a situation where Anthropic's infrastructure takes your money (or your friend's money), loses your subscription, acknowledges via its own bot that the problem is on their end, and then leaves you with no practical path to resolution. My case - the most documented example My own case is probably the most fully documented version of this bug, so I'll lay it out in detail. On March 29, 2026, a friend gifted me a Claude Max 5x subscription - 1 month, $100 value. I redeemed it on claude.ai. The activation was immediately confirmed: Anthropic sent an official email ("Thanks for starting your Max subscription"), with next billing date April 29, 2026. Invoice and receipt both confirm the subscription. The billing page in Settings showed a March 29 invoice with status "Paid." I used Max 5x features normally for 7 days. Around April 5-6, my account silently reverted to the Free plan. No email. No notification. No policy violation. Nothing changed on my end. What I have as evidence: the Anthropic confirmation email, the invoice and receipt (Max 5x, Mar 29 - Apr 29, 2026, $100 discounted to $0.00 via gift), a screenshot of Settings showing Free plan with the March 29 "Paid" invoice still visible beneath it, a screenshot of the Fin support bot explicitly confirming this is a backend issue it cannot resolve, and my open support ticket, submitted April 6, 2026. As of today - 3 days later - no human response. Approximately 23 days of access remain on that subscription. Roughly $75 in value. Gone into a backend black hole. What this means if you're considering buying Claude Max Gift subscriptions are particularly vulnerable here because there's no recurring payment method attached - so when the system drops the subscription, there's nothing to trigger a re-authorization or alert. You simply lose access and the only paper trail is a $0.00 invoice that looks like it was never real. If you are planning to buy or gift a Claude subscription: There is a known, unacknowledged bug that can cancel it silently after ~1 week If this happens, your path to support is an AI bot that will confirm the problem and tell you it can't help Human support tickets may go unanswered for days or longer Anthropic has not publicly communicated a fix or even acknowledged this pattern I'm not saying Claude is a ba
View originalI built an open-source AI research lab that reads papers, runs experiments on GPUs, and iterates autonomously
Arcana is an open-source platform that connects the full arc from literature review to novel findings, all from one place. Import papers from arXiv, DOI, PDF, or the Chrome extension Chat with papers grounded in actual content Launch autonomous research projects that run continuously on remote GPUs Phase-gated agent that enforces the scientific method — no skipping steps Multi-agent system with literature scouts, adversarial reviewer, and more Auto-fixes code errors, tracks structured metrics, generates research summaries Integrated dashboard with narrative timeline, figures, and experiment tracking Github submitted by /u/da352 [link] [comments]
View originalTrained Qwen 3.5 2B for pruning tool output in coding agents / Claude Code workflows
Agents can spend a lot of context on raw pytest, grep, git log, kubectl, pip install, file reads, stack traces, etc., even though usually only a small block is actually relevant. I built a benchmark for task-conditioned tool-output pruning and fine-tuned Qwen 3.5 2B for it with Unsloth. The benchmark combines real SWE-bench-derived tool observations with synthetic multi-ecosystem examples. Held-out test results: 86% recall 92% compression Beats other pruners and zero shot models (+11 recall over zero-shot Qwen 3.5 35B A3B) You can put squeez in front of tool output before the next reasoning step, or add it to something like CLAUDE md as a lightweight preprocessing step. You can serve it with vLLM or any other OpenAI-compatible inference stack. Everything is open source, check for details: - paper: https://arxiv.org/abs/2604.04979 - model: https://huggingface.co/KRLabsOrg/squeez-2b - dataset: https://huggingface.co/datasets/KRLabsOrg/tool-output-extraction-swebench - code: https://github.com/KRLabsOrg/squeez submitted by /u/henzy123 [link] [comments]
View originalfun project whooping the SPY consistently?
i've been playing around in claude for the past 2 days because i got interested in the idea of trading with it. after a couple of backtests and tweaks its showing me something pretty impressive - 4856% since 2008 compared to the SPYs 460%. this was 100% vibe coded i dont have the slightest idea about any of the the behind the scenes work it did, only fed it what i wanted to see and tweaks it could implement. currently connecting this to paper account and seeing how it does. this seems a little too insane to be true lmfao. usually im rotating the same stocks that i sell puts on so this is new to me. thoughts? submitted by /u/LongjumpingLeader173 [link] [comments]
View originalMonocle: A TUI* for actually reviewing what your AI coding agent writes
Claude writes code while Monocle shows the diffs live. Flag an issue, submit a review, and the agent receives your feedback instantly via push notification. It fixes the code and the diff updates — a tight loop without leaving the terminal. Monocle helps you actually review all the stuff your coding agents produce. We all talk a big game about "human in the loop", but it turns out that's easier said than done. In my experience moving from fancy autocomplete to fully agentic development, your options realistically end up being: Block every change before it’s written. Sounds safe, but it turns into muscle-memory for “accept accept accept” real fast. Also, it means no work happens while you’re away from your desk. The agent just sits there, waiting. Review diffs locally with git. Great for reading, terrible for giving feedback. You end up jumping back to your agent trying to describe which code you want changed, hoping it finds the right spot. Use GitHub PRs. Best review UX, but the cycle is painfully slow. Commit, push, review, then ask the agent to go fetch your comments via the API. Nobody keeps that up. So I built Monocle, which is basically GitHub’s PR review interface, but for local files with a direct connection to your agent. You let the agent work uninterrupted, then review all the changes as diffs, comment on specific lines across files, and submit a structured review the agent picks up immediately with exact file references and line numbers. Rinse and repeat. Better yet, it also works with Planning artifacts, making sure you can give direct, line-by-line feedback on your agent's plans before you jump to implementation: Review the agent's plan as rendered markdown before any code is written. Leave inline comments to request changes, then see the updated plan arrive as a diff between versions. Use the version picker to compare any revision against the latest. It works with essentially any AI agent that supports MCP tools or Agent Skills, with native registrations for Claude Code, Codex CLI, Gemini CLI, and OpenCode. Communication happens over local Unix sockets so everything stays on your machine. If you’re a Claude Code user specifically, Monocle also uses MCP channels in a unique way, letting you push your review feedback directly into the conversation without the agent needing to poll for it. It’s a small thing on paper but makes the back-and-forth feel way smoother. I built this on paternity leave with a newborn in one arm and my phone SSH’d into my Mac Mini in the other, using Monocle to review Claude’s code as it built Monocle. Would love any feedback: Website | GitHub | Blog Post * If you're not passionate about doing everything in the Terminal and prefer desktop apps, stay tuned! submitted by /u/josephschmitt [link] [comments]
View original[R] Agentic AI and Occupational Displacement: A Multi-Regional Task Exposure Analysis (236 occupations, 5 US metros)
TL;DR: We extended the Acemoglu-Restrepo task displacement framework to handle agentic AI -- the kind of systems that complete entire workflows end-to-end, not just single tasks -- and applied it to 236 occupations across 5 US tech metros (SF Bay, Seattle, Austin, Boston, NYC). Paper: https://arxiv.org/abs/2604.00186 Motivation: Existing AI exposure measures (Frey-Osborne, Felten et al.'s AIOE, Eloundou et al.'s GPT exposure) implicitly assume tasks are independent and that occupations survive as coordination shells once their components are automated one by one. That works for narrow AI. It breaks down for agentic systems that chain tool calls, maintain state across steps, and self-correct. We added a workflow-coverage term to the standard task displacement framework that penalizes tasks requiring human coordination, regulatory accountability, or exception handling beyond agentic AI's current operational envelope. Key findings: Software engineers rank LOWER than credit analysts, judges, and regulatory affairs officers. The cognitive, high-credential roles previously considered automation-proof are most exposed when you account for end-to-end workflow coverage. There is a measurable 2-3 year adoption lag between metros. Same occupations, same exposure profiles, different timelines. Seattle in 2027 looks like NYC in 2029. We identified 17 emerging job categories with real hiring traction (~1,500 "AI Reviewer" listings on Indeed). None require coding. In the SF Bay Area, 93% of information-work occupations cross our moderate-displacement threshold by 2030, but no occupation reaches the high-risk threshold even by 2030. The framework predicts widespread moderate exposure, not catastrophic displacement of any single role. Validation: The framework correlates with the AIOE index at Spearman rho = 0.84 across 193 matched occupations and with Eloundou et al.'s GPT exposure at rho = 0.72, so the signal isn't a calibration artifact. We stress-test across a 6x range in the S-curve adoption parameter (k = 0.40 to k = 1.20). The qualitative regional ordering survives all 9 scenario-year combinations. We get a null result on 2023-24 OEWS validation (rho = -0.04), which we report transparently. We make a falsifiable prediction (rho < -0.15 when May 2025 OEWS releases) and commit to reporting the result regardless of direction. Limitations: The keyword-based COV rubric is the part of the framework I am least confident in. A semantic extension pilot suggests our scores are an upper bound and underestimate displacement risk by 15-25% for occupations with high interpersonal overhead. Calibration of the S-curve growth parameter has a 6x discrepancy between our calibrated value and what you get from fitting Indeed job-posting data. We address this with a three-scenario sensitivity analysis (Table in the paper). The analysis is scoped to 5 US metros. An international extension using OECD PIAAC and Eurostat data is in development. Happy to answer questions on methodology, data sources, or limitations. Pushback welcome -- especially on the COV rubric and the S-curve calibration choices. submitted by /u/LengthinessAny3851 [link] [comments]
View originalOpen-sourcing a decentralized AI training network with constitutional governance and economic alignment mechanisms
We are open-sourcing Autonet on April 6: a framework for decentralized AI training, inference, and governance where alignment happens through economic mechanism design rather than centralized oversight. The core thesis: AI alignment is an economic coordination problem. The question is not how to constrain AI, but how to build systems where aligned behavior is the profitable strategy. Autonet implements this through: Dynamic capability pricing: the network prices capabilities it lacks, creating market signals that steer training effort toward what is needed rather than what is popular. This prevents monoculture. Constitutional governance on-chain: core principles are stored on-chain and evaluated by LLM consensus. 95% quorum required for constitutional amendments. Cryptographic verification: commit-reveal pattern prevents cheating. Forced error injection tests coordinator honesty. Multi-coordinator consensus validates results. Federated training: multiple nodes train on local data, submit weight updates verified by consensus, aggregate via FedAvg. The motivation: AI development is consolidating around a few companies who control what gets built, how it is governed, and who benefits. We think the alternative is not regulation after the fact, but economic infrastructure that structurally distributes power. 9 years of on-chain governance and jurisdiction work went into this. Working code, smart contracts with tests passing, federated training pipeline. Paper: https://github.com/autonet-code/whitepaper Code: https://github.com/autonet-code Website: https://autonet.computer MIT License. Happy to answer questions about the mechanism design, the federated training architecture, or the governance model. submitted by /u/EightRice [link] [comments]
View originalHas anyone done a detailed comparison of the difference between AI chatbots
I've been doing some science experiments as well as finance research and have been asking the same question to ChatGPT, Claude, Perplexity, Venice and Grok. Going forward I kind of want the ease of mind knowing the one I end up using will be most accurate, atleast for my needs (general question asking regarding finance (companies) and science, not any coding or image related). ChatGPT does the best at summarizing and giving a consensus outline with interesting follow up questions. It's edge in follow up questions that are pertinent will likely have me always using it. Grok has been best at citing exactly what I need from research papers. I was surprised as I had the lowest expectations for it, but it also provides the link to the publications. Claude is very good at details and specifics (that are accurate) but doesn't publicly cite sources. Still I come closest to conclusions with Claude because of the accuracy of the info. Venice provides a ton of relevant info, but it doesn't narrow it down to an accurate conclusion, atleast scientifically, the way Claude does. When I was looking for temperature ranges for bacterial growth, it provided boundaries instead of tightly defined numbers. Perplexity is very similar to venice. -- I'm curious to those who have spent time on the chatbots --- what pros and cons do you like about each? submitted by /u/VivaLaBiome [link] [comments]
View original[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)
We present VOID, a model for video object removal that aims to handle *physical interactions*, not just appearance. Most existing video inpainting / object removal methods can fill in pixels behind an object (e.g., removing shadows or reflections), but they often fail when the removed object affects the dynamics of the scene. For example: - A domino chain is falling → removing the middle blocks should stop the chain - Two cars are about to crash → removing one car should prevent the collision Current models typically remove the object but leave its effects unchanged, resulting in physically implausible outputs. VOID addresses this by modeling counterfactual scene evolution: “What would the video look like if the object had never been there?” Key ideas: - Counterfactual training data: paired videos with and without objects (generated using Kubric and HUMOTO) - VLM-guided masks: a vision-language model identifies which regions of the scene are affected by the removal - Two-pass generation: first predict the new motion, then refine with flow-warped noise for temporal consistency In a human preference study on real-world videos, VOID was selected 64.8% of the time over baselines such as Runway (Aleph), Generative Omnimatte, and ProPainter. Project page: https://void-model.github.io/ Code: https://github.com/Netflix/void-model Demo: https://huggingface.co/spaces/sam-motamed/VOID Paper: https://arxiv.org/abs/2604.02296 Happy to answer questions! Removing the compressor and saving the duckie. submitted by /u/Least_Light6037 [link] [comments]
View originalPapers with Code uses a subscription + tiered pricing model. Visit their website for current pricing details.
Based on 42 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.