Unlock enterprise-scale AI with ClearML’s AI Infrastructure Platform. Manage GPU clusters, streamline AI/ML workflows, and deploy GenAI models effortl
The ClearML AI Infrastructure Platform is a three-layer solution that delivers a smooth, scalable AI workflow from development to production at enterprise scale. The Infrastructure Control Plane allows you to connect and manage GPU clusters – whether on-premises, in the cloud, or both – ensuring high performance and cost optimization. It offers built-in security features like multi-tenancy, role-based access control, and billing. The AI Development Center provides a robust environment for developing, training, and testing AI models, accessible from anywhere. Finally, the GenAI App Engine effortlessly deploys LLMs onto your clusters, with ClearML handling networking, authentication, and security. Launch any GenAI workload with a single click, and let our scheduler handle the rest. From infrastructure management to AI development and deployment, ClearML streamlines your AI workflows, getting you up and running quickly and efficiently. Control and manage AI infrastructure and maximize compute utilization Streamline the AI/ML development from development to production Boost GenAI deployment with customizable workflows and managed access Drive superior results and lower cost on every AI workload with ClearML Derive more value from current infrastructure and delay future hardware purchases. reduction in compute and human capital costs Boost efficiency, cut costs, and accelerate time-to-market. Scale AI on your terms with unmatched flexibility from an agnostic solution. For larger teams with security and compliance needs, see our Scale and Enterprise options. Best for individuals, researchers, academia, and small teams working on projects Best for growing AI teams that require enhanced features and more automation For organizations with 8-48 GPUs Pay for What You Use | *VPC only For organizations with multiple large projects Get in touch with our team and we will assist you with building your business's custom ClearML license. Welcome to the documentation for ClearML, the end-to-end platform for streamlining AI development and deployment. ClearML consists of three essential layers: Each layer provides distinct functionality to ensure an efficient and scalable AI workflow from development to deployment. The AI Development Center offers a robust environment for developing, training, and testing AI models. It is designed to be cloud and on-premises agnostic, providing flexibility in deployment. The GenAI App Engine is designed to deploy large language models (LLM) into GPU clusters and manage various AI workloads, including Retrieval-Augmented Generation (RAG) tasks. This layer also handles networking, authentication, and role-based access control (RBAC) for deployed services. The Platform Management Center provides an administrative dashboard for all tenants across a ClearML deployment. It enables platform administrators to monitor tenant activity, usage, and costs. To begin using the ClearML, follow these steps: For detailed inst
Mentions (30d)
0
Reviews
0
Platforms
2
Sentiment
0%
0 positive
Features
Industry
information technology & services
Employees
54
Funding Stage
Venture (Round not Specified)
Total Funding
$11.0M
Pricing found: $0, $15, $0.1 / 1gb, $0.01/1mb, $1/100k
Started a video series on building an orchestration layer for LLM post-training [P]
Hi everyone! Context, motivation, a lot of yapping, feel free to skip to TL;DR. A while back I posted here asking [D] What framework do you use for RL post-training at scale?. Since then I've been working with verl, both professionally and on my own time. At first I wasn't trying to build anything new. I mostly wanted to understand veRL properly and have a better experience working with it. I started by updating its packaging to be more modern, use `pyproject.toml`, easily installable, remove unused dependencies, find a proper compatibility matrix especially since vllm and sglang sometimes conflict, remove transitive dependencies that were in the different requirements files etc. Then, I wanted to remove all the code I didn't care about from the codebase, everything related to HF/Nvidia related stuff (transformers for rollout, trl code, trtllm for rollout, megatron etc.), just because either they were inefficient or I didn't understand and not interested in. But I needed a way to confirm that what I'm doing was correct, and their testing is not properly done, so many bash files instead of pytest files, and I needed to separate tests that can run on CPU and that I can directly run of my laptop with tests that need GPU, then wrote a scheduler to maximize the utilization of "my" GPUs (well, on providers), and turned the bash tests into proper test files, had to make fixtures and handle Ray cleanup so that no context spills between tests etc. But, as I worked on it, I found more issues with it and wanted it to be better, until, it got to me that, the core of verl is its orchestration layer and single-controller pattern. And, imho, it's badly written, a lot of metaprogramming (nothing against it, but I don't think it was handled well), indirection and magic that made it difficult to trace what was actually happening. And, especially in a distributed framework, I think you would like a lot of immutability and clarity. So, I thought, let me refactor their orchestration layer. But I needed a clear mental model, like some kind of draft where I try to fix what was bothering me and iteratively make it better, and that's how I came to have a self-contained module for orchestration for LLM post-training workloads. But when I finished, I noticed my fork of verl was about 300 commits behind or more 💀 And on top of that, I noticed that people didn't care, they didn't even care about what framework they used let alone whether some parts of it were good or not, and let alone the orchestration layer. At the end of the day, these frameworks are targeted towards ML researchers and they care more about the correctness of the algos, maybe some will care about GPU utilization and whether they have good MFU or something, but those are rarer. And, I noticed that people just pointed out claude code or codex with the latest model and highest effort to a framework and asked it to make their experiment work. And, I don't blame them or anything, it's just that, those realizations made me think, what am I doing here? hahaha And I remembered that u/dhruvnigam93 suggested to me to document my journey through this, and I was thinking, ok maybe this can be worth it if I write a blog post about it, but how do I write a blog post about work that is mainly code, how do I explain the issues? But it stays abstract, you have to run code to show what works, what doesn't, what edge cases are hard to tackle etc. I was thinking, how do I take everything that went through my mind in making my codebase and why, into a blog post. Especially since I'm not used to writing blog post, I mean, I do a little bit but I do it mostly for myself and the writing is trash 😭 So I thought, maybe putting this into videos will be interesting. And also, it'll allow me to go through my codebase again and rethink it, and it does work hahaha as I was trying to make the next video a question came to my mind, how do I dispatch or split a batch of data across different DP shards in the most efficient way, not a simple split across the batch dimension because you might have a DP shard that has long sequences while other has small ones, so it has to take account sequence length. And I don't know why I didn't think about this initially so I'm trying to implement that, fortunately I tried to do a good job initially, especially in terms of where I place boundaries with respect to different systems in the codebase in such a way that modifying it is more or less easy. Anyways. The first two videos are up, I named the first one "The Orchestration Problem in RL Post-Training" and it's conceptual. I walk through the PPO pipeline, map the model roles to hardware, and explain the single-controller pattern. The second one I named "Ray Basics, Workers, and GPU Placement". This one is hands-on. I start from basic Ray tasks / actors, then build the worker layer: worker identity, mesh registry, and placement groups for guaranteed co-location. What I'm working on next is the dispat
View originalAnthropic Leaked 512,000 Lines of Claude Code Source. Here's What the Code Actually Reveals.
On March 31, 2026, Anthropic accidentally published a source map file in their npm package that contained the complete TypeScript source code of Claude Code — 1,900 files, 512,000+ lines of code, including internal prompts, tool definitions, 44 hidden feature flags, and roughly 50 unreleased commands. Developer comments were preserved. Operational data was exposed. A GitHub mirror hit 9,000 stars in under two hours. Anthropic issued DMCA takedowns affecting 8,100+ repository forks within days. This is a breakdown of what the source code actually reveals — not the drama, but the engineering. How the Leak Happened The culprit was a .map file — a source map artifact. Source maps contain a sourcesContent array that embeds the complete original source code as strings. The fix is trivial: exclude *.map from production builds or add them to .npmignore. This was the second incident — a similar leak occurred in February 2025. The operational complexity of shipping a tool at this scale appears to have outpaced DevOps discipline. The Architectural Picture The most technically honest takeaway from this leak is: the competitive moat in AI coding tools is not the model. It is the harness. Claude Code runs on Bun (not Node.js) — a performance decision. The terminal UI is built with React and Ink — a pragmatic choice allowing frontend engineers to use familiar component patterns. The tool system accounts for 29,000 lines of code just for base tool definitions. Tool schemas are cached for prompt efficiency. Tools are filtered by feature gates, user type, and environment flags. The multi-agent coordinator pattern is production-grade and visible in the code: parallel workers managed by a coordinator, XML-formatted task-notification messages, shared scratchpad directory for cross-agent knowledge transfer. This is exactly what developers building multi-agent systems today are trying to implement — and now there's a reference implementation to study. The YOLO permission system uses an ML classifier trained on transcript patterns to auto-approve low-risk operations — a production example of using a small fast model to gate a larger expensive one. The Unreleased Features Worth Understanding Three unreleased capabilities behind feature flags are architecturally significant: KAIROS is an always-on background agent that maintains append-only daily log files, watches for relevant events, and acts proactively with a 15-second blocking budget to avoid disrupting active workflows. Exclusive tools include SendUserFile, PushNotification, and SubscribePR. KAIROS is the clearest signal available about where AI assistants are heading: from reactive tools that wait for commands to persistent background companions that monitor and act on your behalf. This is not a Claude Code feature. This is a preview of the next generation of all AI assistants. ULTRAPLAN offloads complex planning to a remote Cloud Container Runtime using Opus 4.6 with 30-minute think time — far beyond any interactive session. A browser-based UI surfaces the plan for human approval. Results transfer via a special ULTRAPLAN_TELEPORT_LOCAL sentinel. This is async deep thinking as a product feature: separate the computationally expensive planning phase, run it at maximum model time, surface results for review. BUDDY is a Tamagotchi-style companion pet system: 18 species across 5 rarity tiers (Common 60%, Uncommon 25%, Rare 10%, Epic 4%, Legendary 1%), independent 1% shiny chance, procedural stats (Debugging Skill, Patience, Chaos, Wisdom, Snark), ASCII sprite rendering with animation frames. Uses the Mulberry32 deterministic PRNG for consistent pet generation. Beneath the novelty: this exercises session persistence, personality modeling, and companion UX — all capabilities Anthropic is building for more serious agent memory systems. The Anti-Distillation Contradiction The source code revealed a system designed to inject fake tool definitions into Claude Code's outputs to poison AI training data scraped from API traffic. The code comment explicitly states this measure is now "useless" — because the leak exposed its existence. This is the most intellectually interesting artifact in the entire codebase. The security mechanism depended entirely on secrecy, not technical robustness. Once the code was visible, the trick stopped working. The same applies to hidden feature flags, internal codenames, and internal roadmap references — many AI product security models are built on "if nobody sees the code, nobody can replicate it." That assumption is now broken. Claude Code's internal codename was also confirmed as "Tengu." The Code Quality Question Developer reactions to the code were mixed. Some described the architecture as underwhelming relative to the tool's capabilities. Others noted the detailed internal comments as useful context for understanding agent behavior. The frustration detection system, notably, uses a regex rather than an LLM inference call — likely for
View original[D] Why does it seem like open source materials on ML are incomplete? this is not enough...
Many times when I try to deeply understand a topic in machine learning — whether it's a new architecture, a quantization method, a full training pipeline, or simply reproducing someone’s experiment — I find that the available open source materials are clearly insufficient. Often I notice: Repositories lack complete code needed to reproduce the results Missing critical training details (datasets, hyperparameters, preprocessing steps, random seeds, etc.) Documentation is superficial or outdated Blog posts and tutorials only show the "happy path", while real edge cases, bugs, and production nuances are completely ignored This creates the feeling that open source in ML is mostly just "weights + basic inference code", rather than fully reproducible science or engineering. The only big exception I see is Andrej Karpathy — his repositories (like nanoGPT, llm.c, etc.) and YouTube lectures are exceptionally clean, educational, and go much deeper. But even he mostly focuses on one specific direction (LLM training from scratch and neural net fundamentals). What bothers me even more is that I don’t just want the code — I want to understand the logic and reasoning behind the decisions: why certain choices were made, what trade-offs were considered, what failed attempts happened along the way, and how the authors actually thought about the problem. Does anyone else feel the same way? In your opinion, what’s the main reason behind this widespread issue? Do companies and researchers deliberately hide important details (to protect competitive advantage or because the code is messy)? Does everything move so fast that no one has time (or incentive) to properly document their thought process? Is it the culture in the community — publishing for citations, hype, and leaderboard scores rather than true reproducibility and deep understanding? Or is it simply that “doing it properly (clean code + full reasoning) is hard, time-consuming, and expensive”? I’d really appreciate opinions from people who have been in the field for a while ,especially those working in industry or research. What’s your take on the underlying mindset and motivations? (Translated with ai, English is not my native language) submitted by /u/Kalli_animation [link] [comments]
View originalAI hype burst - yet powerful
I started building app (who nobody cares) a long time ago, and I was so impressed that I was just building, building building, without realizing the amount of bugs or lazy fallbacks, AI was producing. My experience was, I spend 3-5 building a full stack app, when completed, then next stage was 2-3 weeks debugging, only to get the full stack app running, then debugging continued. I created, agents, commands, skills to counter part the AI tendency to implement lazy fallbacks, fake information, hallucinations, etc.. but AI persistence on all of the mention issues is so strong, that I learned to leave with it and constantly try to spot these out as early as possible. I created a skill to run regular on my any of my codebase published on https://www.reddit.com/r/ClaudeAI/comments/1s1a9tp/i_built_a_codebase_review_skill_that_autodetects/ . This skill was built with a concept learn from ML models, for every bug identified, 3 agents spawn run separate validations and results are presented for a vote, then the decision is based on winning votes, minimizing hallucinations. I was happy to find that the skill was working and fixing lots of issues, however I then found out an article in claude about AI hallucination power, mentioning the capacity of AI to also identify non-existing bugs and introduce new bugs by fixing non existing bugs, oh dear! Can't find the link to the article, but If I find it again I'll share it. Next, I found another article about an experiment run by a claude developer, about harnessing design for long term running applications, which can be found on https://www.anthropic.com/engineering/harness-design-long-running-apps , this provided really good insights and concepts, including using Generative Adversarial Networks (GANs), and introducing the concept of context anxiety, which results on an expensive run, however a codebase less prompt to bugs (although not free). To get an understanding of cost, you can see below the table of running the prompt solo vs using the harness system described on the article. https://preview.redd.it/14ko9se5yrrg1.png?width=1038&format=png&auto=webp&s=5ba1ea533bd71bd67a126cd4b516d63e76380d7b I am now trying to generate a similar agentic system than the one described on the article, but adding some improvements, by addressing context management and leveraging the Generative Adversarial Networks (GANs) during design and implementation, and augmenting functionality, so it can generate the system from a more detailed high level functional specs, instead of short prompts so it can generate a more useful system after spending so many tokens. The system is not ready yet but I might share on GitHub if I get anywhere half decent. In conclusion, when I started working with AI I was so excited that I didn't realized of the level of hallucination AI has, then I started spending days and weeks fixing bugs on code, then I realized that bugs would never stop while realizing that all apps I was developing were only useful to gain experience, but other people with lots more AI understanding and experience and organizations investing on AI implementation can and will surpass any app I'll ever create, which is a bit demoralizing, but I still stick with it as I still can use it to build some personal projects and would keep me professionally relevant (I hope). Finally, I ended up on a state of feeling about AI where I realized that AI full power is yet to come and what we can see today is a really good picture of the capabilities AI will be able to provide, as AI companies are working hard to harness the silent failures and lazy fall back currently introduced during design and implementation. Has anybody experienced similar phases with AI learning curve? PS: This post has not been generated by AI, as it seems it is heavily punished by people, and it seems that auto moderators block post automatically when AI is detected, hopefully this one is not blocked. I apologize if grammar or spelling is not correct, or structure is not clear, but I hope this post does not get blocked and punished by other people for being AI generated because it is not. Credit to Prithvi Rajasekaran for writing the interesting article about Harness design for long-running application development. -> https://www.anthropic.com/engineering/harness-design-long-running-apps Happy Saturday everyone. submitted by /u/amragl [link] [comments]
View originalWhich AI skills/Tool are actually worth learning for the future?
Hi everyone, I’m feeling a bit overwhelmed by the whole AI space and would really appreciate some honest advice. I want to build an AI-related skill set over the next months that is: • future-proof • well-paid • actually in demand by companies • and potentially useful for freelancing or building my own business later Everywhere I look, I see terms like: AI automation, AI agents, prompt engineering, n8n, maker, Zapier, Claude Code, claude cowork, AI product manager, Agentic Ai, etc. My problem is that I don’t have a clear overview of what is truly valuable and what is mostly hype. About me: I’m more interested in business, e-commerce, systems, automation, product thinking, and strategy — not so much hardcore ML research. My questions: Which AI jobs, skills and Tools do you think will be the most valuable over the next 5–10 years? Which path would you recommend for someone like me? And what should I start learning first, so which skill and which Tool? Thanks a Lot! submitted by /u/RabbitExternal2874 [link] [comments]
View originalI built a 200+ article knowledge base that makes my AI agents actually useful — here's the architecture
Most AI agents are dumb. Not because the models are bad, but because they have no context. You give GPT-4 or Claude a task and it hallucinates because it doesn't know YOUR domain, YOUR tools, YOUR workflows. I spent the last few weeks building a structured knowledge base that turns generic LLM agents into domain experts. Here's what I learned. The problem with RAG as most people do it Everyone's doing RAG wrong. They dump PDFs into a vector DB, slap a similarity search on top, and wonder why the agent still gives garbage answers. The issue: - No query classification (every question gets the same retrieval pipeline) - No tiering (governance docs treated the same as blog posts) - No budget (agent context window stuffed with irrelevant chunks) - No self-healing (stale/broken docs stay broken forever) What I built instead A 4-tier KB pipeline: Governance tier — Always loaded. Agent identity, policies, rules. Non-negotiable context. Agent tier — Per-agent docs. Lucy (voice agent) gets call handling docs. Binky (CRO) gets conversion docs. Not everyone gets everything. Relevant tier — Dynamic per-query. Title/body matching, max 5 docs, 12K char budget per doc. Wiki tier — 200+ reference articles searchable via filesystem bridge. AI history, tool definitions, workflow patterns, platform comparisons. The query classifier is the secret weapon Before any retrieval happens, a regex-based classifier decides HOW MUCH context the question needs: - DIRECT — "Summarize this text" → No KB needed. Just do it. - SKILL_ONLY — "Write me a tweet" → Agent's skill doc is enough. - HOT_CACHE — "Who handles billing?" → Governance + agent docs from memory cache. - FULL_RAG — "Compare n8n vs Zapier pricing" → Full vector search + wiki bridge. This alone cut my token costs ~40% because most questions DON'T need full RAG. The KB structure Each article follows the same format: - Clear title with scope - Practical content (tables, code examples, decision frameworks) - 2+ cited sources (real URLs, not hallucinated) - 5 image reference descriptions - 2 video references I organized into domains: - AI/ML foundations (18 articles) — history, transformers, embeddings, agents - Tooling (16 articles) — definitions, security, taxonomy, error handling, audit - Workflows (18 articles) — types, platforms, cost analysis, HIL patterns - Image gen (115 files) — 16 providers, comparisons, prompt frameworks - Video gen (109 files) — treatments, pipelines, platform guides - Support (60 articles) — customer help center content Self-healing I built an eval system that scores KB health (0-100) and auto-heals issues: - Missing embeddings → re-embed - Stale content → flag for refresh - Broken references → repair or remove - Score dropped from 71 to 89 after first heal pass What changed Before the KB: agents would hallucinate tool definitions, make up pricing, give generic workflow advice. After: agents cite specific docs, give accurate platform comparisons with real pricing, and know when to say "I don't have current data on that." The difference isn't the model. It's the context. Key takeaways if you're building something similar: Classify before you retrieve. Not every question needs RAG. Budget your context window. 60K chars total, hard cap per doc. Don't stuff. Structure beats volume. 200 well-organized articles > 10,000 random chunks. Self-healing isn't optional. KBs decay. Build monitoring from day one. Write for agents, not humans. Tables > paragraphs. Decision frameworks > prose. Concrete examples > abstract explanations. Happy to answer questions about the architecture or share specific patterns that worked. submitted by /u/Buffaloherde [link] [comments]
View originalKarpathy's autoresearch applied to debugging – two open-source skills
karpathy's autoresearch runs an AI agent in a loop: modify one file, measure one metric, keep or discard, git checkpoint, repeat. you sleep, it runs 100 experiments overnight. the thing that stuck with me wasn't the ML application - it was why the loop is safe to run unattended. four constraints: one file (bounded scope), one metric (deterministic decision), time-boxed experiments (can't get lost), git checkpoint (always reversible). remove any one and you need supervision. keep all four and you can walk away. i realized the same pattern works for debugging. the normal way you debug a silent failure: fix the first thing that looks wrong, discover it wasn't the real cause, fix the next layer, repeat. hours chasing symptoms without reaching the bottom. so i built two claude code skills that apply karpathy's loop to bug fixing: /rootcause - autonomous diagnosis. describe a symptom ("pipeline processed 1000 photos, found zero faces, no error"). it generates hypotheses ranked by probability, investigates the most likely one, confirms or eliminates it, narrows, repeats. max 10 rounds. read-only - never touches your code. i pointed it at a face detection bug. six rounds, found a timeout silently killing the process. i didn't read a single file myself. /autofix - autonomous fix-and-verify. takes a root cause, designs a fix, writes validation tests before the fix (so the tests stay honest), applies it, runs the tests. if they fail, it re-diagnoses and tries a different approach. max 3 cycles. all on a temporary git branch - if nothing works, your code is exactly where it was. they chain: rootcause finds the problem, autofix ships the fix. describe a symptom, walk away, come back to a verified fix or a clear report of what was tried. the constraints are the feature. one change at a time, one metric, git as the undo button. the agent has a narrow lane and a clear feedback signal - that's what makes it safe to run without watching. open-sourcing both: /rootcause: https://github.com/ecstatic-pirate/rootcause /autofix: https://github.com/ecstatic-pirate/autofix copy SKILL.md to ~/.claude/skills/{name}/SKILL.md and they work as slash commands. submitted by /u/Thin-Currency9867 [link] [comments]
View originalRepository Audit Available
Deep analysis of allegroai/clearml — architecture, costs, security, dependencies & more
Yes, ClearML offers a free tier. Pricing found: $0, $15, $0.1 / 1gb, $0.01/1mb, $1/100k
Key features include: Join 2,100+ forward-thinking organizations worldwide using ClearML, Control, Streamline, Simplify Kubernetes and cloud deployment for hassle-free resource consumption, Maximize ROI, Optimize Resources, Simplify Operations.
Based on user reviews and social mentions, the most common pain points are: token cost.
Based on 12 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.