Convert any URL to Markdown for better grounding LLMs.
User mentions and discussions suggest that "Jina Reader" excels in cutting-edge embedding compression techniques and providing high-quality multilingual embeddings, making it well-regarded in the field of model efficiency and performance. Users commend its ability to handle on-device and browser-based tasks efficiently with smaller model sizes. However, there are minor concerns about potential data reconstruction from embeddings, which might pose privacy or security questions. Sentiment around pricing seems neutral, as most social mentions emphasize technical features and improvements over costs. Overall, Jina Reader has a strong reputation for innovation and technical performance in its domain.
Mentions (30d)
0
Reviews
0
Platforms
3
Sentiment
19%
5 positive
User mentions and discussions suggest that "Jina Reader" excels in cutting-edge embedding compression techniques and providing high-quality multilingual embeddings, making it well-regarded in the field of model efficiency and performance. Users commend its ability to handle on-device and browser-based tasks efficiently with smaller model sizes. However, there are minor concerns about potential data reconstruction from embeddings, which might pose privacy or security questions. Sentiment around pricing seems neutral, as most social mentions emphasize technical features and improvements over costs. Overall, Jina Reader has a strong reputation for innovation and technical performance in its domain.
Features
Use Cases
Industry
information technology & services
Employees
43
Funding Stage
Merger / Acquisition
Total Funding
$32.0M
Convert your embeddings to spherical coordinates before compression - this trick cuts embedding storage from 240 GB to 160 GB, and 25% better than the best lossless baseline. Reconstruction is near-lo
Convert your embeddings to spherical coordinates before compression - this trick cuts embedding storage from 240 GB to 160 GB, and 25% better than the best lossless baseline. Reconstruction is near-lossless as the error stays below float32 machine epsilon - so retrieval quality is preserved perfectly. Works across text, image, and multi-vector embeddings. No training, no codebooks.
View originalAnthropic just confirmed why 90% of non-coding AI agents fail in production
Anthropic recently published an incredibly deep breakdown analyzing millions of real human-agent tool calls across their public API, and they shared a breakdown of where these agents are being deployed. They said “Software engineering makes up roughly 50% of all agentic activity on their platform”. Everything else: sales, marketing, finance, legal is sitting down in the single digits. A lot of the initial commentary around this has been along the lines of: "Oh, look, AI agents only work for coding. They haven't cracked the rest of the enterprise yet." But if you’ve tried to build and deploy an autonomous agent in a non-coding environment, you know that is the wrong conclusion. The models are more than capable but the real problem is that software engineering data is clean, while real-world business data is a horrific and unorganized. Think about it: Why Coding is Easy for Agents: Code lives in structured Git repo. It follows strict syntax rules, has clear docs and runs inside deterministic terminals. If an agent breaks something, the compiler throws a clean error message telling it exactly what went wrong. Why the Rest of the World is Hard: A sales or marketing agent doesn’t get a clean github repo instead you’re constantly dealing with changing information like competitor pricing and badly formatted data. When a non-coding agent fails, it’s almost never because the model lost its ability to reason but cause it gets choked out by unstructured web data that fills up its context window with thousands of useless tags and tracking scripts until it hallucinates. The developers getting agents to work in those low-percentage brackets on Anthropic's chart (like automated market research or live CRM routing) are usually spending most of their time on the boring infra work behind the scenes such as clean inputs, reliable scraping and that’s the part that really makes the difference. If you look at a modern, high-reliability agent stack outside of coding, it usually relies on three things: The Core Reasoner: Something fast with a massive context window like Claude Sonnet to handle the logic. Data Hygiene at the Gateway: Instead of letting the agent scrape raw web URLs directly (which triggers bot blocks and inputs HTML that will need to be revised), developers feed the internet data through dedicated markdown converters with tools like Firecrawl or Jina Reader are pretty standard here and the agent gets pure text, saving token costs and preventing hallucinations. The Guardrail Layer: Traditional code hooks or rules engines that check the agent’s output before it executes an irreversible action (like sending an email or updating a database record). The low adoption numbers in the rest of the enterprise doesn’t mean agents are overhyped. In most industries, the surrounding tooling just still kind of sucks so once the data side gets more reliable, you’ll probably see adoption spread a lot faster outside engineering What are your thoughts on this? For those building agents in finance, marketing, or operations, I would love to get your thoughts here! submitted by /u/Loud-Campaign-6312 [link] [comments]
View originalI measured my Claude Code MCP stack on two axes — byte savings AND cache-friendliness. My "best" byte-saver was defeating Anthropic's prompt cache (counter-example + open benchmark)
TL;DR — Single-axis benchmarks for MCPs, compressors, and retrieval layers can recommend a system that's strictly worse in production. The missing axis: cache-friendliness — whether the same input produces byte-identical bytes across runs, so Anthropic's prompt cache hits. In my coding-agent stack, my biggest byte-saver (retrieval MCP, 60–70% reduction) was defeating the 5-min TTL prompt cache on every call. Two runs of the same query produced different bytes because of rg --files-with-matches output order leaking through a Map insertion sequence into the final context. The fix was 2 lines: sort the rg hits before slicing, sort the Map entries by path. Byte savings unchanged, cache_friendly_score went from ~0% to 100%. https://preview.redd.it/x5foipotq93h1.png?width=1600&format=png&auto=webp&s=c0930422e882e23d1fc34ded25934c74db692a21 Article + open benchmark harness: Article: https://gregshevchenko.com/research/mcp-stack-token-economy/ Harness (stdlib-only Python, offline): https://github.com/g-shevchenko/mcp-token-savers — see methods/ for formal definitions, cluster-bootstrap CIs, Wilson CIs, preregistration, real-data Cohen's κ. What the harness measures: mean_ratio + CV across N≥5 runs per fixture → byte-saving axis unique_md5_count == 1 check → cache-friendliness axis (0–100%) 12-anti-pattern audit on tool definitions (DSA reference) What named alternatives publicly disclose: I surveyed the public docs for Cursor codebase index, Sourcegraph Cody, Aider repo-map, Microsoft LLMLingua / LLMLingua-2, Firecrawl / Jina Reader, RouteLLM / Martian (May 2026). https://preview.redd.it/ailemo1wq93h1.png?width=1600&format=png&auto=webp&s=4732f5d03f53ba95d2b5aaac0c7f21f1858a36a4 Limitations: I hypothesized that the prep layer triggers more downstream cache hits on subsequent turns. It didn't reach significance: Welch p=0.32, Cohen's d ≈ 0.18, N=137. Two-judge Cohen's κ on the corpus (cerebras-llama × groq-llama, N=25): κ = 0.5955 (moderate, below the 0.7 substantial threshold). 4 of 5 inter-judge disagreements concentrate on one task with an ambiguous acceptance criterion. Sharpening the spec would push κ to ~0.83. Disclosure: I'm the author. No commercial affiliation with the listed tools. The harness is MIT-licensed and takes any compressor as (str) -> str. Curious what cache_friendly_score looks like on others' Claude Code stacks. submitted by /u/Level_Credit1535 [link] [comments]
View originalHow to save 80% on your claude bill with better context
been building web apps with claude lately and those token limits have honestly started hitting me too. i’m using claude 4.6 sonnet for a research tool, but feeding it raw web data was absolutely nuking my limits. I’m putting together the stuff that actually worked for me to save tokens and keep the bill down: switch to markdown first. stop sending raw html. use tools like firecrawl to strip out the nested divs and script junk so you only pay for the actual text. don't let your prompt cache go cold. anthropic’s prompt caching is a huge relief, but it only works if your data is consistent. watch out for the 200k token "premium" jump. anthropic now charges nearly double for inputs over 200k tokens on the new opus/sonnet 4.6 models. keep your context under that limit to avoid the surcharge strip the nav and footer. the website’s "about us" and "careers" links in the footer are just burning your money every time you hit send. use jina reader for quick hits. for simple single-page reads, jina is a great way to get a clean text version without the crawler bloat. truncate your context. if a documentation page is 20k words, just take the first 5k. most of the "meat" is usually at the top anyway. clean your data with unstructured if you are dealing with messy pdfs alongside web data, this helps turn the chaos into a clean schema claude actually understands. map before you crawl. don't scrape every subpage blindly. i use the map feature in firecrawl to find the specific documentation urls that actually matter for your prompt, if you use another tool, prefer doing this. use haiku for the "trash" work. use claude 4.5 haiku to summarize or filter data before feeding it into the expensive models like opus. use smart chunking. use llama-index to break your data into semantic chunks so you only retrieve the exact paragraph the ai needs for that specific prompt. cap your "extended thinking" depth. for opus 4.6, set thinking: {type: "adaptive"} with effort: "low" or "medium". the old budget_tokens param is deprecated on 4.6. thinking tokens are billed at the output rate, so if you leave effort on high, claude thinks hard on every single reply including the simple ones and your bill will hurt. set hard usage limits. set your spending tiers in the anthropic console so a buggy loop doesn't drain your bank account while you're asleep. feel free to roast my setup or add better tips if you have them submitted by /u/No-Writing-334 [link] [comments]
View original@ChiragCX Oh man we thought Skills were the dead ones
@ChiragCX Oh man we thought Skills were the dead ones
View originalOur official CLI for agents https://t.co/XLhRvLRuDc https://t.co/wFtN8i9YcA
Our official CLI for agents https://t.co/XLhRvLRuDc https://t.co/wFtN8i9YcA
View originalThe trend toward smaller embeddings is a shift. On-device retrieval, browser-based search, and edge deployment all demand models that fit in constrained memory budgets. Learn more about Small & Nano b
The trend toward smaller embeddings is a shift. On-device retrieval, browser-based search, and edge deployment all demand models that fit in constrained memory budgets. Learn more about Small & Nano below: - blog post: https://t.co/M8RJp2pczh - 🤗 weights including GGUFs and MLX: https://t.co/IwpUK9SzAV - arXiv: https://t.co/AsTenf1XDt
View originalv5-text uses decoder-only backbones with last-token pooling instead of mean pooling. Four lightweight LoRA adapters are injected at each transformer layer, handling retrieval, text-matching, classific
v5-text uses decoder-only backbones with last-token pooling instead of mean pooling. Four lightweight LoRA adapters are injected at each transformer layer, handling retrieval, text-matching, classification, and clustering independently. Users select the appropriate adapter at inference time. For retrieval, queries get a "Query:" prefix and documents get "Document:". Context length is 32K tokens, a 4x increase over v3.
View originalMMTEB (131 multilingual tasks): v5-small (677M) hits 67.0, next best sub-1B is 64.3. +2.7pt gap. MTEB English (41 tasks): v5-small leads at 71.7. v5-nano (239M) scores 71.0 -- matching models 2x its s
MMTEB (131 multilingual tasks): v5-small (677M) hits 67.0, next best sub-1B is 64.3. +2.7pt gap. MTEB English (41 tasks): v5-small leads at 71.7. v5-nano (239M) scores 71.0 -- matching models 2x its size. Retrieval (5 benchmarks): v5-small at 63.28 matches v4 (3.8B) while being 5.6x smaller. The nano model at 239M params has no peer in its weight class.
View originaljina-embeddings-v5-text is here! Our fifth generation of jina embeddings, pushing the quality-efficiency frontier for sub-1B multilingual embeddings. Two versions: small & nano, available today o
jina-embeddings-v5-text is here! Our fifth generation of jina embeddings, pushing the quality-efficiency frontier for sub-1B multilingual embeddings. Two versions: small & nano, available today on Elastic Inference Service, vLLM, GGUF and MLX. https://t.co/68GGuBRdy4
View original@tmztmobile It will be a lossy compression, like impressionist lossy
@tmztmobile It will be a lossy compression, like impressionist lossy
View originalCheck out the live demo https://t.co/W1EXpDFCAL and see it in action. Our read our repo and paper for more technical details on training and decoding.
Check out the live demo https://t.co/W1EXpDFCAL and see it in action. Our read our repo and paper for more technical details on training and decoding.
View originalText embeddings are widely assumed to be safe, irreversible representations. We show we can reconstruct the original text using conditional masked diffusion. Existing inversions (Vec2Text, ALGEN, Zer
Text embeddings are widely assumed to be safe, irreversible representations. We show we can reconstruct the original text using conditional masked diffusion. Existing inversions (Vec2Text, ALGEN, Zero2Text) generate tokens autoregressively and require iterative re-embedding through the target encoder. We take a different approach: embedding inversion as conditional masked diffusion. Starting from a fully masked sequence, a denoising model reveals tokens at all positions in parallel, conditioned on the target embedding via adaptive layer normalization (AdaLN-Zero). Each denoising step refines all positions simultaneously using global context, without ever re-embedding the current hypothesis.
View originalMost don't know (1) how easy it is to invert embedding vectors back into sentences, (2) this is a perfect task text diffusion models. Here's a 78M parameter model and live demo that recovers 80% of to
Most don't know (1) how easy it is to invert embedding vectors back into sentences, (2) this is a perfect task text diffusion models. Here's a 78M parameter model and live demo that recovers 80% of tokens from Qwen3-Embedding and EmbeddingGemma vectors. Works even on multilingual input.
View original@Prince_Canuma @liquidai @deepseek_ai @Alibaba_Qwen @allen_ai @TencentHunyuan @PaddlePaddle 🔥
@Prince_Canuma @liquidai @deepseek_ai @Alibaba_Qwen @allen_ai @TencentHunyuan @PaddlePaddle 🔥
View originalYes, Jina Reader offers a free tier. The pricing model is freemium + tiered.
Key features include: URL to Markdown conversion, Supports multiple content types (HTML, PDF, etc.), Automatic extraction of images and links, Customizable Markdown templates, Batch processing for multiple URLs, Integration with popular LLMs for enhanced grounding, User-friendly API for developers, Real-time content updates.
Jina Reader is commonly used for: Creating documentation from web content, Generating blog posts from articles, Enhancing data for AI training models, Building knowledge bases from online resources, Converting research papers into Markdown format, Facilitating content migration to Markdown-based platforms.
Jina Reader integrates with: OpenAI API, Hugging Face Transformers, Slack for team collaboration, GitHub for version control, Zapier for automation workflows, Notion for content management, WordPress for blog publishing, Google Drive for file storage, Microsoft Teams for communication, Trello for project management.
Based on user reviews and social mentions, the most common pain points are: token cost.
Based on 27 social mentions analyzed, 19% of sentiment is positive, 81% neutral, and 0% negative.