Zerox
OCR & Document Extraction using vision models. Contribute to getomni-ai/zerox development by creating an account on GitHub.
A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense! Zerox is available as both a Node and Python package. (Node.js SDK - supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, Google Gemini, etc.) The maintainFormat option tries to return the markdown in a consistent format by passing the output of a prior page in as additional context for the next page. This requires the requests to run synchronously, so it's a lot slower. But valuable if your documents have a lot of tabular data, or frequently have tables that cross pages. Zerox supports structured data extraction from documents using a schema. This allows you to pull specific information from documents in a structured format instead of getting the full markdown conversion. Use extractPerPage to extract data per page instead of from the whole document at once. Zerox supports a wide range of models across different providers: (Python SDK - supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, etc.) The pyzerox.zerox function is an asynchronous API that performs OCR (Optical Character Recognition) to markdown using vision models. It processes PDF files and converts them into markdown format. Make sure to set up the environment variables for the model and the model provider before using this API. Refer to the LiteLLM Documentation for setting up the environment and passing the correct model name. Note the output is manually wrapped for this documentation for better readability. This project is licensed under the MIT License. OCR Document Extraction using vision models There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page.
Dagster
Dagster is the data orchestrator platform that helps you build, schedule, and monitor reliable data pipelines - fast, flexible, and built for teams.
Dagster Labs is the organization behind Dagster, the open-source project, and Dagster Cloud. We’re a small, well-funded, and collegial team with a proven track record of shipping open-source software with global adoption. We are fortunate to be able to partner with some of the best venture capital investors in the business. We are a team that is intrinsically driven and executes with fierce urgency. We think big, aim high and are here to be the best at what we do. We value grit, resilience, and are able to persevere to get to the best outcome. We play to win and we do not mistake motion for progress, striving to quickly focus in on what really matters and avoid work about work We hold ourselves to high standards and trust each other to do the same. We do not believe that quality and velocity are at odds with each other, and taking our craft seriously means we can move fast with excellence. We we do what we say we’re going to do. We work from first principles and solve fundamental problems. We provide continuous, direct, and thoughtful feedback to one another in order to improve. When failures happen, we learn from them as an opportunity to improve our future outcomes. Our workplace should reflect the full diversity of interests, backgrounds, and ideas of all of our employees. We invest in creating experiences to foster meaningful connections and encourage everyone to connect genuinely with colleagues. Building is hard and we believe it will be more sustainable, and we will have more fun when we engage authentically and inject some levity into our daily interactions. We optimize for the group, the company, and not just for the individual. We have a mutual responsibility to support one another to succeed and multiply our impact beyond the sum of our individual parts. We sometimes put aside the work that’s most important within our focus area to help with higher-priority work in other areas. We empower people to have sufficient context across the company to be able to work cross-functionally. We sometimes operate outside of our defined responsibility and never say that something is “not our job”. We act as owners, roll our sleeves up to pitch in, and fix problems and gaps that we see. We started off as an OSS project - our community has been with us the entire journey and they are the reason Dagster Labs exists. The developer experience at Dagster Labs is everyone’s responsibility. We are dedicated to doing everything we can to improve their experience working with data platforms. This means that everyone is invested in our community, their success and their sentiment towards our products. Nick is the founder of Dagster Labs. Prior to that, he was a Principal Engineer and Director at Facebook between 2009-17, where he founded the Product Infrastructure team and co-created GraphQL. Pete previously led teams at Twitter, co-founded Smyte, and was a member of the early React team at Facebook. Yuhan was a senior software engineer and tech lead o
Zerox
Dagster
Zerox
Pricing found: $50.10, $48.71, $48.71, $48.71, $9.74
Dagster
Pricing found: $10, $100, $120, $1200, $.005
Dagster (1)
Only in Zerox (10)
Only in Dagster (10)
Zerox
Dagster