The announcement stated, “enterprises today work across a fragmented landscape of document formats, including PDFs, JPEGs, and other file types built primarily for human consumption rather than AI interpretation.”
As organizations increasingly rely on generative AI and agentic systems, it said, “this disconnect can introduce complexity, raise costs, and reduce reliability when extracting meaning from business documents.”
Mark Collier, executive director of LF AI & Data, said the goal of the DocLang Specification Working Group is to “develop a vendor-neutral, interoperable standard that helps organizations prepare document data for AI more reliably, transparently, and at scale.”
To that end, an information document released by the group stated, “PDF was built for print, DOCX was built for editors. DocLang is built for what comes next, a machine-readable document standard your models can actually trust.”
Read the full article here

