PDF Forms Explained: AcroForms vs XFA

Q: How do I tell if a PDF form is AcroForm or XFA without opening it?

Run qpdf --qdf input.pdf - and search for /XFA in the catalog. If you find it, you have an XFA form (or a hybrid). If you only find /AcroForm with /Fields, it's a pure AcroForm. In Chrome or Preview, AcroForms are clickable and editable; XFA shows a "please open in Adobe Reader" placeholder. Programmatic detection is one line in pdf-lib or PyPDF2.

Q: Can I sign an AcroForm field digitally?

Yes — AcroForms support a /FT /Sig field type for digital signatures. The signature wraps a content range of the PDF in a cryptographic hash, signed with the user's private key, and embedded back in the form field. Adobe Reader, Foxit, and most enterprise tools support this; pdf-lib and PyPDF2 can read but generally not create signature fields. PAdES (PDF Advanced Electronic Signatures) is the EU-recognized standard.

You download a PDF tax form, open it in your browser's built-in viewer, and the fields just work — click, type, save. The next form you open shows a stern message: "Please open this with Adobe Acrobat Reader." Same file extension, same icon, completely different behavior. The reason is that "PDF form" actually refers to two unrelated technologies stuffed inside the same .pdf container, and only one of them is universally supported.

This is the AcroForm vs XFA split. Knowing which one you have changes how you fill, flatten, extract, and automate forms. Here is what each is, why XFA exists at all, and why it is on its way out.

The Two PDF Form Worlds

PDF has had interactive forms since version 1.2 (1996). Adobe called this technology AcroForm — short for "Acrobat Form." Fields are stored as PDF objects (text fields, checkboxes, radio buttons, dropdowns, signatures) attached directly to pages. Every PDF reader on the planet — Preview, Chrome, Firefox, mobile browsers, server-side libraries — supports AcroForms.

In 2003, Adobe acquired Accelio and inherited a separate form technology called XFA (XML Forms Architecture). XFA is XML-based, supports dynamic layouts, JavaScript-driven validation, and server round-trips. Adobe wedged it into PDF starting with PDF 1.5 by adding an optional /XFA entry. The PDF wrapper became, in effect, a delivery vehicle for an XML form that only Adobe's full Reader could render.

The catch: ISO 32000-2 (PDF 2.0, published 2017) deprecates XFA. It is no longer part of the official PDF standard. Browsers never supported it. Modern Adobe Reader still does, but Adobe themselves has been signaling its end for years.

flowchart TB
  CAT[/PDF Catalog/]
  CAT --> AF[/AcroForm dictionary/]
  AF --> F[/Fields array/]
  F --> W1[Widget annot<br/>FT=Tx text]
  F --> W2[Widget annot<br/>FT=Btn checkbox]
  F --> W3[Widget annot<br/>FT=Ch dropdown]
  F --> W4[Widget annot<br/>FT=Sig signature]
  AF --> XFA[/XFA optional/]
  XFA --> XML1[preamble XML]
  XFA --> XML2[config XML]
  XFA --> XML3[template XML<br/>layout + bindings]
  XFA --> XML4[datasets XML]
  XFA --> XML5[form XML]

What an AcroForm Looks Like Inside the PDF

If you crack open a PDF with AcroForms in a hex editor or a tool like qpdf --qdf, you will see something like this in the catalog:

1 0 obj
<<
  /Type /Catalog
  /Pages 2 0 R
  /AcroForm <<
    /Fields [10 0 R 11 0 R 12 0 R]
    /NeedAppearances true
  >>
>>
endobj

Each field is its own object. A text field looks roughly like:

10 0 obj
<<
  /Type /Annot
  /Subtype /Widget
  /FT /Tx
  /T (full_name)
  /V (Jane Doe)
  /Rect [72 700 300 720]
  /P 3 0 R
>>
endobj

Plain dictionary entries. /FT /Tx means "field type: text." /T is the field name, /V is the current value. Checkboxes use /FT /Btn with /V /Yes or /V /Off. Dropdowns use /FT /Ch. The whole field model is a flat list of widgets that any reader can walk and render.

That simplicity is why pdf-lib, PyPDF2, iText, PDFBox, and Apple's PDFKit all handle AcroForms cleanly.

What an XFA Form Looks Like Inside the PDF

XFA hijacks the same /AcroForm slot but stuffs an XML payload into it:

1 0 obj
<<
  /Type /Catalog
  /AcroForm <<
    /Fields []
    /XFA [
      (preamble) 20 0 R
      (config) 21 0 R
      (template) 22 0 R
      (datasets) 23 0 R
      (form) 24 0 R
    ]
  >>
>>
endobj

/Fields is empty. The actual form lives in those XML streams. The template stream contains the layout in XFA's own dialect:

<template xmlns="http://www.xfa.org/schema/xfa-template/3.6/">
  <subform name="form1" layout="tb">
    <field name="full_name">
      <ui><textEdit/></ui>
      <bind match="dataRef" ref="$record.fullName"/>
    </field>
  </subform>
</template>

Notice what is happening here: the form is no longer a list of widgets at fixed coordinates. It is a tree of subforms with binding expressions, scripts, and a layout engine. To render it, your reader needs an XFA processor — basically a mini browser. That is why only Adobe Acrobat (and a couple of expensive enterprise products) can open these forms correctly.

There are two flavors of XFA: static (looks like a regular PDF, fixed pages) and dynamic (pages grow and shrink as you fill in repeating sections, like adding rows for dependents on a tax form). Dynamic XFA is the one that absolutely refuses to render anywhere except Adobe Reader.

How to Tell Which Type You Have

Open the PDF in Chrome or Preview. If the fields are clickable and you can type into them, it is an AcroForm. If you see a gray placeholder page that says "To view the full contents of this document, you need a later version of the PDF viewer," it is XFA.

Programmatic detection is straightforward — check the catalog:

import { PDFDocument } from 'pdf-lib';

const bytes = await fs.promises.readFile('form.pdf');
const pdf = await PDFDocument.load(bytes);
const acroForm = pdf.catalog.lookup(PDFName.of('AcroForm'));
const xfa = acroForm?.lookup(PDFName.of('XFA'));

if (xfa) {
  console.log('XFA form — limited tooling support');
} else if (acroForm) {
  console.log('AcroForm — work normally');
} else {
  console.log('No interactive fields');
}

Many "hybrid" forms include both: a basic AcroForm fallback plus a richer XFA layer. Adobe Reader uses the XFA layer, everything else falls back to AcroForm. If you see both, prefer the AcroForm fields — they are what 95% of your users will actually interact with.

flowchart TB
  IN[Incoming PDF] --> CHK{Inspect catalog}
  CHK -- /AcroForm + /Fields --> AF[AcroForm path]
  CHK -- /AcroForm + /XFA --> HY[Hybrid form]
  CHK -- /AcroForm with /XFA only --> XF[Pure XFA]
  CHK -- neither --> NO[No fields]
  AF --> WORK[Fill / read / flatten<br/>with pdf-lib or PyPDF2]
  HY --> AF
  XF --> ROUTE{Route}
  ROUTE -- "convert" --> CONV[XFA -> AcroForm<br/>via qpdf / commercial]
  CONV --> WORK
  ROUTE -- "human" --> ADOBE[Adobe Reader workflow]
  WORK --> OUT[Flattened PDF]

Why XFA Was Created (and Why It Failed)

XFA solved real problems. Government agencies and financial institutions wanted forms that could:

Grow dynamically — add a row for each dependent, expand to a third page if needed
Validate complex business rules — Social Security number checksums, cross-field math, conditional sections
Round-trip data — submit XML to a backend, receive prefilled XML back
Embed scripted logic — JavaScript and FormCalc expressions

AcroForms could not do any of that natively in 2003. XFA could. So tax authorities, insurers, and banks adopted it heavily.

Then the web caught up. Browser-based forms with HTML, JavaScript, and a backend API do everything XFA does, with no proprietary runtime, no Adobe Reader install, and no vendor lock-in. Mobile browsers never bothered to implement XFA. Apple's PDFKit ignores it. Chromium ignores it. Even Adobe's own modern web-based Acrobat is steering customers toward AcroForms or pure HTML forms.

ISO 32000-2 formalized the divorce in 2017 by removing XFA from the spec. New PDFs should not use it. Old XFA forms still exist by the millions, and they will keep existing for decades — government bureaucracy moves slowly — but the platform is frozen.

Filling Forms Programmatically

For AcroForms, the developer experience is excellent. Here is a complete fill-and-flatten using pdf-lib:

import { PDFDocument } from 'pdf-lib';

const pdfBytes = await fs.promises.readFile('input.pdf');
const pdf = await PDFDocument.load(pdfBytes);
const form = pdf.getForm();

form.getTextField('full_name').setText('Jane Doe');
form.getCheckBox('subscribe').check();
form.getDropdown('country').select('US');

form.flatten();

const out = await pdf.save();
await fs.promises.writeFile('output.pdf', out);

flatten() bakes the field values into the page content stream and removes the interactive widgets. The resulting PDF is no longer editable, which is what you want for archival, signed contracts, or anything you mail to a regulator.

For XFA, the picture is grim. pdf-lib cannot fill XFA fields. iText supports XFA in its commercial edition. PDFBox has partial support. The realistic options for handling XFA at scale:

Convert XFA to AcroForm using a server-side library (qpdf with --remove-unreferenced-resources, or a tool like ABCpdf). You lose the dynamic layout but gain universal compatibility.
Render and re-fill — render the XFA form to a flat PDF, then add new AcroForm widgets on top. Crude but works.
Replace it with an HTML form and generate a flat PDF from the submission. This is what most modern workflows actually do.

If you need a no-install way to drop values into an AcroForm right now, our PDF Form Filler runs the same pdf-lib pipeline in your browser — upload, fill, flatten, download. To pull data out of either kind of form, the PDF Text Extractor handles the text content, and the PDF Page Count and Metadata tool surfaces the catalog entries that tell you which form type is in play.

Security Gotchas Specific to Forms

Forms add a few attack surfaces that plain PDFs do not have.

JavaScript actions. AcroForm fields can have /AA (additional actions) that run JavaScript on focus, blur, value change, or submission. Most readers prompt before executing or block it entirely, but legacy enterprise readers may run it silently. Strip these on import if you accept user-uploaded forms.

Submit URLs. Both AcroForm and XFA support /SubmitForm actions that POST data to a URL when the user clicks a button. This can be used for phishing — a "Confirm your address" button that exfiltrates filled data to attacker-controlled servers. Inspect the catalog for /SubmitForm and /URI actions before trusting any uploaded form.

XFA external entities. XFA is XML, and older XFA processors are vulnerable to XXE (XML External Entity) attacks. If you parse XFA streams server-side, disable external entity resolution.

Embedded form values containing PII. When you flatten a form for distribution, double-check that fields containing Social Security numbers, signatures, or other sensitive data are properly redacted — flatten() keeps the visible text. If you need to remove visible PII from a flattened PDF, the PDF Redact Text tool does proper content-stream redaction rather than the "black rectangle on top" approach that leaks data on copy-paste. For visible "DRAFT" or "CONFIDENTIAL" overlays without touching the field data underneath, the PDF Watermark tool applies stamps to a flattened copy.

What This Means in Practice

If you are building anything that touches PDF forms, the practical advice is short:

Generate AcroForms, never XFA. Every modern PDF library produces AcroForms by default. Do not go out of your way to use XFA; you are creating a future migration problem.
For incoming forms, detect the type first. Branch your processing logic. AcroForm path is cheap and universal. XFA path either converts to AcroForm or routes to a human.
Flatten before archiving. Once a form is filled and signed off, flatten it. Editable forms in archives are a compliance liability and a footgun.
Trust the spec. ISO 32000-2 deprecated XFA for a reason. The web ecosystem has voted with its feet. Build for AcroForm.
Compress before delivery. Flattened forms with embedded fonts can balloon past 5 MB. Run them through the PDF Compress tool before emailing them around.

The good news is that for anything you control end-to-end, AcroForms cover every case modern users care about. The XFA forms you encounter will mostly be legacy government and enterprise paperwork — handle them defensively, convert when you can, and do not adopt the format for anything new.

For deeper reading, the ISO 32000-2 PDF specification is the authoritative source on field structures. The PDF Association publishes accessible guides on form best practices, and Adobe's XFA reference remains the canonical XFA documentation despite its deprecated status. For a higher-level overview of PDF form mechanics, the Wikipedia PDF Forms section is a reasonable starting point.

FAQ

How do I tell if a PDF form is AcroForm or XFA without opening it?

Run qpdf --qdf input.pdf - and search for /XFA in the catalog. If you find it, you have an XFA form (or a hybrid). If you only find /AcroForm with /Fields, it's a pure AcroForm. In Chrome or Preview, AcroForms are clickable and editable; XFA shows a "please open in Adobe Reader" placeholder. Programmatic detection is one line in pdf-lib or PyPDF2.

Why do some government forms only work in Adobe Reader?

Because they use dynamic XFA — pages that grow and shrink as you fill in repeating sections (dependents, employment history). XFA is XML-based with its own layout engine, and only Adobe Reader includes the runtime to render it. ISO 32000-2 (2017) deprecated XFA, but tax authorities and government agencies still ship millions of XFA forms because their existing tooling generates them.

Can I convert XFA forms to AcroForm programmatically?

Sort of. Tools like qpdf, ABCpdf, and certain commercial libraries can render the XFA form to a flat PDF and overlay AcroForm widgets on top — you lose the dynamic layout but gain universal compatibility. The process isn't perfect; complex XFA forms with conditional sections may need manual rebuild. For volume work, the realistic path is replacing XFA with HTML forms entirely.

What does flattening a PDF form actually do?

Flattening bakes the field values into the page content stream and removes the interactive widget annotations. The result looks identical visually but is no longer editable — the form is permanent. Always flatten before archival, signed contract distribution, or anything sent to a regulator. A non-flattened form lets recipients edit the data, which can violate compliance requirements.

Why does pdf-lib not support XFA forms?

Because XFA is a separate XML-based standard that requires a full layout engine to render — way beyond the scope of pdf-lib's PDF object manipulation. Supporting XFA would essentially require implementing a mini browser. iText (commercial) and Apache PDFBox (partial) handle XFA, but most open-source PDF libraries skip it. For pure-Python, pdfrw and PyPDF2 also don't support XFA fields.

Are PDF forms accessible to screen readers?

AcroForms are if they're authored correctly — each field needs a tooltip (/TU entry) for the screen reader to announce, and the tab order should match visual order. Most government forms get this wrong. PDF/UA (ISO 14289) is the accessibility standard for PDFs; tools like CommonLook and Acrobat's accessibility checker flag missing labels. XFA forms are notoriously inaccessible — another reason they're being phased out.

Can I sign an AcroForm field digitally?

Yes — AcroForms support a /FT /Sig field type for digital signatures. The signature wraps a content range of the PDF in a cryptographic hash, signed with the user's private key, and embedded back in the form field. Adobe Reader, Foxit, and most enterprise tools support this; pdf-lib and PyPDF2 can read but generally not create signature fields. PAdES (PDF Advanced Electronic Signatures) is the EU-recognized standard.

What's the alternative to PDF forms for new applications?

HTML forms with a backend API, then generate a flattened PDF from the submission. This is what most modern workflows do — DocuSign, JotForm, Typeform all work this way. The benefits: universal browser support, no Adobe Reader dependency, easy validation, accessible by default, and you can render to PDF only for archival. Reserve PDF forms for legacy compatibility cases where the recipient must use Adobe.