Welcome Guest | My Membership | Login

Populating PDF Forms with MultiValue Data


Article

Walk into any shop, MultiValue or not, that's been around for years and you are likely to find special form-overlay programs which PRINT s data on forms, using either physical printers or virtual printers that overlay raw print data on images. It's a tried-and-true way to get the job done. These programs always work great until the form changes for one reason or another.

The reason each form change becomes an issue is that we aren't really working with the form when we create an overlay. We are working with where we expect the spaces to be. PDF forms address this problem. Once you have a PDF document with form prompts on it, you can merge the data into your form and not worry about where on the from it needs to go. The PDF document will take care of that for you.

What You Need

In order to populate a PDF with data, you will need one third-party program on your system:

PDFtk by PdfLabs

https://www.pdflabs.com/tools/pdftk-server/

This program is delivered as part of the distribution version of many Linux systems but is not limited to Linux only. There is a windows version of the same program, so for those with Window based systems, this will work as well.

PDFtk (PDF toolkit) does a number useful things, even before we add our MultiValue magic. It is designed to merge, encrypt, decrypt, add watermarks, and single PDFs split into multiple individual files. And, of course it can fill-in PDF form data.

Example Used

I want to keep this article business-practical, so my example will involve filling out a legal form for the payroll department. The sample PDF that I will be using is the IRS W9 Form [ Figure 1 ]. While this form isn't something that is used every day, it is a good example. If you aren't working with an American company, you'll find that there's an equivalent document in most if not all other countries.

FillPDF_W9_Figure1
Figure 1

To get your own copy of the original PDF: https://www.irs.gov/pub/irs-pdf/fw9.pdf . Once you have that, you can follow along and build your program as we continue our way through the article together.

Retrieving the Form Data

Like HTML forms, PDF forms are set up with a unique name assigned to each field. This is very much like how we assign dictionaries to individual fields in our database. Unfortunately, the names aren't visible when you look at the un-filed document. Since we don't know what these names are, we have to extract them in order to have the names when we populate the form.

The following command will extract this information for you:

$ pdftk fw9.pdf dump_data_fields > fw9_fields.txt

This will produce an output [ Figure 2 ] file that contains information about each field in the PDF document. Each PDF input will have 4-7 pieces of information designed to describe how the field is to be populated. The key data you need is FieldName . This will be the unique identifier which will mark each spot that can be filled-in. Connect the right data to the right name and the results will make sense.

---
FieldType: Text
FieldName: topmostSubform[0].Page1[0].f1_1[0]
FieldFlags: 8388608
FieldJustification: Left
---
FieldType: Button
FieldName: topmostSubform[0].Page1[0].FederalClassification[0].c1_1[0]
FieldFlags: 0
FieldJustification: Left
FieldStateOption: 1
FieldStateOption: Off

Figure 2

I have found that the input field names aren't always self-explanatory. You may have to do a little bit of homework in order to get the right field for the right input [ Figure 3 ]. The easiest way to do this is test the tab order. Open the PDF document and tab between the fields to verify which fields are the the first, second, third, etc. in order.

FillPDF_W9_Figure3

Figure 3

You will also need to watch for the FieldType information to make sure you are providing valid information. If you look at Figure 2 , you will see a FieldType for the button, which has two FieldStateOption values. The first value is the checked (Yes) value and the second value is the unchecked (No) value.

You will also need to watch for the FieldType for Choice, which may contain two or more FieldStateOptions as well, if it is present. This might be a good time to remind you that I didn't design this methodology, I'm just explaining what PDF forms provide.

If the FieldType is Button, then you need to look at the FieldStateOption field to find out what values are allowed to be assigned to the field.

Form Data File

Once you know what the field names are, you need to create a Form Data Format (FDF) file. This is a special file format used by PDFs to populate the data. They made it really easy for us by keeping this file text, but it does look a little odd [ Figure 4 ].

%FDF-1.2
1 0 obj << /FDF << /Fields [
<< /T(topmostSubform[0].Page1[0].f1_1[0]) /V(International Spectrum) >>
<< /T(topmostSubform[0].Page1[0].FederalClassification[0].c1_1[1]) /V(2) >>
<< /T(topmostSubform[0].Page1[0].Address[0].f1_7[0]) /V(3691 E 102nd Ct) >>
] >> >>
endobj
trailer
<< /Root 1 0 R >>
%%EOF

Figure 4


If you have read previous articles on generating PDFs from within MultiValue BASIC (
http://www.intl-spectrum.com/mag/JULAUG.2009/default.aspx
 and 
https://www.intl-spectrum.com/resource/category/168/PDF.aspx
), you'll see a similarity in the file formats and structures.

That's a truly ugly layout. If this is your first look at PDF internals, it may be hard to follow. Believe it or not, it is actually pretty simple. This file is basically a Key/Pair file. The /T indicates the key and the /V represents the value. The data is wrapped in parenthesis, much like you would use quotes. Once again, not my design. The first two lines in Figure 4 are the header of the file, and the last five lines are the footer. Both the header and the footer will always be the same for any FDF-formated file.

In between the header and footer is where we need to put the data we want to merge into the PDF [ Figure 5 ].

<< /T(topmostSubform[0].Page1[0].f1_1[0]) /V(International Spectrum) >>
<< /T(topmostSubform[0].Page1[0].FederalClassification[0].c1_1[1]) /V(2) >>
<< /T(topmostSubform[0].Page1[0].Address[0].f1_7[0]) /V(3691 E 102nd Ct) >>

Figure 5

Once you have created your FDF file, and it has been saved with the .fdf extension, you can merge the pdf and data together to create a new PDF document:

$ pdftk fw9.pdf fill_form fw9_data.fdf output fw9_merged.pdf flatten

If you look at this command line, you will see the original PDF is named fw9.pdf , the data is in the FDF file fw9_data.fdf , and the final merged document will be called fw9_merged.pdf . The flatten keyword will create the new PDF document without editable input fields. The original files will remain as-is and can be used again.

Alternate Form Data Format

There is an alternate FDF format called XFDF, which is XML based [ Figure 6 ]. Why didn't I cover that format first? Well, depending upon the version of pdftk you have on your system, XFDF may not be supported.

<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
  <fields>
    <field name="topmostSubform[0].Page1[0].f1_1[0]">
      <value>International Spectrum</value>
    </field>
    <field name="topmostSubform[0].Page1[0].FederalClassification[0].c1_1[1]">
      <value>2</value>
    </field>
    <field name="topmostSubform[0].Page1[0].Address[0].f1_7[0]">
      <value>3691 E 102nd Ct</value>
    </field>
  </fields>
</xfdf>

Figure 6

I thought it would be best if you have the most up-to-date version of pdftk, but that is not always the case, so I started with the harder format first. Besides being easier to understand, XFDF has one more advantage. It will support Unicode in UTF-8 format. The FDF format does not support Unicode.

Extended Features Error

Some of original PDF documents start with Extended Features enabled. If this is the case with a document you are working with, you'll get an error when you open it in Acrobat:

"This Document enabled extended features in Adobe Reader. This document has been changed since it was created and use of extended features is no longer available."

Sometimes this is due to Signed PDFs, other times, its due to security settings like related to Page Extraction. In order to remove these errors, you need to run the PDFtk command one more time to strip this information:

$ pdftk fw9_merged.pdf cat output fw9_finished.pdf

Putting This all Together

As you can see, this is all really easy to do. While you can do it yourself, there are subroutines available at the following URL that will takes all this into account:

https://www.intl-spectrum.com/resource/category/168/PDF.aspx

Creating Your own PDF Documents With Form Inputs

You aren't limited to pre-made PDF documents. If your company has documents they regularly fill out, like liens, mortgage forms, tax forms, or credit requests, then you can convert any existing PDF document into a PDF document with input. You just need the right program. Adobe Acrobat Pro is the most commonly used, but also the most expensive. A good open source version is OpenOffice.

 

# # #          # # #          # # #

 

Related Articles

  • From the Inside September/October 2017

    Company: International Spectrum

    As I plan for the 2018 Conference in April, I'm always looking for interesting and fun things for at-tendees to do. As part of the planning for the MultiValue Bootcamp track I was looking for how oth-er conferences have run that sort of training in their own industries. In doing so, I ran across the concept of a Coding Retreat. A Coding Retreat is very similar to the Code Katas that I've talked about in past FTIs, but it sounds like a lot more fun.

  • A View From the Top: Value-Add Followership

    International Spectrum has always been a business and technology resource. A View From the Top is an occasional column dedicated to business perspectives that affect the entire business, not just the technology and technologists.

  • Business Tech: UX and UI Part III - Compliance

    Company: HDWP

    When we think about law and coding, it usually conjures up movie images of hackers doing illegal things in thirty seconds that would take hours or days. Truth is, even the good guys can run afoul of the law — easily and without intent — but still in an extremely serious way. Legal compliance isn't just a thing we have to do, it is a thing we must embrace.

  • Case Study: Plumbing Company Modernizes Services with Revelation Software

    Company: Revelation Software Database: OpenInsight

    Often, when explaining what we do as programmers and analysts, we compare it to the building trades. We talk about wiring up code or framing out the processes that make a business work. This time we'll be talking about the software plumbing that makes one plumbing company flush with success.

  • Email from U2 Hold Files as PDFs

    Database: UniVerse

    Printing is only one way to share reports. Felix provides code for re-inventing hold files as PDFs and as email.


Return to top