About PDF Files

February 18, 2011

The goal of PDF is to enable users to exchange and view electronic documents easily and reliably, independent of the environment in which they were created or the environment in which they are viewed or printed.

PDF files can be thought of as self-contained composite documents made up of many instances of many things: page contents, images, graphics, fonts, colorspaces, metadata, annotations, links, digital signatures and more.

If you open up a PDF in a text editor you will notice that there are parts of it which make sense, but that the majority is unreadable to the human eye. That’s because much of the data in PDF files is stored inside binary streams, in which data has been encrypted or compressed. This binary data looks like garbage, but you can easily break your PDF just by adding a single character. It’s best not to edit PDF files directly in a text editor.

Underneath the hood PDF files are made from unordered numbered objects which can refer to each other by number and are all linked together by a cross reference table which maps object numbers to very specific places within the file.

At a low level PDF combines three technologies:

A subset of the PostScript page description programming language, for generating the layout and graphics.
A font-embedding/replacement system to allow fonts to travel with the documents.
A structured storage system to bundle these elements and any associated content into a single file, with data compression where appropriate.

As you can see, PDF files are quite complex and it makes a lot of sense (and saves a lot of time and money) to use products such as Quick PDF Library to keep those complexities out of your life.

However, if you are interested in studying the internals of PDF files further then these resources should be helpful:

By Rowan | Comments Off | Posted in Quick PDF Library,Tips & Tutorials