About PDF Files
The goal of PDF is to enable users to exchange and view electronic documents easily and reliably, independent of the environment in which they were created or the environment in which they are viewed or printed.
PDF files can be thought of as self-contained composite documents made up of many instances of many things: page contents, images, graphics, fonts, colorspaces, metadata, annotations, links, digital signatures and more.
If you open up a PDF in a text editor you will notice that there are parts of it which make sense, but that the majority is unreadable to the human eye. That’s because much of the data in PDF files is stored inside binary streams, in which data has been encrypted or compressed. This binary data looks like garbage, but you can easily break your PDF just by adding a single character. It’s best not to edit PDF files directly in a text editor.
At a low level PDF combines three technologies:
- A subset of the PostScript page description programming language, for generating the layout and graphics.
- A font-embedding/replacement system to allow fonts to travel with the documents.
- A structured storage system to bundle these elements and any associated content into a single file, with data compression where appropriate.
However, if you are interested in studying the internals of PDF files further then these resources should be helpful: