Shared Content Streams and Quick PDF Library

February 1, 2012

A lot of PDF tools expect all the pages of a PDF to have individual content streams.

But it’s technically possible for two or more pages to reference the exact same content stream, either entirely or even pieces of content stream parts.

10 0 obj
<<
/Type /Page
/Contents [ 11 0 R 12 0 R ]
>>

15 0 obj
<<
/Type /Page
/Contents [ 18 0 R 11 0 R ]
>

In this example, the page defined by object 10 has two content stream parts, stored in objects 11 and 12. The page in object 15 also has two content stream parts, but the second part is the same object 11 used by the first page – this is a shared content stream.

If a PDF is structured that way, any changes to one page will also appear on the other page with the shared content stream.

The PDF specification doesn’t forbid this at all and Adobe Acrobat and Adobe Reader both seem happy with files like this. It’s quite a useful trick. In fact Quick PDF Library’s ClonePages function uses this exact technique to allow many pages to share a single content stream without increasing the size of the file.

Some PDF software might not be able to read files structured like this, and most PDF tools will have unpredictable results when doing things like extracting or deleting pages.

When deleting a page from a PDF, it makes sense to delete the content streams that describe the page and not just the page dictionary otherwise there would be unused data in the output PDF, wasted space. Quick PDF Library’s DeletePages function does exactly that, it clears all the content stream parts (sets them to an empty string) and then deletes the page dictionary.

So if a page shares content streams with other pages, and is then deleted with DeletePages, the other pages will be affected too.

The RemoveSharedContentStreams function cycles through all the pages in the document building up a list of stream objects in the /Contents array. If any shared content streams are found they are left intact for one page and copies are made for any other pages using that same content.

This process might take a long time on PDFs with thousands of pages.

By Rowan | Comments Off | Posted in Quick PDF Library,Tips & Tutorials

Quick PDF Library 8 API Changes

September 14, 2011

Quick PDF Library 8 is a major new version of Quick PDF Library and contains a few changes to the API that were necessary to provide native support for Unicode and to improve the way secured PDF files are handled by the library. We also took this opportunity to rename some functions which used terms that conflicted with terms used by Acrobat.

This document contains a list of items that developers looking to upgrade to Quick PDF Library 8 should pay attention to. Changes to code may be required in order for your projects to compile using the new version.

1. Changed functions definitions

The LoadFromFile, LoadFromString, LoadFromVariant and LoadFromStream functions now have a Password parameter. Note functionality changes below.
The ExtractFilePages and GetFileMetadata functions now have a Password parameter.
All functions that previously returned or accepted 8-bit character strings (either UTF-8 encoded, WinAnsi encoding or unspecified encoding) now use UTF-16 Unicode strings.
Certain functions continue to accept or return 8-bit data, others have been renamed to more easily facilitate working with binary data.
All the text drawing functions (such as DrawText, DrawHTMLText) allow the text to be specified using UTF-16 Unicode strings. The fonts automatically convert the text string’s Unicode characters to the appropriate encoding for storing in the content stream.

2. Renamed functions

The fullowing functions have been renamed due to the changes from 8-bit to 16-bit strings. The change results in two new functions to replace each original function:

GetObjectSource	GetObjectToString (Delphi and DLL) GetObjectToVariant (ActiveX)
SetObjectSource	SetObjectFromString (Delphi and DLL) SetObjectFromVariant (ActiveX)
GetPageContent	GetPageContentToString (Delphi and DLL) GetPageContentToVariant (ActiveX) * Note functionality changes listed below
SetPageContent	SetPageContentFromString (Delphi and DLL) SetPageContentFromVariant (ActiveX) * Note functionality changes listed below
DAGetPageContent	DAGetPageContentToString (Delphi and DLL) DAGetPageContentToVariant (ActiveX)
DAGetObjectSource	DAGetObjectToString (Delphi and DLL) DAGetObjectToVariant (ActiveX)
ExtractFilePageContent	ExtractFilePageContentToString (Delphi and DLL) ExtractFilePageContentToVariant (ActiveX)

In QPL v7 the term “layer” was used to describe a part of a page’s content stream. This conflicted with Adobe’s use of the term “layer” for the PDF feature known as optional content groups. To avoid confusion, all the layer functions have been renamed:

SetLayerOptional	SetContentStreamOptional
LayerCount	ContentStreamCount
CombineLayers	CombineContentStreams
NewLayer	NewContentStream
SelectLayer	SelectContentStream
EditableLayer	EditableContentStream
MoveLayer	MoveContentStream
DeleteLayer	DeleteContentStream
LayerSafe	ContentStreamSafe
UseUnsafeLayers	UseUnsafeContentStreams
EncapsulateLayer	EncapsulateContentStream
RemoveSharedLayers	RemoveSharedContentStreams

3. New functions

GetContentStreamToString and GetContentStreamToVariant have been added to replace the functionality previously provided by the GetPageContent function.
SetContentStreamFromString and SetContentStreamFromVariant have been added to replace the functionality previously provided by the SetPageContent function.
SetTextExtractionOptions was added to provide greater contrul over the text extraction functions.
CheckFileCompliance was added to check for PDF/A compliance along with the GetStringListCount, GetStringListItem functions needed to retrieve the compliance test results.
SetPDFAMode was added to allow the creation of new PDF/A-1b compliant documents.
Other new functions include DrawPDF417Symbul, AddTrueTypeSubsettedFont, SetLineDashEx and SetRenderCropType.

4. Removed functions

The SetAdvancePassword and SetPassword functions have been removed. They are no longer necessary because the LoadFrom* functions now have an additional parameter for specifying the password.

5. Changed functionality

When the LoadFrom* functions are used to open an encrypted document the objects will be automatically decrypted as necessary. Calls to SetPassword, SetAdvancePassword and Decrypt are no longer necessary to access any parts of an encrypted document. The SaveTo* functions can be used to save an encrypted document and the original encryption will remain in place with any new content automatically encrypted to the existing security settings.
The LoadFrom* functions now return 1 on success and 0 on failure. In QPL v7 these functions would return 2 if an encrypted document made use of object streams or cross reference streams. This is no longer necessary as these functions have a new password parameter. If the wrong password is given, the LoadFrom* functions will return 0 and the LastErrorCode function will return error code 404.
The Encrypt and Decrypt functions can still be used to add or remove security and full access to the document is possible even after encryption has been applied.
In QPL v7, the GetPageContent function returned only a portion of the page’s content stream. The replacement functions GetPageContentToString and GetPageContentToVariant now return the entire content stream of the selected page, not just an individual part.
Similarly, the SetPageContentFromString and SetPageContentFromVariant functions now set the entire content stream of the selected page, not just an individual part.
The SecurityInfo function returns the active encryption details even though encrypted object in documents are internally decrypted as needed.
XMP metadata is now added to new documents and maintained in existing documents. Calling functions like SetInformation will result in the XMP metadata being updated to keep that particular item in the document information dictionary in sync.

By Rowan | Comments (2) | Posted in Quick PDF Library,Tips & Tutorials

Device context handles and signed/unsigned integers

June 21, 2011

For Quick PDF Library version 7.25 we changed the integer type of the device context handles returned by the GetCanvasDC function and accepted as parameters to the RenderPageToDC and DARenderPageToDC functions.

The reason for this is that the Delphi VCL declares the HDC type as an unsigned 32-bit integer.

The Windows graphics system does return values in the entire unsigned 32-bit range.

However, since making the change in 7.25 we have done further research and found some inconsistencies in various programming environments.

In particular, the Windows SDK defines a device context handle HDC as a PVOID which is a signed 32-bit integer.

The .NET Framework uses System.IntPtr which is also a signed 32-bit integer.

For 7.26 we will be making further changes to how device context handles are processed by Quick PDF Library to properly match different programming languages.

Delphi DCUs – Unsigned – Cardinal
Delphi DLL – Unsigned – HDC
C# ActiveX – Signed – System.IntPtr
C# DLL – Signed – System.IntPtr
C++ DLL – Signed – HDC
C++ ActiveX – Signed – HDC
VB.NET ActiveX – Signed – System.IntPtr
VB.NET DLL – Signed – System.IntPtr
PowerBuilder DLL – Signed -long
PowerBASIC DLL – Unsigned – DWORD
ActiveX Type Library – Signed – long

The information in this article supersedes the information that we posted on this topic a couple of weeks ago (specifically two two blog posts: Unsigned Integers and Device Context Functions and Unsigned Integers And Visual Basic).

By Rowan | Comments (1) | Posted in News,Quick PDF Library,Tips & Tutorials

Unsigned Integers and Device Context Functions

June 9, 2011

Update: the information in this post has been superseded by the information from this post: Device context handles and signed/unsigned integer.

In version 7.25 of Quick PDF Library we made a change to the API which made the use of unsigned integers necessary.

Unfortunately this change will break backwards compatibility for code that makes use of the RenderPageToDC, DARenderPageToDC, and GetCanvasDC functions — the only functions which use unsigned integers.

This means that for the RenderPageToDC and DARenderPageToDC functions the DC parameter now requires an unsigned integer to be passed to it and that the GetCanvasDC function now returns an unsigned integer. You will need to update your code to reflect this change prior to being able to successfully compile using version 7.25.

If you do not use any of these functions then you do not need to make any changes.

Most modern programming languages support unsigned integers, but there are a few older languages which do not. So for those languages we will attempt to come up with suitable workarounds — for example, as mentioned on this blog post a couple of days ago, we have created a TLB file to be used with the ActiveX edition of our library and VB6 which resolves the issue of VB6 not supporting unsigned integers. Leave a comment if your programming language does not support unsigned integers.

We do apologize in advance for any inconvenience that this break in backwards compatibility causes. We do our utmost to avoid situations like this but sometimes breaking backwards compatibility is necessary for the future health of the library.

By Rowan | Comments (1) | Posted in News,Quick PDF Library,Tips & Tutorials

Unsigned Integers And Visual Basic

June 7, 2011

Update: the information in this post has been superseded by the information from this post: Device context handles and signed/unsigned integer. It is no longer required to use unsigned integers with the ActiveX edition, this is handled internally by the library.

In version 7.25 of Quick PDF Library we made a change to the API which made the use of unsigned integers necessary. Unfortunately not all versions of programming languages support unsigned integers. In this particular case Visual Basic 6 and earlier versions do not support unsigned integers, while Visual Basic .NET and newer versions do.

Although VB6 was released over 10 years ago it still has a strong following and quite a few VB6 programmers use Quick PDF Library. This being the case we’ve come up with an easy workaround which will enable VB6 programmers to continue using all of the functions in Quick PDF Library.

We’re now providing a TLB file along with the ActiveX edition that handles the unsigned integers. The TLB file is only used at compile time so you won’t need to distribute it with your executable, but you will need to distribute the ActiveX as per usual.

The TLB file for Quick PDF Library 7.25 can be downloaded from here and will be included in all future installers.

Instructions for using the TLB file:

Register the ActiveX on your machine as per usual
Open your VB6 project
Go to Project > References > Browse…
Add the ‘QuickPDFAX0725VB.tlb’ file
Compile

As you are adding the TLB file as the reference it is not necessary to add the ActiveX file as a project reference because the TLB file interfaces with the ActiveX through the registry.

Please note: if you do not use the GetCanvasDC, DARenderPageToDC or RenderPageToDC functions then you won’t run into any issues if you choose not to use the TLB file. The unsigned integers are currently only used with these functions.

By Rowan | Comments (4) | Posted in News,Quick PDF Library,Tips & Tutorials

Use Perl with Quick PDF Library

May 30, 2011

We have just updated the samples page with a package of Perl samples to use with the ActiveX edition of Quick PDF Library.

Perl will only work with Quick PDF Library on Windows, though we hope to provide a cross-platform solution in the future.

If there are any other programming languages that you would like to see some samples for, leave a comment!

By Rowan | Comments (1) | Posted in News,Quick PDF Library,Tips & Tutorials

ARTS PDF Workshop Now 100% Free And Powered By Quick PDF Library

May 11, 2011

ARTS PDF Workshop, a Microsoft Excel add-on that lets you retrieve, manipulate, and print PDF information quickly and easily, has been upgraded to use Quick PDF Library as its core PDF engine. In addition the add-on is now also 100% free.

ARTS PDF is a sister business division of Quick PDF, which are both owned by Debenu.

ARTS PDF Workshop lets you view and update the properties of batches of PDF documents, including document info (title, author, subject and keywords), as well as document open options and security. It is a great tool for batch updating multiple PDF files at once.

The Excel add-on can be downloaded from the ARTS PDF Workshop page on www.artspdf.com.

By Rowan | Comments Off | Posted in Announcements,News,Tips & Tutorials

Windows 2000 End-Of-Support

February 21, 2011

We’ve made the decision to end official support for Windows 2000 effective immediately as this operating system has reached the end of its livecycle and is no longer being supported by Microsoft.

Please note that there are no Windows 2000 specific hacks in the Quick PDF Library code base, so we will not be “removing” support as such. All this decision really means in real-terms is that Windows 2000 is no longer listed as a supported operating system on our website.

A quick tip for anyone who is going to continue developing applications for Windows 2000 is that if you’re using Visual Studio 2010 with runtime 10.0 this may cause issues on Windows 2000. Specifically, this error message:

“The procedure entry point EncodePointer could not be located in the dynamic link library KERNEL32.DLL”

Head on back to Visual Studio 2008 with runtime 9.0 or earlier and this error message will disappear.

Sayonara Windows 2000, it was nice knowing you.

By Rowan | Comments (1) | Posted in News,Quick PDF Library,Quick PDF Library Lite,Tips & Tutorials

About PDF Files

February 18, 2011

The goal of PDF is to enable users to exchange and view electronic documents easily and reliably, independent of the environment in which they were created or the environment in which they are viewed or printed.

PDF files can be thought of as self-contained composite documents made up of many instances of many things: page contents, images, graphics, fonts, colorspaces, metadata, annotations, links, digital signatures and more.

If you open up a PDF in a text editor you will notice that there are parts of it which make sense, but that the majority is unreadable to the human eye. That’s because much of the data in PDF files is stored inside binary streams, in which data has been encrypted or compressed. This binary data looks like garbage, but you can easily break your PDF just by adding a single character. It’s best not to edit PDF files directly in a text editor.

Underneath the hood PDF files are made from unordered numbered objects which can refer to each other by number and are all linked together by a cross reference table which maps object numbers to very specific places within the file.

At a low level PDF combines three technologies:

A subset of the PostScript page description programming language, for generating the layout and graphics.
A font-embedding/replacement system to allow fonts to travel with the documents.
A structured storage system to bundle these elements and any associated content into a single file, with data compression where appropriate.

As you can see, PDF files are quite complex and it makes a lot of sense (and saves a lot of time and money) to use products such as Quick PDF Library to keep those complexities out of your life.

However, if you are interested in studying the internals of PDF files further then these resources should be helpful:

By Rowan | Comments Off | Posted in Quick PDF Library,Tips & Tutorials

The Basics: Getting Familiar with Quick PDF Library

February 17, 2011

Quick PDF Library is a big library and sometimes it can be a little daunting getting started, so we’ve put together a few basic tips to help you get up to speed quickly. We’ll enhance this document as we think of new tips. If you have any tips you’d like us to add to this document or questions about the intended use of certain features, please leave a comment.

Unlock the library

The UnlockKey function needs to be called and the return value checked otherwise most other functions called later will fail. More info here.

Check functions for return values

Always check the return value of the important functions such as LoadFromFile because if this function fails the subsequent function calls will fail to. Every single function returns a value. Checking for return values is not a requirement, but it can be immensely useful in ensuring the robustness of your code and debugging.

If a function is not documented as having a return value then you can assume the a return value of one (1) indicates successful. Zero (0) or other values could indicate an error or it could be returning a valid handle or ID.

Memory and direct access functions

Function names that start with DA indicate that the function is a direct access function. Functions that do not have this pre-fix are memory functions. Direct access and memory functions cannot be used together. Combining them in your code will result in your code not working correctly.

The DA functions are primarily used for PDF documents that are very large and contain thousands of pages. They are generally faster for these larger documents because the document does not need to be loaded into memory. For file sizes under 500 MB or under a few thousand pages the speed differences are negligible.

Blank document automatically loaded

When you initialise the library there is always a one page blank document in memory. It is selected and ready to use by default. This is due to the design of the library. There can never not be at least one document in memory, so if you try to delete that document using the DeletePages function, the library will automatically re-create a one page blank document.

The blank one page document uses a Letter page size which is 8.5 x 11 inches or 215.9 mm × 279.4 mm. The page size can be changed using the SetPageSize function.

New documents automatically selected

Whether you load an existing document using the LoadFromFile function or create a new document using the NewDocument function, it will automatically be selected in memory and the documents ID can be retrieved using the SelectedDocument function.

Multiple documents in memory permitted

You can have more than one document in memory and can swap between them using the ID returned from calls such functions as NewDocument and SelectedDocument. You can also count all documents in memory using the DocumentCount function and then retrieve each documents ID or filename using GetDocumentID or GetDocumentFileName. All of the document management related functions can be seen in the document management section in the function reference.

Origin point for drawing operations

The origin has coordinates of 0,0 and is the starting point for finding all other points. The origin point for a page in a PDF typically starts at a page corner. The default origin for Quick PDF Library is the bottom left page corner.

Using the SetOrigin function in Quick PDF Library you can change the point of origin to be any page corner (bottom left, top left, top right, bottom right).

By default calling QP.DrawText(10, 10, “Test”) will result in the text being drawn at 10 points in from the left of the page and 10 points up from the bottom of the page, but if you call QP.SetOrigin(1) prior to DrawText then the text will be drawn at 10 points in from the left of the page and 10 points down from the top of the page because passing the value 1 to the Origin parameter for the SetOrigin function changes the origin to top left of the page.

The default point of origin in Adobe Acrobat was the bottom left page corner up until Acrobat 8, at which point Adobe switched the point of origin to the top left page corner. As mentioned above, you can set Quick PDF Library to use any page corner in a PDF.

Measurement units

In PDF the coordinate system is called default user space. The default for the size of the unit in default user space (1/72 inch) is approximately the same as a point, a unit widely used in the printing industry. It is not exactly the same, however; there is no universal definition of a point.

Using the SetMeasurementUnits function you can change the units for all measurements given to and returned from the library. The available options are default user space, millimetres and inches.

Unicode, UTF-8 and the DLL and Delphi Editions

There are many different ways to encode Unicode characters. One way is to use strings with 16-bit characters. COM/ActiveX uses 16-bit characters, so adding Unicode support for the ActiveX edition of the library was easy.

For the Delphi and DLL editions, the strings have always been 8-bit characters. Unfortunately we can’t change the definition of functions as this would cause issues with backwards compatibility.

This means that when using the Delphi and DLL editions and working with Unicode characters, you need to encode your file names with UTF8 encoding, as mentioned in the function reference. Make sure that you pay attention to each function description as it will specifically mention if you need to encode or decode the input or output.

Different languages will have different functions to do the UTF8 encoding.

Standard Fonts

The PDF specification outlines 14 fonts that should always be available in all PDF viewers. These 14 fonts are called standard fonts and can be added to your PDF using the AddStandardFont function.

Font embedding, subset font embedding and no font embedding

There are three key ways that fonts can be handled in PDF files. They are:

Full Font Embedding = Larger file size
Recipient doesn’t need the same font to view or edit the file

Subset Font Embedding = Smaller file size
Recipient doesn’t need the same font to view but does need the same font installed in order to edit the file

No Font Embedding = Smallest file size
Recipient needs to have same fonts installed

Each option has its merits. As long as option 1 or 2 above when you are building your PDF files, then you can be sure that when your PDF is rendered or printed the font you’ve specified will be used. If you use option 3, then the PDF viewer will attempt to locate the specified font on the local machine but if it cannot be found then it will use a substitute font during the viewing/printing process.

When a PDF is displayed on your screen it is rendered in exactly the same fashion as it would be prior to being printed.

Quick PDF Library can only fully embed or subset fonts during the PDF creation process. The AddTrueTypeFont function can be used to add and embed a TrueType font in the document and the AddSubsettedFont function can be used to embed a subset of a font in the document. The AddCJKFont function is also available for CJK fonts.

By Rowan | Comments (3) | Posted in Quick PDF Library,Tips & Tutorials

Newer Posts »