Do you own a Debenu Quick PDF Library 12/11/10/9/8/7? Upgrade to Debenu Quick PDF Library 13!

Foxit Quick PDF Library

Frequently Asked Question:

Return to FAQ Index

Find a specific word in a PDF file

Question

Is it possible to search the content of a PDF file for a string (i.e. "characters")?

For example, perhaps I want to find all references to the PageCount function in the function reference for Quick PDF Library or perhaps I want to find all references to an invoice number in a PDF.

I would also want to know on what page a match was found for the string that I'm searching for.

Is this possible using Quick PDF Library?

Answer

It's possible to get all of the necessary information from the PDF which would allow you to find matches for a specified string using the GetPageText function and a little work in the programming language of your choice.

I've written some sample code below which demonstrates one way to find a word in a PDF file. Remember, the code below is just a quick prototype, so it's not necessarily the most efficient way to find keyword matches, but it does work for the purposes of showing you what's possible.

The sample code was written in a C# console app using the DLL edition of Quick PDF Library. Leave a comment if you have any feedback.

using System;
using System.IO;
using QuickPDFDLL0718;

namespace QPLConsoleApp
{
public class QPL
{
public static void Main()
{
// This example uses the DLL edition of Quick PDF Library
// Create an instance of the class and give it the path to the DLL
PDFLibrary QP = new PDFLibrary("QuickPDFDLL0718.dll");

// Check if the DLL was loaded successfully
if (QP.LibraryLoaded())
{
// Insert license key here / Check the license key
if (QP.UnlockKey("...") == 1)
{
QP.LoadFromFile(@"C:\Program Files\Quick PDF Library\DLL\GettingStarted.pdf");

int iPageCount = QP.PageCount();
int PageNumber = 1;
int MatchesFound = 0;

while (PageNumber <= iPageCount)
{
QP.SelectPage(PageNumber);
string PageText = QP.GetPageText(3);

using (StreamWriter TempFile = new StreamWriter(QP.GetTempPath() + "temp" + 
PageNumber + ".txt"))
{
TempFile.Write(PageText);
}

string[] lines = File.ReadAllLines(QP.GetTempPath() + "temp" + PageNumber + ".txt");
string[][] grid = new string[lines.Length][];

for (int i = 0; i < lines.Length; i++)
{
grid[i] = lines[i].Split(',');
}

foreach (string[] line in grid)
{
string FindMatch = line[11];

// Update this string to the word that you're searching for.
// It can be one or more words (i.e. "sunday" or "last sunday".

if (FindMatch.Contains("characters"))
{
Console.WriteLine("Success! Word match found on page: " + PageNumber);
MatchesFound++;
}
}
PageNumber++;
}

if (MatchesFound == 0)
{
Console.WriteLine("Sorry! No matches found.");
}
else
{
Console.WriteLine();
Console.WriteLine("Total: " + MatchesFound + " matches found!");
}
Console.ReadLine();
}
}
}
}
}

Please note: obviously writing the text to a text file and then reading it back into memory isn't necessary as this could all be done in memory, but for the purposes of this sample code, I thought it would be useful to see the text output.


© 2015 Debenu & Foxit. All rights reserved. AboutBuyContactBlogNewsletterSupportFAQProduct UpdatesForum