help with pdf

pangolin_10

Member
Joined
Sep 2, 2005
Messages
14
Location
India
Programming Experience
1-3
hi,
i have to read from a pdf file and store it in a string variable. i do not know the number of characters in the file and i have to search for a particular string in the file. plzzz help.
 
Do you have to download the file first or is it already on your machine. Are you looking for a specific string each for each search or is the string you want to find going to be different each time. If you could please answer these questions, and then i'll come up with some kind of a model.

FYI - Just as a piece of background information you'll need to research the System.IO namepace.
 
re:vis

the file is already on my machine. Suppose its name is "x.pdf". When opened in the Acrobat Reader, it has the content "Hello x, how are you?"
Now, there is a text box in the form where I input my search string, say "Hello". The code I am looking for will parse the PDF file and if the search string is found in the file, a success will be returned. I do not know how the PDF file is encoded and decoded, so I really need your help on this. Hope I have been able to explain what I want? So plzzz help..
 
Ok, this *is* possible but it's by no means a piece of cake. .pdf files are encrypted and encoded and that makes it a real pain to parse. However it can be done of a fashion. Here's a link to a CodeProject page that contains a C++ class that can parse a .pdf file. You'll have to make a separate win32 console application. But all the instructions are in the site. Hope this helps......

http://www.codeproject.com/cpp/ExtractPDFText.asp
 
Back
Top