Extracting Specific Text Lines or Selection from PDF

Hey Guys,

New to the site, but the love the community. As most of you here, I am new to applescripting, but I’m picking up steam fast. I am currently writing an applescript for printing mailing labels, but I have hit a snag.

I have to pull the mailing address from a PDF form. I have found applescript’s all over that will extract and copy full pages of text from PDF’s, but I need it to select only the part of the PDF with the mailing address on it and copy it into a word document.

Any ideas on how to begin, or where I can find the actual applescript?

Are you going to identify (select) the address, or do you expect to parse for it in the document?

Can you do manually what you’d like to integrate into a workflow? (i.e., is the PDF copyable?)

Have you looked at Skim (which is scriptable?)

I’m looking for the best option to use without having to use other programs, such as automator, to get this accomplished. Since the PDF files will all have the exact same page layout, I just need a way to tell my applescript what part of the PDF page to copy. I can do it manually by selecting or highlighting the specific mailing address info, and then copying and pasting it into my printing label word document. But again, I’m new to the scripting world, so I don’t know all the technical language yet to get you a more specific answer. My apologies in advance.

I’m actually creating these PDF’s from the web. They are invoices. Instead of printing them, I am just saving them as PDF’s. I don’t know if that help the situation.

When you print them from the web do you not have access to the address information then?
Why not extract the information from the web instead of from the pdf?

Just a suggestion.

And to answer your question, “How would I go about doing that?”

You could use JavaScript. There a several examples on this forum for ideas of doing it that way.
Assuming you are using Safari. Firefox has excellent support for JavaScript just not from AppleScript
and it is no fun sending information to AppleScript from inside of Firefox.

Or you could use Python, Ruby, Perl, etc and with each one of these you have access to good
libraries designed to extract web data.

hth,

Craig