Sums all values that are after the field called VALOR BASE in all PDFs, in the dummy case 271,86.
Sums all values that are after the field called IVA : IVA - regime de isenção [art.o 53.o] ; in all PDFs, that in the dummy case is 0,00 but may contain a value. The name of this field may vary, but always start with IVA:. Note that the field uses commas as a decimal point.
Present the two sums, separately.
My question is, how do I scan the PDF using AppleScript and look for these fields?
Is there a way to obtain a list of text fields, boxes or whatever and enumerate them?
macos_boy. I don’t know how to do exactly what you want. Hopefully another forum member will be able to help.
FWIW I wrote a script that gets the desired amounts on your sample form using a regular expression. The regex pattern is 1 to 3 decimal characters, followed by a comma, followed by 2 decimal characters. The script returns two lists containing the first and second regex matches on each PDF. If the desired data is not the first and second regex matches on each page then, obviously, the script won’t do what you want. I didn’t include the actual calculations, because my locale uses a period as a decimal separator, but a simple repeat loop will do the job.
-- revised 2023.01.07
use framework "Foundation"
use framework "Quartz"
use scripting additions
set theFiles to (choose file of type {"pdf"} with multiple selections allowed) -- two PDFs selected in testing
set countOne to current application's NSMutableArray's new()
set countTwo to current application's NSMutableArray's new()
repeat with aFile in theFiles
set theString to getStringFromPDF(aFile)
set patternMatches to getPatternMatches(theString)
try
(countOne's addObject:(patternMatches's objectAtIndex:0))
(countTwo's addObject:(patternMatches's objectAtIndex:1))
on error
display dialog "The correct amounts could not be extracted from a selected file" buttons {"OK"} cancel button 1 default button 1
end try
end repeat
set theCounts to {countOne as list, countTwo as list} --> {{"271,86", "271,86"}, {"0,00", "0,00"}}
-- insert code to calculate totals from countOne and countTwo
on getStringFromPDF(theFile)
set theURL to current application's |NSURL|'s fileURLWithPath:(POSIX path of theFile)
set thePDF to current application's PDFDocument's alloc()'s initWithURL:theURL
return (thePDF's |string|())
end getStringFromPDF
on getPatternMatches(theString)
set thePattern to "\\d{1,3},\\d{2}" --> refine if necessary
set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
set theRanges to (regexResults's valueForKey:"range")
set theMatches to current application's NSMutableArray's new()
repeat with aRange in theRanges
(theMatches's addObject:(theString's substringWithRange:aRange))
end repeat
return theMatches
end getPatternMatches
I was curious if this could be done with a shortcut using the same approach as in my previous script, and it does work. The shortcut will not be used for anything, so I wrote it to accept one PDF and to return the matching strings only.
As far as I know, Apple’s PDFKit doesn’t have an option to retrieve values of specific fields in PDF forms.
There are 3rd-party frameworks that can do this but you won’t be able to access them directly by AppleScript (let alone that they cost hundreds of dollars, although it’s possible that some free open-source libraries that can do this also exist).
I’ll be happy to be proven wrong if there are actually ways to do this using built-in macOS frameworks.