I have InDesign documents that have XML tagged text inside of text frames. There is a unicode character that is used to delimit the tags on tagged words/phrases. If I get all the text from a text frame, then this unicode character can come through in the string, and I need to strip it out. Example: “This sentence [has tagged] text.” But the brackets are the invisible unicode character.
In ASOC I have been doing it like this:
set unicodeTagChar to «data utxtFEFFFEFF»
set unicodeTagChar to unicodeTagChar as Unicode text
set finalXMLString to replaceText_oldChr_newChr_(finalXMLString, unicodeTagChar, "")
using this AS code it works:
on replaceText_oldChr_newChr_(theText, theBadChar, theReplaceChar)
set AppleScript's text item delimiters to theBadChar
set newText to text items of theText
set AppleScript's text item delimiters to theReplaceChar
set newText to newText as string
set AppleScript's text item delimiters to "" --reset to "normal" TID's
return newText
end replaceText_oldChr_newChr_
What I really want to do is replace the AS code with an Objective C method. I have a class I’m building to do a lot of “helper functions”. I’ve got this code, which works OK on normal characters:
+(NSString *)replaceText:(NSString *)theText oldChr:(NSString *)theBadChar newChr:(NSString *)theReplaceChar
{
//replaces unwanted chars with a specific new char
NSMutableString *mutText = [NSMutableString stringWithString:theText];
[mutText replaceOccurrencesOfString:theBadChar
withString:theReplaceChar
options:NSCaseInsensitiveSearch
range:NSMakeRange(0, [mutText length])];
return (NSString *)mutText;
}
but when I call it like this in my script:
set finalXMLString to CocoaHelpers's replaceText_oldChr_newChr_(finalXMLString, unicodeTagChar, "")
tell me to log "STRING= " & (finalXMLString as string)
RESULT: This sentence ?has tagged? text.
Encoding confuses me and I don’t know what to do for the ObjC code (could I use 0xFEFF or U+FEFF or U0999 format? how do I even find the equivalent to «data utxtFEFFFEFF»?). I am also hesitant to just strip out a question mark from the final string, in case my text would contain a real question.