Strip Diacritcals

This script will convert letters with diacriticals (accent marks) in a string to the letters with no diacriticals.

OS version: Any

property letters_dia_uc : (characters of "ÁÀÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜŸ")
property letters_dia_lc : (characters of "áàâäãåçéèêëíìîïñóòôöõúùûüÿ")
property letters_nondia_uc : (characters of "AAAAAACEEEEIIIINOOOOOUUUUY")
property letters_nondia_lc : (characters of "aaaaaaceeeeiiiinooooouuuuy")

set the_string to "This is some text. ÁÀÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜŸáàâäãåçéèêëíìîïñóòôöõúùûüÿ"
return strip_diacriticals(the_string)

on strip_diacriticals(the_string)
	set search_chars to letters_dia_uc & letters_dia_lc
	set replace_chars to letters_nondia_uc & letters_nondia_lc
	set return_string to ""
	repeat with i from 1 to (count of characters of the_string)
		set string_char to (character i of the_string)
		considering case
			if search_chars contains string_char then
				repeat with j from 1 to (count of search_chars)
					if (item j of search_chars) = string_char then
						set return_string to return_string & (item j of replace_chars)
						exit repeat
					end if
				end repeat
			else
				set return_string to return_string & string_char
			end if
		end considering
	end repeat
	return return_string as string
end strip_diacriticals

I paid attention to this script because it is very efficient in speed, but here I should note 2 things:

  1. In fact, script above is an example of multiple replacement of any letters (or, substrings) in the text, not just diacriticals.

  2. It can be improved in speed and compactness by using AppleScript Text Item Delimiters.

I am using an associative list to simplify the script. Also, I added the replacement of “&” to “##” to the list to demonstrate that the replacement of diacriticals is only a special case of the replacement of any letters (or, substrings).

Users can of course remove {“&”, “##”} and name the handler strip_diacriticals as the author of the previous script.
 

set replaceList to {{"&", "##"}, {"Á", "A"}, {"À", "A"}, {"Â", "A"}, {"Ä", "A"}, {"Ã", "A"}, {"Å", "A"}, {"Ç", "C"}, {"É", "E"}, {"È", "E"}, {"Ê", "E"}, {"Ë", "E"}, {"Í", "I"}, {"Ì", "I"}, {"Î", "I"}, {"Ï", "I"}, {"Ñ", "N"}, {"Ó", "O"}, {"Ò", "O"}, {"Ô", "O"}, {"Ö", "O"}, {"Õ", "O"}, {"Ú", "U"}, {"Ù", "U"}, {"Û", "U"}, {"Ü", "U"}, {"Ÿ", "Y"}, {"á", "a"}, {"à", "a"}, {"â", "a"}, {"ä", "a"}, {"ã", "a"}, {"å", "a"}, {"ç", "c"}, {"é", "e"}, {"è", "e"}, {"ê", "e"}, {"ë", "e"}, {"í", "i"}, {"ì", "i"}, {"î", "i"}, {"ï", "i"}, {"ñ", "n"}, {"ó", "o"}, {"ò", "o"}, {"ô", "o"}, {"ö", "o"}, {"õ", "o"}, {"ú", "u"}, {"ù", "u"}, {"û", "u"}, {"ü", "u"}, {"ÿ", "y"}}
set theText to "This is some text & ÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜŸáàâäãåçéèêëíìîïñóòôöõúùûüÿ"

multipeReplace(theText, replaceList)

on multipeReplace(theText, replaceList)
	script o
		property replaceList : {}
	end script
	set o's replaceList to replaceList
	set ATID to AppleScript's text item delimiters
	considering case
		repeat with searchItem in replaceList of o
			set AppleScript's text item delimiters to item 1 of searchItem
			set theText to text items of theText
			set AppleScript's text item delimiters to item 2 of searchItem
			set theText to theText as text
		end repeat
	end considering
	set AppleScript's text item delimiters to ATID
	return theText
end multipeReplace

 

Ray’s script is pretty old.

Nowadays you can strip diacritics with the help of the Foundation framework

use AppleScript version "2.5"
use framework "Foundation"
use scripting additions

set theString to current application's NSString's stringWithString:"ÁÀÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜŸ"
set theResult to (theString's stringByFoldingWithOptions:(current application's NSDiacriticInsensitiveSearch) locale:(current application's NSLocale's currentLocale())) as text
3 Likes

An AS alternative, 20 years on from the original, would be:

set theText to "This is some text & ÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜŸáàâäãåçéèêëíìîïñóòôöõúùûüÿ"
return stripDiacriticals(theText)

on stripDiacriticals(theText)
	set ATID to AppleScript's text item delimiters
	repeat with this in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
		set AppleScript's text item delimiters to this's contents
		considering case but ignoring diacriticals
			set theText to theText's text items as text
		end considering
	end repeat
	set AppleScript's text item delimiters to ATID
	
	return theText
end stripDiacriticals
1 Like

Here is a cleaned up version

property letters_dia : "ÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜŸáàâäãåçéèêëíìîïñóòôöõúùûüÿ"
property letters_nondia : "AAAAAACEEEEIIIINOOOOOUUUUYaaaaaaceeeeiiiinooooouuuuy"

set the_string to "This is some text. ÁÀÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜŸáàâäãåçéèêëíìîïñóòôöõúùûüÿ"
return strip_diacriticals(the_string)

on strip_diacriticals(the_string)
	local strOffset
	set return_string to ""
	repeat with i from 1 to (length of the_string)
		set string_char to text i of the_string
		considering case
			set strOffset to offset of string_char in letters_dia
			if strOffset > 0 then
				set return_string to return_string & item strOffset of letters_nondia
			else
				set return_string to return_string & string_char
			end if
		end considering
	end repeat
	return return_string as string
end strip_diacriticals

also you were missing one diacritical “”

EDIT here is an even shorter version

property letters_dia : "ÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜŸáàâäãåçéèêëíìîïñóòôöõúùûüÿ"
property letters_nondia : "AAAAAACEEEEIIIINOOOOOUUUUYaaaaaaceeeeiiiinooooouuuuy"

set the_string to "This is some text. ÁÀÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜŸáàâäãåçéèêëíìîïñóòôöõúùûüÿ"
return strip_diacriticals(the_string)

on strip_diacriticals(the_string)
	local strOffset, return_string, aChar
	set return_string to characters of the_string
	considering case
		repeat with aChar in return_string
				set strOffset to offset of (contents of aChar) in letters_dia
				if strOffset > 0 then set contents of aChar to text strOffset of letters_nondia
		end repeat
	end considering
	return return_string as string
end strip_diacriticals

This is a variation of sorts on a script used to strip diacriticals from email addresses.

It uses offset but of a reduced character set… only those letters which can be accented. It also ignores case when determining which letters in the text match that set. This reduces the size of the comparisons. Also, it lets the system figure out what all of the diacriticals are so I don’t have to.

use scripting additions

set test to "This is some text. ÁÀÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜŸáàâäãåçéèêëíìîïñóòôöõúùûüÿ"

stripDiacriticals(test)
--> "This is some text. AAAAACEEEEIIIINOOOOOUUUUYaaaaaaceeeeiiiinooooouuuuy"

on stripDiacriticals(test)
	set caseList to "aceinouy"
	set critList to "aceinouyACEINOUY"
	set charList to characters of test as list
	
	repeat with listPos from 1 to length of charList
		set eachChar to item listPos of charList
		ignoring diacriticals
			if eachChar is in caseList then
				considering case
					set oof to offset of eachChar in critList
				end considering -- of case
				set item listPos of charList to item oof of critList
			end if
		end ignoring -- of diacriticals
	end repeat
	
	return charList as text
end stripDiacriticals