Merge Text files and to save it in UTF-16 format

Hi Guys,

I am using MacIntosh X (Mountain Lion), Apllescript Version 2.6

I Had a folder with multiple doc files and text files. What I need is to search for all text files and to merge it into a single text file and to save the single file into UTF-16 format.

Can it possible to do it through Applescript?

Thanks,
John

As you use a very old system we you can’t use ASObjC.

Here is a script doing the job.

set p2d to path to desktop as text
set destFile to p2d & "mergedFiles.txt"

--set sourceFolder to choose folder
set sourceFolder to p2d & "origin:" as alias

tell application "System Events"
	set theTextFiles to path of files of sourceFolder whose name extension is "txt"
end tell

set theText to ""
repeat with aPath in theTextFiles
	set theText to theText & (read file aPath as «class utf8»)
end repeat
theText

my writeTo_UTF16(destFile, theText, false)

#=====

# Two handlers borrowed from : http://stackoverflow.com/questions/4981598/how-to-create-and-write-utf-16-text-file-using-applescript

on writeTo_UTF16(targetFile, theText, appendText)
	try
		set targetFile to targetFile as text
		set openFile to open for access file targetFile with write permission
		if appendText is false then
			set eof of openFile to 0
			write (ASCII character 254) & (ASCII character 255) to openFile starting at eof -- UTF-16 BOM
		else
			tell application "Finder" to set fileExists to exists file targetFile
			if fileExists is false then
				set eof of openFile to 0
				write (ASCII character 254) & (ASCII character 255) to openFile starting at eof -- UTF-16 BOM
			end if
		end if
		write theText to openFile starting at eof as Unicode text
		close access openFile
		return true
	on error theError
		try
			close access file targetFile
		end try
		return theError
	end try
end writeTo_UTF16

on readFrom_UTF16(targetFile)
	try
		set targetFile to targetFile as text
		targetFile as alias -- if file doesn't exist then you get an error
		set openFile to open for access file targetFile
		set theText to read openFile as Unicode text
		close access openFile
		return theText
	on error
		try
			close access file targetFile
		end try
		return false
	end try
end readFrom_UTF16

As you may see, I assume that the original text files are UTF8 ones.

Yvan KOENIG running El Capitan 10.11.0 in French (VALLAURIS, France) jeudi 1 octobre 2015 14:50:39

Hi Yvan KOENIG,

Thanks for your kind help.

It works like a charm in UTF-8 files. This is what I need. But when there is a file with "Western (Mac OS Roman) format. Its not working. What to do for this kind of format.

Thanks,
John

Hello

When I re-tested the script, I put Western ones and UTF8 ones in the Origin folder.
When I opened the merged.txt file in TextWrangler, every non ASCII characters were correctly displayed and at the bottom of the window, TextWrangler correctly stated that the document was UTF16 encoded.
As far as I know, we can’t attach screenshots here so you must trust me about that.

Its not working.
is the kind of answer which doesn’t help.

May you give a comprehensive description of what is wrong in this case for you ?

I made new tests.
I took a text file encoded as Western (Mas OS Roman) and ran the script below upon it.

set aFile to (path to desktop as text) & "origin:Sans titre.txt"

set textWestern to read file aFile
log result
set textUtf8 to read file aFile as «class utf8»
log result
textWestern = textUtf8

It returned true which means that reading it with no explicit encoding or in UTF8 gives exactly the same result.
If I apply the same script to a UTF8 file, it returns false.
When I compare textWestern and textUtf8, I may immediately see the differences.

Several lines are supposed to start with NOBREAK + space + NOBREAK + space.
The lines are perfect in textUtf8 while in textWestern they start with : "¬ ¬ "

Yvan KOENIG running El Capitan 10.11.0 in French (VALLAURIS, France) jeudi 1 octobre 2015 16:41:07

It sounds like you need to try UTF8, and fall back to MacRoman if it fails, something like this:

set theText to ""
repeat with aPath in theTextFiles
try
   set newText to read file aPath as «class utf8»
on error
   set newText to read file aPath
end try
   set theText to theText & read file aPath as «class utf8»
end repeat

And you can write UTF16 a bit more simply:

set fileRef to (open for access file destFile with write permission)
set eof fileRef to 0
write theText to fileRef as «class ut16»
close access fileRef

Hello Shane

(1) I tested with more than 20 files and never got an error when trying to read text files as UTF8

(2) My memory said that BOM is required for UTF16 encoding. It appears that it’s wrong.

Yvan KOENIG running El Capitan 10.11.0 in French (VALLAURIS, France) vendredi 2 octobre 2015 10:32:17

If you save a file as MacRoman with characters like " and ", you will get an error trying to read it as UTF8.

When you use «class ut16», a BOM is added for you.

Thanks Shane

Today starts well, I already learnt two new features :wink:

Yvan KOENIG running El Capitan 10.11.0 in French (VALLAURIS, France) vendredi 2 octobre 2015 11:48:14

Thanks for All Reply.

Daily I am learning more things from this forum.

Sorry for this!!!. I am not good in English.

Thanks,
John

Hi Guys,

Yvans below script is merging the .txt files and converting the final file into utf16 format.

But My folder contains the files as the below list.

MD1_Bullet.txt
MD3_Legend.txt
MD4_Chart.txt
MD6_Port.txt
MD15_Invest.txt

Whereas the below script is inserting the file as the below list.

MD15_Invest.txt
MD1_Bullet.txt
MD3_Legend.txt
MD4_Chart.txt
MD6_Port.txt

Result for your reference:

tell application "System Events"
	get path of every file of alias "Macintosh HD:Users:John:Desktop:origin:" whose name extension = "txt"
		--> {"Macintosh HD:Users:John:Desktop:origin:MD15_Invest.txt", "Macintosh HD:Users:John:Desktop:origin:MD1_Bullet.txt", "Macintosh HD:Users:John:Desktop:origin:MD3_Legend.txt", "Macintosh HD:Users:John:Desktop:origin:MD4_Chart.txt", "Macintosh HD:Users:John:Desktop:origin:MD6_Port.txt"}
end tell

I want the files to be inserted as date modified sorted list.

Where to freeze it out?


set p2d to path to desktop as text
set destFile to p2d & "mergedFiles.txt"

--set sourceFolder to choose folder
set sourceFolder to p2d & "origin:" as alias

tell application "System Events"
	set theTextFiles to path of files of sourceFolder whose name extension is "txt"
end tell

set theText to ""
repeat with aPath in theTextFiles
	set theText to theText & (read file aPath as «class utf8»)
end repeat
theText

my writeTo_UTF16(destFile, theText, false)

#=====

# Two handlers borrowed from : http://stackoverflow.com/questions/4981598/how-to-create-and-write-utf-16-text-file-using-applescript

on writeTo_UTF16(targetFile, theText, appendText)
	try
		set targetFile to targetFile as text
		set openFile to open for access file targetFile with write permission
		if appendText is false then
			set eof of openFile to 0
			write (ASCII character 254) & (ASCII character 255) to openFile starting at eof -- UTF-16 BOM
		else
			tell application "Finder" to set fileExists to exists file targetFile
			if fileExists is false then
				set eof of openFile to 0
				write (ASCII character 254) & (ASCII character 255) to openFile starting at eof -- UTF-16 BOM
			end if
		end if
		write theText to openFile starting at eof as Unicode text
		close access openFile
		return true
	on error theError
		try
			close access file targetFile
		end try
		return theError
	end try
end writeTo_UTF16

on readFrom_UTF16(targetFile)
	try
		set targetFile to targetFile as text
		targetFile as alias -- if file doesn't exist then you get an error
		set openFile to open for access file targetFile
		set theText to read openFile as Unicode text
		close access openFile
		return theText
	on error
		try
			close access file targetFile
		end try
		return false
	end try
end readFrom_UTF16

Thanks,

John

As I’m not a sooth sayer, I respond to posted question and I am bored when the asker failed to describe correctly what he needs because this force me to do the job twice.

If your original question was correctly written, I wouldn’t have triggered System Events whoch has no sort feature and I would, have written to Finder.

Replace :


ell application "System Events"
   set theTextFiles to path of files of sourceFolder whose name extension is "txt"
end tell

set theText to ""
repeat with aPath in theTextFiles
   set theText to theText & (read file aPath as «class utf8»)
end repeat

by

tell application "Finder"
	set theTextFiles to (sort files of folder sourceFolder by modification date)
end tell
set theText to ""
repeat with aPath in theTextFiles
	set aPath to aPath as text
	try
		set theText to theText & (read file aPath as «class utf8»)
	on error
		set theText to theText & (read file aPath)
	end try
end repeat

After that, the files will be sorted by modification date.

Yvan KOENIG running El Capitan 10.11.0 in French (VALLAURIS, France) jeudi 8 octobre 2015 21:22:29

Hi Yvan,

I am extremely sorry for the lack of information in my 1st post.

And Thanking you for your effort spend twice on this post. The script working perfectly.

Thanks,
John