Saturday, June 24, 2017
  • Index
  •  » unScripted
  •  » The Ins & Outs of File Read/Write in AppleScript

#1 2007-12-17 05:00:10 am

Nigel Garvey
Moderator
From: Warwickshire, England
Registered: 2002-11-19
Posts: 4312

The Ins & Outs of File Read/Write in AppleScript

This article is a revised version (November 2007) to correct a couple of inaccuracies. The original appeared in the Summer of 2006. I don't yet have Mac OS 10.5 Leopard, but from what I hear, the information given here is still essentially valid. (If any Intel and/or Leopard users discover any significant departures, do let me know!)

The StandardAdditions OSAX -- part of the standard Mac OS installation -- includes a suite of six commands called File Read/Write. These are AppleScript's direct access to the contents of files. They allow data to be passed directly from a script to a file, or vice versa, without the intervention of an application. This article tells you all you need to know about them -- and perhaps a lot that you don't.

The script examples in this article create and fool around with a couple of files on the desktop called "Aardvark.txt" and "Aardvark.dat". If you already have files with these names on your desktop, and they contain data you want to keep, you should move them somewhere else for safety before trying out the examples!

Before we go into the lurid details, here's a line-up of the commands together with a rough idea of what they're for:

open for access: open an access to a file for the other File Read/Write commands below.
close access: close a previously opened access to a file.
get eof: get the length of a file.
set eof: set the length of a file.
write: write content to a file.
read: read content from a file.

I'll be dealing with these in the order they appear above.

open for access

In order for any of the File Read/Write commands to be able to interact with a file, the file has to be open for access. Unsurprisingly, open for access is the command that achieves this.

... having said which, I must immediately appear to contradict myself by saying that it isn't absolutely necessary to use open for access and its counterpart close access, because each of the other commands is perfectly capable of opening its own access, doing its job, and then closing the access again behind it! The point is, though, that if more than one action is carried out on a file, the repeated opening and closing with each command will slow down the script considerably. Also, certain system-maintained information about progress through the file is discarded with each closure. So open for access opens an access and [i]leaves
it open. It also returns a reference number for that access so that the other commands can be guided straight to it. More of that anon.

open for access also has the property that if it's told to open a file that doesn't exist, it'll create the file itself. It's the only one of the File Read/Write commands that can do this. The files it creates open by default (when double-clicked) in TextEdit (OS X) or SimpleText (OS 9 or earlier), but of course they only make sense in those applications if they contain data that the applications can display.

If you don't happen to have a file called "Aardvark.txt" on your desktop, the following script will create one, open it for access, and then close it again. If you do already have such a file, this script will simply open an access and close it. You may see the file appear on the desktop when it's created, but opening and closing it for access doesn't have any visible effect.

Applescript:

set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

set fRef to (open for access file thePath)
close access fRef

As you can see, open for access's main parameter is a reference to the file to be opened: in this case, file thePath. The reference can be either a file specification (as here) or an alias. As always in AppleScript, an alias can only be used if the file already exists. With some versions of AppleScript, it's possible to get away with just a path, but this can't be relied upon and it's a good idea in any case to get into the habit of using file specifications or aliases with AppleScript and StandardAdditions commands.

For completeness, I have to mention that open for access will also accept just a file name as its parameter. In this case, the location for the file is taken to be the "current directory". In OS 9 and earlier, the "current directory" was the folder containing the application running the script. In OS X, it's the root directory of the startup disk. Since you're not really supposed to put your own files there, a filename-only parameter is to all intents useless and I won't mention it again.

open for access returns a reference number for the access it's just opened. In the script above, this number is stored in the variable fRef and is then used as the parameter for close access.

It's important to understand a few things about how the File Read/Write commands work.

open for access doesn't so much "open a file" as "open an access to a file." If it's used in a tell statement to a particular application, the access that's opened is granted to that application; otherwise it's given to the "default" or "current" application. (That's the application running the script or another application that's been declared as the script's "parent".) Only the application that "owns" an access can make use of it or close it.

Applescript:

set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

-- Tell the Finder to open the file for access.
-- (Surprisingly, this works. Presumably, 'file thePath' is compiled as an
-- AppleScript file reference after 'open for access, not as a Finder one.)
tell application "Finder" to set fref to (open for access file thePath)
try
   -- Try closing the access with the current application instead of with the Finder.
   close access fref
on error msg
   -- As far as the current app's concerned, the file's not open.
   display dialog msg buttons {"OK"} default button 1
end try
-- But as far as the Finder's concerned, it is.
tell application "Finder" to close access fref

Since there's no point in telling any application but the current one to execute any of the File Read/Write commands, and since doing so adds to the execution time and to the possibility of error, it's better not to use any of these commands in application tell statements.

It's possible -- though not usually desirable -- to have more than one access open to the same file at the same time, even within the same application. Accesses are handled by the operating system's File Manager, which maintains private information about each one that includes a "file mark". The file mark indicates the byte in the file at which the next read or write will begin by default if that access is used. When an access is opened, its file mark is set for the first byte in the file -- even if there are no bytes in the file at that stage! After a read or write, the file mark is advanced to the byte after the last one just read or written. The next read or write with that access will begin there unless another start byte is specified. It follows that the file mark will sometimes indicate a non-existent byte immediately beyond the end of the file. Trying to read from there will cause an error, but writing from there's just fine.

The file parameter for each of the other five File Read/Write commands can be either an access number returned by open for access or an alias or file specification representing the file itself. The execution process differs slightly according to which kind of parameter is used:
• If the parameter's an access number, the command uses the specified access. It's fast and unambiguous.
• If the parameter's an alias or file specification, the command attempts to match the file and application with a previously opened access. This obviously adds a little to the execution time.
      • If a matching access is found, that access is used. If there's more than one matching access, the first one found is the one used -- which might not the one assumed by the scripter!
      • If there's no matching access (and the command isn't close access) an access is opened, the command is executed with it, and the access is then closed again. This is the slowest possibility of the lot. On the other hand, the combination of open-act-close in one command usually executes a bit faster than the three commands individually, so it's ideal for one-off actions such as reading a file once then leaving it alone. There's also no danger of the access being accidentally left open if an error occurs. But it does depend on there being no existing access open to the file with the application involved.

The ramifications of all this (if you're still with me) are:
• If you've opened a file with open for access, it's both faster and safer to use the returned access number, not an alias or a file specification, with the other File Read/Write commands.
• If you only want to perform a single action on a file, it's faster and (often) safer not to open it for access first, but simply to use the relevant command with an alias or file specification.
• Mixing methods with the same file and the same application requires a little circumspection.

To illustrate how things can go wrong: say, for example, you're trying out two scripts in Script Editor that both read from the same file. The script in one window opens the file for access, reads it through, and then crashes for some reason before it can close the access. The still-open access belongs to Script Editor, not to the script, so if the script in the other window then tries to read the file using an alias or file specification, the attempt will be routed through that access. The result is an end-of-file error, because the file mark for that access was left pointing beyond the end of the file by the first script. You might experience similarly puzzling errors if a script leaves an access open in Script Menu or some other application -- and waste a lot of time trying to debug the wrong script!

And now back to open for access itself.

If you intend to use the write or set eof commands with the access being opened, the changes to the file have to be enabled with open for access's optional write permission parameter.

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

set fRef to (open for access file thePath with write permission)
-- Do stuff that physically affects the file.
close access fRef

If you don't need write permission, you could, if you felt like it, open the access explicitly without write permission. But it's less work -- and less confusing for the casual onlooker -- simply not to mention it.

Attempting to use either write or set eof with an access that doesn't have write permission will result in a error. However, if there's no access currently open to the file at all, the access that's opened automatically (as described above) will get the necessary permission.

Although there may be several accesses open to a file at any one time, only one of them can have write permission. If there's an access open with write permission, no other write-permission accesses can be opened to that file, with any application, until it's closed.

close access

A script that opens an access to a file must close it again before terminating. close access is the command that does this. Just supply it with the relevant access number or file reference. It only handles one access at a time, so every access that's opened with open for access will need a corresponding close access command.

It's an obviously possibility that something might go wrong while a file's open for access and stop the script before it reaches the close access line. It's thus a very good idea to enclose everything that happens while the access is open in a try block, so that the script keeps going long enough to tidy up. The geography of this will depend on how you want the script to react to the situation. Here are three possibilities:

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

(* If there's an error, recover and continue from where the access is closed. *)
set fRef to (open for access file thePath)
try
   -- Read/write stuff here.
end try
close access fRef
-- Rest of script.

(* Ditto. Here we think there may even be a problem opening the access. *)
try
   set fRef to missing value
   set fRef to (open for access file thePath)
   -- Read/write stuff here.
end try
if (fRef is not missing value) then close access fRef -- If fRef _is_ missing value, the access wasn't opened.
-- Rest of script.

(* If there's an error, delay it just long enough to close the access. Otherwise continue. *)
set fRef to (open for access file thePath)
try
   -- Read/write stuff here.
on error errMsg number errNum
   close access fRef
   error errMsg number errNum
end try
close access fRef
-- Rest of script.

Another way to close every access a script has opened is to quit the applications associated with them. This is a good way to to get yourself out of trouble if you've made a mess while developing the script. However, it's very bad form to write scripts that rely on things like this to do the housekeeping. If a script opens an access to a file, it should close it itself before terminating.

Before tackling the read and write commands, it would be useful to take a quick look at get eof and set eof. "eof" is a common computer acronym for "end of file". More precisely, it's the insertion point after the last byte in the file. I'll deal with this nicety later in the description of write.

get eof

According to an ancient Apple document, AppleScript Scripting Additions Guide, get eof returns "the offset, in bytes, of the end of a specified file from the beginning of the file." What this means in practical terms is that it returns the number of bytes in the file. The StandardAdditions dictionary says that the result is a "double integer", but you don't need to worry about that. To your script, it'll look like a real with a whole-number value. (In OS 10.5 Leopard, the result's apparently now returned as an integer -- presumably as long as there are fewer than 536,870,912 bytes in the file.)

Applescript:


-- The file we created earlier, but to which we haven't yet written anything.
set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

-- Using 'open for access' (best when more than one action is to be performed on the file):
set fRef to (open for access file thePath)
try
   set fileLength to (get eof fRef)
end try
close access fRef

return {fileLength, class of fileLength}
--> {0.0, real}

-- Without using 'open for access' (best when only one action is performed on the file):
set fileLength to (get eof file thePath)

return {fileLength, class of fileLength}
--> {0.0, real}

set eof

set eof sets the length of a file. This affects the file itself and any other accesses that are open to it. The most common (and probably the most sensible) use for it is to "empty" a file before writing fresh data to it. This is achieved by setting the the file's length to 0. (Or, more pedantically, I suppose, by setting "the offset, in bytes, of the end of the specified file from the beginning of the file" to 0.)

The reason for emptying the file first, rather than simply overwriting the old data, is that although a file gets longer as more bytes are written to it, it doesn't get shorter when fewer bytes are written to it. If the new content happens to be shorter than the old, the file's length stays the same and the end of the old content remains in the file beyond the end of the new. Setting the length to zero beforehand allows a completely fresh start. If you only want to overwrite part of the file, of course, then you won't do this.

set eof takes two parameters. Besides the obligatory access number or file reference, it also has a parameter labelled to. This is used to specify the file length in bytes. The dictionary says that the value should be a "double integer", but again, don't worry about that. Any representation of a whole number will do: integer, real, or even string. The necessary coercions are performed automatically by the command. (In fact, even fractional values are rounded to the nearest whole number, as happens with AppleScript's real-to-integer coercion. File Read/Write did this rounding-by-coercion long before AppleScript itself could!)

As I mentioned earlier, set eof makes changes to the file and thus requires write permission. In the following script, the first three write commands are performed while the file's not open for access. This is merely to demonstrate that each can open its own temporary access, with write permission if necessary, and that each new access has its file mark set for the beginning of the file.

Applescript:


-- The file we created earlier, but to which we haven't yet written anything.
set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

(* Write some text to the file, overwrite it with shorter text, observe the result. *)
write "Hello World" to file thePath
write "Disney" to file thePath

display dialog (read file thePath) --> "DisneyWorld"

(* Ditto, but set the file's length to 0 before the second write. *)
write "Hello World" to file thePath
set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write "Disney" to fRef
end try
close access fRef

display dialog (read file thePath) --> "Disney"

It's not a bad idea to set the file's length to 0 -- simply as a formality -- every time you want to replace its contents completely, even if you know that the new data won't be shorter than the old or that the file doesn't contain anything anyway.

Any current file mark that's greater than the new length will be reduced to the new length plus one. If the new length is 0, the next write with any access will be from the beginning of the file.

If the file's shortened and then restored to its original length, the data lost in the shortening are not recovered.

write

This, as you might expect, is the command that puts data into a file. It too needs an access number or file reference, but for the sake of English-like syntax, this is relegated to a labelled parameter: to. The main, unlabelled parameter is the data to be written to the file. Since writing to a file obviously changes it, write permission is needed.

Applescript:

set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write "Hello" to fRef
   write " World" to fRef -- Continues from the file mark after the previous write.
end try
close access fRef

read file thePath
--> "Hello World"

The data written can be of virtually any AppleScript class, including string, Unicode text, list, record, integer, real, date, alias, file specification, or data. Even class names themselves can be written, which could be handy with self-contained values like missing value, null, or Wednesday. Special treatment is needed to write true or false as discrete values, but even they are OK in a list or a record:

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.dat"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write {1, "Hello" as Unicode text, {a:"World"}, true, Thursday} to fRef
end try
close access fRef

read file thePath as list
--> {1, "Hello", {a:"World"}, true, Thursday}

References of any kind don't like being written to file (with write at any rate). AppleScript references are fully resolved to the value of the item at the end of the reference chain. Application references and AppleScript references in lists cause errors.

write also has three optional labelled parameters, starting at, for, and as. Being labelled parameters, they can be arranged in any order, but must follow the label-less data parameter. If none of them is used, write writes from the current file mark for as many bytes as it takes to represent the given data in the form presented. This will usually be what you want. (But see the Leopard note at the end of this section.)

If you want to override the file mark, you can use starting at to specify the byte at which you want the write to commence. The value should be between 1 and 1-more-than-the-file's-length. Alternatively, a negative value can be used to index the byte from the end of the file instead of from the beginning. The value 0 appears to be valid too. It has the same effect as 1, but I don't know the point of it.

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write "Hello World" to fRef
   -- Overwrite the first byte.
   write "Y" to fRef starting at 1
   -- Overwrite the last six bytes and continue.
   write "w Submarine" to fRef starting at -6
end try
close access fRef

set theData to (read file thePath)
--> "Yellow Submarine"

As with set eof's to parameter -- and, indeed, as with any of the File Read/Write parameters that take a number value -- the "double integer" demanded by the dictionary can in fact be any number representation that AppleScript can offer: an integer, a real, a numerical string, a numerical Unicode text, or a single-item list containing any of these. An integer will normally suffice, though!

There's also a useful keyword value that can be used here: eof. This shouldn't be confused with the get eof and set eof commands, which are two-word commands in their own right, not the commands get and set followed by eof. eof as a parameter value is simply a token that write understands to mean "wherever the end of the file happens to be at the moment". Data written starting at eof is appended to the end of the file.

Applescript:

set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write "Hello" to fRef
end try
-- For this demo, close this access and open another,
-- so that we start with a fresh file mark.
close access fRef
set fRef to (open for access file thePath with write permission)
try
   -- Start writing from the end of the file.
   write " World" to fRef starting at eof
end try
close access fRef

set theData to (read file thePath)
--> "Hello World"

It sometimes causes confusion that starting at eof causes a write to begin after the last byte in the file, whereas get eof returns the index of the last byte itself. The reason for this is that "eof" is notionally the point where the last byte ends, not the last byte itself. Writing from there appends data to the end of the file. get eof, you may remember, returns the offset, in bytes, of the end of the file ("eof") from the beginning of the file (where the first byte starts). This distance is the same as the number of bytes in the file and thus also the index number of the last byte. But "eof" itself is at the end of the last byte.

You may sometimes want to initialise the file mark to a certain value before carrying out a series of writes. It's not possible to access it directly, but there's a trick that can be used to set it indirectly: write a zero-length string to the file starting at the byte where you want the later writes to begin:

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write "Hello World" to fRef
   -- Initialise the file mark to the 7th byte by writing nothing, starting there.
   write "" to fRef starting at 7
   -- Successive writes continue from there..
   write "everyone" to fRef
   write ", everywhere." to fRef
end try
close access fRef

set theData to (read file thePath)
--> "Hello everyone, everywhere."

Writing nothing to a file doesn't need write permission, so this trick could also be used to prepare for a series of reads from an access that doesn't have write permission. However, read has a similar trick.

The for parameter is used to govern how many bytes of the given data are written to the file. With plain text, "bytes" equals "characters". With Unicode text, which has two or more bytes per character, "bytes" equals "half that number of characters or less". for is ignored when the write involves a list or a record, but it can be used with other data types. write 7 to fRef for 2 writes the first two bytes of the four representing the integer 7 to the file. If you can think of a good use for this, the possibility's there!

If you were feeling flamboyant, you could use the for parameter with a value of 0 to engineer the zero-length write for the file mark initialisation trick:

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write "Hello World" to fRef
   -- Initialise the file mark to the 7th byte by writing none of this text, starting there.
   write "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch" to fRef for 0 starting at 7
   -- Successive writes continue from there..
   write "everyone" to fRef
end try
close access fRef

set theData to (read file thePath)
--> "Hello everyone"

write's as parameter is similar to an AppleScript coercion, in that it causes the data to be written to file as some other type. Without it, data are written in a form that represents whatever they are already. (Except for text in Leopard, apparently. See the Leopard note at the end of this section.) This isn't necessarily the same as the AppleScript format. For example, two of the thirty-two bits in an AppleScript integer are used for the code that identifies it as being an AppleScript integer, so AppleScript integers only have 30-bit signed values. But an integer value written to file is a full 32 bits wide.

With the as parameter, write can mimic some of AppleScript's coercions and can do a few that the core language itself can't. For instance, not only can reals be written to file as integer, or integers as real; but either of these (or their text equivalents) can be written as double integer (eight bytes), as extended real (ten bytes), as short (two bytes), or as small real (four bytes), none of which exist in the AppleScript language itself. (as short can also be rendered as short integer or as small integer.) If a number's written as a type that's too small to represent it, information is lost � typically the high-order bytes of an integer or the precision of a real. These non-AppleScript number classes are really for specialist use.

When numbers are written to a file as string or as Unicode text, the numeric text produced has greater numeric precision, and absorbs more digits before being rendered as "scientific notation", than the result of the equivalent AppleScript coercion.

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.dat"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write (2 ^ 53) as string to fRef
end try
close access fRef

set applescriptCoercion to (2 ^ 53) as string
set writeCoercion to (read file thePath)

return {applescriptCoercion, writeCoercion}
--> {"9.007199254741E+15", "9007199254740992"}

The AppleScript values true and false can't be written to file as themselves unless they're in a list or a record, but they can be written individually as boolean (!), which in this case is a single-byte value of 1 or 0.

Among the coercions the as parameter can't do, it can't cause lists to be written as string and doesn't recognise as text at all! (Except in Leopard. See the note at the end of this section.) In view of the possibilities for confusion, my own instinct would be to stick with AppleScript coercions and only to use write's as parameter on the rare occasions when double integers, shorts, booleans, etc. were required. (But again, see the note about text in Leopard below.) In the areas where they overlap, the parameter's very slightly faster in operation than the AppleScript coercion, possibly because its effect is combined with the writing to file.

The way to differentiate between an AppleScript coercion and the as parameter in a write command is to parenthesise the coercion.

Applescript:


-- Incomplete script, just to show grammar.

-- 'write's 'as' parameter:
write 12345.5 as integer to fRef
-- or:
write 12345.5 to fRef as integer

-- AppleScript coercion:
write (12345.5 as integer) to fRef

as also accepts four-character strings like "utf8" and "isot" as synonyms for class codes like <<class utf8>> and <<class isot>> or any equivalent keywords.

Applescript:


-- Incomplete script, just to show grammar.

-- These all do the same thing.
write "Hello" as Unicode text to fRef
write "Hello" as <<class utxt>> to fRef
write "Hello" as "utxt" to fRef

Leopard note. According to the release notes, AppleScript 2.0 in Leopard has only one text class, called text, which is "functionally equivalent to the former Unicode text class". The terms string and Unicode text can still be used, but are now synonyms for the new text class. However, they may be implemented as separate types by individual applications.

Prior to Leopard, write would write text data in the form presented -- ie. string/text data as string, Unicode text data as Unicode text -- unless the as parameter was used to specify something else. If I've correctly understood the blurb and a test kindly performed for me by Stefan Klieme, write now writes the new text class as follows:
• Without an as parameter: always like the old as string.
as string: ditto.
as text: ditto. (This doesn't error in Leopard.)
as Unicode text: as Unicode text.
For the best compatibility across systems, it's now recommended that write's as parameter is always used when writing any sort of text to file.

read

Finally, we come to the command that reads data from the file and returns it to the script. read's main parameter is once again either an access number or a reference to a file. There's also a large retinue of eight optional labelled parameters -- from, for, to, before, until, using delimiter, using delimiters, and as -- but these can't all be used at the same time and two of them are the same anyway. In the absence of any of them, read reads from the current file mark to the end of the file and assumes that what it's reading is plain (string) text.

The from parameter is directly equivalent to write's starting at. It overrides the file mark and forces the read to begin at a certain byte in the file. As with starting at, positive or negative indices can be used and there's nothing a script can do to satisfy the dictionary's insistence on a double integer. For reasons that I hope will be obvious by now, trying to read from eof or later is a very silly idea.

for is like the for of write. It specifies how many bytes should be read from the point where the read starts. If there aren't that many bytes before the end of the file, the read simply stops at the end of the file without erroring. With a value of 0, and in conjunction with the from parameter, this for too can be used to initialise the file mark.

Applescript:


-- Incomplete script, just to show grammar.

-- Initialise the file mark to the 7th byte by reading nothing, starting there.
read fRef for 0 from 7

to specifies the number of the byte in the file with which the read should end. The kinds of numeric value accepted are the same as for from and the indexed byte must lie within the file. The eof keyword can be used to read explicitly to the end of the file; but since it can only be used with data where that's the default anyway, there's usually no point.

The before and until parameters are only useful when reading plain (string) text. They both cause the read to stop at the next instance of a particular character in the file. If there's no such instance, the read stops at the end of the file. The parameter value should be the character in question and it's case sensitive. If the parameter has more than one character, only the first is heeded. The difference between before and until is that before omits the specified character (if found) from end of the returned text, whereas until doesn't. In both cases, the file mark is advanced to the byte after the specified character (if found) or to the end of the file.

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write "The Aardventures of Aaron the Aardvark" to fRef
end try
close access fRef

return {read file thePath before "a", read file thePath until "a"}
--> {"The A", "The Aa"}

These two parameters are often used to read a text file, say, a paragraph at a time, or perhaps a database record at a time, using the specified character as a delimiter. It's a stop and start process and every byte has to be tested as it comes off the disk. It's generally much faster, if the computer has enough memory, to read the entire file in one go and examine the text in memory.

The for, to, before, and until parameters can't be used together in any combination in the same command, even if their values don't conflict.

The StandardAdditions dictionary lists both using delimiter and using delimiters as optional parameters for read, the former taking a string value, the latter a list of strings. However, whichever one you type, it'll compile as using delimiter and will accept either a string (or Unicode text) or a list of strings (or Unicode texts). The purpose of using delimiter is to return text from the file as a list of "text items", using the specified character(s) as delimiter(s).

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write "The Aardventures of Aaron the Aardvark" to fRef
end try
close access fRef

read file thePath using delimiter "a"
--> {"The A", "rdventures of A", "ron the A", "rdv", "rk"}

This is often likened to AppleScript's text item delimiters; but it's probably more accurate to think of it as the before parameter on steroids. The result it returns is the same as if a list had been filled with the results of one or more consecutive reads with before. Like before, using delimiter only works properly when reading plain text and the delimiters are case sensitive. Apart from the fact that it only applies to text being read from a file, there are other differences between it and AppleScript's text item delimiters:

• Multiple delimiters are allowed.
• Only the initial characters of delimiters are used.
• A delimiter of "" doesn't split the text into individual characters -- or at all, in fact.
• Delimiters aren't cut from between text items so much as from the ends of them. If the text that's read ends with an instance of a delimiter, that character is simply dropped from end of the last text item. There's no "empty text item" ("") at the end of the list. If "" does appear at the end of the list, it's because both of the last two characters read were delimiters. The single character remaining after the penultimate delimiter has been dropped from itself to leave ""!

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write "The Aardventures of Aaron the Aardvark" to fRef
end try
close access fRef

{read file thePath using delimiter "k", read file thePath using delimiter {"r", "k"}}
--> {{"The Aardventures of Aaron the Aardvar"}, {"The Aa", "dventu", "es of Aa", "on the Aa", "dva", ""}}

Since using delimiter doesn't govern where the reading stops, it can be used in conjunction with any one of the parameters that do: for, to, before, or until.

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write "The Aardventures of Aaron the Aardvark" to fRef
end try
close access fRef

read file thePath from 5 using delimiter {"r", "a"} before "f"
--> {"A", "", "dventu", "es o"}

Like before and until -- and for the same reasons -- using delimiter slows down the retrieval of data from the file. If it's feasible to read the entire file (or a large portion of it) in one go and then use AppleScript's text item delimiters, it's worth considering. It's the only option anyway with Unicode text.

read's last optional parameter, as, isn't a coercion. It's an indication of how the data in the file should be understood. Unless told otherwise, read assumes that what it's reading is plain text and returns it to the script as such. But if, say, an AppleScript list has been written to the file, the file will contain data relating to the structure of the list and to the items in it. In this case, you need to specify that the data should be read as list. Thus forewarned, read will return a beautifully reconstituted list instead of a string of mainly unprintable or invisible characters.

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.dat"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write {1, 2, 3, 4, 5} to fRef
end try
close access fRef

-- Read the data as text (the default) ...
set gobbledygook to (read file thePath)
-- ... and 'as list'.
set aList to (read file thePath as list)

{gobbledygook, length of gobbledygook, aList}
--> {"listlonglonglonglonglong", 68, {1, 2, 3, 4, 5}}

The kinds of thing as which it's possible to read data are the same as those that can be written to the file, such as list, record, integer, real, date, Unicode text, or string. (Unlike write, read understands as text too!) There are also File Read/Write's own areas of expertise: double integer, short (or short integer or small integer), extended real, small real, or boolean.

I presume that in Leopard, however text is read -- as string or as Unicode text -- the result you see will be Leopard's text class.

It's reported that Intel machines don't currently (as at November 2007) read correctly as date and (possibly) as double integer. (I don't know if there are issues with any other data types. People tend to report only what's not working for them.) There appear to be two workarounds for dates. Either: write and read them as <<class isot>> instead and coerce the <<class isot>> back to date after it's read; or: write and read them as list and extract the date from the result. If your scripts or files might be used on Intel machines, bear this in mind.

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write (current date) as «class isot» to fRef
end try
close access fRef

set readDate to (read file thePath as «class isot») as date

-- Or:

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write (current date) as list to fRef
end try
close access fRef

set readDate to item 1 of (read file thePath as list)

Data can also be read simply as data, which returns a data object of type "rdat". If you know what to do with that, you're laughing.

Another as you might find useful is as type class, which reconstitutes class names and class values such as months, weekdays, and missing value.

Applescript:

set thePath to (path to desktop as Unicode text) & "Aardvark.dat"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write Wednesday to fRef
end try
close access fRef

read file thePath as type class
--> Wednesday

For advanced, specialist, or eccentric users, as also accepts four-character string codes such as "TEXT", "utxt", "PICT", "isot", etc. The four mentioned here are equivalent to string («class TEXT»), Unicode text («class utxt»), picture («class PICT»), and the keywordless «class isot».

There are several constraints and exceptions connected with read's as parameter, depending on the kind of data you're trying to read:

• An obvious one this: the read must begin at the first byte of a unit of the specified data. Reading, say, Unicode text or numbers from their second bytes will give rubbish results. Reading lists or records from their second bytes will cause errors.
• When reading numbers or booleans, read returns as many of the specified items as can be got from the number of bytes read. Reading four bytes as integer returns an integer; reading forty bytes as integer returns a list of ten integers. The length of the read should therefore be a an exact multiple of the byte-size of the class involved.
• When reading as double integer, as short, as extended real, or as small real, read interprets the data in the file as the specified type, but returns the numbers to the script in the most appropriate Applescript integer or real form.
• Reading as boolean returns true for bytes whose value is not zero, false for bytes whose value is zero.
• When reading a list or a record, the length of the read can't be controlled by the scripter, who's not expected to know how many bytes there are in any particular list or record. read reads exactly as many bytes as it needs to complete the list or record and then stops, leaving the file mark at the byte after the list. If there's anything else in the file after the list or record, another read is needed to get it. Trying to use the for or to parameter while reading a list or record will cause the script to error.
• Trying to read as list or as record anything that isn't a list or a record will also cause the script to error.
• When reading dates, aliases, file specifications, or type classes, read only returns one of the specified items at a time, but advances the file mark either to the end of the file or to the byte after the specified reading range. (This is probably true with other data too, but these are the ones I've tried.) Knowing that dates are eight bytes long and that type classes are (often) four bytes long makes catering for several of them in a file easy enough: just use the for parameter with each read. But aliases and file specifications are of unforeseeable lengths, so you'd need to be inventive.

I've saved the whackiest as exception until last. You may need to concentrate here. Notwithstanding all I've said above, if the as parameter is anything other than string, text, or Unicode text, and it's used in conjunction with any of the "delimiter" parameters (before, until, or using delimiter), read doesn't read the data as the specified type. It reads the data as string in the manner specified by the delimiter parameter and attempts to coerce the string(s) in the result to the type specified by the as parameter.  smile

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write "0 1 2 3 4 5 mon " to fRef
end try
close access fRef

read file thePath as list before "3"
-- The string "0 1 2 " is coerced to list, not read as list data.
--> {"0 1 2 "}

read file thePath as real before "m" using delimiter space
-- The space-delimited text items of "0 1 2 3 4 5 " are coerced to reals.
--> {0.0, 1.0, 2.0, 3.0, 4.0, 5.0}

This no doubt seemed very clever when it was introduced -- and may occasionally be useful even today, especially with the using delimiter parameter. But it's not widely known and is a bit inconsistent in operation. Looking at the using delimiter example above, you'd expect read file thePath as list before "m" using delimiter space to return a list of lists: {{"0"}, {"1"}, {"2"}, {"3"}, {"4"}, {"5"}} -- but in fact it returns the same list of strings that would have been produced had the file been read as text before "m" using delimiter space. (Apparently, it returned a list of lists briefly under AppleScript 1.6, but this was considered a bug and was "fixed".) The coercions possible with this usage are largely those that write's as parameter can do with strings -- ie. to list, integer, double integer, real, etc. -- but double integer and other exotic number coercions are returned as the best AppleScript equivalents. This (mis)use of the as parameter only produces sensible results if the data in the file are plain text. The technique can't be used to coerce plain text to Unicode text. (Actually, it can on my Jaguar machine, but not in Tiger.)

The as parameter is occasionally useful for reading stuff back from a file in a form other than the one in which it was written. This isn't so much a coercion as a reinterpretation of the data. For instance, if you use System Events's Property List Suite (introduced with Tiger, I think) to get a stored alias from a plist file, the bytes that make up the alias might be returned as a data object rather than as a functioning alias. With the File Read/Write commands, you can write the data object to a temporary file and read it back as alias. If you want to see some code for this, this thread in Macscripter.net's OS X forum has a couple of examples.

And that's nearly all I have to say about the read command. As with write, read's labelled parameters, when used, must come after the unlabelled parameter, but otherwise may be given in any order.

Before finishing with read altogether, here's a little bit of fun, setting the to parameter lower than from when reading plain text:

Applescript:


set thePath to (path to desktop as Unicode text) & "Aardvark.txt"

set fRef to (open for access file thePath with write permission)
try
   set eof fRef to 0
   write "Hello World" to fRef
end try
close access fRef

read file thePath from eof to 1
--> "dlroW olleH"

read file thePath from eof to 1 using delimiter "o"
--> {"rld", " w", "Hell"}

There's a similar effect when reading multiple numbers or booleans from a file. This is apparently the read result being reversed rather than the file being read backwards. Afterwards, the file mark points to the byte after the one at the high-index end of the read.

And that concludes not only the section about read, but this overview of the File Read/Write commands. Like using BBEdit to write a Web page, they offer a lot of control that you may wish you didn't have. But for straightforward reads and writes, as long as you understand the basics, they're easy to use, fast acting, and only dangerous if you fool around with the wrong files! Have fun.

Last edited by Nigel Garvey (2010-06-27 01:02:37 pm)


NG

Filed under: write, file, Read, Garvey

Online

 
  • Index
  •  » unScripted
  •  » The Ins & Outs of File Read/Write in AppleScript

Board footer

Powered by FluxBB

[ Generated in 0.117 seconds, 8 queries executed ]

RSS (new topics) RSS (active topics)