Reading binary files, byte-wise, in AppleScript 2.x ?

JDM · May 10, 2009, 10:01pm

This would seem to be an elementary problem, but I have not been able to find a conclusive discussion about it…

Now that characters are (always) more than one byte in AppleScript 2.x, how does one read binary files byte-wise?

In the past, I would read as text, then use ASCII Number to get a byte value. Surely many many scripters did the same.

Using the new “ID” of a unicode character works of course, but reading a binary file as text results in a character stream that is problematic. I suppose that after reading as UTF-16, the bytes of the two byte characters might be teased apart, but based on my minimal and foggy recollection of unicode encoding, I believe there are certain byte sequences that would not reflect the true contents of the binary file.

The Leopard AS release note describes the change in terms of how it affects dealing with text files, but seems to ignore the fact that some of us would process binary files using “ASCII Number” of some byte (ie. MacRoman single byte character) read from a file.

Writing binary files amounts to the same issue, but I expect the solution for reading will work for writing too.

My current workaround is to read “as unsigned integer” (which is 4 bytes per value), then tease the bytes apart.

Ideally, there would be an “unsigned byte” class, but I can find no such thing… perhaps I should be looking for something with a different name?

I realize that one can read “as data”, but the result seems impervious to manipulation.

I see that the feature list of Satimage’s Smile has “read binary” and “write binary”, and no doubt there is an OSAX that can help (which one?), but I really want to build an applet that can stand-alone.

I suspect I may be missing something obvious… clue-by-fours appreciated.

James

StefanK · May 11, 2009, 12:10pm

Hi,

just read the bytes.
Assuming you have a plain text file MacRoman encoded named testFile.txt on desktop,
this script displays each character and its ASCII decimal value


set theFile to ((path to desktop as text) & "testFile.txt")
set endOfFile to get eof file theFile
repeat with i from 1 to endOfFile
	set a to read file theFile from i to i
	display dialog a & " (" & id of a & ")"
end repeat

JDM · May 11, 2009, 1:13pm

Thanks for your reply…

My apologies, I see I wasn’t clear about the data… it is not MacRoman text, it is arbitrary byte values.

For instance, if I have a file containing the 256 bytes 0x00 through 0xFF, and run this:

set thefile to choose file
set fileref to open for access thefile
set bytelist to {}
repeat with i from 0 to 255
	set onebyte to read fileref for 1
	set end of bytelist to id of onebyte
end repeat
close access fileref
return bytelist

then the result is this (which is not 0 through 255 that I would have got in AppleScript 1.x):

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 196, 197, 199, 201, 209, 214, 220, 225, 224, 226, 228, 227, 229, 231, 233, 232, 234, 235, 237, 236, 238, 239, 241, 243, 242, 244, 246, 245, 250, 249, 251, 252, 8224, 176, 162, 163, 167, 8226, 182, 223, 174, 169, 8482, 180, 168, 8800, 198, 216, 8734, 177, 8804, 8805, 165, 181, 8706, 8721, 8719, 960, 8747, 170, 186, 937, 230, 248, 191, 161, 172, 8730, 402, 8776, 8710, 171, 187, 8230, 160, 192, 195, 213, 338, 339, 8211, 8212, 8220, 8221, 8216, 8217, 247, 9674, 255, 376, 8260, 8364, 8249, 8250, 64257, 64258, 8225, 183, 8218, 8222, 8240, 194, 202, 193, 203, 200, 205, 206, 207, 204, 211, 212, 63743, 210, 218, 219, 217, 305, 710, 732, 175, 728, 729, 730, 184, 733, 731, 711}

I suppose I could map those IDs back into the expected values, but I wonder if this is truly independent of the byte sequence in the file, and of course I was hoping for something less obscure.

Nigel_Garvey · May 11, 2009, 3:18pm

Hi, JDM. Welcome to the fora.

Although finicky, that’s actually much faster than using ‘ASCII number’ on each byte. The caveats with the handler below are that it assumes that ‘read’ treats integers as big-endian even on Intel machines (which I believe is true) and it can’t work with files containing less than four bytes:

on getByteValues(thisFile) -- thisFile's an alias or a file specifier.
	set integerValues to {}
	set fRef to (open for access thisFile)
	try
		-- The file will be read as a set of 4-byte integers, but does it contain an exact multiple of 4 bytes?
		set oddByteCount to (get eof fRef) mod 4
		set thereAreOddBytes to (oddByteCount > 0)
		-- If the number of bytes isn't a multiple of 4, treat the odd ones as being in the first four, then .
		if (thereAreOddBytes) then set end of integerValues to (read fRef from 1 for 4 as unsigned integer)
		-- . read integers from after the odd bytes (if any) to the end of the file.
		set integerValues to integerValues & (read fRef from (oddByteCount + 1) as unsigned integer)
		close access fRef
	on error errMsg number errNum
		close access fRef
		error errMsg number errNum
	end try
	
	set byteValues to {}
	-- Extract the odd-byte values (if any) from the first integer.
	if (thereAreOddBytes) then
		set n to beginning of integerValues
		repeat oddByteCount times
			set end of byteValues to n div 16777216
			set n to n mod 16777216 * 256
		end repeat
	end if
	-- Extract the 4 byte values from each of the remaining integers.
	repeat with i from 1 + ((thereAreOddBytes) as integer) to (count integerValues)
		set n to item i of integerValues
		set end of byteValues to n div 16777216
		set end of byteValues to n mod 16777216 div 65536
		set end of byteValues to n mod 65536 div 256
		set end of byteValues to n mod 256 div 1
	end repeat
	
	return byteValues
end getByteValues

JDM · May 12, 2009, 9:56pm

Thanks Nigel,

Yes, that is true on my Intel Mac Pro OSX 10.5.6, AppleScript 2.0.1, and I expect that is for backwards compatibility and so will (should) never change.

I still think I’m missing something, but not as basic a something as I thought before…

So… dumb question…

If “read as unsigned integer” is an efficient and accepted way to do this, how come there aren’t a bunch of pages here and there about it, not to mention a handler library including routines along the lines of the one Nigel posted? (Or rather… how come I can’t find them? – might be a wetware bug)

And…

how do we get Apple to add an “unsigned byte” class? (Assuming I’m not alone in wanting such a thing… votes?)
is it possible for an OSAX or other do-dad to add a class in a way that StandardAdditions (read/write) can use it, or is implementing complete replacement read/write commands the only way to get this?

James

Bdemers · May 12, 2009, 10:17pm

I’m not sure if this would be of any assistance, but I found it invaluable for opening up a binary file and extracting hexadecimal values from it:

http://listserv.dartmouth.edu/scripts/wa.exe?A2=MACSCRPT;qjUtzQ;19990415153630-0400

I place the SALib at the end of my script and call it this way:

set open_file to (choose file default location def_loc with prompt "Choose the file you wish to parse")
set theContents to read open_file as data
-- convert data to hex string
copy data_to_string(theContents) to data_string

then I can look for a particular hex string (maybe something like 0F0FAA0A, which might be a marker between data that I need to read in a variable length file)

I guess the downside is having to convert numbers from hex to decimal and also into text if needed… but it did the job I wanted and I couldn’t find any other workarounds.

JDM · May 13, 2009, 1:06am

Hey, that’s nice! A way to extract info from “as data”!

However, the code as written in 1999 is not completely compatible with AppleScript 2.x, in particular:

The crucial routine to convert “as data” to hexadecimal was originally written this way:

to data_to_string(d)
	try
		"" & d
	on error m
	end try
	{m's text 18 thru 21, m's text 22 thru -17}
end data_to_string

However, in AppleScript 2.x, the result has some trailing cruft due to the fact that -17 is no longer the offset of the end of the hex data, ie. the result ends with “» into ty”

That doesn’t matter if you don’t refer to the length of the string, or offset from its end, or try to convert it back, but if you do, your code is likely broken.

I expect a modification such as the one below which works in AppleScript 2.0, would also be backwards compatible:

to data_to_string_2(d)
	-- revised for compatibility with AppleScript 2
	-- d is of class data, result is hexadecimal string
	try
		d as text
	on error m
	end try
	set o1 to offset of "«" in m
	set o2 to offset of "»" in m
	{m's text (o1 + 6) thru (o1 + 9), m's text (o1 + 10) thru (o2 - 1)}nd data_to_string_2

For completeness in the archives, the routines to convert hexadecimal to/from “as data” are thus:


-- adapted/stolen from SALib_0.1 code by arthur j knapp
-- http://listserv.dartmouth.edu/scripts/wa.exe?A2=MACSCRPT;qjUtzQ;19990415153630-0400

-- watch for the limitation in the amount of data that can be successfully converted to/from hex

to string_to_data(t, d)
	-- t is 4 char data type eg. "rdat", d is hexadecimal string eg. "DEADBEEFCAFE", result is of class data
	string_to_variable("z", "«data " & t & d & "»")'s z's contents
end string_to_data

to string_to_variable(n, v)
	-- result is script object containing property n:v
	run script "script
        property " & n & " : " & v & "
end script"
end string_to_variable

to data_to_string_2(d)
	-- revised for compatibility with AppleScript 2
	-- d is of class data, result is hexadecimal string
	try
		d as text
	on error m
	end try
	set o1 to offset of "«" in m
	set o2 to offset of "»" in m
	{m's text (o1 + 6) thru (o1 + 9), m's text (o1 + 10) thru (o2 - 1)}
end data_to_string_2

Hence one can change “as data” to hexadecimal text (or otherwise create the hex data string), edit the text as desired, change it back to “as data”, and write that to a file.

That may not be nearly as fast as the unsigned integer approach, but is quite transparent IMHO, and can be used for reading and writing.

There is a limitation as to how much data can be processed this way. Using the code below, I found the maximum on my current setup to be 42497 bytes.

set len to 42490
set thefile to choose file
repeat until false
	set fileref to open for access thefile
	set thebytes to read fileref for len as data
	close access fileref
	set hex to item 2 of data_to_string_2(thebytes)
	set dat to string_to_data("rdat", hex)
	set hex2 to item 2 of data_to_string_2(dat)
	if length of hex2 â‰  2 * len then return "failed at " & len
	set len to len + 1
end repeat

Many thanks for everyone’s assistance with this, but if anyone has alternate methods, please describe them!

James

– edited to include Nigel’s improvements

Nigel_Garvey · May 13, 2009, 10:02pm

Although the File Read/Write commands can do quite a few clever things, there hasn’t been much interest shown in them in the fora except as a means of reading and writing text. Apple itself hasn’t exactly been forthcoming about the other possibilities. Some time before 1997, before the “Standard Additions” commands were amalgamated into one file, there was a PDF called “AppleScript Scripting Additions Guide”, which included quite a lot of detail about what the File Read/Write commands could do and why, but there’s not been much since then. I wrote a rather lengthy article about the commands, based on the Scripting Additions Guide and my own experiments, but I didn’t know about ‘unsigned integer’ until I read your query at the top of this thread!

An ‘unsigned short’ would be nice too. I imagine you have to sign up as a developer, then submit a feature request and hope for the best. I lost interest in all that when I found out you had to be an ADC member before you could even submit a bug report.

I’m not sure understand the question. As I think you know, the Read/Write commands translate between AppleScript objects and file data, the ‘as’ parameters specifying the file side of things. So ‘read myFile as unsigned integer’ interprets each set of four bytes in the file as a 32-bit unsigned integer, returning its value as an AppleScript 30-bit signed integer (if possible) or as an AppleScript real (if not). The ‘as’ parameter keywords don’t directly correspond to AppleScript classes.

I expect a modification such as the one below which works in AppleScript 2.0, would also be backwards compatible:

to data_to_string_2(d)
	-- revised for compatibility with AppleScript 2
	-- d is of class data, result is hexadecimal string
	try
		"" & d
	on error m
	end try
	{m's text 18 thru 21, m's text ((offset of "«" in m) + 10) thru ((offset of "»" in m) - 1)}
end data_to_string_2

It still assumes an Anglophone user. For general distribution, it would be better to use the “«” offset for both parts:

to data_to_string(d)
	try
		d as text
	on error m
	end try
	set o1 to offset of "«" in m
	set o2 to offset of "»" in m
	{m's text (o1 + 6) thru (o1 + 9), m's text (o1 + 10) thru (o2 - 1)}
end data_to_string

JDM · May 13, 2009, 10:46pm

Ok, my knowledge of the terminology (and the technology I suppose) is certainly lacking… the crux of my (possibly stupid) question is this:

Is it possible for someone other than Apple to create something (an OSAX or similar) that will enable Apple’s StandardAdditions to “read as unsigned byte”?

(I realize an OSAX could implement an alternate read command, but say, for example, that one would prefer that Apple did most of the testing.)

Note that the “ADC Online Membership” is free, but I doubt a couple of requests would have much influence…

I presume the requirement is to convince some particular Apple employee (that happens to be a developer of AppleScript) of the value of such a request, so that s/he takes on the task of evangelizing such a change. Does anyone like that frequent this forum? If they do, then it would seem prudent to keep requests to the sorely needed/desired ones, hence my request for votes…

Very good point, thanks. I’ll add a note to the listing above.

StefanK · May 14, 2009, 8:05am

JDM:

Thanks for your reply…

My apologies, I see I wasn’t clear about the data… it is not MacRoman text, it is arbitrary byte values.

For instance, if I have a file containing the 256 bytes 0x00 through 0xFF, and run this:
set thefile to choose file
set fileref to open for access thefile
set bytelist to {}
repeat with i from 0 to 255
	set onebyte to read fileref for 1
	set end of bytelist to id of onebyte
end repeat
close access fileref
return bytelist

ASCII number is deprecated in AppleScript 2.0 but it’s still working


set thefile to choose file
set fileref to open for access thefile
set bytelist to {}
repeat with i from 0 to 255
	set onebyte to read fileref for 1
	set end of bytelist to ASCII number onebyte
end repeat
close access fileref
return bytelist

JDM · May 14, 2009, 9:25am

Yes indeed, I think Stefan is quite right…

If one reads one byte at a time, or otherwise take the value of the first byte only, then it appears one can reliably get its value using ASCII Number instead of ID.

My apologies for missing your point.

James

S21 · May 15, 2009, 1:04am

I might not understand exactly what you want…but it seems like this little Perl program (executed as a shell command) delivers the goods, in record time and for any size file (well, within limits of RAM

-- get file path suitable for shell
global in_file_path
set in_file to choose file
set in_file_path to quoted form of POSIX path of in_file

-- get the file as a string of byte values
set output_option to 1 -- 1 is for decimal, 2 is for hex
set str to (do shell script shellcmd(output_option))
return str

-- need a list? Comment out the preceeding "return" statement
set TID to AppleScript's text item delimiters
set AppleScript's text item delimiters to space
set byte_list to every text item of str
set AppleScript's text item delimiters to TID
return byte_list

on shellcmd(opt)
	if opt is 1 then -- decimal values delimited by space char
		set theCmd to "perl -e 'local $/=undef;$s=<>;print join(\" \",unpack(\"C*\", $s));' " & in_file_path
	else if opt is 2 then -- hex values delimited by space char; if you don't want a leading "%", delete 2 of the "%" chars
		set theCmd to "perl -e 'local $/=undef;$s=<>;print join(\" \",map{ sprintf (\"%%%02X\", $_)} unpack(\"C*\", $s));' " & in_file_path
	else
		set theCmd to ""
	end if
	return theCmd
end shellcmd

JDM · May 16, 2009, 12:44am

Understand or not, that’s an excellent solution for specific cases, thanks!

I’m ignorant of perl, so these may be requests far too large to even consider, but can you suggest something similar that:

reads a particular range of bytes… eg. say I want 1024 bytes starting at offset 65536. The object being to avoid difficulties processing huge files.

and

writes a particular range of bytes at some offset in a file, overwriting what is already there.
Since we can’t (and don’t need to) use easily a text string equivalent to the data, the data to be written would be provided as either (“your choice” ) a string of hex digits, or a list of integer values.

Other than that, should we expect implementation issues with perl, such as its availability on all machines running OSX, or version specific problem similar to those affecting Java apps (in my experience at least)?

James

Nigel_Garvey · May 21, 2009, 12:50pm

Revisiting this yesterday, I came up with the handler below. It’s faster on my machines than the other suggestions above as it doesn’t involve the parsing or coercion of Unicode text to get the integers. The range to read can (ie. must) be specified and some of 'read’s little quirks in this respect are reproduced. :rolleyes: (See opening comment.)

(*
	This handler is equivalent to the imaginary 'read fRef from a to b as unsigned byte'.
	'fRef' may be a file, an alias, or a file access reference number. Unlike 'read', if this parameter's a file or an alias, the handler opens and uses a new access by default.
	The range parameters may be positive, negative, or 'eof'. If either of them is 0, 1 is added to both. If the first is more than the second, the returned list of values is reversed.
	If the range parameters are equal, the single value returned is not in a list.
*)
on readAsUnsignedByte out of fRef from a to b
	-- Script object for fast list accesses.
	script o
		property integerValues : {}
		property byteValues : {}
	end script
	
	-- If fRef's an integer, assume it's a file access reference.
	-- Otherwise assume it's a file or an alias and open an access to it.
	set usingOwnAccess to (fRef's class is not integer)
	if (usingOwnAccess) then set fRef to (open for access fRef)
	try
		-- Analyse the range parameters.
		set fileLength to (get eof fRef)
		if (a is eof) then
			set a to fileLength
		else if ((a < 0) and (fileLength + a > -1)) then
			set a to fileLength + 1 + a
		end if
		if (b is eof) then
			set b to fileLength
		else if ((b < 0) and (fileLength + b > -1)) then
			set b to fileLength + 1 + b
		end if
		if ((a is 0) or (b is 0)) then set {a, b} to {a + 1, b + 1}
		set reversing to (a > b)
		if (reversing) then set {a, b} to {b, a}
		
		set rangeLength to b - a + 1
		set oddByteCount to rangeLength mod 4
		-- Read as much of the range as possible as unsigned integers.
		if (rangeLength > 3) then set o's integerValues to (read fRef from a to (b - oddByteCount) as unsigned integer) as list
		-- Read any odd bytes as data.
		if (oddByteCount > 0) then set oddBytes to (read fRef from (b - oddByteCount + 1) to b as data)
		if (usingOwnAccess) then close access fRef
	on error errMsg number errNum
		if (usingOwnAccess) then close access fRef
		error errMsg number errNum
	end try
	
	-- Extract byte values from the integers (if any).
	repeat with i from 1 to rangeLength div 4
		set n to item i of o's integerValues
		set end of o's byteValues to n div 16777216
		set end of o's byteValues to n mod 16777216 div 65536
		set end of o's byteValues to n mod 65536 div 256
		set end of o's byteValues to n mod 256 div 1
	end repeat
	
	-- If there are any odd bytes to do, write the data object and at least three
	-- more bytes to a temporary file, then read back an unsigned integer's worth.
	if (oddByteCount > 0) then
		set tmpRef to (open for access file ((path to temporary items as Unicode text) & "tmp.dat") with write permission)
		try
			write oddBytes to tmpRef
			write «data rdat000000» to tmpRef
			set n to (read tmpRef from 1 to 4 as unsigned integer)
		end try
		close access tmpRef
		-- Extract the byte values from the integer.
		repeat oddByteCount times
			set end of o's byteValues to n div 16777216
			set n to n mod 16777216 * 256
		end repeat
	end if
	
	if (reversing) then -- First range parameter > second.
		return reverse of o's byteValues
	else if (a = b) then -- Single value.
		return beginning of o's byteValues
	else
		return o's byteValues
	end if
end readAsUnsignedByte

(*
-- Demo of use with an alias.
set myFile to (choose file)
set byteValues to (readAsUnsignedByte out of myFile from -7 to eof)
*)

(*
-- Demo of use with an access reference.
set fRef to (open for access (choose file))
try
	set byteValues to (readAsUnsignedByte out of fRef from -7 to eof)
end try
close access fRef
byteValues
*)

Edit: Negative range index conversion debugged following chrys’s comments below.

chrys · May 22, 2009, 5:34am

Nigel Garvey:

Revisiting this yesterday, I came up with the handler below. It’s faster on my machines than the other suggestions above as it doesn’t involve the parsing or coercion of Unicode text to get the integers. The range to read can (ie. must) be specified and some of 'read’s little quirks in this respect are reproduced. :rolleyes: (See opening comment.)
on readAsUnsignedByte out of fRef from a to b

Very nicely done! My first reaction after timing it was that it was a pity that the “oddBytes” path suffered such a speed hit (100 times reading -8 to eof was 14 seconds, 100 times reading -7 to eof was 70 seconds; some other apps were actively using CPU). I tried some other techniques using dd to extract and automatically zero-pad the region of interest. It was faster for the “oddBytes” cases, but it was slower for the ‘fully aligned’ cases (and it would not work for pre-opened fRefs anyway, since it involved giving a pathname to a shell command).

Just before posting this, it occurred to me that it is probably possible to avoid the speed hit in most cases.

If the file is more than four bytes, instead of doing “oddBytes”, read an unsigned integer from b-3 to b and extract the least significant bytes (as opposed to the most significant bytes, like “oddBytes” does). If the file is less than four bytes, then “oddBytes” still seems the best way to go.

I may try to implement this variation, though I will not be upset if someone beats me to it.

chrys · May 22, 2009, 7:09am

It turned out that it was fairly easy to extend (this is often a attribute of nice code):

(* Original script from Nigel Garvey: http://macscripter.net/viewtopic.php?pid=113938#p113938 *)
(*
	This handler is equivalent to the imaginary 'read fRef from a to b as unsigned byte'.
	'fRef' may be a file, an alias, or a file access reference number. Unlike 'read', if this parameter's a file or an alias, the handler opens and uses a new access by default.
	The range parameters may be positive, negative, or 'eof'. If either of them is 0, 1 is added to both. If the first is more than the second, the returned list of values is reversed.
	If the range parameters are equal, the single value returned is not in a list.
*)
on readAsUnsignedByte out of fRef from a to b
	-- Script object for fast list accesses.
	script o
		property integerValues : {}
		property byteValues : {}
	end script
	
	-- If fRef's an integer, assume it's a file access reference.
	-- Otherwise assume it's a file or an alias and open an access to it.
	set usingOwnAccess to (fRef's class is not integer)
	if (usingOwnAccess) then set fRef to (open for access fRef)
	try
		-- Analyse the range parameters.
		set fileLength to (get eof fRef)
		if (a is eof) then
			set a to fileLength
		else if (a < 0) then
			if a < -fileLength then error "Value of parameter "from" is out of range." number -40
			set a to fileLength + a + 1
		end if
		if (b is eof) then
			set b to fileLength
		else if (b < 0) then
			if b < -fileLength then error "Value of parameter "to" is out of range." number -40
			set b to fileLength + 1 + b
		end if
		if ((a is 0) or (b is 0)) then set {a, b} to {a + 1, b + 1}
		set reversing to (a > b)
		if (reversing) then set {a, b} to {b, a}
		
		set rangeLength to b - a + 1
		set oddByteCount to rangeLength mod 4
		-- Read as much of the range as possible as unsigned integers.
		if (rangeLength > 3) then set o's integerValues to (read fRef from a to (b - oddByteCount) as unsigned integer) as list
		-- Read any odd bytes as data.
		if (oddByteCount > 0) then ¬
			if b < 4 then
				set oddBytes to (read fRef from (b - oddByteCount + 1) to b as data)
			else
				set oddBytes to (read fRef from (b - 3) to b as unsigned integer)
			end if
		if (usingOwnAccess) then close access fRef
	on error errMsg number errNum
		if (usingOwnAccess) then close access fRef
		error errMsg number errNum
	end try
	
	-- Extract byte values from the integers (if any).
	repeat with i from 1 to rangeLength div 4
		set n to item i of o's integerValues
		set end of o's byteValues to n div 16777216
		set end of o's byteValues to n mod 16777216 div 65536
		set end of o's byteValues to n mod 65536 div 256
		set end of o's byteValues to n mod 256 div 1
	end repeat
	
	-- If there are any odd bytes to do, write the data object and at least three
	-- more bytes to a temporary file, then read back an unsigned integer's worth.
	if (oddByteCount > 0) then
		if class of oddBytes is data then
			set tmpRef to (open for access file ((path to temporary items as Unicode text) & "tmp.dat") with write permission)
			try
				write oddBytes to tmpRef
				write «data rdat000000» to tmpRef
				set n to (read tmpRef from 1 to 4 as unsigned integer)
			end try
			close access tmpRef
		else
			set n to oddBytes
			-- We want the least significant bytes,
			-- so skip the most significant bytes
			repeat (4 - oddByteCount) times
				set n to n mod 16777216 * 256
			end repeat
		end if
		-- Extract the byte values from the integer.
		repeat oddByteCount times
			set end of o's byteValues to n div 16777216
			set n to n mod 16777216 * 256
		end repeat
	end if
	
	if (reversing) then -- First range parameter > second.
		return reverse of o's byteValues
	else if (a = b) then -- Single value.
		return beginning of o's byteValues
	else
		return o's byteValues
	end if
end readAsUnsignedByte

(*
-- Demo of use with an alias.
set myFile to (choose file)
set byteValues to (readAsUnsignedByte out of myFile from -7 to eof)
*)

(*
-- Demo of use with an access reference.
set fRef to (open for access (choose file))
try
	set byteValues to (readAsUnsignedByte out of fRef from -7 to eof)
end try
close access fRef
byteValues
*)

Here is a diff (ignoring the changes in indentation):

-- This is not AppleScript, but the "code" tags do not preserve indentation.
diff --git 1/readAsUnsignedByte-v0.applescript 2/readAsUnsignedByte-v1.1.applescript
index ff67b0e..90edb98 100644
--- 1/readAsUnsignedByte-v0.applescript
+++ 2/readAsUnsignedByte-v1.1.applescript
@@ -1,3 +1,4 @@
+(* Original script from Nigel Garvey: http://macscripter.net/viewtopic.php?pid=113938#p113938 *)
 (*
 	This handler is equivalent to the imaginary 'read fRef from a to b as unsigned byte'.
 	'fRef' may be a file, an alias, or a file access reference number. Unlike 'read', if this parameter's a file or an alias, the handler opens and uses a new access by default.
@@ -21,11 +22,13 @@ on readAsUnsignedByte out of fRef from a to b
 		if (a is eof) then
 			set a to fileLength
 		else if (a < 0) then
-			set a to fileLength + 1 + a
+			if a < -fileLength then error "Value of parameter "from" is out of range." number -40
+			set a to fileLength + a + 1
 		end if
 		if (b is eof) then
 			set b to fileLength
 		else if (b < 0) then
+			if b < -fileLength then error "Value of parameter "to" is out of range." number -40
 			set b to fileLength + 1 + b
 		end if
 		if ((a is 0) or (b is 0)) then set {a, b} to {a + 1, b + 1}
@@ -37,7 +40,12 @@ on readAsUnsignedByte out of fRef from a to b
 		-- Read as much of the range as possible as unsigned integers.
 		if (rangeLength > 3) then set o's integerValues to (read fRef from a to (b - oddByteCount) as unsigned integer) as list
 		-- Read any odd bytes as data.
-		if (oddByteCount > 0) then set oddBytes to (read fRef from (b - oddByteCount + 1) to b as data)
+		if (oddByteCount > 0) then ¬
+			if b < 4 then
+				set oddBytes to (read fRef from (b - oddByteCount + 1) to b as data)
+			else
+				set oddBytes to (read fRef from (b - 3) to b as unsigned integer)
+			end if
 		if (usingOwnAccess) then close access fRef
 	on error errMsg number errNum
 		if (usingOwnAccess) then close access fRef
@@ -56,6 +64,7 @@ on readAsUnsignedByte out of fRef from a to b
 	-- If there are any odd bytes to do, write the data object and at least three
 	-- more bytes to a temporary file, then read back an unsigned integer's worth.
 	if (oddByteCount > 0) then
+		if class of oddBytes is data then
 		set tmpRef to (open for access file ((path to temporary items as Unicode text) & "tmp.dat") with write permission)
 		try
 			write oddBytes to tmpRef
@@ -63,6 +72,14 @@ on readAsUnsignedByte out of fRef from a to b
 			set n to (read tmpRef from 1 to 4 as unsigned integer)
 		end try
 		close access tmpRef
+		else
+			set n to oddBytes
+			-- We want the least significant bytes,
+			-- so skip the most significant bytes
+			repeat (4 - oddByteCount) times
+				set n to n mod 16777216 * 256
+			end repeat
+		end if
 		-- Extract the byte values from the integer.
 		repeat oddByteCount times
 			set end of o's byteValues to n div 16777216

As the diff shows, I also added some error checks in the case of negative values. They cover situations that I found while playing around with the code (a three byte file with hex values 00 01 02; readAsUnsignedByte out of file from -5 to 1 gave {0, 1, 2}; this was surprising since 5 bytes back from the end would have been 2 bytes before the file even started!).

To implement the speedup, I reused the oddBytes variable and differentiated between the two situations (read four bytes, bytes of interest are LSBs; and had to read as data) by checking if class of oddBytes is data. I think this should work OK in general.

As far as speed goes, my testing of 100 repetitions gave this data (like last time my CPU was already partially busy for the timing runs, but more or less to the same degree for each run):
reading 8 bytes: 17 seconds
reading 7 bytes (above mods): 28 seconds
reading 7 bytes (Nigel’s original): 80 seconds

I was hoping that there would be very little difference from the 8-byte situation, but at least it is a significant improvement.

Looking at the event log, it seems that the times may be fairly closely correlated with the number of OSAX calls used.
8 bytes: [b]get eof[b], read; 2 calls, 8.5s/(100 OSAX calls), 17s/(100 read-calls)
7 bytes (above mods): get eof, 2 x read; 3 calls, 9.33s/(100 OSAX calls), 14s/(100 read-calls)
7 bytes (Nigel’s original): get eof, 2 x read, path to, open access, 2 x write, read, close access; 9 calls, 8.89s/(100 OSAX calls), 26.6s/(100 read-calls), 11.43s/(100 write/read-calls)

At first I thought it was only calls to read that would really make the difference, but counting all the calls (or at least read and write) seems to produce a better fit.

Edit History: Added error number to “way too negative” error for b parameter. Changed tags around diff from ‘code’ to ‘applescript’. Changed spelling of “paramter”. Fixed if fileLength < 4 bug that Nigel Garvey pointed out.

chrys · May 22, 2009, 12:41pm

If the number of OSAX calls is actually the main slowdown, then (at least) one more situation could be optimized.

When quantiseUp(rangeLength,4) <= fileLength (quantiseUp from e.g. aRounderRounder), a single read call could extract enough unsigned integers to cover all the requested bytes. To do it, adjust a or b so that the effective read length is quantiseUp(rangeLength,4) and then skip some of the starting bytes from the first integer (if a was adjusted lower) or some of the ending bytes from the last integer (if b was adjusted higher).

The other situations are already optimally(?) handled. When quantiseUp(rangeLength,4) > fileLength either fileLength < 4, or rangeLength > quantiseDown(fileLength,4). The former uses the as data technique. The latter is request for more bytes than the largest available multiple of four less than the file length. It can be handled with two (partially overlapping) reads (as added in my most recent post).

Nigel_Garvey · May 22, 2009, 3:23pm

Thanks, Chris! I really appreciate your in-depth interest and feedback!

I didn’t find it too bad (ie. no worse than using text methods to analyse the odd bytes) but I did agonise over it for a while. Another idea I considered was to use set eof fref to round up the length of the input file temporarily to an exact multiple of 4, read the whole range as unsigned integer, then restore the original eof. In the end, I felt that, as a single method, the “external file” idea was less questionable. However, with a few extra lines of code, your modification has obviously improved the script’s statistical performance. I think though that the condition for reading the odd bytes as data should be if b < 4 rather than if fileLength < 4.

Oops! Yes. With a three-byte file, my conversion formula converts -5 to -1, so the range read is from -1 to 1. This is of course the entire file (the rangeLength calculation coincidentally gives the correct figure for this) and, since it’s read as a single block of data, ‘read’ doesn’t reverse the result ” and neither does the script, because -1 is less than 1. Wow!

Thanks for your catch code. For my own use, I think I’d rephrase it something like this:

if (a is eof) then
	set a to fileLength
else if ((a < 0) and (fileLength + a > -1)) then
	set a to fileLength + a + 1
end if

This only applies the conversion formula to a negative index if the result would be positive. If not, the bad index remains negative, causing a -40 error at the ‘read’ stage later on and emulating the ‘read’ behaviour more closely.

I’ve just noticed your post immediately above when logging on to post this. I didn’t envisage the handler being used hundreds of times in succession, but I’ll think about how the number of reads per read might br reduced.

Nigel_Garvey · May 23, 2009, 12:26pm

OK. Here’s the revision up with which I came. Tested for accuracy, but not to destruction. Not speed tested at all.

(* Original script from Nigel Garvey: http://macscripter.net/viewtopic.php?pid=113938#p113938 *)
(* This version, suggested by chrys, minimises disk accesses. *)
(*
	This handler is equivalent to the imaginary 'read fRef from a to b as unsigned byte'.
	'fRef' may be a file, an alias, or a file access reference number. Unlike 'read', if this parameter's a file or an alias, the handler opens and uses a new access by default.
	The range parameters may be positive, negative, or 'eof'. If either of them is 0, 1 is added to both. If the first is more than the second, the returned list of values is reversed.
	If the range parameters are equal, the single value returned is not in a list.
*)
on readAsUnsignedByte out of fRef from a to b
	-- Script object for fast list access.
	script o
		property integerValues : {}
		property byteValues : {}
	end script
	
	-- If fRef's an integer, assume it's a file access reference.
	-- Otherwise assume it's a file or an alias and open an access to it.
	set usingOwnAccess to (fRef's class is not integer)
	if (usingOwnAccess) then set fRef to (open for access fRef)
	try
		-- Analyse the range parameters.
		set fileLength to (get eof fRef) div 1
		if (a is eof) then set a to fileLength
		if (b is eof) then set b to fileLength
		if ((a is 0) or (b is 0)) then set {a, b} to {a + 1, b + 1}
		if ((a < 1) and (fileLength + a > -1)) then set a to fileLength + a + 1
		if ((b < 1) and (fileLength + b > -1)) then set b to fileLength + b + 1
		set reversing to (a > b)
		if (reversing) then set {a, b} to {b, a}
		set returningSingleValue to (a = b)
		
		-- If a and/or b index outside the file, throw the appropriate 'read' error.
		if ((a < 1) or (b > fileLength)) then read fRef from a to b
		
		-- The range will be read in one go 'as unsigned integer', if possible.
		set rangeLength to b - a + 1 -- Number of byte values to be returned.
		set leftOverlap to 0 -- Number of odd bytes (< 4) expected in the first integer.
		set rightOverlap to rangeLength mod 4 -- Number of odd bytes (< 4) expected in the last integer.
		set intRangeLength to rangeLength + (4 - rightOverlap) mod 4 -- Range length rounded up to a multiple of 4.
		
		if (fileLength > 3) then -- At least four bytes in the file.
			if (fileLength â‰¥ intRangeLength) then -- Enough headroom to read the range in one go as integers.
				if (a < fileLength - 3) then -- The range starts more than four bytes from the end of the file.
					-- Prepare to read integers from a.
					set b to a + intRangeLength - 1
					-- If this would overrun the end of the file, shift the range leftwards by the excess.
					if (b > fileLength) then
						set leftShift to b - fileLength
						set a to a - leftShift
						set b to fileLength
						set leftOverlap to 4 - leftShift
						set rightOverlap to (rightOverlap + leftShift) mod 4
					end if
					set o's integerValues to (read fRef from a to b as unsigned integer) as list
				else -- The range starts in the last four bytes of the file.
					-- Read the four bytes as an unsigned integer and left-shift the range into its hi bytes.
					set end of o's integerValues to (read fRef from (fileLength - 3) to fileLength as unsigned integer) * (256 ^ (a - (fileLength - 3))) mod 4.294967296E+9
				end if
			else -- Not enough headroom to read the entire range in one go.
				-- Read as many unsigned integers as possible from the range.
				if (rangeLength > 3) then set o's integerValues to (read fRef from a to (b - rightOverlap) as unsigned integer) as list
				-- Read the odd bytes as the end of a signed integer and left-shift them into its hi bytes.
				set end of o's integerValues to (read fRef from (b - 3) to b as unsigned integer) * (256 ^ (4 - rightOverlap)) mod 4.294967296E+9
			end if
		else -- Less than four bytes in the file.
			-- Read as data, write with more bytes to a temporary file, read back an unsigned integer's worth.
			set byteData to (read fRef from a to b as data)
			set tmpRef to (open for access file ((path to temporary items as Unicode text) & "tmp.dat") with write permission)
			try
				write byteData to tmpRef
				write «data rdat000000» to tmpRef
				set end of o's integerValues to (read tmpRef from 1 to 4 as unsigned integer)
			end try
			close access tmpRef
		end if
		if (usingOwnAccess) then close access fRef
	on error errMsg number errNum
		if (usingOwnAccess) then close access fRef
		error errMsg number errNum
	end try
	
	set i to 0
	-- If the byte and integer ranges don't start in the same place, extract the odd bytes from the first integer.
	if (leftOverlap > 0) then
		set n to beginning of o's integerValues
		set o's byteValues to items -leftOverlap thru -1 of {n mod 16777216 div 65536, n mod 65536 div 256, n mod 256 div 1}
		set i to 1
	end if
	-- Extract the byte values from the full integers (if any).
	repeat ((intRangeLength - leftOverlap - rightOverlap) div 4) times
		set i to i + 1
		set n to item i of o's integerValues
		set end of o's byteValues to n div 16777216
		set end of o's byteValues to n mod 16777216 div 65536
		set end of o's byteValues to n mod 65536 div 256
		set end of o's byteValues to n mod 256 div 1
	end repeat
	-- If the byte and integer ranges don't end in the same place, extract the odd bytes from the last integer.
	if (rightOverlap > 0) then
		set n to end of o's integerValues
		set o's byteValues to o's byteValues & items 1 thru rightOverlap of {n div 16777216, n mod 16777216 div 65536, n mod 65536 div 256}
	end if
	
	if (reversing) then -- First range parameter > second.
		return reverse of o's byteValues
	else if (returningSingleValue) then
		return beginning of o's byteValues
	else
		return o's byteValues
	end if
end readAsUnsignedByte

(*
-- Demo of use with an alias.
set myFile to (choose file)
set byteValues to (readAsUnsignedByte out of myFile from -7 to eof)
*)

(*
-- Demo of use with an access reference.
set fRef to (open for access (choose file))
try
	set byteValues to (readAsUnsignedByte out of fRef from -7 to eof)
end try
close access fRef
byteValues
*)

Edits: Fixed bugs pointed out by chrys in the following post and a couple more I found myself. Finally, untangled the logic to make the script more followable.

chrys · May 23, 2009, 4:23pm

I had not yet started working on a ‘single read when possible’ version, but I had been working on a test suite to automatically test a bunch of scenarios. I ran it against this latest version and found a few failure classes. I made some fixes for a couple of them, and partially analyzed another.

Apparently my speed problems were due to having the Event Log History window logging everything as it went. Things are much more reasonably speedy with that logging deactivated.

set dir_POSIX to (POSIX path of (path to home folder as alias)) & "tmp/readAsUnsignedByte/"
set v1_3 to load script POSIX file (dir_POSIX & "readAsUnsignedByte-v1.3.scpt")
set v1_3_1 to load script POSIX file (dir_POSIX & "readAsUnsignedByte-v1.3.1.scpt")
set fRef to POSIX file (dir_POSIX & "tmp.dat")
set fileLength to get eof fRef

tell v1_3_1
	-- returns single value list instead of bare value for fileLength > 4
	-- returns extra and duplicate values when a in {-2,-3} and a = b and fileLength >= 4 (same cause as next one?)
	readAsUnsignedByte out of fRef from 1 to 1
	
	-- does not generate an error for fileLength >= 4 and fileLength mod 4 in {0,1}
	readAsUnsignedByte out of fRef from 0 to -(fileLength + 1)
	
	-- returns extra and duplicate byte values for fileLength >= 4
	readAsUnsignedByte out of fRef from -3 to -2
	
	-- does not generate an error for fileLength >= 4
	readAsUnsignedByte out of fRef from (fileLength + 1) to (fileLength + 1)
end tell

(* Diff based on #19 <http://macscripter.net/viewtopic.php?pid=114019#p114019>.
Fixes the first two failure classes.
The duplicate bytes in the third failure class are from extracting bytes from the same integer in both the leftOverlap and rightOverlap finale.
I have not yet researched the fourth failure class.
---
diff --git 1/readAsUnsignedByte-v1.3.applescript 2/readAsUnsignedByte-v1.3.1.applescript
index 8a9847a..67c9da0 100644
--- 1/readAsUnsignedByte-v1.3.applescript
+++ 2/readAsUnsignedByte-v1.3.1.applescript
@@ -30,9 +30,17 @@ on readAsUnsignedByte out of fRef from a to b
 		else if ((b < 0) and (fileLength + b > -1)) then
 			set b to fileLength + 1 + b
 		end if
-		if ((a is 0) or (b is 0)) then set {a, b} to {a + 1, b + 1}
+		if a is 0 then
+			set a to 1
+			if b > -1 then set b to b + 1
+		end if
+		if b is 0 then
+			set b to 1
+			if a > -1 then set a to a + 1
+		end if
 		set reversing to (a > b)
 		if (reversing) then set {a, b} to {b, a}
+		set returningSingleValue to a = b
 		
 		-- The whole range will be read in one go 'as unsigned integer', if possible. Otherwise jiggery pokery.
 		set rangeLength to b - a + 1 -- Number of bytes required.
@@ -103,7 +111,7 @@ on readAsUnsignedByte out of fRef from a to b
 	
 	if (reversing) then -- First range parameter > second.
 		return reverse of o's byteValues
-	else if (a = b) then -- Single value.
+	else if (returningSingleValue) then
 		return beginning of o's byteValues
 	else
 		return o's byteValues
*)

And here is the test suite itself. The code is a bit haphazard, but it seems to work well enough. Some bits were downright experimental (e.g. tell {} . set end of it to . end to avoid naming a list variable!).

Customize the base path, save the byte reading script there, and it should work. Oh, about make, just comment that out and save the scripts as .scpt files. I was saving as .applescript and using a custom Makefile to compile to .scpt files (mostly I just wanted to be able to easily use git diff without having to manually save a plain text version).

on run
	set dir_POSIX to (POSIX path of (path to home folder as alias)) & "tmp/readAsUnsignedByte/"
	do shell script "make -C " & quoted form of dir_POSIX & " all 1>&2"
	
	set alltests to makeAllTests(8, 9)
	
	set v0 to load script POSIX file (dir_POSIX & "readAsUnsignedByte-v0.scpt")
	set v0_1 to load script POSIX file (dir_POSIX & "readAsUnsignedByte-v0.1.scpt")
	set v1 to load script POSIX file (dir_POSIX & "readAsUnsignedByte-v1.scpt")
	set v1_1 to load script POSIX file (dir_POSIX & "readAsUnsignedByte-v1.1.scpt")
	set v1_1_1 to load script POSIX file (dir_POSIX & "readAsUnsignedByte-v1.1.1.scpt")
	set v1_2 to load script POSIX file (dir_POSIX & "readAsUnsignedByte-v1.2.scpt")
	set v1_3 to load script POSIX file (dir_POSIX & "readAsUnsignedByte-v1.3.scpt")
	set v1_3_1 to load script POSIX file (dir_POSIX & "readAsUnsignedByte-v1.3.1.scpt")
	set v1_3_2 to load script POSIX file (dir_POSIX & "readAsUnsignedByte-v1.3.2.scpt")
	set v1_3_3 to load script POSIX file (dir_POSIX & "readAsUnsignedByte-v1.3.3.scpt")
	set scriptUnderTest to v1_3_3
	
	set alltests to prepDataFiles(dir_POSIX, alltests)
	set t0 to current date
	set allFailures to runAllTests(alltests, scriptUnderTest)
	set t1 to current date
	{t1 - t0, allFailures}
	-- Use Log Nothing in Event Log History window!
	(* v0 (Nigel's original)
	* best time: 20 seconds
	* test failure classes:
	* -- (I think) spurious error number mismatches for 0 and 1 byte files
	* --   expected -40 (before beginning err), got -39 (eof error)
	* -- fileLength = 8, reading -9 to -9 produces 0 instead of an error
	* -- fileLength > 1
	* --   reading 0 to -(fileLength+2) produces {1,0} instead of an error
	* --   reading -(fileLength+2) to 0 produces {0,1} instead of an error
	*)
	(* v0.1 (Nigel's original with the endpoint calculation from v1.3.2)
	* best time: 18 seconds
	* no test failures
	*)
	(* v1.3.2 (Nigel's latest)
	* best time: 12 seconds (40% less time than original)
	* no test failures
	*)
end run

to runAllTests(alltests, scriptUnderTest)
	set allFailures to {}
	repeat with test in alltests
		tell runTests(|data file| of test, tests of test, scriptUnderTest) to ¬
			if it is not {} then set end of allFailures to {|failed tests|:it, tests:missing value} & contents of test
	end repeat
	allFailures
end runAllTests

to prepDataFiles(dir_POSIX, testList)
	tell {}
		repeat with test in testList
			set end of it to {|data file|:my createDataFile(dir_POSIX, numBytes of test)} & test
		end repeat
		it
	end tell
end prepDataFiles

to runTests(dataFileAlias, tests, scriptUnderTest)
	local fRef, failures, test, a, b
	set failures to {}
	try
		set fRef to open for access dataFileAlias
		repeat with test in tests
			set a to start of test
			set b to |stop| of test
			try
				readAsUnsignedByte of scriptUnderTest out of fRef from a to b
				set got to makeNormalResult(result)
			on error m number n
				set got to makeErrorResult(n, m)
				try
					close access alias ((path to temporary items as Unicode text) & "tmp.dat")
					log "FAILED TO CLOSE INTERNAL TMP.DAT?"
				end try
				if n is -128 then error m number n
			end try
			set expected to expected of test
			set normalAndMismatch to kind of expected is "normal" and ¬
				got is not expected
			set errorAndMismatch to kind of expected is "error" and (kind of got is not "error" or ¬
				|error number| of value of got is not ¬
				|error number| of value of expected)
			if normalAndMismatch or errorAndMismatch then
				set end of failures to contents of test & {got:got}
			end if
		end repeat
		close access fRef
	on error m number n
		try
			close access fRef
		end try
		error m number n
	end try
	failures
end runTests

to createDataFile(dir_POSIX, n)
	local tmpfile_POSIX
	if class of n is not integer or n < 0 then error "Bad n (" & n & ") for data file."
	set rangeSuffix to ""
	if n > 0 then set rangeSuffix to "-" & n
	set tmpfile_POSIX to dir_POSIX & "0" & rangeSuffix & ".dat"
	do shell script "rm -f " & quoted form of tmpfile_POSIX
	if n > 0 then
		do shell script "jot " & n & " 0 | perl -e 'print pack q(C*), <>' - > " & quoted form of tmpfile_POSIX
	else
		do shell script "cp /dev/null " & quoted form of tmpfile_POSIX
	end if
	alias POSIX file tmpfile_POSIX
end createDataFile

to makeAllTests(maxIndex, maxWindowSize)
	local alltests, eofErrorNum, eofErrorMsg, bofErrorNum, bofErrorMsg, b, a, i, someSize
	
	set alltests to {}
	-- Test out of bounds on zero byte file
	tell {}
		set {eofErrorNum, eofErrorMsg} to {-39, "End of file error."}
		set {bofErrorNum, bofErrorMsg} to {-40, "Tried to position before beginning of file ."}
		repeat with b in {1, 2}
			set end of it to my makeErrorTest(1, contents of b, eofErrorNum, eofErrorMsg)
		end repeat
		repeat with a in {-1, -2}
			set end of it to my makeErrorTest(contents of a, -1, bofErrorNum, bofErrorMsg)
		end repeat
		set end of alltests to {numBytes:0, tests:it}
	end tell
	
	-- Test out of bounds on an eight byte file
	set end of alltests to {numBytes:8, tests:{¬
		makeErrorTest(9, 9, eofErrorNum, eofErrorMsg), ¬
		makeTest(5, 5, 4), ¬
		makeErrorTest(-9, -9, bofErrorNum, bofErrorMsg)}}
	
	-- Test windows near start and end of file for files from 1 to 17 bytes
	--  Windows sizes from 1 to (the third argument) are tested.
	--  Windows are tested at offsets between 0 to (the second argument) from both the start and end of the files.
	--    For the smaller files, nearly every combination of range endpoints is tested.
	--    For the larger files, ranges near the start and end are tested along with ranges from "the middle" (well, middle-ish since they will also be fairly close to the start or end).
	-- Test several tricky, zero-based ranges.
	repeat with i from 1 to 17
		tell makeStandardTests(i, maxIndex, maxWindowSize)
			-- Test zero-based endpoint
			set end of it to my makeStandardTest(i, 0, 0)
			set end of it to my makeStandardTest(i, 0, -2)
			set end of it to my makeStandardTest(i, -2, 0)
			set end of it to my makeStandardTest(i, 0, -(i + 1))
			set end of it to my makeStandardTest(i, -(i + 1), 0)
			set end of it to my makeErrorTest(0, -1, eofErrorNum, eofErrorMsg)
			set end of it to my makeErrorTest(-1, 0, eofErrorNum, eofErrorMsg)
			set end of it to my makeErrorTest(0, -(i + 2), bofErrorNum, bofErrorMsg)
			set end of it to my makeErrorTest(-(i + 2), 0, bofErrorNum, bofErrorMsg)
			
			-- Test last byte, and last + past EOF
			set end of it to my makeStandardTest(i, i, i)
			set end of it to my makeErrorTest(i, i + 1, eofErrorNum, eofErrorMsg)
			set end of it to my makeErrorTest(i, i + 2, eofErrorNum, eofErrorMsg)
			set end of alltests to {numBytes:i, tests:it}
		end tell
	end repeat
	
	-- Test zero-based start/stop, mixed sign stop/stop, reversed results, eof as -1/fileLength
	tell {}
		set someSize to 12
		repeat with i from 0 to (someSize - 1)
			set end of it to my makeStandardTest(someSize, 0, i)
			set end of it to my makeStandardTest(someSize, i, 0)
		end repeat
		set end of it to my makeStandardTest(someSize, 2, -2)
		set end of it to my makeStandardTest(someSize, -3, 3)
		set end of it to my makeErrorTest(0, eof, eofErrorNum, eofErrorMsg)
		set end of it to my makeErrorTest(eof, 0, eofErrorNum, eofErrorMsg)
		set end of it to my makeStandardTest(someSize, 4, eof)
		set end of it to my makeStandardTest(someSize, eof, 4)
		set end of alltests to {numBytes:someSize, tests:it}
	end tell
	alltests
end makeAllTests

to makeErrorResult(|number|, message)
	{kind:"error", value:{|error number|:|number|, |error message|:message}}
end makeErrorResult
to makeNormalResult(expected)
	{kind:"normal", value:expected}
end makeNormalResult

to makeErrorTest(a, b, n, m)
	{start:a, |stop|:b, expected:makeErrorResult(n, m)}
end makeErrorTest
to makeTest(a, b, expected)
	{start:a, |stop|:b, expected:makeNormalResult(expected)}
end makeTest

to zeroUpToN(n) -- thought I might need to memoize this, but turned out to be fast enough for now
	local i
	tell {}
		repeat with i from 0 to n - 1
			set end of it to i
		end repeat
		it
	end tell
end zeroUpToN

to getStandardResults(n, a, b)
	if a is eof then set a to n
	if b is eof then set b to n
	if a is 0 or b is 0 then set {a, b} to {a + 1, b + 1}
	if a is 0 or b is 0 then error "invalid range? (" & a & "," & b & "); maybe use makeErrorTest() instead" -- can only happen is original is (0,-1) or (-1,0) which seem like invalid zero-based ranges (plain read throws an EOF error); eof is -2 and bof is -(length+1) if the other end is zero
	if a < -n then error "start is much too negative: " & a & "; maybe use makeErrorTest() instead"
	if b < -n then error "stop is much too negative: " & b & "; maybe use makeErrorTest() instead"
	if a < 0 then set a to a + n + 1
	if b < 0 then set b to b + n + 1
	
	tell items a through b of zeroUpToN(n)
		if length is 1 then return first item
		if a > b then return reverse
		it
	end tell
end getStandardResults

to makeStandardTest(n, a, b)
	makeTest(a, b, getStandardResults(n, a, b))
end makeStandardTest

to makeStandardTests(numBytes, maxStart, maxLen)
	local a, b
	tell {}
		if maxStart > numBytes then set maxStart to numBytes
		repeat with a from 1 to maxStart
			set maxStop to a + maxLen - 1
			if maxStop > numBytes then set maxStop to numBytes
			repeat with b from a to maxStop
				set end of it to my makeStandardTest(numBytes, a, b)
				set end of it to my makeStandardTest(numBytes, -b, -a)
			end repeat
		end repeat
		it
	end tell
end makeStandardTests

If you want to play with it, here is the Makefile (spaces in the .applescript filenames will probably not work properly with this; mind the TAB characters, they are a critical part of the syntax):

APPLESCRIPTS = $(wildcard *.applescript)
SCPTS = $(patsubst %.applescript,%.scpt,$(APPLESCRIPTS))

all : $(SCPTS)

%.scpt : %.applescript
	osacompile -l AppleScript -d -o $@ $<

clean:
	rm -f $(SCPTS)

Edit History: Corrected some bad tests, added some new ones. Added not-eof to past-eof tests for bug that Nigel found.