Compare Two Text Files for Differences

The AppleScript forum has a thread about the sdiff shell utility, and I thought I’d write something similar with ASObjC just for practice. My first effort does a simple string comparison and uses the logic shown in the following table. The words left and right in this table refer to corresponding lines in files one and two.

Circumstance Example
left and right are the same aa = aa
left and right are both blank =
left is blank or does not exist < aa
right is blank or does not exist aa >
left and right are different aa | bb

I may refine this script in subsequent posts including:

  • display a set number of characters per string;
  • make the string comparisons case insensitive;
  • change specific characters before comparison (e.g. tabs to spaces);
  • remove blank lines; and
  • simplify the code or take a different ASObjC approach.

In testing with Script Geek, the timing result was 14 milliseconds with 84 lines in fileOne and 100 lines in fileTwo with 17 words per line.

use framework "Foundation"
use scripting additions

set fileOne to "/Users/Robert/Working/File One.txt" --set to desired value
set stringOne to current application's NSString's stringWithContentsOfFile:fileOne encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
set arrayOne to (stringOne's componentsSeparatedByString:linefeed)
set arrayOneCount to arrayOne's |count|()

set fileTwo to "/Users/Robert/Working/File Two.txt" --set to desired value
set stringTwo to current application's NSString's stringWithContentsOfFile:fileTwo encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
set arrayTwo to (stringTwo's componentsSeparatedByString:linefeed)
set arrayTwoCount to arrayTwo's |count|()

if arrayOneCount is less than or equal to arrayTwoCount then
	set shortCount to arrayOneCount
	set longCount to arrayTwoCount
else
	set shortCount to arrayTwoCount
	set longCount to arrayOneCount
end if

set emptyString to current application's NSString's stringWithString:""
set theDifference to current application's NSMutableArray's new()
repeat with i from 0 to (shortCount - 1)
	set stringOne to (arrayOne's objectAtIndex:i)
	set stringTwo to (arrayTwo's objectAtIndex:i)
	if (stringOne's isEqualToString:stringTwo) is true then
		set theString to ((stringOne's stringByAppendingString:"  =  ")'s stringByAppendingString:stringTwo)
	else if (stringOne's isEqualToString:emptyString) is true then
		set theString to ((stringOne's stringByAppendingString:"  <  ")'s stringByAppendingString:stringTwo)
	else if (stringTwo's isEqualToString:emptyString) is true then
		set theString to ((stringOne's stringByAppendingString:"  >  ")'s stringByAppendingString:stringTwo)
	else if (stringOne's isEqualToString:stringTwo) is false then
		set theString to ((stringOne's stringByAppendingString:"  |  ")'s stringByAppendingString:stringTwo)
	else
		display dialog "An unexpected error has occurred" buttons {"OK"} cancel button 1 default button 1
	end if
	(theDifference's addObject:theString)
end repeat

if shortCount is equal to arrayOneCount then
	repeat with i from shortCount to (longCount - 1)
		set stringTwo to (arrayTwo's objectAtIndex:i)
		set theString to ((emptyString's stringByAppendingString:"  <  ")'s stringByAppendingString:stringTwo)
		(theDifference's addObject:theString)
	end repeat
else
	repeat with i from shortCount to (longCount - 1)
		set stringOne to (arrayOne's objectAtIndex:i)
		set theString to ((stringOne's stringByAppendingString:"  >  ")'s stringByAppendingString:emptyString)
		(theDifference's addObject:theString)
	end repeat
end if

set theString to (theDifference's componentsJoinedByString:linefeed) as text

This script is the same as that above except that:

  • the data is formatted as a markdown table and saved to a file on the user’s desktop; and

  • the displayed lines are trimmed to the number of characters specified in the getSubstring handler.

The following is a simple example:

The timing result was 24 milliseconds with the same files used with my script in post 1.

use framework "Foundation"
use scripting additions

on main()
	set fileOne to "/Users/Robert/Documents/File One.txt" --set to desired value or replace with dialog
	set stringOne to current application's NSString's stringWithContentsOfFile:fileOne encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
	set arrayOne to stringOne's componentsSeparatedByString:linefeed
	set arrayOneCount to arrayOne's |count|()
	
	set fileTwo to "/Users/Robert/Documents/File Two.txt" --set to desired value or replace with dialog
	set stringTwo to current application's NSString's stringWithContentsOfFile:fileTwo encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
	set arrayTwo to stringTwo's componentsSeparatedByString:linefeed
	set arrayTwoCount to arrayTwo's |count|()
	
	if arrayOneCount is less than or equal to arrayTwoCount then
		set shortCount to arrayOneCount
		set longCount to arrayTwoCount
	else
		set shortCount to arrayTwoCount
		set longCount to arrayOneCount
	end if
	
	set theDifference to current application's NSMutableArray's new()
	set emptyString to current application's NSString's stringWithString:""
	repeat with i from 0 to (shortCount - 1)
		set stringOne to (arrayOne's objectAtIndex:i)
		set substringOne to getSubstring(stringOne)
		set stringTwo to (arrayTwo's objectAtIndex:i)
		set substringTwo to getSubstring(stringTwo)
		if (stringOne's isEqualToString:stringTwo) is true then
			set theString to current application's NSString's stringWithFormat_("| %@ | = | %@ |", substringOne, substringTwo)
		else if (stringOne's isEqualToString:emptyString) is true then
			set theString to current application's NSString's stringWithFormat_("| %@ | < | %@ |", substringOne, substringTwo)
		else if (stringTwo's isEqualToString:emptyString) is true then
			set theString to current application's NSString's stringWithFormat_("| %@ | > | %@ |", substringOne, substringTwo)
		else if (stringOne's isEqualToString:stringTwo) is false then
			set theString to current application's NSString's stringWithFormat_("| %@ | \\| | %@ |", substringOne, substringTwo)
		else
			display dialog "An unexpected error has occurred" buttons {"OK"} cancel button 1 default button 1
		end if
		(theDifference's addObject:theString)
	end repeat
	
	if shortCount is equal to arrayOneCount then
		repeat with i from shortCount to (longCount - 1)
			set stringTwo to (arrayTwo's objectAtIndex:i)
			set substringTwo to getSubstring(stringTwo)
			set theString to current application's NSString's stringWithFormat_("| %@ | < | %@ |", emptyString, substringTwo)
			(theDifference's addObject:theString)
		end repeat
	else
		repeat with i from shortCount to (longCount - 1)
			set stringOne to (arrayOne's objectAtIndex:i)
			set substringOne to getSubstring(stringOne)
			set theString to current application's NSString's stringWithFormat_("| %@ | > | %@ |", substringOne, emptyString)
			(theDifference's addObject:theString)
		end repeat
	end if
	set theDifference to (theDifference's componentsJoinedByString:linefeed)
	
	set theHeader to "| File One | Comparison | File Two |"
	set theFormatter to "| :--- | :---: | :--- |"
	set theLinefeed to linefeed
	set theString to current application's NSString's stringWithFormat_("%@%@%@%@%@", theHeader, theLinefeed, theFormatter, theLinefeed, theDifference)
	set desktopFolder to current application's NSHomeDirectory()'s stringByAppendingPathComponent:"Desktop"
	set theFile to desktopFolder's stringByAppendingPathComponent:"File Compare.txt"
	theString's writeToFile:theFile atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
end main

on getSubstring(theString)
	set theRange to theString's rangeOfString:".{0,30}" options:1024 --set 30 to desired value
	set patternMatch to (theString's substringWithRange:theRange)
end getSubstring

main()

Hi @peavine
This is an interesting one!
I think you could take advantage of the compare: method.

update: see amended script post #6
use framework "Foundation"
use framework "AppKit"
use scripting additions

set theSet to current application's NSCharacterSet's newlineCharacterSet()
set fileOne to "/Users/Robert/Documents/File One.txt" --set to desired value or replace with dialog
set stringOne to current application's NSString's stringWithContentsOfFile:fileOne encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
set arrayOne to stringOne's componentsSeparatedByCharactersInSet:theSet
set theCount to arrayOne's |count|()

set fileTwo to "/Users/Robert/Documents/File Two.txt" --set to desired value or replace with dialog
set stringTwo to current application's NSString's stringWithContentsOfFile:fileTwo encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
set arrayTwo to stringTwo's componentsSeparatedByCharactersInSet:theSet
set theCount2 to arrayTwo's |count|()

if theCount2 > theCount then set theCount to theCount2

set theDifference to current application's NSMutableArray's new()
repeat with i from 0 to (theCount - 1)
	set stringOne to (arrayOne's objectAtIndex:i)
	set substringOne to getSubstring(stringOne)
	set stringTwo to (arrayTwo's objectAtIndex:i)
	set substringTwo to getSubstring(stringTwo)
	
	set diff to (substringOne's compare:substringTwo options:129) -- 1 = NSCaseInsensitiveSearch, 128 = NSDiacriticInsensitiveSearch
	
	if diff ≠ 0 and substringOne's |length|() = 0 then set diff to -2
	if diff ≠ 0 and substringTwo's |length|() = 0 then set diff to 2
	set charDiff to item (diff + 3) of {"<<", "|<", "=", "", ">>"}
	set theString to current application's NSString's stringWithFormat_("| %@ | %@ | " & charDiff & " | %@ |", i, substringOne, substringTwo)
	(theDifference's addObject:theString)
end repeat
set theDifference to (theDifference's componentsJoinedByString:linefeed)

set theHeader to "|#| File One | Comparison | File Two |"
set theFormatter to "| :--- | :--- | :---: | :--- |"
set theLinefeed to linefeed
set theString to current application's NSString's stringWithFormat_("%@%@%@%@%@", theHeader, theLinefeed, theFormatter, theLinefeed, theDifference)
set desktopFolder to current application's NSHomeDirectory()'s stringByAppendingPathComponent:"Desktop"
set theFile to desktopFolder's stringByAppendingPathComponent:"File Compare.txt"
theString's writeToFile:theFile atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)

set theWorkspace to current application's NSWorkspace's sharedWorkspace()
theWorkspace's openFile:theFile

on getSubstring(theString)
	set theRange to theString's rangeOfString:".{0,30}" options:1024 --set 30 to desired value
	return (theString's substringWithRange:theRange)
end getSubstring

[BTW, what markdown app are you using to display the table?]

Jonas. Thanks for looking at my script and for your excellent script suggestion.

I like your use of the Compare method. It simplifies the script and deals with case sensitivity and other possible language issues. Line numbers is also a great idea.

I did encounter one possible issue with your script. If the files do not contain the same number of lines, the following error is reported:

***-[_NSArrayM objectAtIndex:]: index 3 beyond bounds [0 … 2]

Also, in the following line, shouldn’t stringOne be compared to stringTwo?

set diff to (substringOne's compare:substringTwo options:129) -- 1 = NSCaseInsensitiveSearch, 128 = NSDiacriticInsensitiveSearch

I used the iA Writer app to display the markdown table. It’s a somewhat expensive writing app that happens to support markdown.

To facilitate comparison, it may be desirable to delete leading and trailing whitespace and optionally blank lines (other than the first line) from the text input. To accomplish this, the following handler can be inserted in the script.

set stringOne to cleanString(stringOne) --also for stringTwo

on cleanString(theString)
	set theString to theString's stringByReplacingOccurrencesOfString:"(?m)^\\h+|\\h+$" withString:"" options:1024 range:{0, theString's |length|()} --remove leading and trailing whitespace
	set theString to theString's stringByReplacingOccurrencesOfString:"(\\R)\\R" withString:"$1" options:1024 range:({0, theString's |length|()}) --also remove blank lines
end cleanString

Here are my corrections according to your comments:

use framework "Foundation"
use framework "AppKit"
use scripting additions

set desktopFolder to current application's NSHomeDirectory()'s stringByAppendingPathComponent:"Desktop"
set theSet to current application's NSCharacterSet's newlineCharacterSet()

set fileOne to desktopFolder's stringByAppendingPathComponent:"a.txt"
set stringOne to current application's NSString's stringWithContentsOfFile:fileOne encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
set arrayOne to stringOne's componentsSeparatedByCharactersInSet:theSet
set theCount to arrayOne's |count|()

set fileTwo to desktopFolder's stringByAppendingPathComponent:"b.txt"
set stringTwo to current application's NSString's stringWithContentsOfFile:fileTwo encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
set arrayTwo to stringTwo's componentsSeparatedByCharactersInSet:theSet
set theCount2 to arrayTwo's |count|()

if theCount < theCount2 then set arrayOne to my equalizeArray:arrayOne |count|:theCount2
if theCount > theCount2 then set arrayTwo to my equalizeArray:arrayTwo |count|:theCount

set theDifference to current application's NSMutableArray's new()
repeat with iCount from 0 to (theCount - 1)
	set stringOne to (arrayOne's objectAtIndex:iCount)
	set substringOne to (my getSubstring:stringOne)
	set stringTwo to (arrayTwo's objectAtIndex:iCount)
	set substringTwo to (my getSubstring:stringTwo)
	
	set diff to (substringOne's compare:substringTwo options:(129)) -- 1 = NSCaseInsensitiveSearch, 128 = NSDiacriticInsensitiveSearch
	
	if diff ≠ 0 and substringOne's |length|() < substringTwo's |length|() then set diff to -1
	if diff ≠ 0 and substringOne's |length|() > substringTwo's |length|() then set diff to 1
	if diff ≠ 0 and substringOne's |length|() = 0 then set diff to -2
	if diff ≠ 0 and substringTwo's |length|() = 0 then set diff to 2
	
	set charDiff to item (diff + 3) of {"x<", "<", "=", ">", ">x"}
	set theString to current application's NSString's stringWithFormat_("| %@ | %@ | " & charDiff & " | %@ |", iCount, substringOne, substringTwo)
	(theDifference's addObject:theString)
end repeat
set theDifference to (theDifference's componentsJoinedByString:linefeed)

set theHeader to "|#| File One | Comparison | File Two |"
set theFormatter to "| :--- | :--- | :---: | :--- |"
set theLinefeed to linefeed
set theString to current application's NSString's stringWithFormat_("%@%@%@%@%@", theHeader, theLinefeed, theFormatter, theLinefeed, theDifference)
set theFile to desktopFolder's stringByAppendingPathComponent:"File Compare.txt"
theString's writeToFile:theFile atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)

tell application "iA Writer"
	activate
	open theFile as «class furl»
end tell

on getSubstring:aString
	set theRange to aString's rangeOfString:".{0,30}" options:1024
	set aString to (aString's substringWithRange:theRange)
end getSubstring:

on equalizeArray:anArray |count|:aCount
	repeat aCount times
		anArray's addObject:""
	end repeat
	return anArray
end equalizeArray:|count|:
1 Like

This script includes Jonas’ excellent suggestions plus a few other enhancements. The script:

  • shows line numbers;
  • optionally removes leading/trailing whitespace and blank lines;
  • uses the Compare method, which is set as case and diacritic insensitive; and
  • formats the text output as a markdown table.

In the following example, the fifth line of File One was truncated at 30 characters from a line that contained 67 characters.

The timing result with the test files was 24 milliseconds.

use framework "Foundation"
use scripting additions

on main()
	set fileOne to "/Users/Robert/Documents/File One.txt" --set to desired value or replace with dialog
	set stringOne to current application's NSString's stringWithContentsOfFile:fileOne encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
	set stringOne to cleanString(stringOne) --disable if desired
	set arrayOne to stringOne's componentsSeparatedByString:linefeed
	set arrayOneCount to arrayOne's |count|()
	
	set fileTwo to "/Users/Robert/Documents/File Two.txt" --set to desired value or replace with dialog
	set stringTwo to current application's NSString's stringWithContentsOfFile:fileTwo encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
	set stringTwo to cleanString(stringTwo) --disable if desired
	set arrayTwo to stringTwo's componentsSeparatedByString:linefeed
	set arrayTwoCount to arrayTwo's |count|()
	
	if arrayOneCount is less than or equal to arrayTwoCount then
		set shortCount to arrayOneCount
		set longCount to arrayTwoCount
	else
		set shortCount to arrayTwoCount
		set longCount to arrayOneCount
	end if
	
	set theDifference to current application's NSMutableArray's new()
	set emptyString to current application's NSString's stringWithString:""
	
	repeat with i from 0 to (shortCount - 1)
		set lineNumber to (i + 1)
		set stringOne to (arrayOne's objectAtIndex:i)
		set substringOne to getSubstring(stringOne)
		set stringTwo to (arrayTwo's objectAtIndex:i)
		set substringTwo to getSubstring(stringTwo)
		if (stringOne's compare:stringTwo options:129) is 0 then --option 129 is case and diacritic insensitive
			set theString to current application's NSString's stringWithFormat_("| %@ | %@ | = | %@ |", lineNumber, substringOne, substringTwo)
		else if (stringOne's compare:emptyString options:129) is 0 then
			set theString to current application's NSString's stringWithFormat_("| %@ | %@ | < | %@ |", lineNumber, substringOne, substringTwo)
		else if (emptyString's compare:stringTwo options:129) is 0 then
			set theString to current application's NSString's stringWithFormat_("| %@ | %@ | > | %@ |", lineNumber, substringOne, substringTwo)
		else if (stringOne's compare:stringTwo options:129) is not 0 then
			set theString to current application's NSString's stringWithFormat_("| %@ | %@ | \\| | %@ |", lineNumber, substringOne, substringTwo)
		else
			display dialog "An unexpected error has occurred" buttons {"OK"} cancel button 1 default button 1
		end if
		(theDifference's addObject:theString)
	end repeat
	
	if shortCount is equal to arrayOneCount then
		repeat with i from shortCount to (longCount - 1)
			set lineNumber to (i + 1)
			set stringTwo to (arrayTwo's objectAtIndex:i)
			set substringTwo to getSubstring(stringTwo)
			set theString to current application's NSString's stringWithFormat_("| %@ | %@ | < | %@ |", lineNumber, emptyString, substringTwo)
			(theDifference's addObject:theString)
		end repeat
	else
		repeat with i from shortCount to (longCount - 1)
			set lineNumber to (i + 1)
			set stringOne to (arrayOne's objectAtIndex:i)
			set substringOne to getSubstring(stringOne)
			set theString to current application's NSString's stringWithFormat_("| %@ |  %@ | > | %@ |", lineNumber, substringOne, emptyString)
			(theDifference's addObject:theString)
		end repeat
	end if
	
	set theDifference to (theDifference's componentsJoinedByString:linefeed)
	set theHeader to "| Line | File One | Comparison | File Two |"
	set theFormatter to "| :---: | :--- | :---: | :--- |"
	set theLinefeed to linefeed
	set theString to current application's NSString's stringWithFormat_("%@%@%@%@%@", theHeader, theLinefeed, theFormatter, theLinefeed, theDifference)
	set theFolder to current application's NSHomeDirectory()'s stringByAppendingPathComponent:"Desktop"
	set theFile to theFolder's stringByAppendingPathComponent:"File Compare.md"
	theString's writeToFile:theFile atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
end main

on cleanString(theString) --enable or disable any of the following
	set theString to theString's stringByReplacingOccurrencesOfString:"(?m)^\\h+|\\h+$" withString:"" options:1024 range:{0, theString's |length|()} --remove leading and trailing whitespace every line
	--set theString to theString's stringByReplacingOccurrencesOfString:"^\\s*\\R" withString:"" options:1024 range:{0, theString's |length|()} --remove blank lines at beginning of string
	set theString to theString's stringByReplacingOccurrencesOfString:"\\s*$" withString:"" options:1024 range:{0, theString's |length|()} --remove blank lines at end of string
	--set theString to theString's stringByReplacingOccurrencesOfString:"(\\R)\\s*\\R" withString:"$1" options:1024 range:({0, theString's |length|()}) --remove blank lines except at beginning and end of string
end cleanString

on getSubstring(stringOne)
	set theRange to stringOne's rangeOfString:".{0,50}" options:1024 --set 50 to desired value
	set stringTwo to (stringOne's substringWithRange:theRange)
	if (stringOne's compare:stringTwo options:129) is not 0 then set stringTwo to stringTwo's stringByAppendingString:"..."
	return stringTwo
end getSubstring

main()

Something like that?

-- remove empty line(s) at start of text
set theString to theString's stringByReplacingOccurrencesOfString:"\\A\\s+" withString:"" options:1024 range:{0, theString's |length|()}
-- the same at end
set theString to theString's stringByReplacingOccurrencesOfString:"\\s+\\Z" withString:"" options:1024 range:{0, theString's |length|()}

Thanks Jonas–that works great.

I happened upon an app that does a good job of displaying markdown files with the Quick Look utility. It’s free but infrequently shows a nag screen asking that you buy the developer a coffee. The app is not notarized or signed, which is always a concern. A screenshot of a simple example:

The app’s GitHub site:

I wrote the scripts in this thread just for practice. However, I thought I would investigate how the sdiff utility works to see if I might modify my script to work similarly. Unfortunately, the operation of sdiff is not clear to me.

The sdiff man page described its operation as follows:

sdiff displays two files side by side, with any differences between the two highlighted as follows: new lines are marked with ‘>’; deleted lines are marked with ‘<’; and changed lines are marked with ‘|’.

The man page does not explain what constitutes a new line, deleted line, or changed line. I thought an example might help, so I created two test files. File One contained:

line one
no match

line three
no match 1
line five

File Two contained:

line one
no match 2
line three
no match

line five

I couldn’t get sdiff to run with the do shell script command. I instead ran the test command in a Terminal window and got the following:

I do not understand the above result and wondered if someone could explain it to me.

In a related question, what comparison logic should be employed with the following files:

The sdiff utility returns the following, which does make sense to me:

I renamed your text files to avoid any noise related to spaces in names, and deleted the spaces at the beginning of each line to simplify posting here.

do shell script "cd ~/Desktop; sdiff fileOne fileTwo"
error "line one						line one
							      >	no match 2
							      >	line three
no match							no match

line three						      <
no match 1						      <
line five							line five" number 1

The key, I think, to understanding the output is that unlike many shell utilities, it isn’t acting on a line by line basis. It’s comparing the entire files and trying to line up the lines before performing the diff.

  • The line no match 2 is only in fileTwo so it is returned with a > indicator.
  • line three in fileTwo, isn’t matched before the next line, therefore >
  • the no match text is in line 2 of fileOne and line 3 of fileTwo, so that’s a match
  • line three in fileOne, in the space after the matching line above, is only in fileOne and is returned with a < indicator
  • no match 1 is only in fileOne, therefore<

In essence, the diff hinges on the no match line in each file. Another way to think of it is that there are actually three diffs being performed:

  1. zone above the no match line
  2. no match line
  3. zone below the no match line

Since fileOne has only a single line above while fileTwo has three such lines, there are two > in the above zone. This is reversed in the below zone, where fileOne has three lines and fileTwo has only one; therefore there are two < in the below zone.

NB error number 1 is part of the output when the files have mismatches. If you output to a file by appending >fileThree then the file’s contents will match the terminal output without the error. Script Editor will still thrown an error, although it looks more sensible (although it’s actually the same error): error "The command exited with a non-zero status." number 1, just without the returned results.

do shell script "cd ~/Desktop; sdiff fileOne fileTwo >fileThree"

It’s a weird command that way and I’m not sure how to handle it in script editor other than by redirecting to a file.

You may find this GNU diffutils documentation helpful. It includes an example that demonstrates what I meant above by a ‘file’ versus ‘line’ approach.

1 Like

I find it helps to think of it not so much as “here are the differences between these two files” as it is “here are the changes needed to turn the first file into the second file”. You take the input from file 1, then insert|delete|change lines as needed to get the output of file2.

1 Like

Thanks @Mockman and @roosterboy for the great explanations. I think I understand now.

I have some preliminary thoughts on how I might get my script to work in a similar fashion. I’ll work on that.

I tested the script contained below with the two test files in post 11, and the script returned the exact same results as the sdiff utility. For example:

The script needs further testing and optimization, but I was encouraged to make this much progress. The sdiff utility has a “|” comparison character, and I don’t know where that would apply. Also, the sdiff utility considers incomplete lines, but my script doesn’t.

use framework "Foundation"
use scripting additions

on main()
	set fileOne to "/Users/Robert/Documents/File One.txt" --set to desired value or replace with dialog
	set stringOne to current application's NSString's stringWithContentsOfFile:fileOne encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
	set stringOne to cleanString(stringOne) --disable if desired
	set arrayOne to stringOne's componentsSeparatedByString:linefeed
	set arrayOneCount to arrayOne's |count|()
	
	set fileTwo to "/Users/Robert/Documents/File Two.txt" --set to desired value or replace with dialog
	set stringTwo to current application's NSString's stringWithContentsOfFile:fileTwo encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
	set stringTwo to cleanString(stringTwo) --disable if desired
	set arrayTwo to stringTwo's componentsSeparatedByString:linefeed
	
	set theDifference to current application's NSMutableArray's new()
	set emptyString to current application's NSString's stringWithString:""
	set adjustedArrayTwo to arrayTwo's mutableCopy()
	
	repeat with i from 0 to (arrayOneCount - 1)
		set stringOne to (arrayOne's objectAtIndex:i)
		set substringOne to getSubstring(stringOne)
		try
			set stringTwo to (arrayTwo's objectAtIndex:i)
			set substringTwo to getSubstring(stringTwo)
		on error
			set stringTwo to emptyString
			set substringTwo to emptyString
		end try
		if (stringOne's compare:stringTwo options:129) is 0 then --fileOne line and fileTwo line match
			set theString to current application's NSString's stringWithFormat_("| %@ | = | %@ |", substringOne, substringTwo)
			(adjustedArrayTwo's removeObjectAtIndex:0)
			(theDifference's addObject:theString)
		else if (adjustedArrayTwo's containsObject:stringOne) is false then --fileOne line is not in fileTwo
			set theString to current application's NSString's stringWithFormat_("| %@ | < | %@ |", substringOne, emptyString)
			(theDifference's addObject:theString)
		else --add unmatched fileTwo lines then matched fileTwo lines then exit repeat
			set subtractArray to current application's NSMutableArray's new()
			repeat with anItem in adjustedArrayTwo
				if (stringOne's compare:anItem options:129) is not 0 then
					set theString to current application's NSString's stringWithFormat_("| %@ | > | %@ |", emptyString, anItem)
					(subtractArray's addObject:anItem)
					(theDifference's addObject:theString)
				else
					set theString to current application's NSString's stringWithFormat_("| %@ | = | %@ |", substringOne, anItem)
					(subtractArray's addObject:anItem)
					(theDifference's addObject:theString)
					exit repeat
				end if
			end repeat
			(adjustedArrayTwo's removeObjectsInArray:subtractArray)
		end if
	end repeat
	
	repeat with anItem in adjustedArrayTwo --remaining items in arrayTwo
		set theString to current application's NSString's stringWithFormat_("| %@ | > | %@ |", emptyString, anItem)
		(theDifference's addObject:theString)
	end repeat
	
	set theDifference to (theDifference's componentsJoinedByString:linefeed)
	set theHeader to "| File One | Comparison | File Two |"
	set theFormatter to "| :--- | :---: | :--- |"
	set theLinefeed to linefeed
	set theString to current application's NSString's stringWithFormat_("%@%@%@%@%@", theHeader, theLinefeed, theFormatter, theLinefeed, theDifference)
	set theFolder to current application's NSHomeDirectory()'s stringByAppendingPathComponent:"Desktop"
	set theFile to theFolder's stringByAppendingPathComponent:"File Compare.md"
	theString's writeToFile:theFile atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
end main

on cleanString(theString) --enable or disable any of the following
	set theString to theString's stringByReplacingOccurrencesOfString:"(?m)^\\h+|\\h+$" withString:"" options:1024 range:{0, theString's |length|()} --remove leading and trailing whitespace every line
	--set theString to theString's stringByReplacingOccurrencesOfString:"^\\s*\\R" withString:"" options:1024 range:{0, theString's |length|()} --remove blank lines at beginning of string
	set theString to theString's stringByReplacingOccurrencesOfString:"\\s*$" withString:"" options:1024 range:{0, theString's |length|()} --remove blank lines at end of string
	--set theString to theString's stringByReplacingOccurrencesOfString:"(\\R)\\s*\\R" withString:"$1" options:1024 range:({0, theString's |length|()}) --remove blank lines except at beginning and end of string
end cleanString

on getSubstring(stringOne)
	set theRange to stringOne's rangeOfString:".{0,50}" options:1024 --set 50 to desired value
	set stringTwo to (stringOne's substringWithRange:theRange)
	if (stringOne's compare:stringTwo options:129) is not 0 then set stringTwo to stringTwo's stringByAppendingString:"..."
	return stringTwo
end getSubstring

main()

Considering…

file docleft

line two  
line three  
line four  
line Five and dimes

file docright

line one
line two
line three
line four
line fives and dime

How would your script address that docleft is missing a line at the top compared to docright? If your approach is exclusively line by line, then all lines will be considered different. In reality, the majority of two documents’ lines are identical.

I included the minor discrepancies in the last line (the ‘F’ and the plurals) to show when a ‘|’ indicator might appear. Either of the discrepancies is sufficient to cause this.

1 Like

Ken. Thanks for looking at my script and for the helpful example. My script contained an error, which I’ve fixed. In other respects, I won’t be able to make my script functionally equivalent to sdiff. FWIW, the results from my revised script and from sdiff:

I thought I would add a quick final note to this thread.

The sdiff utility uses the diff utility to compare two files, and the operation of this utility is described on the page linked by Mockman as follows:

When comparing two files, diff finds sequences of lines common to both files, interspersed with groups of differing lines called hunks. Comparing two identical files yields one sequence of common lines and no hunks, because no lines differ. Comparing two entirely different files yields no common lines and one large hunk that contains all lines of both files. In general, there are many ways to match up lines between two given files. diff tries to minimize the total hunk size by finding large sequences of common lines interspersed with small hunks of differing lines.

Obviously, my AppleScript is not going to match this functionality, and I never should have attempted this. However, I did learn a lot and that’s always a good thing.

1 Like

The script included below roughly mimics the functionality of the comm utility, which is described in its man page as:

The comm utility reads file1 and file2, which should be sorted lexically, and produces three text columns as output: lines only in file1; lines only in file2; and lines in both files.

The contents of my test files were:

The output of the comm utility was:

The output of my script was:

Note should be made of the settings in the cleanString and getSubstring handlers of the script. The timing result using two marginally-different versions of the script as test files was 38 milliseconds.

--revised 2024.08.25

use framework "Foundation"
use scripting additions

on main()
	set fileOne to "/Users/Robert/Documents/File One.txt" --set to desired value or replace with dialog
	set stringOne to current application's NSString's stringWithContentsOfFile:fileOne encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
	if stringOne is (missing value) then display dialog "File One not found" buttons {"OK"} cancel button 1 default button 1
	set stringOne to cleanString(stringOne) --disable if desired
	set arrayOne to (stringOne's componentsSeparatedByString:linefeed)
	set arrayOne to (arrayOne's sortedArrayUsingSelector:"localizedStandardCompare:") --disable if desired and below
	
	set fileTwo to "/Users/Robert/Documents/File Two.txt" --set to desired value or replace with dialog
	set stringTwo to current application's NSString's stringWithContentsOfFile:fileTwo encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
	if stringTwo is (missing value) then display dialog "File Two not found" buttons {"OK"} cancel button 1 default button 1
	set stringTwo to cleanString(stringTwo) --disable if desired
	set arrayTwo to (stringTwo's componentsSeparatedByString:linefeed)
	set arrayTwo to (arrayTwo's sortedArrayUsingSelector:"localizedStandardCompare:") --disable if desired and above
	
	set arrayOneUnique to current application's NSMutableArray's new()
	set arrayTwoUnique to arrayTwo's mutableCopy()
	set commonArray to current application's NSMutableArray's new()
	repeat with anItem in arrayOne
		if (arrayTwoUnique's containsObject:anItem) is true then --item in both arrays
			(commonArray's addObject:anItem)
			(arrayTwoUnique's removeObjectAtIndex:(arrayTwoUnique's indexOfObject:anItem))
		else --item in arrayOne only
			(arrayOneUnique's addObject:anItem)
		end if
	end repeat
	
	set theCounts to {arrayOneUnique's |count|(), arrayTwoUnique's |count|(), commonArray's |count|()}
	set theCounts to current application's NSArray's arrayWithArray:theCounts
	set theCount to (theCounts's valueForKeyPath:"@max.self") as integer
	
	set theTable to current application's NSMutableArray's new()
	set emptyString to current application's NSString's stringWithString:""
	repeat with i from 0 to (theCount - 1)
		try
			set columnOne to getSubstring(arrayOneUnique's objectAtIndex:i)
		on error
			set columnOne to emptyString
		end try
		try
			set columnTwo to getSubstring(arrayTwoUnique's objectAtIndex:i)
		on error
			set columnTwo to emptyString
		end try
		try
			set columnThree to getSubstring(commonArray's objectAtIndex:i)
		on error
			set columnThree to emptyString
		end try
		set aRow to current application's NSString's stringWithFormat_("| %@ | %@ | %@ |", columnOne, columnTwo, columnThree)
		(theTable's addObject:aRow)
	end repeat
	set theTable to (theTable's componentsJoinedByString:linefeed)
	
	set theHeader to "| File One Only | File Two Only | Both Files |"
	set theFormatter to "| :--- | :--- | :--- |"
	set theLinefeed to linefeed
	set theString to current application's NSString's stringWithFormat_("%@%@%@%@%@", theHeader, theLinefeed, theFormatter, theLinefeed, theTable)
	set theFolder to current application's NSHomeDirectory()'s stringByAppendingPathComponent:"Desktop"
	set theFile to theFolder's stringByAppendingPathComponent:"File Compare.md"
	theString's writeToFile:theFile atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
end main

on cleanString(theString) --disable any of the following
	set theString to theString's stringByReplacingOccurrencesOfString:"([*#$>|])" withString:"\\\\$1" options:1024 range:{0, theString's |length|()} --escape markdown characters
	set theString to theString's stringByReplacingOccurrencesOfString:"(?m)^\\h+|\\h+$" withString:"" options:1024 range:{0, theString's |length|()} --remove leading and trailing whitespace every line
	set theString to theString's stringByReplacingOccurrencesOfString:"^\\s*\\R|\\s*$" withString:"" options:1024 range:{0, theString's |length|()} --remove blank lines at beginning and end of string
	set theString to theString's stringByReplacingOccurrencesOfString:"(\\R)\\s*\\R" withString:"$1" options:1024 range:({0, theString's |length|()}) --remove blank lines except at beginning and end of string
end cleanString

on getSubstring(theString) --truncate lines at 50 characters
	set theRange to theString's rangeOfString:".{0,50}" options:1024
	set theSubstring to (theString's substringWithRange:theRange)
	if (theString's compare:theSubstring options:129) is not 0 then set theSubstring to theSubstring's stringByAppendingString:"..."
	return theSubstring
end getSubstring

main()

The comm utility has an option not to display lines common to both files, and the following script mimics this behavior:

use framework "Foundation"
use scripting additions

on main()
	set fileOne to "/Users/Robert/Documents/File One.txt" --set to desired value or replace with dialog
	set stringOne to current application's NSString's stringWithContentsOfFile:fileOne encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
	if stringOne is (missing value) then display dialog "File One not found" buttons {"OK"} cancel button 1 default button 1
	set stringOne to cleanString(stringOne) --disable if desired
	set arrayOne to (stringOne's componentsSeparatedByString:linefeed)
	set arrayOne to (arrayOne's sortedArrayUsingSelector:"localizedStandardCompare:") --disable if desired and below
	
	set fileTwo to "/Users/Robert/Documents/File Two.txt" --set to desired value or replace with dialog
	set stringTwo to current application's NSString's stringWithContentsOfFile:fileTwo encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
	if stringTwo is (missing value) then display dialog "File Two not found" buttons {"OK"} cancel button 1 default button 1
	set stringTwo to cleanString(stringTwo) --disable if desired
	set arrayTwo to (stringTwo's componentsSeparatedByString:linefeed)
	set arrayTwo to (arrayTwo's sortedArrayUsingSelector:"localizedStandardCompare:") --disable if desired and above
	
	set arrayOneUnique to current application's NSMutableArray's new()
	set arrayTwoUnique to arrayTwo's mutableCopy()
	repeat with anItem in arrayOne
		if (arrayTwoUnique's containsObject:anItem) is true then --item in both arrays
			(arrayTwoUnique's removeObjectAtIndex:(arrayTwoUnique's indexOfObject:anItem))
		else --item in arrayOne only
			(arrayOneUnique's addObject:anItem)
		end if
	end repeat
	
	set theCount to arrayOneUnique's |count|()
	set theOtherCount to arrayTwoUnique's |count|()
	if theOtherCount is greater than theCount then set theCount to theOtherCount
	
	set theTable to current application's NSMutableArray's new()
	set emptyString to current application's NSString's stringWithString:""
	repeat with i from 0 to (theCount - 1)
		try
			set columnOne to getSubstring(arrayOneUnique's objectAtIndex:i)
		on error
			set columnOne to emptyString
		end try
		try
			set columnTwo to getSubstring(arrayTwoUnique's objectAtIndex:i)
		on error
			set columnTwo to emptyString
		end try
		set aRow to current application's NSString's stringWithFormat_("| %@ | %@ |", columnOne, columnTwo)
		(theTable's addObject:aRow)
	end repeat
	set theTable to (theTable's componentsJoinedByString:linefeed)
	
	set theHeader to "| File One Only | File Two Only |"
	set theFormatter to "| :--- | :--- |"
	set theLinefeed to linefeed
	set theString to current application's NSString's stringWithFormat_("%@%@%@%@%@", theHeader, theLinefeed, theFormatter, theLinefeed, theTable)
	set theFolder to current application's NSHomeDirectory()'s stringByAppendingPathComponent:"Desktop"
	set theFile to theFolder's stringByAppendingPathComponent:"File Compare.md"
	theString's writeToFile:theFile atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
end main

on cleanString(theString) --disable any of the following
	set theString to theString's stringByReplacingOccurrencesOfString:"([*#$>|])" withString:"\\\\$1" options:1024 range:{0, theString's |length|()} --escape markdown characters
	set theString to theString's stringByReplacingOccurrencesOfString:"(?m)^\\h+|\\h+$" withString:"" options:1024 range:{0, theString's |length|()} --remove leading and trailing whitespace every line
	set theString to theString's stringByReplacingOccurrencesOfString:"^\\s*\\R|\\s*$" withString:"" options:1024 range:{0, theString's |length|()} --remove blank lines at beginning and end of string
	set theString to theString's stringByReplacingOccurrencesOfString:"(\\R)\\s*\\R" withString:"$1" options:1024 range:({0, theString's |length|()}) --remove blank lines except at beginning and end of string
end cleanString

on getSubstring(theString) --truncate lines at 50 characters
	set theRange to theString's rangeOfString:".{0,50}" options:1024
	set theSubstring to (theString's substringWithRange:theRange)
	if (theString's compare:theSubstring options:129) is not 0 then set theSubstring to theSubstring's stringByAppendingString:"..."
	return theSubstring
end getSubstring

main()

A screenshot of script output where the test files were scripts: