Comparing files in different folders

Let me try to make this as simple as possible…

How could you compare to files in two different folders to see if they are the same or different…Any help would be greatly appreciated…

Well, do you only care about comparing the data forks of the files in question? In other words, only the part that UNIX, Linux, and Windows machines would see? Examples: text, jpeg, html, etc.

If you need to include the resource fork (ie. Mac applications, mac-specific data files), then this won’t work. Otherwise, all you need to do is this:


set someFile1 to "Macintosh HD:Users:YOU:somefile1"
set someFile2 to "Macintosh HD:Users:YOU:somefile2"

dataForkSame(someFile1, someFile2)

on dataForkSame(someFile1, someFile2)
	-- version 1.0, Daniel A. Shockley, http://www.danshockley.com
	-- compares the data forks (not resource forks!) of two file.
	-- It returns the boolean TRUE if they are the same
	
	try
		-- first convert the Mac file path/alias to a quoted Posix path
		set someFile1 to quoted form of POSIX path of (someFile1 as string)
		set someFile2 to quoted form of POSIX path of (someFile2 as string)
		
		do shell script "diff " & someFile1 & " " & someFile2
		return true
	on error errMsg number errNum
		-- the shell script errors if they are different, so catch that
		if errMsg ends with "differ" then
			return false
		else
			error "dataForkSame FAILED: " & errMsg number errNum
		end if
	end try
end dataForkSame

I’m not sure what tool you should use to compare the resource forks, as well. Anyone got one? I’m not sure when you’d need to compare those, though. What is it you’re trying to accomplish by this?

I also don’t understand what you are trying to accomplish. If you are just looking for exact duplicates without going into the contents, this may help:

compare_files((choose file with prompt "Choose File 1:"), (choose file with prompt "Choose File 2:"))

on compare_files(file_1, file_2)
	set {info_1, info_2} to {info for (file_1), info for (file_2)}
	if (name of info_1) is not (name of info_2) then
		return false
	else if (creation date of info_1) is not (creation date of info_2) then
		return false
	else if (modification date of info_1) is not (modification date of info_2) then
		return false
	else if (size of info_1) is not (size of info_2) then
		return false
	end if
	return true
end compare_files

Jon

Would md5sum be superior to diff in this exercise? I was wondering if this would also include the resource fork in the comparison.

Andy

Unfortunately, md5 also does not include the resource fork. I modified the resource fork of a copy of my testfile1 and they both still show the same md5sum.

Jonn8, I believe he is trying to see “whether two files are the same or different.” If they are the same data, but have different modification dates, most likely he would want to know they are the same. Also, even if they were created simultaneously, modified simultaneously (both are second-resolution, so this is possible), and have the same size, they may have different contents.

In short, it would be helpful for stuel (the OP) to chime in and explain in more detail what particular things he needed to compare: the files’ content, their meta-data, etc and whether he needs to account for resource forks, or they can be ignored.

Stuel? Help? :slight_smile:

Thank you for your replies…Sorry I was too simple and unclear about it. Right now we keep electronic case files. Every file in the Case folder is a PDF that has been scanned. We have a remote employee (has a copy of a case folder) and someone in the office (has the same copy of a case folder) who both are making changes to the files (adding comments…)in these case folders. The remote employee comes in the office about once a week. He wants to be able to run a script that will allow him to compare each copy of a particular case folder (his copy and the person’s copy in the office) and then update each person’s folder so they reflect the changes both people made. Hope this makes more sense…

Good news on one front! If the files you want to compare the contents of are PDFs, then my example using ‘diff’ will work for you.

Now, however, you have to face another problem: what if BOTH users made changes to the SAME file? That gets a lot tougher - you need something that can do comparisons specifically of your PDF’s contents, and give some interface where the user can merge changes together.

Perhaps you’ve already thought of this, and if the two files are different, you just give each user a copy of both, with the user’s name appended to the file name?

cmp may be a better choice than diff as it has less overhead. cmp just reports if the files are the same or not, whereas diff, by default, will output the differences of the two files. But since PDF files are not text files, this may be a moot point. YMMV.

True. It looks as though cmp would be noticeably (?) faster on large files where there are many differences. It sounds in the OP’s case, though, that he will be comparing fairly similar files. It may even be useful to see what changes there are, although as you noted this won’t help much with PDF files.

If you want to actually get quick overviews of changes to the text within a PDF, take a look at pdftotext. It’s a command-line utility that lets you convert PDFs to text, and has an option to attempt (not always very good) to preserve page layout. The main site for the package pdftotext comes with (xpdf) is http://www.foolabs.com/xpdf/, but the first link is just pdftotext. Zpdf requires a lot more work, including compiling the entire package yourself. I believe the first link is a pre-compiled binary of pdftotext. You could also use fink to install the whole xpdf package.