Convert Mac CR to Windows CRLF

This one has been driving myself and the web developer crazy:

I have a Mac FileMaker database that exports a text file that is used to feed the display of a web site.

Previously an Excel Macro under windows created the text file. That text file works fine, and my FileMaker exports looks no different cosmetically than the original Windows file.

The problem is, the import routine on the web-end needs the Windows-style CRLF (carriage return and line feed) line endings. I’ve tried to get AppleScript to put the codes in via TID (ASCII character 13 & ASCII character 10, or “/r/n”) but it doesn’t seem to work (the developer’s script still not working…reads my data as one huge record).

When he converts my fle manually with Coda, all is well.

My FileMaker-generated file uses “/r” as the line ending.

I tried it via command-line (using do shell, which is where the AppleScript comes in), but not making any progress either, probably because I don’t understand what I’m doing:

Got this off the web for UNIX-to-Windows conversions:

awk 'sub("$", "\r")' unixfile.txt > winfile.txt 

(suspicious of this one because I don’t see the “/n” Windows needs)

Thought I was going somewhere with this, but Mac doesn’t seem to have a command-line Perl?

perl -p -e 's/\n/\r\n/' < unixfile.txt > winfile.txt

Found some Perl scripts that can be turned into command-line executables, but I know less about working with Perl than I do about the command line in general.

THE IDEA, is I get this working so FileMaker can kick-off the script to “repair” it’s own exports. It currently does this to convert “vertical tab” (it’s native record delimiter) to a Mac-style “return” (/r).

Okay, my head hurts just explaining all that. :wink:

SHORT FORM: need a command-line way to convert Mac line-endings to Window-style line endings.

This works for me:


do shell script "cd ~/desktop; awk 'sub(\"$\",\"\\r\")' unixfile.txt > winfile.txt"

As you can see, the biggest problem is often escaping the quotes and backslashes for AppleScript so that they are passed to the actual shell script.

This also worked:


do shell script "cd ~/desktop; perl -p -e 's/\\n/\\r\\n/' < unixfile.txt > winfile.txt"

P.S. I recommend checking [x] Escape tabs and line breaks in strings, in AppleScript Editor’s Preferences, Editing.

hi,

with satimage.osax may be as simple as

convert to Windows string

or you could use


--gives a list of all encodings
do shell script "iconv -l "

--how it works -> from encoding to encoding inputfile
set NewEncodedText to do shell script "iconv -f UTF-8 -t CP1252 " & quoted form of POSIX path of MyTextFile



awk and Perl didn’t work (I mean, they completed without a hitch, but still aren’t doing something the Windows world expects).

I also tried switching FileMaker to export as “Windows (ANSI)”, but that screws-up the French characters in the data, so went back to “Macintosh.”

I need to try Hans-Gerd’s ideas next, just have to move onto another fire short-term. I’ve downloaded Satimage, and will also look at the file with iconv (which may give me clues by checking out the old Windows-Excel-macro-generated data files compared to the new Filemaker-generated ones).

In the meantime, the dev just has to open my data files in Coda and convert the line endings that way. One step instead of the dozens he had before, at least it’s an improvement. :wink:

Those last two commands will convert line endings from Unix-style to DOS/Window-style, but if looks like you need a i Mac[/i]-style to DOS/Windows-style conversion.

Unix line ending is LF. Old-Mac line ending is CR (“old Mac” because many Mac OS X programs/environments use Unix ending). DOS line ending is CR LF.

So, Unix→DOS means “add CR before LF” and old-Mac→DOS means “add LF after CR”. There are probably no LF characters in your old-Mac line-ending file, so a Unix→DOS conversion would induce no changes.

Use Perl to read one old-Mac line at a time and write it as a DOS line (by appending an LF to each):

perl -015 -pe '$_.=qq(\n)' <mac-file >dos-file

In a generalized form to make it easy to change the source and target line endings:

perl -pe 'BEGIN {$/=qq(\r)}; $_.=qq(\r\n) if chomp' <mac-file >dos-file

Use Perl to read one Unix line at a time (which may be the whole file if it exclusively uses Mac line endings!) and write it out after converting any internal old-Mac lines into DOS lines:

perl -pe 's/\r/\r\n/g' <mac-file >dos-file

In a generalized form, to take any/mixed line endings (CR LF, CR, or LF) and convert them all to DOS line endings:

perl -pe 's/(:?\r\n?|\n)/\r\n/g' <mixed-file >dos-file

You can do the conversion in AppleScript, but you have to be careful how you handle the string. If you read text via do shell script at any point, you would want to use the without altering line endings option to make sure it does not do its usual DOS/Unix→old-Mac conversion (AppleScript is an “old Mac” environment, so it expects it strings to use CR line endings). Using read, write, and text item delimiters should work fine (mind the character encoding issues with read and write though).

Hmm, is FileMaker limited to using a single character for the delimiter? If not, could you just set the delimiter to the two character string CR LF?

The encoding issue (iconv, CP1252, UTF-8, MACROMAN, etc.) is a lower level (thus independent) issue than that of line endings (i.e. you have to know how to decode bytes into characters before you can decide where a line of characters breaks). The perl invocations above assume that the character encoding is a superset of ASCII (or at least it uses the same encoding for LF and CR as in ASCII; UTF-8, CP1252, ISO-8859-1 and MacRoman are such supersets of ASCII). If such processing needed to handle encodings that were not supersets of ASCII, then different code would be required.

I like the idea of this one, just to cover all my bases (since I’m not 100% sure what the line ending is at this point), and for future re-use flexibility. However, it made my head hurt turning this into an escaped-out “do shell” line. Also, I don’t know how to use the “without alternate endings”…does that go after the shell command (do shell script “somecode” without alternate endings)?

If you could give me the explicit line AppleScript verbiage, that would be helpful.

For reasons that FileMaker gives no explanation for, they use ASCII 11 (“vertical tab”) as the record separator when using Export Records ScriptMaker steps. Truly, no idea why, seems a carriage return is more straighfroward (and more compatible with other data-handling software). User doesn’t seem to get any choice in the matter either.

So I already have AppleScript use TIDs to swap “ASCII character 11” for “return”. That part seems to be working. Based on what I understood in your response, that means I have swapped them for the “old Mac” line ending, CR.

Which means technically I can use the CR-swapping perl code, but I like the idea of using a “catches any kind of line ending” code.

As usual, my thanks for holding my hand through this. You chrys always seem to have these UNIX-y things figured out. I was just getting the handle on grep and some awk, now there’s this perl stuff. ;p

Probably just doubling all the backslashes is enough.

(Apple)Script Editor will generate AppleScript syntax string values in its Result pane for the final value of a script. So just copy it out of the Result pane after pasting it into the dialog from running this code:

text returned of (display dialog "Enter your text" default answer "")

After applying this to the perl invocation and adding some variables, etc. you get something like this:

do shell script "perl -pe 's/(:?\\r\\n?|\\n)/\\r\\n/g' <" & quoted form of inputPathname & " >" & quoted form of outputPathname

Yes (except for the spelling): do shell script . without altering line endings. This is not needed in the bit of AppleScript above since the shell is sending the output of the Perl program to a file. If you left off the output redirection (and thus were collecting the output in AppleScript as the result of do shell script), you would want without altering line endings to prevent do shell script from converting the line endings to old-Mac style in the resulting AppleScript string value.

Let me see if I have your scenario right. In an AppleScript program you get a string value from FilerMaker. This string value is a sequence of “lines” that are each terminated by a single VT character (ASCII 11). You want to produce a file that includes these same lines, with characters encoded in UTF-8, but using CR LF to end each line.

If that all sounds right, then you can generate your file completely in AppleScript like this (the VT to CRLF bits should probably feel familiar; the key to “any line ending” in AppleScript is simply paragraphs of):

--  some value from FileMaker
set fmValue to _makeSomeVTTerminatedLines({"line 1", "line 2", "Line Three!", "Line IV?"})

-- supply a file object, alias, or HFS pathname (for use with "open for access")
set outputFile to choose file name


set lineList to listFromVTLines(fmValue)
--set lineList to listFromanyEOLLines(someValue)
writeUTF8ToFile(lineListToCRLFLines(lineList), outputFile)

to listFromVTLines(theVTString)
    set VT to ASCII character 11
    set text item delimiters to {VT}
    text items of theVTString
end listFromVTLines

to listFromanyEOLLines(theString)
    paragraphs of theString -- breaks on CR, CRLF, or LF
end listFromanyEOLLines

to lineListToCRLFLines(theLines)
    set CRLF to (ASCII character 13) & (ASCII character 10)
    if last item of theLines is not "" then
        display dialog "Hmm, input string was VT separated, not VT terminated. Output will be CRLF terminated anyway." giving up after 10
        set end of theLines to ""
    end if
    
    set text item delimiters to {CRLF}
    theLines as Unicode text
end lineListToCRLFLines

to writeUTF8ToFile(theString, outputFile)
    set fRef to missing value
    try
        set fRef to open for access outputFile with write permission
        set eof of fRef to 0
        write theString to fRef as «class utf8»
        close access fRef
    on error m number n
        if fRef is not missing value then
            close access fRef
        end if
        error m number n
    end try
end writeUTF8ToFile

to _makeSomeVTTerminatedLines(l)
    set text item delimiters to {ASCII character 11}
    (l & {}) as Unicode text -- append an extra empty string to make sure we get VT /terminated/ lines, not just VT /separated/ lines
end _makeSomeVTTerminatedLines

Hello.

This was or is a real great thread, treating a subject in its full depth.
-Quite enlightening. I’m a lot wiser than I believed I could be about line endings after reading this post. :slight_smile:

But this code of yours using multiple text item delimiters will only work under Snow Leopard wont it Chrys? -Since you are using a list with delimiters, and that is only supported in Snow Leopard according to this post in Adam Bells Text Item delimiters tutorial.
This is the handler with multiple text item delimiters I’m thinking of:

to lineListToCRLFLines(theLines)
   set CRLF to (ASCII character 13) & (ASCII character 10)
   if last item of theLines is not "" then
       display dialog "Hmm, input string was VT separated, not VT terminated. Output will be CRLF terminated anyway." giving up after 10
       set end of theLines to ""
   end if
   
   set text item delimiters to {CRLF}
   theLines as Unicode text
end lineListToCRLFLines

Best Regards

McUsr

I developed and tested the code on 10.4 (Tiger). It works at least that far back (and probably even further back). There is a caveat concerning forward compatibility though. As I understand it ASCII character was deprecated in 10.5, but it still works in 10.5 and 10.6. If you replace ASCII character with character id, it should even work on any potential future versions of AppleScript where ASCII character has been removed.

The bit of functionality that was added in 10.6 (Snow Leopard) is the use of multiple items in text item delimiters. See the AppleScript Release Notes for 10.6 id 1186965.

set text item delimiters to {"a", "b"} -- *** Multiple items
text items of "elasmobranchiate"
-- on < 10.6 --> {"el", "smobr", "nchi", "te"}
-- on 10.6 --> {"el", "smo", "r", "nchi", "te"}

set text item delimiters to {"an"} -- *** Single item, multiple characters
text items of "elasmobranchiate"
-- on all versions --> {"elasmobr", "chiate"}

My code uses a single item list (a single string that happens to have two characters), so it neither depends on, nor is affected by, that change in 10.6.

This has been a good thread, because it’s still a common back-end issue with “plain” text files across platforms.

It’s also made my head hurt. :stuck_out_tongue:

Sorry I haven’t checked-in, we’ve been too busy massaging the data and getting it to display properly to worry about automating the proper the line-endings (we just step the files through Coda and have it convert them).

Now that things have settled I’ve provided the developer with a file that was modified using chrys’ Perl…

do shell script "perl -pe 's/(:?\\r\\n?|\\n)/\\r\\n/g' <" & quoted form of inputPathname & " >" & quoted form of outputPathname

…and I should know in a couple days whether it worked or not.

Thanks again for everyone’s help!