Compatibility with CR and LF and CRLF

ldicroce · October 20, 2019, 5:33pm

The script below doesn’t work as expect if I use “return”. And if I have a file with paragraphs separated by return, I cannot extract information using grep.
I need to convert CR into LF using “tr ‘\r/’ ‘\n/’” (see script below).
But if I use linefeed (instead of return) everything works as expected.
BBedit does not highlight the differences if CR has been used or LF.
So how can I enforce LF rather than CR when I write a text to a file to ensure compatibility with further manipulation?

Thanks

set myList to {"19064739	10.1182", "20887948	10.1016", "22094252	10.1182", "19064739	10.1182"}
set MyText to ""
set the_path to "Users:ldicroce:Desktop:Test3.txt"

set myLineBreacker to return -- "linefeed""would work" 
repeat with i in myList
	set MyText to MyText & (i as string) & myLineBreacker
end repeat

try
	open for access (file the_path) with write permission
	write MyText to file the_path
	close access (file the_path)
on error
	close access (file the_path)
end try

# grab  all lines containing a string ("22094252")
set mycommand to "grep " & "'" & "22094252" & "'" & space & (the_path's POSIX path)
set myresult to do shell script mycommand

# covert the CR into LF or CRLF
set the_pathCovertedFile to "Users:ldicroce:Desktop:Test2.txt"
set myConversion to do shell script "tr '\\r/' '\\n/'  < " & (the_path's POSIX path) & " > " & (the_pathCovertedFile's POSIX path)

# grab  all lines containing a string ("22094252")  
set mycommand to "grep " & "'" & "22094252" & "'" & space & (the_pathCovertedFile's POSIX path)
set myresult2 to do shell script mycommand

Marc_Anthony · October 20, 2019, 7:43pm

Hi. This can be done without a file object:

set myList to {"19064739    10.1182", "20887948    10.1016", "22094252    10.1182", "19064739    10.1182"}
set MyText to ""
set myLineBreacker to return
repeat with i in myList
	set MyText to MyText & (i as string) & myLineBreacker
end repeat
do shell script "echo " & MyText's quoted form & " | tr  '\\r' '\\n' | grep   '22094252' "

or, more simply, using TIDs:

set text item delimiters to linefeed
set MyText to {"19064739    10.1182", "20887948    10.1016", "22094252    10.1182", "19064739    10.1182"} as text
do shell script "echo " & MyText's quoted form & " | grep   '22094252' "

ldicroce · October 20, 2019, 8:06pm

Thanks Marc Anthony, this is very useful.
I am learning new things every day. I will keep in mind.

But I often relay on passing information among scripts using files (shared in dropbox among my Macs: work and home).
My question was in this direction: how to prevent cases where I might face this problem.
Is there a general and simple way to prevent this?
Should I use any special format when I save text files? Such as coding UTF-8, or something similar.
I really know very little about this.
Thanks !

Nigel_Garvey · October 20, 2019, 9:31pm

I’m not clear if you’re trying to create a text file with linefeed line endings or convert one which may have other types.

In vanilla AppleScript, list to text is:

set myList to {"19064739    10.1182", "20887948    10.1016", "22094252    10.1182", "19064739    10.1182"}

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to linefeed
set MyText to myList as text
set AppleScript's text item delimiters to astid

MyText

If you already have a text and you don’t know what the line endings are:

set MyText to "19064739    10.1182" & return & ¬
	"20887948    10.1016" & linefeed & ¬
	"22094252    10.1182" & return & linefeed & ¬
	"19064739    10.1182"

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to linefeed
set MyText to MyText's paragraphs as text
set AppleScript's text item delimiters to astid

MyText

Shane_Stanley · October 20, 2019, 11:09pm

When you use the scripting write (or read) command to write (or read) a text file, it defaults to what it originally used, which was the native encoding being used on the Mac running it – mostly MacRoman. That was the way applications worked when Macs were first introduced, in the days when Unicode was in its infancy and UTF-8 didn’t exist, and it continued to be used until OS X.

Fast-forward 30 years and MacRoman (along with it’s other-language counterparts) is all but dead. And on Macs, the preferred line-break character is now linefeed, reflecting OS X’s Unix heritage.

If you speak English and never allow accents, or emojis, or full mathematical symbols, or… you can keep living in the MacRoman world. It’s a classic case of it works until it doesn’t. But it’s generally much better to use UTF-8. Unfortunately no AppleScript terminology was ever defined for this, so you have use «class utf8».

One of the differences with UTF-8 is that if you try to read a file that is not UTF-8 as UTF-8, if it uses any characters beyond the basics, you will get an error. In most circumstances this is a good warning – MacRoman will simply produce garbage characters in the same situation.

Like all transitions, there can be side-effects if you change, especially if you hard-code for particular line-breaks.That’s where AppleScript’s parapgraphs property is so useful, being line-break agnostic.

ldicroce · October 21, 2019, 6:15am

Thanks to all.

I was confused by the existence of several ways to break lines, from where they came from, etc…
It is now clear to me that linefeed is usually a better option than CR (and the history behind that!).
And thanks also for the tools to convert among those.

Ciao
L

peavine · October 21, 2019, 2:24pm

It’s also never been clear to me when to use linefeed and when to use return. Both are defined in the AppleScript Language Guide as text/global constants. Just by watching what Nigel and other knowledgeable forum members do, I’ve begun using linefeed with text item delimiters, as in the following:

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to linefeed
set MyText to myList as text
set AppleScript's text item delimiters to astid

And, I use return with text as in the following:

display dialog "Line 1" & return & "Line 2"

Although, in both of the above scripts, I can substitute return for linefeed and linefeed for return and the scripts still work.

Shane_Stanley · October 22, 2019, 12:25am

The issue of return versus linefeed is sometimes just a matter of preference. NeXT was based on Unix so it used linefeeds, but when it came to combining it with the traditional Mac approach of returns to build OS X, they knew they had to cope with whatever was thrown at them. So the underlying text containers in macOS handle either more-or-less equally (as well as CRLF), and hence lots of apps do too.

But using returns alone as line breaks has never (that I know of) been used outside Macs, so it can take people by surprise.

However, the AppleScript compiler still uses returns. Try typing this in Script Editor:

one
two

Compile, then click just before the “t”. Hit return. Nothing happens – a linefeed gets inserted, but it’s following a return, so the two together get treated as a single CRLF.

It’s the whole CRLF issue that makes me personally prefer to stick to one or the other, and I reckon linefeed makes more sense. But as I said, in some situations it’s simple preference.

peavine · October 22, 2019, 3:37pm

Shane. Thanks for the explanation. Linefeed seems a more descriptitve term, and I’ll go with that.

kerflooey · October 23, 2019, 4:42pm

Hi. I thought I would bring up Adobe Tagged Text files. These have a header string, followed by a CR, and then the text. The problem is, if the text contains a linefeed to break a line of text (pretty common), this causes many programs to helpfully make all CRs and LFs one or the other if you try to edit the file. Which makes the tagged text file useless. It must contain both individual CRs and individual LFs.

For example, BBEdit will not honor or preserve the distinction between CR and LF (which seems pretty lame for a program meant to give you total control over text). So you can’t just open the tagged text file with BBEdit, make an edit and save the file, and then re-place it in the InDesign document.

Maybe someone can shed some light on this or suggest some workarounds or alternative apps for this scenario? I haven’t looked at this issue for a while.Thanks for any insights!

Shane_Stanley · October 23, 2019, 11:27pm

The real issue there is that the InDesign filters are lame. The Unicode one uses little-endian UTF16 and itself converts all returns and linefeeds to linefeeds. The ASCII one at least encodes linefeeds as <0x000A>.