Conversions Nightmare Impossible Challenge?

How to convert an RTF file where I find hyperlinks and standard text?

The file is about 200 pages and ALWAYS keeps follows this format:

Intro text (max 4 or 5 words)
HYPERLINK 1 (Hides my student link to the web page with his data)
HYPERLINK 2 (Hides my student link to his home work page)
HYPERLINK 3 (Hides my student link to additional page)
Date In standard tex

Example:

Dear Prof (Plain text): [u]Jon Lake[/u] (hides http://www.college.com/FilmSCH/studentID/1234567)
[u]Meaning of Will[/u] (hides http://www.college.com/readmessage.cfm?message=14457847&page=50&orderby=12&display=rec)
[u]Pictorial Charts[/u] (hides http://www.college.comreadmessage.cfm?message=14532147&page=150&orderby=12&display=rec)
25-April-2011

I would like to get:

Dear Prof
Jon Lake
http://www.college.com/FilmSCH/studentID/1234567
Meaning of Will
http://www.college.com/readmessage.cfm?message=14457847&page=50&orderby=12&display=rec
Pictorial Charts[
25-april-2011

Converting from RTF to TXT looses the Links
Using TextEdit Manually is possible but it needs to select manually ang paste it manually

for 200 pages is a nightmare

I have found this script

however:

1st it does not return the Plain text part of the file so I miss the “date” information
It does not write the results to I file I can later import in Filemaker

set rtfFile to (choose file with prompt "Choose the RTF file.")

set filePath to rtfFile
set startDelimiter to "{HYPERLINK \""

set endDelimiter to "\"}}"

set hyperlinks to {}

set rtfText to read file filePath
set text item delimiters to startDelimiter
set theItems to text items of rtfText
if (count of theItems) is greater than 1 then
	set text item delimiters to endDelimiter
	repeat with i from 2 to count of theItems
		set a to text items of (item i of theItems)
		set end of hyperlinks to item 1 of a
	end repeat
end if
set text item delimiters to ""
return hyperlinks
writeTo(adjustedText, filePath, false, string)



(*==================== SUBROUTINES ===================*)
on writeTo(this_data, target_file, append_data, mode) -- append_data is true or false, mode is string etc. (no quotes around either)
	try
		set target_file to target_file as text
		set target_file to POSIX file target_file as text
		set the open_target_file to open for access file target_file with write permission
		if append_data is false then set eof of the open_target_file to 0
		write this_data to the open_target_file starting at eof as mode
		close access the open_target_file
		return true
	on error
		try
			close access file open_target_file
		end try
		return false
	end try
end writeTo

Thanks of the kind people who will help me solve this problem

Hi,

I wrote a very primitive Command Line Tool which extracts the links from an RTF text,
the AppleScript usage is

do shell script "/path/to/RTFLinkParser /path/to/file.rtf"

the result is the plain text. If there is a link in a paragraph the link is put after the plain text separated by a tab character.
If there is more than one link in a single paragraph only the first link is considered

You can download it here: RTFLinkParser
It should work on PPC and Intel 10.5 or higher

Thanks Stephan as always

Where do I need to put the script in my Mac? I am not familiar with UNIX unfortunately and I do not have the developper installed as I do not develop anything?

Also: is there a way to create an Applescript which allows to keep the plain text to the resulting document?

Example in plain english

Find file with prompt
set para to paragraphs in file
repeat
from para to last para
if para is unicode txt
copy para + tab
else
do shell script RTFLinkParser
end repeat

Regards and thanks

Danwan

RTFLinkParser parses always the whole file.
Save the executable wherever you want

You can easily convert RTF to txt with

do shell script "textutil -convert txt " & quoted form of POSIX path of (choose file of type "rtf")

Thanks Stefan:

I put the Parser in the Applescript MYSCripts folder

biut how choose the file to process?


set tFile to choose file with prompt "Choose the tfile" without invisibles

do shell script "/path/to/RTFLinkParser /path/to/file.rtf" & quoted form of POSIX path of tfile


also:


do shell script "/path/to/RTFLinkParser /path/to/file.rtf" & quoted form of POSIX path of (choose file of type "rtf")

I always get the same error:

error “sh: /path/to/RTFLinkParser: No such file or directory” number 127


set tFile to choose file with prompt "Choose the tfile" without invisibles
do shell script "/path/to/RTFLinkParser " & quoted form of POSIX path of tfile

replace /path/to/RTFLinkParser with the full (POSIX) path to the executable

If the executable is in the documents folder in you home directory, the path is

/Users/YOU/documents/RTFLinkParser

YOU is your short user name

→ error “2011-05-05 15:33:11.988 RTFLinkParser[2057:60f] (null): unrecognized selector sent to class 0x7fff70a88698
2011-05-05 15:33:11.990 RTFLinkParser[2057:60f] *** __NSAutoreleaseNoPool(): Object 0x100111610 of class NSCFString autoreleased with no pool in place - just leaking
2011-05-05 15:33:11.990 RTFLinkParser[2057:60f] *** __NSAutoreleaseNoPool(): Object 0x100111660 of class NSException autoreleased with no pool in place - just leaking
2011-05-05 15:33:11.990 RTFLinkParser[2057:60f] *** __NSAutoreleaseNoPool(): Object 0x1001159f0 of class _NSCallStackArray autoreleased with no pool in place - just leaking
2011-05-05 15:33:11.991 RTFLinkParser[2057:60f] *** __NSAutoreleaseNoPool(): Object 0x100115a50 of class _NSCallStackArray autoreleased with no pool in place - just leaking
2011-05-05 15:33:11.991 RTFLinkParser[2057:60f] *** __NSAutoreleaseNoPool(): Object 0x100115d70 of class NSCFString autoreleased with no pool in place - just leaking
2011-05-05 15:33:11.991 RTFLinkParser[2057:60f] *** __NSAutoreleaseNoPool(): Object 0x100116890 of class NSCFString autoreleased with no pool in place - just leaking
2011-05-05 15:33:11.991 RTFLinkParser[2057:60f] *** __NSAutoreleaseNoPool(): Object 0x100115e60 of class NSConcreteMutableData autoreleased with no pool in place - just leaking
2011-05-05 15:33:11.992 RTFLinkParser[2057:60f] *** Terminating app due to uncaught exception ‘NSInvalidArgumentException’, reason: ‘(null): unrecognized selector sent to class 0x7fff70a88698’
*** Call stack at first throw:
(
0 CoreFoundation 0x00007fff824157b4 __exceptionPreprocess + 180
1 libobjc.A.dylib 0x00007fff83f820f3 objc_exception_throw + 45
2 CoreFoundation 0x00007fff8246f1a0 __CFFullMethodName + 0
3 CoreFoundation 0x00007fff823e791f forwarding + 751
4 CoreFoundation 0x00007fff823e3a68 _CF_forwarding_prep_0 + 232
5 RTFLinkParser 0x0000000100000a0a main + 44
6 RTFLinkParser 0x00000001000009bc start + 52
)
terminate called after throwing an instance of ‘NSException’” number 1006

I tested it only with 10.6, no problems

It might help I hope give the TextWrangler files as I cannot post the original RTF

this is the first group:

{\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf350
{\fonttbl\f0\fswiss\fcharset0 ArialMT;\f1\fnil\fcharset0 LucidaGrande;}
{\colortbl;\red255\green255\blue255;\red38\green38\blue38;}
{*\listtable{\list\listtemplateid1\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{*\levelmarker {none}}{\leveltext\leveltemplateid1'00;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid1}}
{*\listoverridetable{\listoverride\listid1\listoverridecount0\ls1}}
\paperw11900\paperh16840\margl1440\margr1440\vieww9000\viewh8400\viewkind0
\deftab720
\pard\tx220\tx720\pardeftab720\li720\fi-720\sl800\ql\qnatural
\ls1\ilvl0{\field{*\fldinst{HYPERLINK “http://www.mycollegeexample.com/membermailsystem/readmessage.cfm?message=14562397&page=43&orderby=1&display=sent”}}{\fldrslt
\f0\b\fs24 \cf2 [ No Subject ] }}
\f0\fs24 \cf2
\pard\tx220\tx720\pardeftab720\li720\fi-720\sl800\ql\qnatural
\ls1\ilvl0
\b \cf2 {\field{*\fldinst{HYPERLINK “http://www.mycollegeexample.com/photoDisplay.cfm?mID=41861FILM&ph1=1&ph2=0&ph3=1&ph4=0&ph5=0&keepThis=true&TB_iframe=true&height=580&width=700”}}{\fldrslt
}}\ls1\ilvl0
\f1 \uc0\u8232 {\field{*\fldinst{HYPERLINK “http://www.mycollegeexample.com/Member_Profile.cfm?ID=41861FILM”}}{\fldrslt
\f0 J}}
\f0 on
\b0
15-Mar-2011
}

Second part has also a chinese font set (fcharset128 HiraKakuProN-W3; \f3\fnil\fcharset134 STHeitiSC-Light;o)

but I think the command line works with any charset so this should not be the problem as I did a test only with latin chars and the errors are the same

‘NSInvalidArgumentException’, reason: '(null): unrecognized selector sent to class

{\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf350
{\fonttbl\f0\fswiss\fcharset0 ArialMT;\f1\fnil\fcharset0 LucidaGrande;\f2\fnil\fcharset128 HiraKakuProN-W3;
\f3\fnil\fcharset134 STHeitiSC-Light;}
{\colortbl;\red255\green255\blue255;\red38\green38\blue38;}
{*\listtable{\list\listtemplateid1\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{*\levelmarker {none}}{\leveltext\leveltemplateid1'00;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid1}}
{*\listoverridetable{\listoverride\listid1\listoverridecount0\ls1}}
\paperw11900\paperh16840\margl1440\margr1440\vieww9000\viewh8400\viewkind0
\deftab720
\pard\tx220\tx720\pardeftab720\li720\fi-720\sl800\ql\qnatural
\ls1\ilvl0
\f0\b\fs24 \cf2 {\field{*\fldinst{HYPERLINK “http://www.mycollegeexample.com/membermailsystem/readmessage.cfm?message=27743&page=50&orderby=1&display=sent”}}{\fldrslt RE: Contact }}
\b0
\ls1\ilvl0
\b {\field{*\fldinst{HYPERLINK “http://www.mycollegeexample.com/photoDisplay.cfm?mID=41637FILM&ph1=1&ph2=1&ph3=1&ph4=1&ph5=1&keepThis=true&TB_iframe=true&height=580&width=700”}}{\fldrslt
}}\ls1\ilvl0
\f1 \uc0\u8232 {\field{*\fldinst{HYPERLINK “http://www.mycollegeexample.com/Member_Profile.cfm?ID=41637FILM”}}{\fldrslt
\f2 '8a'43
\f3 'c1'fa}}
\f0\b0
08-Mar-2011

}

Your sample RTF code works fine on my machine, and of course there is a NSAutoreleasePool.
What system version on what computer are you using?

I uploaded an Intel only version (same link)

I am using 10.6.7 and Apple script is

AppleScript 2.1.2

Version 2.3 (118)

I tried copying the same file from this site, reopening it and saving as RTF on Textedit but I still get the same errors on the result window:

Am I doing something wrong?

error “2011-05-05 17:26:35.652 RTFLinkParser[2500:60f] (null): unrecognized selector sent to class 0x7fff70a88698
2011-05-05 17:26:35.654 RTFLinkParser[2500:60f] *** __NSAutoreleaseNoPool(): Object 0x100111610 of class NSCFString autoreleased with no pool in place - just leaking
2011-05-05 17:26:35.654 RTFLinkParser[2500:60f] *** __NSAutoreleaseNoPool(): Object 0x100111660 of class NSException autoreleased with no pool in place - just leaking
2011-05-05 17:26:35.654 RTFLinkParser[2500:60f] *** __NSAutoreleaseNoPool(): Object 0x1001159f0 of class _NSCallStackArray autoreleased with no pool in place - just leaking
2011-05-05 17:26:35.654 RTFLinkParser[2500:60f] *** __NSAutoreleaseNoPool(): Object 0x100115a50 of class _NSCallStackArray autoreleased with no pool in place - just leaking
2011-05-05 17:26:35.655 RTFLinkParser[2500:60f] *** __NSAutoreleaseNoPool(): Object 0x100115d70 of class NSCFString autoreleased with no pool in place - just leaking
2011-05-05 17:26:35.655 RTFLinkParser[2500:60f] *** __NSAutoreleaseNoPool(): Object 0x100116890 of class NSCFString autoreleased with no pool in place - just leaking
2011-05-05 17:26:35.655 RTFLinkParser[2500:60f] *** __NSAutoreleaseNoPool(): Object 0x100115e60 of class NSConcreteMutableData autoreleased with no pool in place - just leaking
2011-05-05 17:26:35.655 RTFLinkParser[2500:60f] *** Terminating app due to uncaught exception ‘NSInvalidArgumentException’, reason: ‘(null): unrecognized selector sent to class 0x7fff70a88698’
*** Call stack at first throw:
(
0 CoreFoundation 0x00007fff824157b4 __exceptionPreprocess + 180
1 libobjc.A.dylib 0x00007fff83f820f3 objc_exception_throw + 45
2 CoreFoundation 0x00007fff8246f1a0 __CFFullMethodName + 0
3 CoreFoundation 0x00007fff823e791f forwarding + 751
4 CoreFoundation 0x00007fff823e3a68 _CF_forwarding_prep_0 + 232
5 RTFLinkParser 0x0000000100000a0a main + 44
6 RTFLinkParser 0x00000001000009bc start + 52
)
terminate called after throwing an instance of ‘NSException’” number 1006

it seems to be a compiling error.
Please try it again, I changed something in the compiler settings, same link

Thanks Stefan you are a hero

Danwan