Wednesday, June 26, 2019

#1 2019-01-07 12:07:44 pm

danwan
Member
Registered: 2008-03-21
Posts: 205

Extract Hyperlinks from RTF and save as readable plaintext

I have a folder with ONLY textedit rtf files.

Is there a way in applescript to choose the folder where they are located to then export the various Hyperlink as readable text and save each file in to a folder of choice?

Where "website" and "email" become "http://www.site.com" and "address@site.com"

Converting as plain text I lose the hyperlinks



The files include short text I'd like to keep and these links.

Here an example:

This Site (with Hyperlink)
Submission Date 
December 12, 2018
Notification Date 
October 1, 2019
Event Date 
October 25, 2019
Tracking Number 
SIFF4964
       
        Email  (with Hyperlink)
       
        Website  (Hyperlink)

I hope to get a plain text file as such

Site.com
Submission Date 
December 12, 2018
Notification Date 
October 1, 2019
Event Date 
October 25, 2019
Tracking Number 
SIFF4964
address@site.com
http://www.site.com/


Thanks a lot


Filed under: TextEdit

Offline

 

#2 2019-01-07 01:29:28 pm

t.spoon
Member
From:: BFE, Massachusetts
Registered: 2013-01-13
Posts: 397

Re: Extract Hyperlinks from RTF and save as readable plaintext

Yes. I'd just read the RTF into an Applescript variable using Applescript's "read" command, which will give you the plain text markup behind the RTF.

Example, I made this RTF:

Blah Blah web Hyperlink blah blah email



With the "Hyperlink" and "email" being hyperlinks. Saved as RTF.

Then Applescripted this:

Applescript:


set rawRTFtext to read alias [file path]

and now the contents of the rawRTFtext variable is:

{\rtf1\ansi\ansicpg1252\cocoartf1504\cocoasubrtf830
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww12600\viewh7800\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0

\f0\fs24 \cf0 Blah Blah web {\field{\*\fldinst{HYPERLINK "http://www.google.com"}}{\fldrslt Hyperlink}} blah blah {\field{\*\fldinst{HYPERLINK "mailto:somebody@somewhere.com"}}{\fldrslt email}}}



From there, you can use vanilla Applescript, or regex via terminal or ASObjC or whatever, to separate out the link addresses via the preceding delimiter:

{\*\fldinst{HYPERLINK "


and the terminating delimiter

"}}{\fldrslt



Then format as needed and save as a .txt file wherever you like.

Last edited by t.spoon (2019-01-07 01:32:14 pm)


Hackintosh built February, 2012 |  Mac OS Sierra
GIGABYTE GA-Z68X-UD3H-B3 | Core i5 2500k | 16 GB DDR3 | GIGABYTE Geforce 1050 TI 4GB
250 GB Samsung 850 EVO | 4 TB RAID
Dell Ultrasharp U3011 | Dell Ultrasharp 2007FPb

Offline

 

#3 2019-01-07 04:10:21 pm

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4884

Re: Extract Hyperlinks from RTF and save as readable plaintext

Post and script deleted as seeing Shane's script below made me realise I'd misread the query.

Last edited by Nigel Garvey (2019-01-08 02:06:59 am)


NG

Offline

 

#4 2019-01-07 06:10:48 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5708

Re: Extract Hyperlinks from RTF and save as readable plaintext

Here's an approach that deals with the text as RTF:

Applescript:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

set thePath to "/Users/shane/Desktop/Test.rtf"
-- read file into attributed string
set theURL to current application's NSURL's fileURLWithPath:thePath
set {attString, theError} to current application's NSAttributedString's alloc()'s initWithURL:theURL options:(missing value) documentAttributes:(missing value) |error|:(reference)
-- get elngth so we can start from the end
set start to (attString's |length|()) - 1
-- make plain string copy to work on
set theString to attString's |string|()'s mutableCopy()
repeat
   -- find link
   set {aURL, theRange} to attString's attribute:(current application's NSLinkAttributeName) atIndex:start effectiveRange:(reference)
   if aURL is not missing value then
       -- get linked text
       set linkText to theString's substringWithRange:theRange
       if aURL's |scheme|()'s isEqualToString:"mailto" then -- email address
           set newLink to aURL's resourceSpecifier()
       else if linkText's containsString:"This Site" then -- resource specifier, remove //
           set newLink to aURL's resourceSpecifier()'s substringFromIndex:2
       else -- full URL
           set newLink to aURL's absoluteString()
       end if
       -- replace link
       theString's replaceCharactersInRange:theRange withString:newLink
   end if
   set start to (location of theRange) - 2
   if start < 0 then exit repeat
end repeat
return theString as text


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#5 2019-01-08 01:10:52 pm

danwan
Member
Registered: 2008-03-21
Posts: 205

Re: Extract Hyperlinks from RTF and save as readable plaintext

Thanks Shane
your script works perfectly
However which steps do I need to add so the script will scan each files in my folder (I have 450 files there)

/Users/danwan/Desktop/RTFFiles

and convert the result in a folder of choice keeping each file original name 

/Users/danwan/Desktop/RTFFilesConverted

Kind regards


Danwan

Browser: Safari 537.36
Operating System: macOS 10.12

Offline

 

#6 2019-01-08 10:20:15 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5708

Re: Extract Hyperlinks from RTF and save as readable plaintext

What code have you written already?


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#7 2019-01-09 07:13:21 am

danwan
Member
Registered: 2008-03-21
Posts: 205

Re: Extract Hyperlinks from RTF and save as readable plaintext

Thanks for your time and reply

I didn't add anything as I am not skilled enough to do as follows:

1 ask the script to select the folder with my 400+files
2 create a new file with the same name as the one the script convert to plaintext
3 export the result to this new file
4 save the new file in a folder of choice


Regards

Last edited by danwan (2019-01-09 10:59:04 am)

Offline

 

#8 2019-01-09 05:24:08 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5708

Re: Extract Hyperlinks from RTF and save as readable plaintext

I think perhaps you misunderstand the role of MacScripter. It's a help facility, not a free script writing service. This should help you get started:

Applescript:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

set theFolder to choose folder -- choose the folder containing the .rtf files
tell application id "com.apple.finder" -- Finder
   set theFiles to every file of theFolder as alias list
end tell
repeat with aFile in theFiles
   if (theFile as text) ends with ".rtf" then
       set theURL to (current application's NSURL's fileURLWithPath:(POSIX path of aFile))
       set {attString, theError} to (current application's NSAttributedString's alloc()'s initWithURL:theURL options:(missing value) documentAttributes:(missing value) |error|:(reference))
       -- get elngth so we can start from the end
       set start to (attString's |length|()) - 1
       -- make plain string copy to work on
       set theString to attString's |string|()'s mutableCopy()
       repeat
           -- find link
           set {aURL, theRange} to (attString's attribute:(current application's NSLinkAttributeName) atIndex:start effectiveRange:(reference))
           if aURL is not missing value then
               -- get linked text
               set linkText to (theString's substringWithRange:theRange)
               if (aURL's |scheme|()'s isEqualToString:"mailto") then -- email address
                   set newLink to aURL's resourceSpecifier()
               else if (linkText's containsString:"This Site") then -- resource specifier, remove //
                   set newLink to (aURL's resourceSpecifier()'s substringFromIndex:2)
               else -- full URL
                   set newLink to aURL's absoluteString()
               end if
               -- replace link
               (theString's replaceCharactersInRange:theRange withString:newLink)
           end if
           set start to (location of theRange) - 2
           if start < 0 then exit repeat
       end repeat
       set newFile to (theURL's URLByDeletingPathExtension()'s URLByAppendingPathExtension:"text")
(theString's writeToURL:newFile atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value))
   end if
end repeat

Last edited by Shane Stanley (2019-01-11 05:11:51 am)


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#9 2019-01-10 06:41:35 am

danwan
Member
Registered: 2008-03-21
Posts: 205

Re: Extract Hyperlinks from RTF and save as readable plaintext

I am sorry for misunderstanding how to use MacScripter
It won't happen again.
Regards and thanks a lot for your time

Offline

 

#10 2019-01-11 05:09:37 am

StefanK
Member
From:: St. Gallen, Switzerland
Registered: 2006-10-21
Posts: 11563
Website

Re: Extract Hyperlinks from RTF and save as readable plaintext

Shane, you probably mean

Applescript:

(theString's writeToURL:newFile atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value))


regards

Stefan

Offline

 

#11 2019-01-11 05:12:25 am

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5708

Re: Extract Hyperlinks from RTF and save as readable plaintext

Thanks, Stefan -- I've edited the original accordingly.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#12 2019-01-11 05:58:28 am

danwan
Member
Registered: 2008-03-21
Posts: 205

Re: Extract Hyperlinks from RTF and save as readable plaintext

Thanks To Shane and StefanK for their time and help.
This is the final fully working script.
Fixed as StefanK suggested, and another small typo.
This could be very useful to other "not very skilled as myself" in ObjectiveC and Applescript users dealing with rtf files when in need to convert hyperlinks to plain text while keeping plain text as it is in the rtf file.

Applescript:


use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

set theFolder to choose folder -- choose the folder containing the .rtf files
tell application id "com.apple.finder" -- Finder
   set theFiles to every file of theFolder as alias list
end tell
repeat with aFile in theFiles
   if (theFiles as text) ends with ".rtf" then
       set theURL to (current application's NSURL's fileURLWithPath:(POSIX path of aFile))
       set {attString, theError} to (current application's NSAttributedString's alloc()'s initWithURL:theURL options:(missing value) documentAttributes:(missing value) |error|:(reference))
       -- get elngth so we can start from the end
       set start to (attString's |length|()) - 1
       -- make plain string copy to work on
       set theString to attString's |string|()'s mutableCopy()
       repeat
           -- find link
           set {aURL, theRange} to (attString's attribute:(current application's NSLinkAttributeName) atIndex:start effectiveRange:(reference))
           if aURL is not missing value then
               -- get linked text
               set linkText to (theString's substringWithRange:theRange)
               if (aURL's |scheme|()'s isEqualToString:"mailto") then -- email address
                   set newLink to aURL's resourceSpecifier()
               else if (linkText's containsString:"This Site") then -- resource specifier, remove //
                   set newLink to (aURL's resourceSpecifier()'s substringFromIndex:2)
               else -- full URL
                   set newLink to aURL's absoluteString()
               end if
               -- replace link
               (theString's replaceCharactersInRange:theRange withString:newLink)
           end if
           set start to (location of theRange) - 2
           if start < 0 then exit repeat
       end repeat
       set newFile to (theURL's URLByDeletingPathExtension()'s URLByAppendingPathExtension:"txt")
       (theString's writeToURL:newFile atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value))
   end if
end repeat

Browser: Safari 537.36
Operating System: macOS 10.12

Last edited by danwan (2019-01-11 06:04:30 am)

Offline

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)