Convert Microsoft Word to/from LaTeX

Hi,

I write research papers in latex, however, I miss the language spelling and grammar tools from Microsoft Word.
This leads me to copy-pasting between Word doc file and a tex file (with some inconvenience about some special characters not being accepted in latex).

With this in mind I decided to (try to) create a couple of applescripts that would convert some latex tags to their equivalent in Word (e.g., \textbf{} to bold, \section{} to style heading1, etc.) and vice-versa. However, I am really a newbie in Applescript…

So far, I have only accomplished to search-replace for bold, italic and quotes.
There is still much to be done, but I’m sure there’s someone here willing to help. Some latex users might find this specially useful. :slight_smile:


tell application "Microsoft Word"
	
	set curly to false -- change to true if curly quotes desired
	set wasSmartQuotes to auto format as you type replace quotes of settings
	set auto format as you type replace quotes of settings to curly
	set myFind to find object of selection
	tell myFind
		clear formatting myFind
		--replace ems and quotes
		execute find find text "”" replace with "--" replace replace all
		execute find find text "˜" replace with "`" replace replace all
		execute find find text "'" replace with "'" replace replace all
		execute find find text """ replace with "``" replace replace all
		execute find find text """ replace with "''" replace replace all
		
		-- format italic
		clear formatting
		set italic of font object to true
		execute find find text "" replace with "\\emph{^&}" replace replace all
		set italic of font object to false
		
		-- format bold
		clear formatting
		set bold of font object to true
		execute find find text "" replace with "\\textbf{^&}" replace replace all
		set bold of font object to false
		
	end tell
	set auto format as you type replace quotes of settings to wasSmartQuotes
end tell

Thanks in advance!

Priority things that I would like to know how to accomplish in Applescript:
- search with regex and replace with new text and new formatting
e.g., remove \textbf tag and format text as bold;
e.g., remove \section tag and format text as style Heading1
- replace ``TEXT’’ with curly quoted text

Can anyone help me…? :slight_smile:

Yes! I have almost the same issue. I just want to achieve some simple conversions into LaTeX mark-up from a Microsoft Word document. I have created my own Word styles called ‘section’, ‘subsection’, ‘quote’, ‘quotation’ with a view to doing this. The following should be simple but I have no idea how to do them in applescript:

Find “text” in the ‘section’ style, replace it with “\section{text}” in normal style.

Find “text” in the ‘subsection’ style, replace it with “\subsection{text}” in normal style.

Find “text” in the ‘quote’ style, replace it with “\begin{quote}text\end{quote}” in the normal style.

Find “text” in the ‘quotation’ style, replace it with “\begin{quotation}text\end{quotation}” in the normal style.

At the moment the best I can do, adding a couple of lines to iamtotal’s good work above, is:


tell application "Microsoft Word"
	
	set curly to false -- change to true if curly quotes desired
	set wasSmartQuotes to auto format as you type replace quotes of settings
	set auto format as you type replace quotes of settings to curly
	set myFind to find object of selection
	tell myFind
		clear formatting myFind
		--replace ens and quotes
		execute find find text "“" replace with "--" replace replace all
		execute find find text "˜" replace with "`" replace replace all
		execute find find text "'" replace with "'" replace replace all
		execute find find text """ replace with "``" replace replace all
		execute find find text """ replace with "''" replace replace all
		execute find find text "." replace with "\\ldots\\" replace replace all
		execute find find text "..." replace with "\\ldots\\" replace replace all
		
		
		-- format italic
		clear formatting
		set italic of font object to true
		execute find find text "" replace with "\\emph{^&}" replace replace all
		set italic of font object to false
		

		-- This is what I've added so far
		-- mark-up section headings
		clear formatting
		set style of format to style heading2
		execute find find text "" replace with "\\section{^&}" replace replace all
		
		clear formatting
		set style of format to style heading3
		execute find find text "" replace with "\\subsection{^&}" replace replace all
		
		
		
		-- format bold (commented out because I don't want it to interfere with the bold in headings)
		--clear formatting
		--set bold of font object to true
		--execute find find text "" replace with "\\textbf{^&}" replace replace all
		--set bold of font object to false
		
		
	end tell
	set auto format as you type replace quotes of settings to wasSmartQuotes
end tell

So there are two main problems I have at the moment.

  1. This finds “text” in the ‘Heading 2’ and ‘Heading 3’ styles because I don’t know how to specify my own Word styles. When I try

		-- mark-up section headings
		clear formatting
		set style of format to style section
		execute find find text "" replace with "\\section{^&}" replace replace all

I get an error message. I want the find/replace to work on my own Word styles.

  1. This doesn’t take the found italicized text out of italics. This is important to me because I want to be able to run the same script several times on the same document as I revise and add to it; as it stands it will put markup on an already marked up bit of italics, so that I end up with several layers of nesting of \emph{}. The same applies to taking text out of the ‘section’, ‘subsection’, ‘quote’ and ‘quotation’ styles.

Any help with this would be greatly appreciated.

I am part of the way towards solving my own problem here, though it doesn’t help much with the priority issues of the original post.

Here is what I can now do:

  • Find “text” in the ‘section’ style and replace with “\section{text}”

  • Find “text” in the ‘subsection’ style, replace it with “\subsection{text}”

  • Find “text” in the ‘quote’ style, replace it with “\begin{quote}text\end{quote}”

  • Find “text” in the ‘quotation’ style, replace it with “\begin{quotation}text\end{quotation}”

But I can’t yet do the replacements in normal style. The replacement text is still in the same style in which it was found.

I’ll update if I make any progress with this.


tell application "Microsoft Word"
	
	set curly to false -- change to true if curly quotes desired
	set wasSmartQuotes to auto format as you type replace quotes of settings
	set auto format as you type replace quotes of settings to curly
	set myFind to find object of selection
	tell myFind
		clear formatting myFind
		--replace ens and quotes
		execute find find text "“" replace with "--" replace replace all
		execute find find text "˜" replace with "`" replace replace all
		execute find find text "'" replace with "'" replace replace all
		execute find find text """ replace with "``" replace replace all
		execute find find text """ replace with "''" replace replace all
		execute find find text "." replace with "\\ldots\\" replace replace all
		execute find find text "..." replace with "\\ldots\\" replace replace all
		
		
		-- format italic
		clear formatting
		set italic of font object to true
		execute find find text "" replace with "\\emph{^&}" replace replace all
		set italic of font object to false
		
		-- mark-up section headings
		clear formatting
		set style of format to "section"
		execute find find text "" replace with "\\section{^&}" replace replace all
		
		clear formatting
		set style of format to "subsection"
		execute find find text "" replace with "\\subsection{^&}" replace replace all
		
		clear formatting
		set style of format to "subsubsection"
		execute find find text "" replace with "\\subsubsection{^&}" replace replace all
		
		-- mark-up quotes and quotations
		clear formatting
		set style of format to "quote"
		execute find find text "" replace with "\\begin{quote}^&\\end{quote}" replace replace all
		
		clear formatting
		set style of format to "quotation"
		execute find find text "" replace with "\\begin{quotation}^&\\end{quotation}" replace replace all
		
		
		-- format bold
		--clear formatting
		--set bold of font object to true
		--execute find find text "" replace with "\\textbf{^&}" replace replace all
		--set bold of font object to false
		
		
	end tell
	set auto format as you type replace quotes of settings to wasSmartQuotes
end tell

Update.

I’ve now fixed the script to do everything I wanted. It’s possible to run and re-run the script on the same document without worrying about marking up the same piece of text twice.

One warning though: the ‘quote’ style find/replace is a bit hit and miss for no reason I can fathom. I think it has to do with a built-in Word style called ‘Quote’. Working with a user style called ‘latexquote’ instead seems to get around this sometimes; sometimes the latexquote-styled text isn’t marked up (but still taken out of style). If you use this, keep an eye on the ‘latexquote’ style.

(If this seems inefficient, AppleScript didn’t like handling ‘replacement’ inside of a ‘tell MyFind’ block, so I ended up repeating ‘myFind’ a lot to get around this.)

Script as follows:


tell application "Microsoft Word"
	
	set curly to false -- change to true if curly quotes desired
	set wasSmartQuotes to auto format as you type replace quotes of settings
	set auto format as you type replace quotes of settings to curly
	set myFind to find object of selection
	tell myFind
		clear formatting myFind
		--replace ens and quotation marks
		execute find find text "“" replace with "--" replace replace all
		execute find find text "˜" replace with "`" replace replace all
		execute find find text "'" replace with "'" replace replace all
		execute find find text """ replace with "``" replace replace all
		execute find find text """ replace with "''" replace replace all
		execute find find text "." replace with "\\ldots\\" replace replace all
		execute find find text "..." replace with "\\ldots\\" replace replace all
	end tell
	
	
	-- mark up italics and take out of italics
	clear formatting of myFind
	set italic of font object of myFind to true
	set content of myFind to ""
	clear formatting replacement of myFind
	set content of replacement of myFind to "\\emph{^&}"
	set italic of font object of replacement of myFind to false
	execute find myFind replace replace all
	
	
	-- mark-up section, subsection, subsubsection headings
	clear formatting of myFind
	set style of myFind to "section"
	set content of myFind to ""
	clear formatting replacement of myFind
	set content of replacement of myFind to "\\section{^&}"
	set style of replacement of myFind to "normal"
	execute find myFind replace replace all
	
	clear formatting of myFind
	set style of myFind to "subsection"
	set content of myFind to ""
	clear formatting replacement of myFind
	set content of replacement of myFind to "\\subsection{^&}"
	set style of replacement of myFind to "normal"
	execute find myFind replace replace all
	
	clear formatting of myFind
	set style of myFind to "subsubsection"
	set content of myFind to ""
	clear formatting replacement of myFind
	set content of replacement of myFind to "\\subsubsection{^&}"
	set style of replacement of myFind to "normal"
	execute find myFind replace replace all
	
	
	-- mark-up block quotes and quotations
	clear formatting of myFind
	set style of myFind to "latexquote" -- wanted to use "quote" as my own style but it seems to interfere with a built-in style of the same name
	set content of myFind to ""
	clear formatting replacement of myFind
	set content of replacement of myFind to "\\begin{quote}^&\\end{quote}"
	set style of replacement of myFind to "normal"
	execute find myFind replace replace all
	
	clear formatting of myFind
	set style of myFind to "quotation"
	set content of myFind to ""
	clear formatting replacement of myFind
	set content of replacement of myFind to "\\begin{quotation}^&\\end{quotation}"
	set style of replacement of myFind to "normal"
	execute find myFind replace replace all
	
	
	set auto format as you type replace quotes of settings to wasSmartQuotes
end tell

–>From Microsoft Word Doc to XeLateX

I was searching for something similar and I thought I would post a handler I just made to convert Microsoft Word Document style Headings to TeX style Sections. I hope this helps someone! It is quite flexible, because it supports custom heading names and custom latex commands too!


texifyHeading("Heading 1", "section")
texifyHeading("Heading 2", "subsection")
texifyHeading("Heading 3", "subsubsection")

on texifyHeading(word_style, tex_style)
	-- This handler is used to find and replace headings in Word with Latex code. It does not use the normal replace part of Word's find, because that does not support remove the trailing paragraph character (to my knowledge). Instead, it temporarily store the contents of the find object (selection) and separates the trailing paragraph character by using text item delimiters, then sets contents of of the find object with ending "}".
	tell application "Microsoft Word"
		set myFind to find object of selection
		-- mark-up section, subsection, subsubsection headings
		clear formatting of myFind
		set forward of myFind to true
		set wrap of myFind to find continue
		set style of myFind to word_style --of active document
		set content of myFind to ""
		execute find myFind
		if found of myFind is true then
			set orig_heading to (get content of text object of selection)
			set orig_delims to AppleScript's text item delimiters
			set AppleScript's text item delimiters to ASCII character 13 -- (a carriage return)
			set orig_heading_items to text items of orig_heading
			set sec_text to item 1 of orig_heading_items
			set AppleScript's text item delimiters to orig_delims
			set content of text object of selection to "\\" & tex_style & "{" & sec_text & "}" & (ASCII character 13)
			set style of text object of selection to "normal"
			set bold of text object of selection to false
		end if
	end tell
end texifyHeading

Model: MacBook Air 13" 2012
AppleScript: 2.2.1
Browser: Firefox 9.0.1
Operating System: Mac OS X (10.7)