Pastebin launched a little side project called VERYVIRAL.com, check it out ;-) Want more features on Pastebin? Sign Up, it's FREE!
Guest

Using pandoc to programmatically convert LaTeX to plain text

By: andybuckley on Jun 10th, 2013  |  syntax: Python  |  size: 3.12 KB  |  views: 80  |  expires: Never
download  |  raw  |  embed  |  report abuse  |  print
Text below is selected. Please press Ctrl+C to copy to your clipboard. (⌘+C on Mac)
  1. def detex(tex):
  2.     """Use pandoc (if available) to modify LaTeX text strings
  3.    for use as plain text, e.g. as printed to the terminal. Taken
  4.    from original code in Rivet (http://rivet.hepforge.org), hence
  5.    the particle physics bias in the macros used to do the replacements!
  6.  
  7.    TODO: Maybe group many strings to be processed together, to
  8.    save on system call / pandoc startup?
  9.    """
  10.  
  11.     from distutils.spawn import find_executable
  12.     if not find_executable("pandoc"):
  13.         return tex
  14.     texheader = r"""
  15.    \newcommand{\text}[1]{#1}
  16.    \newcommand{\ensuremath}[1]{#1}
  17.    \newcommand{\emph}[1]{_#1_}
  18.    \newcommand{\textrm}[1]{#1}
  19.    \newcommand{\textit}[1]{_#1_}
  20.    \newcommand{\textbf}[1]{*#1*}
  21.    \newcommand{\mathrm}[1]{#1}
  22.    \newcommand{\mathit}[1]{_#1_}
  23.    \newcommand{\mathbf}[1]{*#1*}
  24.    \newcommand{\bm}[1]{*#1*}
  25.    \newcommand{\frac}[2]{#1/#2}
  26.    \newcommand{\sqrt}[1]{sqrt(#1)}
  27.    \newcommand{\hat}[1]{#1hat}
  28.    \newcommand{\bar}[1]{#1bar}
  29.    \newcommand{\d}[1]{d#1}
  30.    \newcommand{\degree}{^\circ }
  31.    \newcommand{\infty}{oo }
  32.    \newcommand{\exp}{exp }
  33.    \newcommand{\log}{log }
  34.    \newcommand{\ln}{ln }
  35.    \newcommand{\sin}{sin }
  36.    \newcommand{\cos}{cos }
  37.    \newcommand{\tan}{tan }
  38.    \newcommand{\sinh}{sinh }
  39.    \newcommand{\cosh}{cosh }
  40.    \newcommand{\tanh}{tanh }
  41.    \newcommand{\ell}{l}
  42.    \newcommand{\varphi}{\phi}
  43.    \newcommand{\varepsilon}{\epsilon}
  44.    \newcommand{\sim}{~}
  45.    \newcommand{\lesssim}{<~ }
  46.    \newcommand{\gtrsim}{>~ }
  47.    \newcommand{\neq}{!= }
  48.    \newcommand{\ge}{>= }
  49.    \newcommand{\gg}{>> }
  50.    \newcommand{\le}{<= }
  51.    \newcommand{\ll}{<< }
  52.    \newcommand{\pm}{+- }
  53.    \newcommand{\mp}{-+ }
  54.    \newcommand{\times}{x }
  55.    \newcommand{\cdot}{. }
  56.    \newcommand{\langle}{<}
  57.    \newcommand{\rangle}{>}
  58.    \newcommand{\gets}{<- }
  59.    \newcommand{\to}{-> }
  60.    \newcommand{\leftarrow}{<- }
  61.    \newcommand{\rightarrow}{-> }
  62.    \newcommand{\leftrightarrow}{<-> }
  63.    \newcommand{\Leftarrow}{<= }
  64.    \newcommand{\Rightarrow}{=> }
  65.    \newcommand{\Leftrightarrow}{ }
  66.    \newcommand{\left}{}
  67.    \newcommand{\right}{}
  68.    \newcommand{\!}{}
  69.    \newcommand{\/}{}
  70.    \newcommand{\rm}{}
  71.    \newcommand{\it}{}
  72.    \newcommand{\,}{ }
  73.    \newcommand{\;}{ }
  74.    \newcommand{\ }{ }
  75.    \newcommand{\unit}[2]{#1 #2}
  76.    \newcommand{\bar}[1]{#1bar}
  77.    \newcommand{\pT}{pT }
  78.    \newcommand{\perp}{T}
  79.    \newcommand{\MeV}{MeV }
  80.    \newcommand{\GeV}{GeV }
  81.    \newcommand{\TeV}{TeV }
  82.    """
  83.     import subprocess, shlex
  84.     p = subprocess.Popen(shlex.split("pandoc -f latex -t plain --no-wrap"),
  85.                          stdin=subprocess.PIPE, stdout=subprocess.PIPE)
  86.     plain, err = p.communicate((texheader + tex).replace("\n", ""))
  87.     plain = plain.replace(r"\&", "&")
  88.     # TODO: Replace \gamma, \mu, \tau, \Upsilon, \rho, \psi, \pi, \eta, \Delta, \Omega, \omega?
  89.     # TODO: Replace e^+- -> e+-?
  90.     return plain[:-1] if plain[-1] == "\n" else plain
  91.  
  92. # print detex(r"Foo \! $\int \text{bar} \d{x} \sim \; \frac{1}{3} \neq \emph{foo}$ \to \gg bar")
  93. # => 'Foo \int bar dx ~ 1/3 != _foo_ -> >> bar'