How to properly use PyPlots in LaTeX

In the series “How to waste time with LaTeX, pt. $\infty+1$” here’s another one. Todays mission: Embed plots done in Python into a LaTeX document, have it scalable, whilst keeping fonts and styling consistent throughout the entire document.

That sounds very easy, right? Well, I soon found out it isn’t.


I’ve done most of my development work and analyses in Python. There are already quite a few plots I’d like to just export. Plotting is done with Matplotlibs’ PyPlot. It even provides a PGF Backend. How great is that! One can simply save the plots into PGF files and include them (provided the presence of according packages) like any other graphic. Work’s great, looks good and has proper support of PyPlot.

Okay, so where’s the problem? I might like to scale images from within LaTeX. That’s where it gets tricky, since, even though it’s only a textual description of the image, the entire thing gets scaled – including fonts. You might find loads of tricks and hacks how to compensate that, but I found, that it’s quite error prone.

Oh, and one more thing: If you don’t care about too complex stuff: It might be easier to use the vanilla PGF Backend and hardcode the size of your graphic as you want it + all the font sizes and styles. All the stuff below happened because I want that done by LaTeX!

Save PyPlots to TikZ

Here’s what I came up with. It’s still a work in progress, but it works good for now and the results are very acceptable.

First, let’s look at the setup. I have my LaTeX stuff all in one folder, with sub-directories build (how want’s to spam the source folder…) and graphics. Within the latter, there’s a data directory with files filled with numbers of all sorts (csv-ish) and a scripts directory containing small scripts that “make” plots. Obviously I don’t want to run all these scripts individually by hand and, as stated in the preamble, I want consistent styling. That’s all done by a wrapper script + a small “library”.

from os import listdir
import os.path

from matplotlib import pyplot as plt
from matplotlib import style

# that helps you import a module from a variable
from importlib import import_module

# magic library
from matplotlib2tikz import save as tikz_save

scriptfolder = './scripts'
graphicsfolder = './'
datafolder = './data'

pltscript_names = [f.replace('.py','') for f in listdir(scriptfolder) if f.endswith('.py') and f != '' and f != '']
pltscripts = [os.path.join(scriptfolder, f) for f in pltscript_names]

for name, ps in zip(pltscript_names, pltscripts):
    print('Creating', name,'(',ps,')')

    plt.clf() # clear the plot from what was on it before
    fig = plt.figure()
    # you could use that default styling

    # "convert" the file path to a python module name
    psmodule = import_module(ps.replace('./', '').replace('/','.'), 'Plot')

    # here's where the actual plotting happens
    psmodule.plot(fig, plt, os.path.abspath(datafolder))
    # fix boundaries

    plt.savefig(os.path.join(graphicsfolder, name+'.png'))
    tikz_save(os.path.join(graphicsfolder, name+'.tikz'),
              figureheight = '\\figureheight',
              figurewidth = '\\figurewidth',

So what does that do? Well, it gathers all files within that scripts folder and cycles through them, calling the plot function and saving the graphic as *.tikz, as well as *.png (just so I can scroll through the folder and look at all of them). You might have noticed that  matplotlib2tikz module. That basically is most of the magic I promised. Internally it iterates over all objects in the plot and translates them into tikzpicture code. You’ll see later on, what this height/width stuff is about.

Here’s my small library, that is used by the plot scripts. Throughout my thesis I’m referring to four classes of “things” and compare them. The plot style should obviously be the same, so eventually I could drop the legend on each individual figure. The convention is (for now) to use one line style per “thing” and  one marker per metric (be it Precision, Recall, F1-Score, …; not implemented yet). I can easily extend the style for my “things”. Here’s that library thing, there will be more in the future.

defaultstyle = {
        'c': 'black'
styles = {
    'title': {
        'ls': '-'
    'author': {
        'ls': '--'
    'date': {
        'ls': '-.'
    'unassigned': {
        'ls': ':'

def get_style(field):
    style = defaultstyle.copy()
    return style
def quadplot(plt, x, ys):
    for field, y in zip(['Title','Author','Date','Unassigned'], ys):
        plt.plot(x, y, label=field, **get_style(field.lower()))

def add_grid(plt, axis='y'):
    :param axis can be ‘both’, ‘x’, or ‘y’ (default) to control which set of gridlines are drawn.
    plt.grid(color='darkgrey', linestyle=':', axis=axis)

That’s straight forward I think. You’ll see it in action in a sample plot script:

import pandas as pd
import os.path
import scripts.pltutils as pltutils

class Plot:
    def info(self):
        print('Eval Reduction Size with class weights')

def plot(fig, plt, datafolder):
    frm = pd.read_csv(os.path.join(datafolder,'eval_reduction_size_weights'), sep='\t')

                      list(frm['reduction']), [list(frm['title.2']), list(frm['author.2']), 
                                               list(frm['date.2']), list(frm['unassigned.2'])])
    pltutils.add_grid(plt, axis='both')

Standard PyPlot stuff. It doesn’t need any further explanation I guess. Maybe one thing: Note, that the legend gets no parameters. That’s intentional, the positioning is done in LaTeX, since PyPlot doesn’t support positioning the legend outside the plot. Okay, I hear you screaming… It can be done, yes, but only by fiddling with funny offsets. Try it, it won’t work as expected in LaTeX. Now that I run that wrapper script, this and other plot scripts will be executed. Each of those will produce graphics with the script name.

Okay let’s move on to the LaTeX side of things. Or should I say LuaLaTeX? Sorry, should have mentioned that earlier: I use Lua… I think, that it might work (with some adjustments) in plain LaTeX, too. In my preamble I have this:

% pfg image stuff
% this makes x/y ticks a bit smaller
\pgfplotsset{every tick label/.append style={font=\footnotesize}}
% position the legends
\pgfplotsset{legend pos=outer north east}
% make gridlines thinner
\pgfplotsset{major grid style={loosely dotted, thin, gray}}

% some pgfkey magic, so I can use named params
% for help, check this:
  /includetikzgraphic/.is family, /includetikzgraphic,
  default/.style = {
    width = \textwidth,
    ratio = 0.61803398875},
  width/.estore in = \itgWidth,
  ratio/.estore in = \itgRatio
% remember these from Python?

% make a command, that works like \includegraphics
  \pgfkeys{/includetikzgraphic, default, #1}

The comments should explain most of what happens. Height and width is now set dynamically. The Height results from the width multiplied by the golden ratio. Through a parameter, that can be changed in individual cases. Let’s put it in action and see how it works:

  \caption{A tikzpicture plot with Python}

Ahhhh, great. All the dirty stuff is left behind, and from within the document it’s almost not noticeable what happened earlier. Here’s the result:

Screenshot from 2016-07-28 13:08:30

Possible Improvements

Not everything you do in PyPlot is translated (properly) into TikZ, or it can’t. Also, not everything TikZ can do, PyPlot can. Just compare the variety of line styles (Section 4.7.2) that could be possible, whereas PyPlot only gives you four options. I’m tempted to somehow find ways to utilise this. After all, the goal here is not to match PyPlot exactly, it’s to have LaTeX handle all the rendering!

Also, the LaTeX preamble code is messy. That needs cleanup and there could be more features. For now it’s fine though. The legend seems to screw with the boundaries used for the width though. Since I’m dropping those anyway, I don’t care.


Okay, I hope you found some of the information helpful. Maybe you want to share your “plot p(y|i)peline” or see improvements in what I presented above.

Thanks for reading.

Using BibLaTeX in Texmaker

Just the other day I was trying to use BibLaTeX over BibTeX, because I heard good things about it. My primary motivation to give it a try was, that citing web-resources seemed much easier.

The way it is supposed to work is quite straight forward: You include the BibLaTex package, tell it which bibliography backend to use (i.e. Biber or BibTex) and it creates an auxiliary bibliography, which is referenced then in the aux file of your master document.

There were however two things I really wanted to hold on to: I love Texmaker so obviously I want the quick build button to work as usual. Furthermore I hate clutter in my working directory, so a separate build directory is key.

Assume the following example files: The master.tex where all my packages are loaded and usually a lot of other stuff is happening. Usually the actual content of my work will be scattered across separate tex files. Additionally there’s the trusty bibliography file, in this case called bib.bib.

Using the default quick build (set to use pdflatex + bibtex + pdflatex + pdflatex + view) did not work. After some troubleshooting I figured out where the problem lies. Remember I mentioned biblatex references to the auxiliary bibliography file (master-blx.bib) from the master.aux? Those lines look something like that:


The bibliography data, which bibtex will look for are the original bibliography and the auxiliary bibliography generated by BibLaTeX. However, the master-blx.bib sits nicely in the build directory as it should, but bibtex looks for both files in the same directory. After skimming over a bunch of pages of the package documentation I found no solution, which enabled me to have this reference manipulated. Neither did bibtex have a command line option to indicate different file locations to look for. Part of the solution is to look at placed we can touch: It’s possible to use \bibliography{../bib.bib} over \bibliography{bib}. Doing that, you need to run the bibtex step from within the build directory.

Problem solved, right? Not quite. Texmaker did not accept any attempts to customize the bibtex command in a way to accept either environment variables or cd commands or any other stuff I tried. What did end up working though, was writing a shell script that does what I want and put it into a random custom command and choosing the quick build setting to execute it and viewing the resulting PDF as shown in the screenshots.

screen2 screen1

The shell script (

pdflatex -output-directory=build -synctex=1 -interaction=nonstopmode master.tex
cd build/; bibtex master.aux; cd ..
pdflatex -output-directory=build -synctex=1 -interaction=nonstopmode master.tex
pdflatex -output-directory=build -synctex=1 -interaction=nonstopmode master.tex

The latex document (master.tex):




LaTeX is a high-quality typesetting system \cite{latex}.


The bibliography file with a web reference (bib.bib):

    author = {LaTeX project team},
    title  = {LaTeX - A document preparation system},
    publisher={Frank Mittelbach},
    urldate   = {2016-02-13},
    url    = {}

That’s it! Additional thoughts might be to place the somewhere else and point the command to an absolute path to that file, which would then be project independent. The script would have to take parameters (the master file name and the absolute path to the project directory), which you could achieve with Texmakers special characters.

I hope that helps, thanks for reading!

PhantomJS in a Bash Loop

The problem I recently tackled was quite simple: I have a list of URLs (+ some additional information) in a tab separated file (.tsv) and want to run a PhantomJS script for each line with some parameters, which are provided in the file mentioned.

So initially it seems quite obvious to just open the file and use while read -r -a line to do that. That reads the next line in the file, splits it on tabs and I can use it as an array in the body of the loop – works like a charm! However, as soon as I added my PhantomJS script, the loop stops after the first iteration. After hours of testing different ways to read the file and figure out what happens, it seemed clear, that there was just no line to read after Phantom was done.

Just to show how messy it ended up:

# header of tsv file
#  0     1       5 
# url	id ... title   ...
  #skip header
  # set separator to TAB
  # read file line by line
  while read -r -a row
    hash=`echo -n "${row[0]}" | md5sum | awk '{ print $1 }'`
    phantomjs phantomscript.js "${row[0]}" "${row[1]}" "${hash}" "${row[5]}"
    echo "(${COUNTER} => ${hash}) ${row[0]}"

    # after that, nothing else happens, script stops
    # (obviously there's one more attempt to read, but the result is empty)
} < ${FILE}

I’m not too subtle when it comes to this kind of stuff, so frustration kicked in. Just after I started offloading the file parsing and looping into my phantom script, which would have implied lots of extra work, I decided to give it one more shot with a different approach, this time with no concern for elegance nor good practice.

The idea was: instead of reading a file line by line, take a for loop and use sed to read one specific line at a time. The following example worked as expected:

NUM_LINES=`cat ${FILE} | wc -l`
    LINE=`sed "${COUNTER}q;d" ${FILE}`
    IFS=$'\t' read -r -a row <<< "${LINE}"
    hash=`echo -n "${row[0]}" | md5sum | awk '{ print $1 }'`
    phantomjs phantomscript.js "${row[0]}" "${row[1]}" "${hash}" "${row[5]}"

    echo "(${COUNTER} => ${hash})"

I guess the key components are quite clear:
First you need the range of the for loop (starting at 2, to skip the header). Secondly get the current line with sed "NUMq;d" FILE, whereas NUM is the line you want to fetch (start counting at one, not zero) (details).
To get the line parsed into an array, you can still use read -r -a (details) as before in a slightly different way.

One tip along the way: In case you ever want to force-stop the script (CTRL+C), you will notice, it will continue anyway (because the signal is trapped by PhantomJS).
A simple fix is to set up the trap in your shell script with trap "exit" INT (details).

Maybe one more: What if there was an error processing the page? You probably want to keep track of those cases to take a look later on. Simply use phantom.exit(1) (or any other non-zero exit code) in your phantom script and, within your loop, right after the phantomjs call, add the magic line:
if [[ $? != 0 ]] ; then echo ${LINE} >> ${FAILED_FILE} ; fi.

I hope that helps someone somewhere out there.

Thanks for reading!

Berlin Triathlon 2015

Yesterday it was time for the first Triathlon of the year. After a very efficient winter-training with the Swansea Uni triathlon club I got very lazy and actually didn’t swim till last Tuesday, just a few days before the race weekend. Surprisingly that went quite okay… With the limited amount of cycling kilometres and poor run training after the marathon the sprint triathlon in Berlin Treptow was a big gamble.

Because of massive delays in the race schedule I had plenty of time to watch the other races and walk around to take a few photos.

IMG 20150607 130543 IMG 20150607 131230

Here the last wave of Olympic distance starters came in from their second lap around the “Insel der Jugend”. To get a better perspective and some shade I walked over the bridge to the island.

IMG 20150607 131758 IMG 20150607 131758-PANO

In the first image you see the view in swim direction. You might even spot the first buoy. Looking in the other direction, there’s the start line (water start) and on the right side behind all the boats, there is the transition area.

IMG 20150607 132052 IMG 20150607 132033

From the other side I got a nice view over the last athletes making their way to the swim exit. Fortunately enough the water still was cold enough for a wet suite even though it was very warm outside.

IMG 20150607 143955

For me the race started at 3pm. The swim went surprisingly well and I wasn’t even close to being the last one out of the water. This also is a bit disappointing thinking about the little effort I put in for training… Probably it is because I finally got the freestyle technique right and swam through without a break. Okay, to be honest I had to stop once to clear my goggles as they completely fogged up.

The ride went smoothly as well. I took the first two of the five laps alone and let everyone behind me. Only then I found another guy about my abilities and we took it in turns together (notice: drafting was allowed). The last lap, again, I rode alone. All in all a steady performance I guess. My Garmin said I averaged around 37.6km/h, which is acceptable.

Running, as the last leg of the three disciplines was very bad. In the first half I couldn’t really push the pace because I was always close to vomiting. Running a normally easy pace of ~5min/k was the closest thing I could get. But for some reason that was more than enough to catch even more competitors. On the way back to the finish things started to become better and I managed to increase the pace slowly.

IMG 20150607 134352 IMG 20150607 135622

As always in these sprint distance triathlons the finish line shows up before you know it and it’s all over. After a quick shower I checked out the result sheets and was surprised about how “good” I was and found my name in the first half of the second sheet:

IMG 20150607 170740 IMG 20150607 170917


In numbers:

72 | 68 | 8 |  1:20:03 | 0:18:03(165)  | 0:38:26( 51 ) | 0:23:34( 63 ) | 0:20:01
364 Finishers

88 | 77 | 18 | 01:22:13 | 0:21:15(346) | 0:36:28( 37 ) | 0:24:30( 55 ) | 00:20:32
376 Finishers

65 | 58 |  8 | 1:20:23 | 0:19:16(234) | 0:38:31( 30 ) | 0:22:36( 88 ) | 0:17:13
394 Finishers

57 | 54 | 12 | 1:10:38 | 0:18:28(158) | 0:31:07( 24 ) | 0:21:03( 33 ) | 0:15:13
292 Finishers

64 | 59 | 7 | 1:20:56 | 0:19:21(201) | 0:39:18( 37 ) | 0:22:17( 56 ) | 0:19:41
271 Finishers

Platz ges | Platz männl | Platz AK | Swim ( Platz ) | Bike (Platz) | Run (Platz)
Since 2014 in age-group TM25, before it was TM20

Garmin Tracks:


IGA 2017 – Baustellentour im Juni

Hello world!