Chapter 7. All about axes

published book

This chapter covers

Using multiple axes
Selecting plot ranges
Customizing tic marks and tic labels
Plotting time series with gnuplot

In this chapter, we finally come to coordinate axes and their labeling. Treating them last, after discussing plot styles and decorations, may seem surprising, given how critical well-labeled coordinate axes are to achieving an informative graph. On the other hand, gnuplot’s default behavior for axes-related options is perfectly adequate in almost all situations, so that explicit customization is rarely required.

One topic deserves special consideration, namely, the use of multiple axes on the same plot, and this is what we’ll discuss first. Then we move on and describe all the ways that axes and their labels can be customized. Lastly, we treat the special case when one axis (usually the x axis) represents time, in other words, when the plot shows a time series. Time series plots pose special challenges, since now the labels aren’t simply numeric. Instead, we need to worry about things such as the names of months and weekdays, potentially in different languages, too! This has long been a problem to gnuplot users, and so I’ll devote significant space to this application.

But first, let’s talk about multiple axes on the same plot.

7.1. Multiple axes

Gnuplot gives us the ability to plot graphs using two different coordinate systems within the same plot. Typically, these coordinate systems will share one axis (otherwise there’s no good reason to have them on the same plot), but they may also be entirely independent.

Plots involving two different y axes usually make the most sense when we want to compare two data sets side by side which have very different units. As a typical example, let’s study figure 7.1, which compares the average ice cream consumption (in some community) over consecutive four-week periods with the mean temperature during the same period.^[1]

¹ This example was inspired by the “Ice Cream Consumption” story, found on the StatLib’s Data and Story Library (DASL) at http://lib.stat.cmu.edu/DASL/Datafiles/IceCream.html.

Figure 7.1. Using multiple axes on a plot to compare two different quantities side by side. (See listing 7.2 find out how this plot was made.)

Figure 7.1 is a good example of why we might want to use multiple axes on a plot: the two quantities (ice cream consumption and temperatures) have a different nature, and are also numerically quite different. Yet, once we put them next to each other, the correlation becomes clear (not too surprisingly, in this example).

7.1.1. Terminology

As we’ve just seen, gnuplot can handle two sets of axes on a single plot. The consequence is that all commands and options to manipulate axes-related properties come in two versions—one for each set of axes. In this section, we summarize the naming conventions associated with these commands.

The primary coordinate system is usually plotted along the bottom and left borders of the graph. (This corresponds to the first coordinate system we introduced in section 6.2.) If the secondary system (second) is used, it’s plotted along the top and right borders.

Be default, the secondary system isn’t used. Instead, the tic marks (but not the labels) of the primary system are mirrored on the opposite sides of the plot.

All options that modify aspects of the coordinate systems can be applied to any of the axes: either the x or the y axis, in either the primary or the secondary coordinate system. The actual commands and options are prefixed to indicate which specific axis a command should be applied to (see table 7.1). Omitting the prefix applies the option to all axes.

Table 7.1. Prefixes used to indicate the selected coordinate system

	Primary	Secondary
x axis	`x`	`x2`
y axis	`y`	`y2`

In the rest of this chapter, I’ll frequently discuss only one variant of any option—typically the one for the x axis of the primary coordinate system. It should be understood that everything applies to all other axes as well, just by selecting the appropriate prefix per table 7.1.

7.1.2. Plotting with two coordinate systems

The best way to understand how multiple coordinate systems are used in the same plot is through an example. Listing 7.1 shows the beginning of the data file from figure 7.1, and the complete set of commands used to generate the plot from the data file is in listing 7.2.

Listing 7.1. The beginning of the data file from figure 7.1

# Date         Consumption[g]    Temperature[Celsius] 1951-04-01     179.8              6.01 1951-04-29     180.8             13.34 1951-05-27     186.5             17.78 1951-06-24     202.1             19.38 1951-07-22     190.1             19.48 ...

The first three lines (from set timefmt to set xdata) tell gnuplot how to parse and format the calendar date used along the x axis. We’ll discuss them in section 7.5 later in this chapter.

Next, we switch off the mirroring of the primary axis’s tic marks on the opposite (right) side of the plot (set ytics nomirror) and instead explicitly switch on tic marks for the secondary y axis (set y2tics).

We make sure that labels are placed on the graph—this step is crucial when using multiple axes, since otherwise the viewer has no chance of figuring out which data set goes with which axis. We also modify the key from its default location and appearance (see section 6.4 if you need a refresher on any of the options).

Finally, the actual plot command. The only thing new here are the axes keywords and their arguments. These directives tell the plot command which combination of axes to use for each data set. For example, axes x1y2 means that the data should be plotted according to the primary x axis, but the secondary y axis.

There are four possible combinations of axes that can be used, and they can be selected using x1y1, x1y2, x2y1, and x2y2.

Listing 7.2. The commands used to generate figure 7.1

set timefmt "%Y-%m-%d" set format x "%d%b%y" set xdata time set ytics nomirror  # Switch mirroring of primary system OFF set y2tics          # Switch secondary system ON set ylabel  "Mean Icecream Consumption per Head [Grams]" set y2label "Mean Temperature [Celsius]" set key top left reverse Left plot ["1951-03-25":] "icecream" u 1:2 t "Icecream" axes x1y1 w linesp, "" u 1:3 axes x1y2 t "Temperature" w linesp

I hope this example convinced you that using multiple axes in gnuplot is really quite simple (we’ll also study a further example in section 7.4). A different question is whether you should do it.

7.1.3. Should you do it?

Multiple axes on a single plot are occasionally frowned upon, because they can easily be abused to manipulate the message of a graph. Look at figure 7.2. The middle panel shows both data sets drawn in a single coordinate system. We can see that both curves grow, but also that one grows more strongly than the other.

Figure 7.2. The malicious effect of not-to-scale graphs. The data in all three panels is the same, but the scales have been changed for both curves independently. The scale for the solid curve is always on the left, the scale for the dashed curve is on the right. Only in the middle panel are both curves drawn to the same scale.

In the other two panels, we show exactly the same data, but how different is the appearance! In the panel on the top, it seems as if both curves are almost identical, while in the panel at the bottom, one seems to be growing much more strongly than the other one. (Look closely—the seemingly strongly growing curve is the one that changed least in the middle panel.)

These dramatically different appearances have been achieved solely by manipulating the plot ranges for each curve individually. Being able to select different plot ranges for the same data on a single plot is what makes dual axes plots open to the kind of abuse you see in figure 7.2.

Also note how in the figure no indication is given which curve is plotted on which axis, making it impossible to determine the actual meaning of the graph!

I think dual axes plots have their use, in particular when we want to compare two quantities side by side that are entirely different in nature and are measured in different units. (In this case, we couldn’t even plot them “to scale” in a single coordinate system.) Yet, in all such graphs, care must be taken that the selected plot range is chosen suitably for the data at hand. I’ll have more to say about the effect of scales and plot ranges in chapter 14.

7.2. Selecting plot ranges

We’ve already encountered plot ranges in chapter 2, but only in a limited form as an inline specification to the plot command, looking something like this:

plot [-10:10][-2:2] sin(x)

The first pair of numbers in brackets sets the desired x range, while the second (optional) pair of numbers in brackets fixes the y range.

This is enough—most of the time. However, using this syntax, only the plot ranges of the primary coordinate system can be fixed, which is insufficient if we want to use multiple axes on the same plot. Also, the inline syntax doesn’t work well when attempting to change the plot range with the mouse.

The inline syntax is a shorthand for the family of _range commands. (Here and in the following, the underscore is intended as a placeholder for any one of the prefixes from table 7.1.) To adjust plot ranges for the primary and the secondary systems independently, we need to issue separate set _range commands using different prefixes. Listing 7.3 shows how this is being done.

The explicit set _range commands expect a pair of numbers enclosed in square brackets, similar to the syntax for inline range specifications. Besides providing explicit lower and upper boundaries, we can leave one or both of the numbers blank, in which case the corresponding value won’t be changed. Alternatively, we can supply a star (*), which indicates to gnuplot to turn on the autoscale feature for that particular value.

If autoscaling is active, gnuplot chooses the plot range so that all of the data (or function) is visible and then extends the plot range to the next full tic mark position. Turning on autoscaling for the independent variable (that is, for the x axis) isn’t meaningful unless a data file is being plotted, in which case the plot range is extended to the next full tic mark that includes all data points from the input file.^[2]

² There is also a way to read out the values chosen by the autoscale feature and use them for further computation. Check the standard gnuplot reference documentation on set _range writeback and set autoscale for more details.

Listing 7.3 shows some examples of the _range commands in action.

Listing 7.3. Examples for the syntax permissible when setting plot ranges

set xrange [-1:5]      # Explicit min and max set xrange [:10]       # Leave min unaffected, set max explicitly set yrange [3:*]       # Set min explicitly, use autoscaling for max set yrange [:sqrt(2)]  # Numeric expressions are legal set yrange [1:0]       # Inverted axes are possible

7.3. Tic marks

Tic marks are the subdivision markers placed onto the axes of a plot to indicate the scale of the graph. Only if tic marks are present can a viewer infer quantitative information from a graph. Suitably chosen tic marks are therefore of critical importance to any well-constructed graph. Fortunately, gnuplot handles tic marks really well on its own and we rarely need to customize tic mark generation and labeling. But for the few cases when we do have special requests, here’s how to do it.

Gnuplot distinguishes between major and minor tic marks. The difference is that major tic marks also carry a textual label (normally a number), while minor tic marks don’t. By default, only major tic marks are used, except for logarithmic axes, where both major and minor tic marks are drawn by default.

7.3.1. Major tic marks

We can control the appearance of major tic marks using the set xtics family of options. (The usual prefixes for different axes apply.) The command has the following synopsis:

set _tics [ axis | border ]
          [ [no]mirror ]
          [ in | out ]
          [ scale [ default | {flt:major} [,{flt:minor}] ] ]

          [ [no]rotate [by {flt:ang}] ]
          [ offset {pos:offset} | nooffset ]
          [ font "{str:name} [,{int:size}]" ]
          [ textcolor | tc {clr:color} ]

          [ add ]
          [ autofreq
            | {flt:incr}
            | {flt:start}, {flt:incr} [,{flt:end}]
            | ( ["{str:label}"] {flt:pos} [ 0 | 1 ]
                 [, ["{str:label}"] ... ] ) ]

By default, gnuplot draws tic marks on the border of the plot, and mirrors the primary system on the opposite side. Alternatively, tic marks can be drawn along the zero axis (for instance, set xtics axis will draw tic marks along the line of the plot where y equals 0). If the zero axis isn’t within the plot range, the tic marks will always be drawn along the border.

Mirroring of tic marks can be turned off. You probably want to do this when using different coordinate systems for the primary and secondary axes.

Usually, tic marks are drawn on the inside of the border (extending into the plot region), but they can be drawn toward the outside using the out option. This is useful in particular when the tic marks would interfere with the data.

The scale parameter controls the size of both major and minor tic marks. If no size for the minor tic marks is given explicitly, it’s set to half the size of the major marks. The size is given relative to the default size of 1.0 for major tic marks.

The text labels associated with major tic marks can be rotated and shifted using rotate and offset. If rotate is used without an explicit angle (in degrees), the labels will be turned by 90 degrees to the left. The position of the labels can be adjusted using offset. The relative shift can be specified in one of the five usual coordinate systems (see section 6.2). Text font and color can be selected in the usual fashion.

Finally, we can control where tic marks will be drawn. If we choose autofreq, gnuplot will automatically generate tic marks based on the plot range. Alternatively, we can provide an increment. Tic marks will be drawn at integer multiples of the increment. Or we can specify a start point, an increment, and (optionally) an endpoint.

Some examples will clarify:

pi = 3.1415
set xtics pi        # Draws tic marks at pi, 1*pi, 2*pi, ...
set xtics 1, pi     # Draws tic marks at 1, 1+pi, 1+2pi, ...
set xtics 0,0.1,1   # Draws tic marks at 0, 0.1, 0.2, ... 0.9, 1 only

We can also provide a list of explicit labels and locations at which to draw tic marks. The list must be enclosed in regular parentheses, with list entries separated by commas. Each entry in the list consists of the text label for the tic mark (possibly empty), its location, and a third, optional parameter that indicates whether the tic mark should be drawn as a major or minor tic mark: 0 for major and 1 for minor.

Using the add keyword, we can apply additional tic marks, without clobbering previous settings. This can be very useful for adding tic marks for special values to otherwise autogenerated tics, like so:

set xtics autofreq
set xtics add ( "pi" 3.1415 )

These commands draw an additional tic mark at 3.1415, in addition to the automatically generated ones. Had we omitted the add keyword in the previous example, the second line would have clobbered the first, and the only tic mark would have been the one explicitly set at 3.1415.

7.3.2. Minor tic marks

Minor tic marks aren’t labeled, and are typically drawn smaller than the major tic marks. By default, minor tic marks are disabled for linear axes and enabled for logarithmic axes.

Minor tic marks can be switched on using the m_tics family of options, where the underscore is again used as a placeholder for any of the usual prefixes:

set m_tics [ {int:intervals} ]

The optional parameter counts the number of subintervals between major tics marks; the number of minor tic marks generated is one less than this number.

Minor tic marks are only drawn when there are regularly spaced major tic marks. If all major tics are individually placed, m_tics will have no effect. Minor tic marks can still be created manually, using set _tics.

7.3.3. Formatting the tic labels

We can change the formating used for the labels placed at the major tic marks, using set format:

set format [ x|y|xy|x2|y2 ] [ "{str:format}" ]

The format can be chosen for each axis individually. Omitting the axis specifier will apply the format command to all axes at the same time. (Beware: this is a common mistake leading to often mysterious error messages!)

The format string is similar to the format string familiar from the printf() family of functions from the standard C library. In addition, gnuplot uses extra format (or conversion) specifiers, which are listed in table 7.2. These conversion specifiers only apply to numeric arguments; for date/time values, check section 7.5.

Table 7.2. Conversion specifiers understood by the gprintf( ... ) function and used to format numeric values for the set format command. See table 7.3 and table 7.4 for conversion specifiers for date and time values.

Conversion specifier	Description
`%f`	Floating point notation
`%e` or `%E`	Exponential notation, using ‘e’ or ‘E’ (respectively) to indicate exponent
`%g` or `%G`	Uses the shorter of `%f` and `%e` (or `%E`)
`%x` or `%X`	Hexadecimal representation
`%o` (lowercase only)	Octal representation
`%t`	Mantissa to base 10
`%l`	Mantissa to base of current logscale
`%s`	Mantissa to base of current logscale; scientific power (restricts power to multiple of 3)
`%T`	Power to base 10
`%L`	Power to base of current logscale
`%S`	Scientific power (restrict power to multiple of 3)
`%c`	Character replacement for scientific power, such as ‘k’ (kilo) for 1000, and so on
`%P`	Multiple of π

Table 7.3. Alphabetically sorted conversion specifiers for date/time information for the set format and settimefmt commands. See table 7.2 to format numeric values. See table 7.4 for a list sorted by topic.

	Available for ...
Conversion specifier	input: set timefmt	output: set format	Values	Description
`%a`			Sun, Mon, ...	Abbreviated day of week
`%A`			Sunday, Monday, ...	Full day of week
`%b`		(also %h)	Jan, Feb, ...	Abbreviated name of month (3 characters)
`%B`			January, February, ...	Full name of month
`%d`			01–31	Day of month (always two digits on output)
`%D`			e.g. “03/25/08”	Shorthand for “%m/%d/%y” (US date format)
`%H`			00–24	Hour—24-hour clock (always two digits on output)
`%I`			00–12	Hour—12-hour clock (always two digits)
`%j`			001–366	Day of year (always three digits on output)
`%k`			0–24	Hour—24-hour clock (one or two digits on output)
`%l`			0–12	Hour—12-hour clock (one or two digits)
`%m`			01–12	Month (always two digits on output)
`%M`			00–60	Minute (always two digits on output)
`%p`			“am”, “pm”	a.m./p.m. indicator
`%r`			e.g. “10:55:48 pm”	Shorthand for “%I:%M:%S %p” (US time format)
`%R`			e.g. “22:12”	Shorthand for “%H:%M” (24-hour clock time format without seconds)
`%s`			0–...	Unix epoch seconds (input only!)
`%S`			00–60	Seconds (always two digits on output)
`%T`			e.g. “22:12:48”	Shorthand for “%H:%M:%S” (24-hour clock with seconds)
`%U`			00–53	Week of the year (weeks starting on Sunday;always two digits)
`%w`			00–06	Day of the week (0=Sunday; always two digits)
`%W`			00–53	Week of the year (weeks starting on Monday;always two digits)
`%y`			00–99	Year (two-digit; always two digits on output)
`%Y`			0000–9999	Year (four-digit; always two digits on output)

Table 7.4. Conversion specifiers for date/time information for the set format and set timefmt commands, sorted by topic. See table 7.2 to format numeric values. See table 7.3 for a list sorted alphabetically by conversion specifier.

	Available for ...
Conversion specifier	input: set timefmt	output: set format	Values	Description
`%s`			0–...	Unix epoch seconds (input only)
`%S`			00–60	Seconds (always two digits on output)
`%M`			00–60	Minute (always two digits on output)
`%k`			0–24	Hour—24-hour clock (one or two digits on output)
`%H`			00–24	Hour—24-hour clock (always two digits on output)
`%l`			0–12	Hour—12-hour clock (one or two digits)
`%I`			00–12	Hour—12-hour clock (always two digits)
`%p`			“am”, “pm”	a.m./p.m. indicator
`%j`			001–366	Day of year (always three digits on output)
`%d`			01–31	Day of month (always two digits on output)
`%m`			01–12	Month (always two digits on output)
`%b`		(also %h)	Jan, Feb, ...	Abbreviated name of month (3 characters)
`%B`			January, February, ...	Full name of month
`%y`			00–99	Year (two-digit; always two digits on output)
`%Y`			0000–9999	Year (four-digit; always four digits on output)
`%w`			00–06	Day of the week (0=Sunday; always two digits)
`%a`			Sun, Mon, ...	Abbreviated day of week
`%A`			Sunday, Monday, ...	Full day of week
`%W`			00–53	Week of the year (weeks starting on Monday;always two digits)
`%U`			00–53	Week of the year (weeks starting on Sunday;always two digits)
`%R`			e.g. “22:12”	Shorthand for “%H:%M” (24-hour clock time format without seconds)
`%T`			e.g. “22:12:48”	Shorthand for “%H:%M:%S” (24-hour clock with seconds)
`%r`			e.g. “10:55:48 pm”	Shorthand for “%I:%M:%S %p” (US time format)
`%D`			e.g. “03/25/08”	Shorthand for “%m/%d/%y” (US date format)

If the % character is encountered in the format string, it’s interpreted as the beginning of a conversion specifier. It must be followed by one of the characters from table 7.1. A numeric value may be inserted between the % and the following character, which will be interpreted as a desired width. For instance, set format "%.3f" will restrict floating-point values to at most three decimal places. (Check the documentation for the standard C library’s family of printf() functions for all possible format modifiers.)

The format string can also contain arbitrary characters, which are placed verbatim onto the plot. This makes it possible, for instance, to print the units (such as kg or cm) together with the numerical values.

Finally, providing an empty string as format specifier to set format is a way to suppress the generation of tic labels, although the tic marks will be drawn.

Let’s look at an interesting example (listing 7.4 and figure 7.3).

Figure 7.3. The graph generated using the commands in listing 7.4. Note the tic marks at multiples of π and the Greek letters used for the tic labels.

Listing 7.4. The commands used to generate figure 7.3

set terminal wxt enhanced set xtics pi set format x "%.0P{/Symbol p}" plot [-3*pi:3*pi][-1:1] cos(x)

Let’s step through this example:

Make sure enhanced text mode is enabled. (You may choose a different terminal, such as x11 if the wxt terminal doesn’t work for you, as long as it supports enhanced mode.)

Turn on major tic marks at all multiples of π.

Choose formatting as a full multiple of π, suppressing any digits to the right of the decimal sign. Also, append the Greek letter for π (namely {/Symbol p}) to the numeric value.

Plot. Note the choice of plot range in multiples of π.

It’s important to understand that the format specifier %P will interpret a value as multiple of π, but by itself does not ensure that tic marks will only be drawn at integer multiples of π. Instead, we must explicitly choose the locations where tic marks will be drawn using set xtics, then use set format x “%P” to format the labels at those positions accordingly. (Try it both ways to fully understand the difference.)

There are some conversion specifiers in table 7.2 that give us access to the power and mantissa individually. They’re intended for situations where you want to build up the combination of power and mantissa yourself; for instance (not using enhanced mode)

set format y "%.1t^%T"

leads to tic labels of the form 1.5^2. If we use enhanced text mode for the terminal, we might want to use a format specification like

set format y "%.1t 10^%T"

In enhanced text mode, the caret character will be interpreted as superscript indicator, so that the tic labels will be plotted properly, with the powers as superscripts to 10.

7.3.4. Reading tic labels from file

Finally, we can read the tic labels from the input file, using the xticlabels() and yticlabels() functions (or xtic() and ytic() for short) as part of the using directive to the plot command.

Let’s look at the data file in listing 7.5. We see that the x values are both present in numeric form (column 1) and as strings. Of course, it would be nice to use the strings for the tic labels. Here’s how we do that:

plot "months" u 1:2:xtic(3) w linesp

Listing 7.5. A data file containing a time series—see listing 7.7 and figure 7.4

# Month  Data     Month Name 1        3        Jan 2        4        Feb 3        2        Mar 4        5        Apr 5        8        May 6        7        Jun 7        4        Jul 8        5        Aug 9        3        Sep 10       2        Oct 11       4        Nov 12       2        Dec

The xtic() function takes as argument the number of a column that will be used for the tic labels. Equivalent functions exist for the other coordinate axes (ytic(), x2tic(), and so forth). Labels for the y axis are specified after labels for the x axis.

When employing any of the _tic() functions, tic marks and labels will only be drawn at the locations explicitly read from the data file—in other words, autogeneration of tic marks is turned off.

In section 7.5.1, we’ll see yet another way to plot a file like the one in listing 7.5.

7.3.5. Grid and zero axes

In addition (or as alternative) to tic marks along the border of the graph, we can overlay a scale grid on the data. Grid lines are drawn at the position of major and, optionally, minor tic marks.

set grid [ [no]_tics ] [ [no]m_tics ]
         [ layerdefault | front | back ]
         [ polar [ {flt:angle} ] ]
         [ [ linetype | lt {idx:majortype} ]
           [ linewidth | lw {flt:majorwidth} ]
           | [ linestyle | ls {idx:majorstyle} ] ]
         [, [ linetype | lt {idx:minortype} ]
            [ linewidth | lw {flt:minorwidth} ]
            | [ linestyle | ls {idx:minorstyle} ] ]

We can switch the grid on to be drawn at major or minor tic marks, for the primary or secondary coordinate system. (The underscore again must be replaced by any one of the prefixes from table 7.1.) Tic marks must be enabled—instructions to draw a grid at nonexistent tic locations will be ignored. The grid is either drawn in front (set grid front) or behind the data (set grid back). The lines to use for the grid can be set separately for grid lines drawn at major and minor tic marks. If no style or type is given, the style ls 0, which draws the least visible lines possible (often using a dotted line), is assumed. The polar option is only relevant for plots using polar coordinates, which we’ll discuss in chapter 10.

Similar to grid lines, but less obtrusive, are zero axes. These are lines drawn across the graph for all the points where one of the coordinates is equal to zero:

set _zeroaxis [ [ linetype | lt {idx:type} ]
                [ linewidth | lw {flt:width} ]
                | [ linestyle | ls {idx:style} ] ]

For example, set xzeroaxis switches on a horizontal line at y=0 (representing the x axis). The default line type is ls 0, same as for the grid.

7.4. A worked example

You may have wondered how I generated the plot in figure 4.1 using two different y axis scales, each covering only part of the plot. Now we have all the information at hand to lift the secret (shown in listing 7.6).

The plot shows the same data twice, but vertically shifted. I achieve this by adjusting the plot ranges for the primary and secondary coordinate system. Note how the visible range (from min to max) is the same for both systems, but how the two ranges are offset from each other.

Now, the only thing missing are the tic marks. Here, I make sure to specify both a start and an end value for tic mark generation—this way, I achieve the partial labeling of each axis, only for the part of the plot that’s relevant to each curve. It’s all very simple, really...

Listing 7.6. The commands used to generate figure 4.1 in chapter 4

set yrange [9:16] set y2range [6:13] set ytics 9,1,12 nomirror set y2tics 10,1,13

7.5. Special case: time series

Whenever we want to study how some quantity changes over time, we are dealing with a time series. Time series are incredibly common—from stock charts and opinion polls to fever curves. Unfortunately, they pose special challenges, since the tic labels we would like to use for the x axis (such as the names of months or weekdays) aren’t strictly numeric.

Worse, they aren’t even universal, but locale-dependent. If we want to plot time series data, we therefore need to be able to parse arbitrary date/time formats from a file, and we must have the ability to format timestamps in a suitable, locale-dependent format.

Gnuplot offers three different ways to deal with date/time information as part of axes labels:

The “classic” way, using set _data and set timefmt, which allows us to parse and reformat arbitrary date/time information in the input file, and which I’ll describe in detail in section 7.5.2.
The “new” style, which reads fully formatted tic labels directly from the input file using the _ticlabels() functions introduced in section 7.3.4.
For the special cases when we don’t require arbitrary date/time labels, but merely want to use the names of months or weekdays in a plot, gnuplot provides the simplified set _mtics and set _dtics facilities (see section 7.5.1).

In the next section, we first discuss the simpler case of using month or weekday names as tic labels. Afterwards, we’ll tackle the harder problem of dealing with arbitrary date/time information, both for input and for output.

7.5.1. Turning numbers into names: months and weekdays

Gnuplot provides two simple commands to turn numbers into the names of months or days of the week. They offer much less flexibility than the general time series commands discussed in the next section, but are easy to use.

Let’s look back at the data file in listing 7.5. We want to label the x axis with the names of the month, but without using the explicit names in the third column. We can do this using the set xmtics command, which maps numbers to names of months (with 1=“January”, ..., 12=“December”). Don’t confuse this command with the set mxtics command introduced in section 7.3.2, which switches on minor tic marks!

The sequence of commands in listing 7.7 was used to produce the plot in figure 7.4. Note the dual x axis, with the primary axis showing the names of the month and the secondary showing the index of the corresponding month.

Figure 7.4. The data from listing 7.5 plotted using the commands in listing 7.7

Listing 7.7. Commands used to plot the file in listing 7.5

set xtics nomirror # switch off mirrored tic marks on secondary axis set xmtics         # set primary tic mark formatting to Months set x2tics 1,1     # switch on secondary tics, starting at 1, not 0 plot [][0:10] "months" u 1:2 w linesp

This example demonstrates a general problem when using multiple coordinate systems: the tic marks on the secondary set of axes aren’t properly synchronized with the data read from file—they are merely tic marks distributed uniformly over the range inherited from the primary axis of the plot. If we didn’t specify the starting value for x2tics, gnuplot would distribute 12 units over the range from 0 to 12 (as opposed to 1 to 12), with the result that the primary and secondary tic marks wouldn’t even match up with each other! This is true in general for tics on the secondary axes: the plot isn’t scaled to them; they’re merely aliases for the data in the primary axes, and it’s the user’s responsibility to make sure the range plotted on the secondary axes matches the data properly.

Besides the names of months, we also can use days of the week (such as “Mon”, “Tue”, and so on) as tic labels. We enable them using set xdtics (with 0=“Sunday”, ..., 6=“Saturday”), similar to what we’ve seen for set xmtics. Both set xmtics and set xdtics map overflows back into the legal range through a modulo operation (modulo 12 and modulo 7, respectively), as you would expect.

Listing 7.9 shows an interesting application, where we adjust the x values on the fly to align the days of the month with the days of the week. The original data file is shown in listing 7.8.

Listing 7.8. Another time series example—see listing 7.9 and figure 7.5

# Day in month     Value 1                  5.080     # First of the month - a WEDNESDAY! 2                  5.310 3                  5.561 4                  5.574 5                  6.008 6                  5.540 7                  5.419 8                  5.519 9                  5.715 ... 31                 5.945

Listing 7.9. The commands to plot the file in listing 7.8 to generate figure 7.5

set xtics nomirror set xdtics plot "days" u ($1+2):2 w linesp

Figure 7.5. The data from listing 7.8 plotted using the commands in listing 7.9. Note the days of the week as tic labels on the x axis.

We can restore normal (numerical) axes labeling through unset xmtics or unset xdtics.

The actual strings used for the tic labels are determined by the current locale. The default is taken from the LANG environment variable, but can be changed using the following command:

set locale ["{str:locale}"]

The choice of available locales is system-dependent. On Unix systems, you can use the shell command locale -a for a list of available locales, or check the directory /usr/ share/locale/. Note that some locales have country-specific variations (such as en_AU, en_CA, en_GB, and en_US). In this case it may not be sufficient to set the general locale (such as en), and a more specific locale must be chosen.

Finally, similar commands exist for all other axes, using the usual prefixes per table 7.1.

Old Versus New Style

If the data file contains a column with suitable strings that can be used for tic labels, the new style (see section 7.3.4) is very convenient. Nevertheless, the old style that we introduced in this section still has its uses. Three points stand out:

The old style can be used even when the data file doesn’t contain explicit tic labels.
The old style supports internationalization through the set locale option.
The old style gives better results if data points are missing or irregularly spaced. Remember that the new style plots tic marks only at the locations found in the data file. So, if for example the entry for the month of May were missing from the file in listing 7.5, no tic mark for May would be generated using the new style. By contrast tic marks (and labels) for all 12 months are drawn when using the old style.

7.5.2. General time series: the gory details

For more general time series, we face two problems: first we must be able to read arbitrary timestamps from the input file, and then format them again for output.

First, we must enable time series mode (for the primary x axis) using

set xdata time

Issuing set xdata (without an argument) restores normal operation again. Equivalent commands exist for all other axes, distinguished through the usual prefixes.

In time series mode, input (parsing of timestamps from files) and output (formatting timestamps for inclusion in the plot) are controlled by the two commands set timefmt for input and set format for output. Both accept a format string using a syntax similar to the one found in the POSIX strftime() routine. (We already encountered set format in section 7.3.3, but there we only talked about the formatting of plain numbers. To this, we now add the possibilities to format complex date/ time values.)

Gnuplot assumes all data to be in universal time coordinates (UTC)—it has no facilities to perform time zone changes, adjust for daylight savings, or apply similar transformations. If they are required, they must be applied externally, before attempting to plot the data.^[3]

³ According to the gnuplot documentation, timestamps are internally represented as seconds since midnight, January 01, 2000 (UTC). Of course, users should not rely on this particular internal representation, but insight into this piece of the implementation helps to understand the way some values default when generating tic marks from dates. For instance, when reading only month and day (using set timefmt "%d %m" for example) but plotting month, day, and year (using set format x "%D" or similar), you’ll find that the year defaults to 2000.

Input

Time/date information is parsed in a way reminiscent of the scanf() family of functions, and shares its familiar challenges.

The expected input format is indicated through a format string to set timefmt. The format string may contain several conversion specifiers, all of which begin with the % character, followed by a letter that indicates how an input value should be interpreted. Check tables 7.3 and 7.4 for a list of all possible conversion specifiers and their meanings.

The input format string may contain other characters besides format specifiers, but input strings must match the format exactly (with some exceptions regarding whitespace we’ll discuss shortly):

set timefmt "%Y-%m-%d" # will match 2000-01-01, but also 2000-1-1
set timefmt "%d%b%y"   # will match 1JAN05, 01Jan05, 1jan05

If there are no characters separating different fields from one another, gnuplot consumes a fixed number of characters per field (left-to-right), and the fields must be left-zero-padded as necessary:

set timefmt "%Y%m%d" # will parse 20020101 as Jan 01st,
                     # but will parse 2002101 as Oct 01st,
                     # and will fail to parse 200211

Special rules apply when the date format to be parsed contains whitespace. First of all, gnuplot will interpret whitespace-separated data as occupying several columns. A timefmt format containing whitespace in turn will consume several columns. A blank space (not an escaped tab: \t) embedded in a formatting string matches zero or more blanks (not tabs) in the input file. So, "%H %M" matches 1220, 12␣20, and 12␣␣␣20. (The ␣ symbol indicates a whitespace character.)

An example will help. The following input file

2005-01-01     8:41     3
2005-01-01     9:17     4
2005-01-01    22:46     2
2005-01-02    03:05     5

will be correctly parsed and plotted by the following commands:

set timefmt "%Y-%m-%d %H:%M"
plot "data" u 1:3 w linesp

Note that the column used for the y values is the third, since the time format consumes two columns. Also, the format string contains a single whitespace, but in the data file several blanks separate the date from the time. The file won’t parse correctly if the spaces between date and time are replaced by tabs.

Finally, gnuplot won’t parse strings enclosed in quotes (see section 4.2.1). Therefore, it’s not possible to parse a file that contains date/time information as strings with embedded whitespace:

"2005-01-01 8:41"     3  # will NOT parse
"2005-01-01 9:17"     4

Gnuplot seems to be tolerant with regard to the locale when it comes to parsing %b and %B fields (abbreviated and full name of months), and appears to parse them on a best-effort basis.

Output

Compared to parsing time/date information, it’s much easier to format it into human-readable tic labels. Simply specify the desired output format using set format_ "..." (where the underscore again is a placeholder for any of the possible prefixes from table 7.1).

Only one word of caution: do not omit the specification of the axis to which this format should be applied. Leaving the axis open will apply the same format to all axes. This can lead to mysterious error messages. For instance, if the data for the y axis exceeds the legal range of values for the defined format, this will lead to a Bad format character message. (Gnuplot won’t generate a plot in these cases, making it difficult to find the location of the error.)^[4]

⁴ If the output formatting routine gets wedged, it may even be necessary to exit gnuplot and restart to reach consistent behavior.

The format string can contain arbitrary text besides the formatting characters. Here’s a useful snippet to stack the time on top of the date (note the embedded newline):

set format x "%T\n%D"     # Time stacked on top of date

But other text is also possible, such as formatting characters:

set format x "%Y-%m-%D %H:%M"     # Date, followed by time

and even plain text:

set format x "It happened on %A"     # Full day of week

In particular when used together with string functions, there is almost no limit to the appearance of tic labels for plots displaying time series.

Working in Time Series Mode

Keep in mind that when working in time series mode (after issuing the set xdata time command), all x coordinates will be interpreted according to the current setting of the set timefmt format option (equivalently for all other axes).

In particular, this means that plotting ranges must be specified as quoted strings, in the format given by timefmt (the input time format): plot ["01Jan00":"15Jan00"] "data" u 1:2". Similar concerns hold for the coordinates supplied to set arrow or set label.

Finally, don’t forget that the currently selected locale (as inherited from the environment when gnuplot was started, or set using set locale) will affect the tic labels (names of months and of days in week).

7.6. Summary

In this chapter, we talked about all the ways we can control the appearance of the axes on a plot. It’s been a long chapter, but axes—or rather, the tic marks and labels placed on them—are important: they enable the viewer to gain quantitative insight from the data displayed in the plot.

Before moving on, let’s summarize the most important points:

Most of the time, gnuplot’s default behavior is just fine. It will place reasonably spaced tic marks along the axes and label them appropriately.
We can put an explanatory label on each axis using the set xlabel and set ylabel commands.
Tic marks are usually autogenerated, but we can exert great control using the set _tics family of commands. We can influence the range and frequency at which tic marks are placed; we can even put individual tic marks onto the plot explicitly.
Using the same family of commands, we can also customize the appearance of tic marks and tic labels.
The visible range of a plot is controlled through the set _range family of functions. Alternatively, plot ranges can be specified inline as part of the plot command.
Gnuplot supports multiple coordinate systems in a single graph. We can switch them on through the set x2tics and set y2tics commands, but need to take care not to generate a confusing graph or a graph that distorts the data inappropriately.
There are several ways to format time and date information for use in tic labels. Numbers can be formatted as names of months or weekdays through the simple set _mtics and set _dtics commands. For more sophisticated labeling tasks, we can use the set xdata time facility, together with the range of formatting options available through the set format command.

This chapter concludes our overview of what I would call “basic” gnuplot. In the following chapters, we’ll look at some exciting but distinctly more advanced topics, such as color in graphs, multidimensional plots, and other special-purpose features.

We’ll also take an in-depth look at ways to script and program gnuplot, and learn everything there is about exporting graphs to standard file formats.