svgtools: Manipulate SVG template files of charts.

Purpose

The purpose of svgtools is to manipulate SVG files that are templates of charts the user wants to produce. In vector graphics one copes with x-/y-coordinates of elements (e.g. lines, rectangles, text). Their scale is often dependent on the program that is used to produce the graphics. In applied statistics one usually has numeric values on a fixed scale (e.g. percentage values between 0 and 100) to show in a chart. Basically, svgtools transforms the statistical values into coordinates and widths/heights of the vector graphics.

SVG file format is nothing else than XML (see here). By the means of package xml2, svgtools reads SVG files and then changes certain attributes or even whole elements in the XML document.

For example, an SVG image might look like this:

Fig. 1: SVG example
Fig. 1: SVG example

Its file content contains lines in XML:

...
<g id="myBars">
    <g>
        <rect x="141.732" y="92.126" fill="#C6C6C6" width="94.394" height="14.173"/>
        <rect x="236.126" y="92.126" fill="#878787" width="94.394" height="14.173"/>
        <rect x="330.52" y="92.126" fill="#3C3C3B" width="94.677" height="14.173"/>
        <text transform="matrix(1 0 0 1 183.396 101.8799)" font-family="'ArialMT'" font-size="10">33</text>
        <text transform="matrix(1 0 0 1 277.7612 101.8799)" fill="#FFFFFF" font-family="'ArialMT'" font-size="10">33</text>
        <text transform="matrix(1 0 0 1 372.1265 101.8799)" fill="#FFFFFF" font-family="'ArialMT'" font-size="10">33</text>
    </g>
    ...
</g>
<rect id="myFrame" x="141.732" y="85.04" fill="none" stroke="#000000" stroke-width="0.5" stroke-miterlimit="10" width="283.464" height="141.732"/>
...

What we see here are three rectangle elements (top bar of the chart) with graphical x-coordinates and widths that lie within a rather arbitrary range. The same holds for three text elements (value labels of top bar) and their coordinates that are stored within a SVG transformation matrix in attribute ‘transform’. Their text entry is fixed to 33. All of these are grouped together (top bar with value labels) and then grouped again (all the bars). The last line shown corresponds to the rectangle that serves as the outer frame of the data area of the chart.

The following lines of code are enough to set coordinats, widths and numbers in the bar chart right when, for example, percentage values are stored within a dataframe of 5 rows (the groups) and 3 columns (the categories):

svg <- read_svg(file = "images/fig1.svg")
myValues <- data.frame(cat1=c(0.1,0.2,0.3,0.4,0.5),
                       cat2=c(0.35,0.25,0.35,0.25,0.35),
                       cat3=c(0.55,0.55,0.35,0.35,0.15))
svg <- stackedBar(svg = svg,frame_name = "myFrame",group_name = "myBars",
                  scale_real = c(0,100),values = myValues*100)
write_svg(svg = svg,file = "images/fig1_values.svg")

The result looks like this:

Fig. 2: SVG manipulated to reflect real values
Fig. 2: SVG manipulated to reflect real values

The magic happens at the function call for stackedBar. Here, one argument refers to the named rectangle ‘myFrame’ to define outer limits for the graphical coordinates and one argument to the named group of elements ‘myBars’ containing rectangles (bar segments) and texts (value labels) for the chart. Concerning the values one wants to show in the chart, the “real” scale is defined by a vector ranging from 0 to 100 and a dataframe with values is provided. svgtools can now calculate the corresponding graphical coordinates and widths and change the elements in ‘myBars’ accordingly.

This vignette explains how to set up SVG files so that svgtools can work with them and gives insight in the most common usage of package functions. For detailed information on all functions and arguments see ?stackedBar and others.

Reading, displaying and writing SVG files

Functions for file handling and display are rather straight-forward in svgtools. A typical workflow looks like that:

svg <- read_svg(file = "myFile.svg")
summary_svg(svg = svg)
display_svg(svg = svg)
# Code to manipulate the SVG
# ...
display_svg(svg = svg)
write_svg(svg = svg,file = "myFile_out.svg")

read_svg relies on read_xml from the xml2 package. So it has all the possibilities to read a file from the file system, a connection or even a raw vector. It defaults to encoding UTF-8, which may be changed with argument enc="latin-1", for example.

Function summary_svg is a convenience function that prints some useful information about the SVG content on the console (or whereever sink is set to). For the SVG in Fig. 1 the output would look like this:

[1] "************************"
[1] "** -- SVG SUMMARY: -- **"
[1] "************************"
[1] "-- NAMED GROUPS:"
[1] "myBars with 5 children"
[1] "-- AVAILABLE FRAMES:"
[1] "myFrame"
[1] "-- USED FONTS:"
[1] "'ArialMT'"
[1] "-- USED FONT SIZES:"
[1] "10"
[1] "-- USED COLORS:"
[1] "#C6C6C6" "#878787" "#3C3C3B" "none"    "#000000"

One can see that there is a named group (‘myBars’) in the SVG. It contains five child elements, that are the five bars of the chart, see further below. Also, there is one “available frame” (a named rectangle) called ‘myFrame’. This information helps with setting right the arguments in the function calls to manipulate the SVG. Further information on used fonts, font sizes and colors in the SVG only serves the purpose of validating the consistency of the design. One can invoke summary_svg directly by argument summary=TRUE of read_svg.

To display an SVG on the current viewport one may use display_svg. Standard viewport depends on operating system and IDE. For example, RStudio plots the image under the Viewer tab. By default, width and height of the bitmap (image) are derived from its content and the current DPI setting of the viewport. But one can set desired width and height with the correspondent function arguments.

Typically, display_svg is used before and after SVG manipulation to get visual proof of the changes. Therefore, argument display=TRUE of read_svg conveniently invokes the function.

Finally, write_svg uses write_xml from the xml2 package to write the (then manipulated) SVG to file system or an open connection. By default, hidden elements of the SVG are removed in the written file (not in the XML document in the R environment). To change this behavior set remove_hidden=FALSE. If one wants to remove all groupings in the written file (again, not in the XML document itself) it is possible to set flatten=TRUE. This may be beneficial in further layouting tasks on the resulting SVG image.

General principles of operation

svgtools relies heavily on naming objects of the SVG. One can always accomplish that with any text editor by inserting id-attributes in the XML element for the object. See the following:

<rect id="myFrame" x="141.732" y="85.04" width="283.464" height="141.732"/>

Naming an object in that way is also possible in almost any vector graphics program. (Check the manuals.) For example, in Adobe Illustrator using the Layers Panel to name objects ultimately leads to XML elements with id-tags when saving as SVG.

The following rules apply:

On the side of the values one wants to show in a chart, be mindful that svgtools does not calculate anything apart from the right coordinates and widths/heights of objects. This is relevant in situations like the following:

Adjustment of charts with lines and/or symbols needs a simple vector of numerical values. For bar charts, it is possible to adjust several bars at once. In that case, one needs to provide a dataframe or a matrix (with only numerical values). Then, rows always concern different bars, while columns define the sequence of bar segments to stack.

Horizontal and vertical alignment of charts works essentially the same. A corresponding argument is provided in all manipulating functions except changeText. Mind that alignment="horizontal" means adjusting x-coordinates for all chart types while alignment="vertical" always refers to adjustment of y-coordinates. This may be counter-intuitive when it comes to line charts, see below.

Adjusting bar charts

General bar charts

For a general bar chart one needs to prepare an SVG file that has named (XML attribute ‘id’) groups (XML element ‘g’) of bar segments (XML element ‘rect’) and, optionally, value labels (XML element ‘text’).

svg <- read_svg(file = "images/fig3.svg",summary = TRUE,display = TRUE)

Reading the SVG file with arguments summary=TRUE and display=TRUE conveniently prints information about it on the console and displays the SVG in the current viewport (see here). It might look like that:

Fig. 3: Bar chart example
Fig. 3: Bar chart example

In this example the XML structure has two named groups. The first consists of the XML elements (3 rectangles and 3 texts) for the leftmost bar and is named ‘overall’. The second one is a group named ‘subgroups’ that itself contains three groups of XML elements for the three bars to the right. See the following excerpt of the file:

...
<g id="overall">
    <rect x="156.746" y="255.119" fill="#3C3C3B" width="26.667" height="56.693"/>
    <rect x="156.746" y="198.425" fill="#878787" width="26.667" height="56.693"/>
    <rect x="156.746" y="141.732" fill="#C6C6C6" width="26.667" height="56.693"/>
    <text transform="matrix(1 0 0 1 164.5171 306.2188)" fill="#FFFFFF" font-family="'ArialMT'" font-size="10">20</text>
    <text transform="matrix(1 0 0 1 164.5171 249.5259)" fill="#FFFFFF" font-family="'ArialMT'" font-size="10">20</text>
    <text transform="matrix(1 0 0 1 164.5171 192.833)" font-family="'ArialMT'" font-size="10">20</text>
</g>
<g id="subgroups">
    <g>
        <rect x="213.438" y="255.119" fill="#3C3C3B" width="26.667" height="56.693"/>
        <rect x="213.438" y="198.425" fill="#878787" width="26.667" height="56.693"/>
        <rect x="213.438" y="141.732" fill="#C6C6C6" width="26.667" height="56.693"/>
        <text transform="matrix(1 0 0 1 221.21 306.2188)" fill="#FFFFFF" font-family="'ArialMT'" font-size="10">20</text>
        <text transform="matrix(1 0 0 1 221.21 249.5259)" fill="#FFFFFF" font-family="'ArialMT'" font-size="10">20</text>
        <text transform="matrix(1 0 0 1 221.21 192.833)" font-family="'ArialMT'" font-size="10">20</text>
    </g>
    <g>
        ...
    </g>
    <g>
        ...
    </g>
</g>
...

The summary on the console will reflect this. Note that the number of child elements depends on wether a named group consists of bar elements or of further (sub)groups:

[1] "-- NAMED GROUPS:"
[1] "overall with 6 children"
[1] "subgroups with 3 children"

Note: It is not allowed to nest the grouping any further!

The bar chart in the example of figure 3 needs two separate function calls to be adjusted to values on the statistical scale. The first call to stackedBar refers to the group named ‘overall’, the second one to the group named ‘subgroups’. It is necessary to have a named frame (XML element ‘rect’) for both cases (see here). Since the bar chart will be adjusted in vertical direction (alignment="vertical" makes sure that y-coordinates and heights are changed) the SVG needs to have only one such rectangle for both the overall bar and the bars for subgroups.

The main difference between the two function calls is how one provides the values. In the case that a group refers to only one bar (here: ‘overall’) values are provided as a simple numerical vector. If a matrix or a dataframe were provided, only the first row would be used. The function call will fail with an error message if the number of values does not match the number of bar segments (or value labels).

In the case that a group refers to (sub)groups (here: ‘subgroups’) one needs to provide a matrix or a dataframe. Rows will be used to get values for each bar, while the order of columns defines the values from left to right with alignment="horizontal" or bottom to top with alignment="vertical. The function call will stop with an error message if the number of rows does not match the number of bars and also if the number of values does not match the number of bar segments (or value labels).

svg <- stackedBar(svg = svg,frame_name = "frame",group_name = "overall",
                  scale_real = c(0,160),values = c(9.97,42.42,105.71),
                  alignment = "vertical",has_labels = TRUE,label_position = "end",
                  decimals = 0,display_limits = 10)
df.subgroups <- matrix(1:9*8,nrow=3)
svg <- stackedBar(svg = svg,frame_name = "frame",group_name = "subgroups",
                  scale_real = c(0,160),values = df.subgroups,
                  alignment = "vertical",display_limits = 10)
display_svg(svg = svg)
write_svg(svg = svg,file = "images/fig3_values.svg",
          remove_hidden = FALSE,flatten = TRUE)

The program code above will ultimately display the following chart and also save it to a file:

Fig. 4: Adjusted bar chart example
Fig. 4: Adjusted bar chart example

The first function call to stackedBar in the example above has set every argument there is for this function. has_labels=TRUE and decimals=0 are actually default values, which is why things work the same in the second function call. While the meaning of has_labels=TRUE is obvious, note how values are rounded to the number of decimal digits desired, so that ‘105.71’ becomes ‘106’ in the chart. It is possible to set the rounding of the labels to rounding away from zero by options("svgtools.roundAwayFromZero" = TRUE) such that ‘106.5’ becomes ‘107’ (default: ‘106’).

label_position="end" puts value labels to the top of the bar segments in vertical aligment and to the right side (or left side for negative values) in horizontal alignment. The default setting used in the second function call puts value labels in the center of the bar segments. Argument display_limits is used to suppress value labes in a range around zero. If only one number is provided it refers to the absolute value. In the example the value ‘8’ of the category A in group 1 is not shown any more because it is lower than 10. Note that this is evaluated for the exact value, not the rounded one. So ‘10.1’ would be visible (as ‘10’ with decimals=0) while ‘9.9’ would not.

Note that display options for value labels are rather limited in the current version of svgtools. It is neither possible to set the distance from the edge of bar segments with label_position="start" or label_position="end" nor will stackedBar change any coordinates or alignment of texts that do not concern the x-axis with alignment="horizontal" or y-axis with alignment="vertical". This is why in the example ‘106’ (category C of overall bar) leans to the right: the text element was not aligned to be centered in the SVG template (set XML attribute ‘text-anchor’ to “middle” beforehand, in order to do this).

The example code calls write_svg with otional arguments (see here). With remove_hidden=FALSE one will still have text elements for the two value labels that are not displayed in the saved SVG file. And flatten=TRUE leads to an SVG file without any groups (XML elements ‘g’). All 60 graphical elements are stored directly beneath the XML document node.

Special bar charts

With the help of hidden bar segments and recalculated values one could already produce a wide range of bar charts using stackedBar. For convenience, svgtools offers three special variants. They work the same as the general case when it comes to naming SVG elements. The following sections describe each one very briefly and focussing on their distinct usage of function arguments.

Reference bar

In svgtools a reference bar is a stacked bar chart that is aligned around a nullvalue. Bar segments up to a so-called reference category (a certain column/position of values) are positioned to the left (with alignment="horizontal") or to the bottom (with alignment="vertical") while further segments lie to the right or top.

The following figure shows a possible template. Note that it is not necessary to adjust bar segments and values (in horizontal direction) beforehand.

Fig. 5: Reference bar example
Fig. 5: Reference bar example

An example code for usage of referenceBar looks like that:

values <- matrix(c(1,2,3,4,2,3,4,1,3,4,1,2,4,1,2,3,1,2,3,4)*10,
                 nrow = 5,byrow = TRUE)
svg <- referenceBar(svg = svg,frame_name = "frame",group_name = "group",
                    scale_real = c(-100,100),values = values,
                    reference = 2,nullvalue = 0)

In regard of the x-axis labels, scale_real has to provide an interval with a range of 200 that includes the nullvalue. So even though the percentages in the example are of course positive, defining the range of values from -100 to 100 is feasible. The nullvalue will most often be at zero, so this is also set as the default value. reference=2 defines, that the first two values in each bar will be represented to the left of whereever the nullvalue lies, while further values (another two in the example) will be shown to the right. The result looks like this:

Fig. 6: Adjusted reference bar example
Fig. 6: Adjusted reference bar example

Difference bar

Another specialized bar chart provides the ability to show difference values in regard to a null value (usually zero). Bar segments, and optionally value labels, of values lower than the null value are positioned to the left (with alignment="horizontal") or to the bottom (with alignment="vertical") while higher values are represented to the right or to the top.

The following example makes use of the general behavior of svgtools that elements related to NA values are simply hidden from the chart: