The purpose of svgtools is to manipulate SVG files that
are templates of charts the user wants to produce. In vector graphics
one copes with x-/y-coordinates of elements (e.g. lines, rectangles,
text). Their scale is often dependent on the program that is used to
produce the graphics. In applied statistics one usually has numeric
values on a fixed scale (e.g. percentage values between 0 and 100) to
show in a chart. Basically, svgtools transforms the
statistical values into coordinates and widths/heights of the vector
graphics.
SVG file format is nothing else than XML (see here). By
the means of package xml2, svgtools reads SVG
files and then changes certain attributes or even whole elements in the
XML document.
For example, an SVG image might look like this:
Its file content contains lines in XML:
...
<g id="myBars">
<g>
<rect x="141.732" y="92.126" fill="#C6C6C6" width="94.394" height="14.173"/>
<rect x="236.126" y="92.126" fill="#878787" width="94.394" height="14.173"/>
<rect x="330.52" y="92.126" fill="#3C3C3B" width="94.677" height="14.173"/>
<text transform="matrix(1 0 0 1 183.396 101.8799)" font-family="'ArialMT'" font-size="10">33</text>
<text transform="matrix(1 0 0 1 277.7612 101.8799)" fill="#FFFFFF" font-family="'ArialMT'" font-size="10">33</text>
<text transform="matrix(1 0 0 1 372.1265 101.8799)" fill="#FFFFFF" font-family="'ArialMT'" font-size="10">33</text>
</g>
...
</g>
<rect id="myFrame" x="141.732" y="85.04" fill="none" stroke="#000000" stroke-width="0.5" stroke-miterlimit="10" width="283.464" height="141.732"/>
...
What we see here are three rectangle elements (top bar of the chart) with graphical x-coordinates and widths that lie within a rather arbitrary range. The same holds for three text elements (value labels of top bar) and their coordinates that are stored within a SVG transformation matrix in attribute ‘transform’. Their text entry is fixed to 33. All of these are grouped together (top bar with value labels) and then grouped again (all the bars). The last line shown corresponds to the rectangle that serves as the outer frame of the data area of the chart.
The following lines of code are enough to set coordinats, widths and numbers in the bar chart right when, for example, percentage values are stored within a dataframe of 5 rows (the groups) and 3 columns (the categories):
svg <- read_svg(file = "images/fig1.svg")
myValues <- data.frame(cat1=c(0.1,0.2,0.3,0.4,0.5),
cat2=c(0.35,0.25,0.35,0.25,0.35),
cat3=c(0.55,0.55,0.35,0.35,0.15))
svg <- stackedBar(svg = svg,frame_name = "myFrame",group_name = "myBars",
scale_real = c(0,100),values = myValues*100)
write_svg(svg = svg,file = "images/fig1_values.svg")
The result looks like this:
The magic happens at the function call for stackedBar.
Here, one argument refers to the named rectangle ‘myFrame’ to define
outer limits for the graphical coordinates and one argument to the named
group of elements ‘myBars’ containing rectangles (bar segments) and
texts (value labels) for the chart. Concerning the values one wants to
show in the chart, the “real” scale is defined by a vector ranging from
0 to 100 and a dataframe with values is provided. svgtools
can now calculate the corresponding graphical coordinates and widths and
change the elements in ‘myBars’ accordingly.
This vignette explains how to set up SVG files so that
svgtools can work with them and gives insight in the most
common usage of package functions. For detailed information on all
functions and arguments see ?stackedBar and others.
Functions for file handling and display are rather straight-forward
in svgtools. A typical workflow looks like that:
svg <- read_svg(file = "myFile.svg")
summary_svg(svg = svg)
display_svg(svg = svg)
# Code to manipulate the SVG
# ...
display_svg(svg = svg)
write_svg(svg = svg,file = "myFile_out.svg")
read_svg relies on read_xml from the
xml2 package. So it has all the possibilities to read a
file from the file system, a connection or even a raw vector. It
defaults to encoding UTF-8, which may be changed with argument
enc="latin-1", for example.
Function summary_svg is a convenience function that
prints some useful information about the SVG content on the console (or
whereever sink is set to). For the SVG in Fig. 1 the output
would look like this:
[1] "************************"
[1] "** -- SVG SUMMARY: -- **"
[1] "************************"
[1] "-- NAMED GROUPS:"
[1] "myBars with 5 children"
[1] "-- AVAILABLE FRAMES:"
[1] "myFrame"
[1] "-- USED FONTS:"
[1] "'ArialMT'"
[1] "-- USED FONT SIZES:"
[1] "10"
[1] "-- USED COLORS:"
[1] "#C6C6C6" "#878787" "#3C3C3B" "none" "#000000"
One can see that there is a named group (‘myBars’) in the SVG. It
contains five child elements, that are the five bars of the chart, see
further below. Also, there is one “available frame” (a named rectangle)
called ‘myFrame’. This information helps with setting right the
arguments in the function calls to manipulate the SVG. Further
information on used fonts, font sizes and colors in the SVG only serves
the purpose of validating the consistency of the design. One can invoke
summary_svg directly by argument summary=TRUE
of read_svg.
To display an SVG on the current viewport one may use
display_svg. Standard viewport depends on operating system
and IDE. For example, RStudio plots the image under the Viewer tab. By
default, width and height of the bitmap (image) are derived from its
content and the current DPI setting of the viewport. But one can set
desired width and height with the correspondent function arguments.
Typically, display_svg is used before and after SVG
manipulation to get visual proof of the changes. Therefore, argument
display=TRUE of read_svg conveniently invokes
the function.
Finally, write_svg uses write_xml from the
xml2 package to write the (then manipulated) SVG to file
system or an open connection. By default, hidden elements of the SVG are
removed in the written file (not in the XML document in the R
environment). To change this behavior set
remove_hidden=FALSE. If one wants to remove all groupings
in the written file (again, not in the XML document itself) it is
possible to set flatten=TRUE. This may be beneficial in
further layouting tasks on the resulting SVG image.
svgtools relies heavily on naming objects of the SVG.
One can always accomplish that with any text editor by inserting
id-attributes in the XML element for the object. See the following:
<rect id="myFrame" x="141.732" y="85.04" width="283.464" height="141.732"/>
Naming an object in that way is also possible in almost any vector graphics program. (Check the manuals.) For example, in Adobe Illustrator using the Layers Panel to name objects ultimately leads to XML elements with id-tags when saving as SVG.
The following rules apply:
summary_svg!On the side of the values one wants to show in a chart, be mindful
that svgtools does not calculate anything apart from the
right coordinates and widths/heights of objects. This is relevant in
situations like the following:
values
argument at function calls.percentileBar takes
percentile values and recalculates them into differences between
percentiles to provide widths for bar segments.Adjustment of charts with lines and/or symbols needs a simple vector of numerical values. For bar charts, it is possible to adjust several bars at once. In that case, one needs to provide a dataframe or a matrix (with only numerical values). Then, rows always concern different bars, while columns define the sequence of bar segments to stack.
Horizontal and vertical alignment of charts works essentially the
same. A corresponding argument is provided in all manipulating functions
except changeText. Mind that
alignment="horizontal" means adjusting x-coordinates for
all chart types while alignment="vertical" always refers to
adjustment of y-coordinates. This may be counter-intuitive when it comes
to line charts, see below.
For a general bar chart one needs to prepare an SVG file that has named (XML attribute ‘id’) groups (XML element ‘g’) of bar segments (XML element ‘rect’) and, optionally, value labels (XML element ‘text’).
svg <- read_svg(file = "images/fig3.svg",summary = TRUE,display = TRUE)
Reading the SVG file with arguments summary=TRUE and
display=TRUE conveniently prints information about it on
the console and displays the SVG in the current viewport (see here). It might look like that:
In this example the XML structure has two named groups. The first consists of the XML elements (3 rectangles and 3 texts) for the leftmost bar and is named ‘overall’. The second one is a group named ‘subgroups’ that itself contains three groups of XML elements for the three bars to the right. See the following excerpt of the file:
...
<g id="overall">
<rect x="156.746" y="255.119" fill="#3C3C3B" width="26.667" height="56.693"/>
<rect x="156.746" y="198.425" fill="#878787" width="26.667" height="56.693"/>
<rect x="156.746" y="141.732" fill="#C6C6C6" width="26.667" height="56.693"/>
<text transform="matrix(1 0 0 1 164.5171 306.2188)" fill="#FFFFFF" font-family="'ArialMT'" font-size="10">20</text>
<text transform="matrix(1 0 0 1 164.5171 249.5259)" fill="#FFFFFF" font-family="'ArialMT'" font-size="10">20</text>
<text transform="matrix(1 0 0 1 164.5171 192.833)" font-family="'ArialMT'" font-size="10">20</text>
</g>
<g id="subgroups">
<g>
<rect x="213.438" y="255.119" fill="#3C3C3B" width="26.667" height="56.693"/>
<rect x="213.438" y="198.425" fill="#878787" width="26.667" height="56.693"/>
<rect x="213.438" y="141.732" fill="#C6C6C6" width="26.667" height="56.693"/>
<text transform="matrix(1 0 0 1 221.21 306.2188)" fill="#FFFFFF" font-family="'ArialMT'" font-size="10">20</text>
<text transform="matrix(1 0 0 1 221.21 249.5259)" fill="#FFFFFF" font-family="'ArialMT'" font-size="10">20</text>
<text transform="matrix(1 0 0 1 221.21 192.833)" font-family="'ArialMT'" font-size="10">20</text>
</g>
<g>
...
</g>
<g>
...
</g>
</g>
...
The summary on the console will reflect this. Note that the number of child elements depends on wether a named group consists of bar elements or of further (sub)groups:
[1] "-- NAMED GROUPS:"
[1] "overall with 6 children"
[1] "subgroups with 3 children"
Note: It is not allowed to nest the grouping any further!
The bar chart in the example of figure 3 needs two separate function
calls to be adjusted to values on the statistical scale. The first call
to stackedBar refers to the group named ‘overall’, the
second one to the group named ‘subgroups’. It is necessary to have a
named frame (XML element ‘rect’) for both cases (see here). Since the bar chart will be adjusted in
vertical direction (alignment="vertical" makes sure that
y-coordinates and heights are changed) the SVG needs to have only one
such rectangle for both the overall bar and the bars for subgroups.
The main difference between the two function calls is how one provides the values. In the case that a group refers to only one bar (here: ‘overall’) values are provided as a simple numerical vector. If a matrix or a dataframe were provided, only the first row would be used. The function call will fail with an error message if the number of values does not match the number of bar segments (or value labels).
In the case that a group refers to (sub)groups (here: ‘subgroups’)
one needs to provide a matrix or a dataframe. Rows will be used to get
values for each bar, while the order of columns defines the values from
left to right with alignment="horizontal" or bottom to top
with alignment="vertical. The function call will stop with
an error message if the number of rows does not match the number of bars
and also if the number of values does not match the number of bar
segments (or value labels).
svg <- stackedBar(svg = svg,frame_name = "frame",group_name = "overall",
scale_real = c(0,160),values = c(9.97,42.42,105.71),
alignment = "vertical",has_labels = TRUE,label_position = "end",
decimals = 0,display_limits = 10)
df.subgroups <- matrix(1:9*8,nrow=3)
svg <- stackedBar(svg = svg,frame_name = "frame",group_name = "subgroups",
scale_real = c(0,160),values = df.subgroups,
alignment = "vertical",display_limits = 10)
display_svg(svg = svg)
write_svg(svg = svg,file = "images/fig3_values.svg",
remove_hidden = FALSE,flatten = TRUE)
The program code above will ultimately display the following chart and also save it to a file:
The first function call to stackedBar in the example
above has set every argument there is for this function.
has_labels=TRUE and decimals=0 are actually
default values, which is why things work the same in the second function
call. While the meaning of has_labels=TRUE is obvious, note
how values are rounded to the number of decimal digits desired, so that
‘105.71’ becomes ‘106’ in the chart. It is possible to set the rounding
of the labels to rounding away from zero by
options("svgtools.roundAwayFromZero" = TRUE) such that
‘106.5’ becomes ‘107’ (default: ‘106’).
label_position="end" puts value labels to the top of the
bar segments in vertical aligment and to the right side (or left side
for negative values) in horizontal alignment. The default setting used
in the second function call puts value labels in the center of the bar
segments. Argument display_limits is used to suppress value
labes in a range around zero. If only one number is provided it refers
to the absolute value. In the example the value ‘8’ of the category A in
group 1 is not shown any more because it is lower than 10. Note that
this is evaluated for the exact value, not the rounded one. So ‘10.1’
would be visible (as ‘10’ with decimals=0) while ‘9.9’
would not.
Note that display options for value labels are rather limited in the
current version of svgtools. It is neither possible to set
the distance from the edge of bar segments with
label_position="start" or label_position="end"
nor will stackedBar change any coordinates or alignment of
texts that do not concern the x-axis with
alignment="horizontal" or y-axis with
alignment="vertical". This is why in the example ‘106’
(category C of overall bar) leans to the right: the text element was not
aligned to be centered in the SVG template (set XML attribute
‘text-anchor’ to “middle” beforehand, in order to do this).
The example code calls write_svg with otional arguments
(see here). With
remove_hidden=FALSE one will still have text elements for
the two value labels that are not displayed in the saved SVG file. And
flatten=TRUE leads to an SVG file without any groups (XML
elements ‘g’). All 60 graphical elements are stored directly beneath the
XML document node.
With the help of hidden bar segments and recalculated
values one could already produce a wide range of bar charts
using stackedBar. For convenience, svgtools
offers three special variants. They work the same as the general case
when it comes to naming SVG elements. The following sections describe
each one very briefly and focussing on their distinct usage of function
arguments.
In svgtools a reference bar is a stacked bar chart that
is aligned around a nullvalue. Bar segments up to a so-called reference
category (a certain column/position of values) are
positioned to the left (with alignment="horizontal") or to
the bottom (with alignment="vertical") while further
segments lie to the right or top.
The following figure shows a possible template. Note that it is not necessary to adjust bar segments and values (in horizontal direction) beforehand.
An example code for usage of referenceBar looks like
that:
values <- matrix(c(1,2,3,4,2,3,4,1,3,4,1,2,4,1,2,3,1,2,3,4)*10,
nrow = 5,byrow = TRUE)
svg <- referenceBar(svg = svg,frame_name = "frame",group_name = "group",
scale_real = c(-100,100),values = values,
reference = 2,nullvalue = 0)
In regard of the x-axis labels, scale_real has to
provide an interval with a range of 200 that includes the
nullvalue. So even though the percentages in the example
are of course positive, defining the range of values from -100 to 100 is
feasible. The nullvalue will most often be at zero, so this
is also set as the default value. reference=2 defines, that
the first two values in each bar will be represented to the left of
whereever the nullvalue lies, while further values (another two in the
example) will be shown to the right. The result looks like this:
Another specialized bar chart provides the ability to show difference
values in regard to a null value (usually zero). Bar segments, and
optionally value labels, of values lower than the null value are
positioned to the left (with alignment="horizontal") or to
the bottom (with alignment="vertical") while higher values
are represented to the right or to the top.
The following example makes use of the general behavior of
svgtools that elements related to NA values are simply
hidden from the chart: