2. The GGIR pipeline

The GGIR function

GGIR comes with a large number of functions and optional settings (function arguments also referred to as parameters) that together form a processing pipeline. To ease interacting with GGIR, it has one function to act as interface to all functionality. This function is also named GGIR. You only need to learn how to work with this central function, but it is important to understand that this function interacts with all other functions in GGIR.

Further, it is important to understand that the GGIR package is structured in two complementary ways:

Parts of the pipeline

GGIR, has a computational structure of six parts which are applied sequentially to the data:

Milestone data

Each part, when run, stores its intermediate data, which we refer to as the milestone data, in an .RData file. This .RData file is then read by the next part. Therefore, there is no direct interaction between the parts. All exchange of information works via the .RData files. The advantage of this design is that it offers internal modularity to ease re-processing and code development.

The .RData files are stored in sub-folders of the output folder:

  • Part 1: meta/basic
  • Part 2: meta/ms2.out
  • Part 3: meta/ms3.out
  • Part 4: meta/ms4.out
  • Part 5: meta/ms5.out
  • Part 6: meta/ms6.out

As user, you are unlikely to ever need to interact with these milestone data because all relevant output for you is stored in csv and pdf files in the output folder results. However, it can be useful to know where the milestone data are stored for the following reasons:

  • By copying these .RData files to a new computer, you may continue your analyses without access to the original data files. This can for example be helpful if you process subsets of the same study on different computers, pooling the resulting milestone data allows you to finalise your analysis on a single computer.
  • When you run into a problem, these .RData files may allow you to share the problem as a reproducible example of the problem without having to share the original data file.

Parameter themes

The arguments of all GGIR package functions can be used as input argument to the GGIR function. We will refer to these arguments as input parameters in this documentation. The parameters are structured thematically, independently of the five parts they are used in:

There are a couple of ways to inspect the parameters in each parameter category and their default values:

GGIR function documentation:


or visit the .pdf manual and search for function GGIR.

Parameters vignette:

Documentation for all parameters in the parameter objects can be found in the vignette: GGIR configuration parameters.

From R command line:


If you are only interested in one specific category like sleep:


If you are only interested in e.g. parameter HASIB.algo from the

sleep_params object:


All of these parameters are accepted as parameter to function GGIR, because GGIR is a shell around all GGIR functionality. However, the params_ objects themselves cannot be provided as input to GGIR.

Configuration file.

GGIR stores all the parameter values as used in a csv-file named config.csv. The file is stored after each run of GGIR in the root of the output folder, overwriting any existing config.csv file. So, if you would like to add annotations in the file, e.g. in the fourth column, then you will need to store it somewhere outside the output folder and explicitly specify the path with parameter configfile. Further, the config.csv file itself is accepted as input to GGIR with parameter configfile to replace the specification of all the parameters, except datadir and outputdir, see example below.

GGIR(datadir = "C:/mystudy/mydata",
     outputdir = "D:/myresults", 
     configfile = "D:/myconfigfiles/config.csv")

The practical value of this is that it eases the replication of the analysis, because instead of having to share you R script with colleagues, sharing your config.csv file will be sufficient.

Parameter extraction order

If parameters are provided in the GGIR call, then the function always uses those. If parameters are not provided in the function call, GGIR checks whether there is a config.csv file either in the output folder or specified via parameter configfile and loads those values. If a parameter is neither specified in the GGIR function call nor available in the config.csv file, then GGIR will use its default value,s which can be inspected as discussed in the section above. Here, it is important to realise that a consequence of this logic is that GGIR will not revert to its default parameter values in a repeated analysis unless you remove the parameter from the function call and delete the config.csv file.

To ensure this is clear here are some example:

  • If GGIR is used for the first time without specifying parameter mvpathreshold, then it will use the default value, which is 100.
  • If you specify mvpathreshold = 120, then GGIR will use this instead and store it in the config.csv file.
  • If you run GGIR again but this time delete mvpathreshold = 120 from your GGIR call, then GGIR will fall back on the value 120 as now stored in the config.csv file.
  • If you delete the config.csv file and run GGIR again, then the value 100 will be used again.

Input files

Raw data

  1. GGIR currently works with the following accelerometer brands and formats:

Externally derived epoch-level data

By default GGIR assumes that the data is raw as discussed at the start of this chapter. However, for some studies raw data is not available and all we have is an epoch level aggregate. For example, done by external software or done inside the accelerometer device. Although this can introduce some severe limitations to the transparency and flexibility of our analysis, GGIR makes an attempt to facilitate the analysis of these aggregations of the raw data. Please find below an overview of the file format currently facilitated:

Note: For Actiwatch and ActiGraph, physical activity description and sleep classification needs to be tailored to count-specific algorithms: do.neishabouricounts = TRUE, acc.metric = "NeishabouriCount_x", HASPT.algo = "NotWorn" HASIB.algo = "NotWorn"; Note 2: For UK Biobank csv epoch data, GGIR does not facilitate sleep analysis as arm angle is not exported.
Sensor brand Data format Combine with parameter
Actiwatch .csv dataFormat = "actiwatch_csv" extEpochData_timeformat = "%m/%d/%Y %H:%M:%S"
Actiwatch .awd dataFormat = "actiwatch_awd" extEpochData_timeformat = "%m/%d/%Y %H:%M:%S"
ActiGraph .csv (count files, not be confused with csv raw data files) dataFormat = "actigraph_csv" extEpochData_timeformat = "%m/%d/%Y %H:%M:%S"
Axivity UKBiobank csv export of epoch level data dataFormat = "ukbiobank_csv"

To process these files, GGIR loads their content and saves it as GGIR part 1 milestone data, essentially fooling the rest of GGIR to think that GGIR part 1 created the data. As will be discussed in chapter 2, GGIR does non-wear detection in two steps: The first step is done in part 1 and the second step is done in part 2. In relation to externally derived epoch data non-wear is detected by looking for consecutive zeros of one hour (Actiwatch, ActiGraph) or derived from the file (UK Biobank csv).

  1. All accelerometer data that need to be analysed should be stored in one folder or subfolders of that folder. Make sure that the folder does not contain any files that are not accelerometer data.
  2. Choose an appropriate name for the folder, preferable with a reference to the study or project it is related to rather than just ‘data’, because the name of this folder will be used as an identifier of the dataset and integrated in the name of the output folder GGIR creates.

How to run your analysis?

The bare minimum input needed for GGIR is:

GGIR(datadir = "C:/mystudy/mydata",
     outputdir = "D:/myresults")

Argument datadir allows you to specify where you have stored your accelerometer data and outputdir allows you to specify where you would like the output of the analyses to be stored. This cannot be equal to datadir. If you copy and paste the above code to a new R script (file ending with .R) and Source it in R(Studio), then the dataset will be processed and the output will be stored in the specified output directory.

Next, we can add argument mode to tell GGIR which part(s) to run, e.g. mode = 1:5 tells GGIR to run all five parts and is the default. With argument overwrite, we can tell GGIR whether to overwrite previously produced milestone data or not.

Further, argument idloc tells GGIR where to find the participant ID. The default setting most likely does not work for most data formats, by which it is important that you tailor the value of this argument to your study setting.

GGIR stores some of its output csv file format with comma as the default column separator. However, this can be modified with argument sep_reports.

Each chapter in this digital book will highlight the key arguments related to specific topics.

From the R console on your own desktop/laptop

Create an R-script and put the GGIR call in it. Next, you can source the R-script with the source button in RStudio: source("pathtoscript/myshellscript.R")

GGIR by default supports multi-thread processing which is used to process one input file per process, speeding up the processing of your data. This can be turned off by setting argument do.parallel = FALSE.

In a cluster

If processing data with GGIR on your desktop/laptop is not fast enough, we advise using GGIR on a computing cluster. The way we did it on a Sun Grid Engine cluster is shown below. Please note that some of these commands are specific to the computing cluster you are working on. Please consult your local cluster specialist to explore how to run GGIR on your cluster. Here, we only share how we did it. We had three files for the SGE setting:


for i in {1..707}; do
    s=$(($(($n * $[$i-1]))+1))
    e=$(($i * $n))
    qsub /home/nvhv/WORKING_DATA/bashscripts/run-mainscript.sh $s $e


#! /bin/bash
#$ -cwd -V
#$ -l h_vmem=12G
/usr/bin/R --vanilla --args f0=$1 f1=$2 < /home/nvhv/WORKING_DATA/test/myshellscript.R


args = commandArgs(TRUE)
if(length(args) > 0) {
  for (i in 1:length(args)) {
    eval(parse(text = args[[i]]))

You will need to update the ... in the last line with the parameters you used for GGIR. Note that f0=f0,f1=f1 is essential for this to work. The values of f0 and f1 are passed on from the bash script. Once this is all setup, you will need to call bash submit.sh from the command line. With the help of computing clusters, GGIR has successfully been run on some of the world’s largest accelerometer datasets such as UK Biobank and the German NAKO study.

Processing time

The time to process a typical seven-day recording should be anywhere in between 3 and 10 minutes depending on the sample frequency of the recording, the sensor brand, data format, the exact configuration of GGIR, and the specifications of your computer. If you are observing processing times of 20 minutes or longer for a seven-day recording then probably you are slowed down by other factors.

Some tips on how you may be able to address this:

GGIR output

GGIR always creates an output folder in the location as specified with parameter outputdir. The output folder name is constructed as output_ followed by the name of the dataset which is derived from the most distal folder name of the data directory as specified with datadir. We recommend this approach because it ensures that the output folder and data directory have matching names. However, it is possible to use datadir to specify a vector with paths to individual files, which may helpful if you want to process a set of files but are not in a position to move them to a new folder. In that scenario, you will need to set parameter studyname to tell GGIR what the dataset name is.

Inside the output folder GGIR will create two subfolders: meta and results as discussed earlier in this chapter. Inside results you will find folder named QC (Quality Checks). The name QC (Quality Checks) is possibly somewhat confusing. Data quality checks are best started with the files stored in the results folder, where the files in the QC subfolder offer complementary information to help with the quality check.

GGIR generates reports in the parts 2, 4, 5, and 6 of the pipeline. With parameter do.report you can specify for which of these parts you want the reports to be generated. For example, do.report = c(2, 5) will only generate the report for parts 2 and 5. For a full GGIR analysis we expect at least the following output files:

Output files in results subfolder:

A detailed discussion of each output can be found in the other chapters.
File path Resolution GGIR part
part2_summary.csv one row per recording 2
part2_daysummary.csv one row per day 2
part4_summary_sleep_cleaned.csv one row per recording 4
part4_nightsummary_sleep_cleaned.csv one row per night 4
visualisation_sleep.pdf 40 recordings per page 4
part5_daysummary_WW_L40M100V400_T5A5.csv one row per day 5
part5_personsummary_WW_L40M100V400_T5A5.csv one row per recording 5

Output files in results/QC subfolder:

A detailed discussion of each output can be found in the other chapters.
File path Resolution GGIR part
data_quality_report.csv one row per recording 5
part4_summary_sleep_full.csv one row per recording 4
part4_nightsummary_sleep_full.csv one row per night 4
part5_daysummary_full_WW_L40M100V400_T5A5.csv one row per day 5
plots_to_check_data_quality_1.pdf one recording per page 2

Output files in meta/ms5.outraw subfolder:

A detailed discussion of each output can be found in the other chapters.
File path Resolution GGIR part
40_100_400/ one file with time series per recording
behavioralcodes2023-11-06.csv dictionary for behavioural codes as used in the time series file