This tutorial demonstrates how to use the `SEAGLE`

package when the user inputs a matrix \({\bf G}\). We’ll begin by loading the `SEAGLE`

package.

`library(SEAGLE)`

As an example, we’ll generate some synthetic data for usage in this tutorial. Let’s consider a dataset with \(n=5000\) individuals and \(L=100\) loci, where the first \(40\) are causal.

The `makeSimData`

function generates a covariate matrix \(\widetilde{\bf X} \in \mathbb{R}^{n \times 3}\), where the first column is the all ones vector for the intercept and the second and third columns are \({\bf X} \sim \text{N}(0,1)\) and \({\bf E} \sim \text{N}(0,1)\), respectively. The last two columns are scaled to have \(0\) mean and unit variance.

The `makeSimData`

function additionally generates the genetic marker matrix \({\bf G}\) with synthetic haplotype data from the COSI software. Detailed procedures for generating \({\bf G}\) can be found in the accompanying journal manuscript. Finally, the `makeSimData`

function also generates a continuous phenotype \({\bf y}\) according to the following fixed effects model \[
{\bf y} = \tilde{\bf X} \boldsymbol{\gamma}_{\widetilde{\bf X}} + {\bf G}\boldsymbol{\gamma}_{G} + \text{diag}(E){\bf G}\boldsymbol{\gamma}_{GE} + {\bf e}.
\] Here, \(\boldsymbol{\gamma}_{\tilde{\bf X}}\) is the all ones vector of length \(P=3\), \(\boldsymbol{\gamma}_{G} \in \mathbb{R}^{L}\), \(\boldsymbol{\gamma}_{GE}\in \mathbb{R}^{L}\), and \({\bf e} \sim \text{N}({\bf 0}, \sigma\, {\bf I}_{n})\). The entries of \(\boldsymbol{\gamma}_{G}\) and \(\boldsymbol{\gamma}_{GE}\) pertaining to causal loci are set to be \(\gamma_{G}\) = `gammaG`

and \(\gamma_{GE}\) = `gammaGE`

, respectively. The remaining entries of \(\boldsymbol{\gamma}_{G}\) and \(\boldsymbol{\gamma}_{GE}\) pertaining to non-causal loci are set to \(0\).

`<- makeSimData(H=cosihap, n=5000, L=100, gammaG=1, gammaGE=0, causal=40, seed=1) dat `

Now that we have our data, we can prepare it for use in the SEAGLE algorithm. We will input our \({\bf y}\), \({\bf X}\), \({\bf E}\), and \({\bf G}\) into the `prep.SEAGLE`

function. The `intercept = 1`

parameter indicates that the first column of \({\bf X}\) is the all ones vector for the intercept.

This preparation procedure formats the input data for the `SEAGLE`

function by checking the dimensions of the input data. It also pre-computes a QR decomposition for \(\widetilde{\bf X} = \begin{pmatrix} {\bf 1}_{n} & {\bf X} & {\bf E} \end{pmatrix}\), where \({\bf 1}_{n}\) denotes the all ones vector of length \(n\).

`<- prep.SEAGLE(y=dat$y, X=dat$X, intercept=1, E=dat$E, G=dat$G) objSEAGLE `

Finally, we’ll input the prepared data into the `SEAGLE`

function to compute the score-like test statistic \(T\) and its corresponding p-value. The `init.tau`

and `init.sigma`

parameters are the initial values for \(\tau\) and \(\sigma\) employed in the REML EM algorithm.

```
<- SEAGLE(objSEAGLE, init.tau=0.5, init.sigma=0.5)
res $T
res#> [1] 246.1886
$pv
res#> [1] 0.8441451
```

The score-like test statistic \(T\) for the G\(\times\)E effect and its corresponding p-value can be found in `res$T`

and `res$pv`

, respectively.