missMethyl 1.28.0

- 1 Introduction
- 2 Reading data into R
- 3 Subset-quantile within array normalization (SWAN)
- 4 Filter out poor quality probes
- 5 Extracting \(\beta\) and M-values
- 6 Testing for differential methylation using limma
- 7 Removing unwanted variation when testing for differential methylation
- 8 Testing for differential variability (DiffVar)
- 9 Gene ontology analysis
- 10 Session information
- References

The *missMethyl* package contains functions to analyse
methylation data from Illumina’s HumanMethylation450 and MethylationEPIC
beadchip. These arrays are a cost-effective alternative to whole genome
bisulphite sequencing, and as such are widely used to profile DNA methylation.
Specifically, *missMethyl* contains functions to perform
SWAN normalisation (Maksimovic, Gordon, and Oshlack 2012),perform differential methylation analysis
using **RUVm** (Maksimovic et al. 2015), differential variability analysis
(Phipson and Oshlack 2014) and gene set enrichment analysis (Phipson, Maksimovic, and Oshlack 2016). As our lab’s
research into specialised analyses of these arrays continues we anticipate that
the package will be updated with new functions.

Raw data files are in IDAT format, which can be read into R using the
*minfi* package (Aryee et al. 2014). Statistical analyses are
usually performed on M-values, and \(\beta\) values are used for visualisation,
both of which can be extracted from `MethylSet`

data objects, which is a class
of object created by *minfi*. For detecting
differentially variable CpGs we recommend that the analysis is performed on
M-values. All analyses described here are performed at the CpG site level.

We will use the data in the *minfiData* package to
demonstrate the functions in *missMethyl*.
The example dataset has 6 samples across two slides. There are 3 cancer samples
and 3 normal sample. The sample information is in the targets file. An
essential column in the targets file is the `Basename`

column which tells us
where the idat files to be read in are located. The R commands to read in the
data are taken from the *minfi* User’s Guide. For
additional details on how to read the IDAT files into R, as well as information
regarding quality control please refer to the *minfi*
User’s Guide.

```
library(missMethyl)
library(limma)
library(minfi)
library(minfiData)
```

```
baseDir <- system.file("extdata", package = "minfiData")
targets <- read.metharray.sheet(baseDir)
```

`## [1] "/home/biocbuild/bbs-3.14-bioc/R/library/minfiData/extdata/SampleSheet.csv"`

`targets[,1:9]`

```
## Sample_Name Sample_Well Sample_Plate Sample_Group Pool_ID person age sex
## 1 GroupA_3 H5 <NA> GroupA <NA> id3 83 M
## 2 GroupA_2 D5 <NA> GroupA <NA> id2 58 F
## 3 GroupB_3 C6 <NA> GroupB <NA> id3 83 M
## 4 GroupB_1 F7 <NA> GroupB <NA> id1 75 F
## 5 GroupA_1 G7 <NA> GroupA <NA> id1 75 F
## 6 GroupB_2 H7 <NA> GroupB <NA> id2 58 F
## status
## 1 normal
## 2 normal
## 3 cancer
## 4 cancer
## 5 normal
## 6 cancer
```

`targets[,10:12]`

```
## Array Slide
## 1 R02C02 5723646052
## 2 R04C01 5723646052
## 3 R05C02 5723646052
## 4 R04C02 5723646053
## 5 R05C02 5723646053
## 6 R06C02 5723646053
## Basename
## 1 /home/biocbuild/bbs-3.14-bioc/R/library/minfiData/extdata/5723646052/5723646052_R02C02
## 2 /home/biocbuild/bbs-3.14-bioc/R/library/minfiData/extdata/5723646052/5723646052_R04C01
## 3 /home/biocbuild/bbs-3.14-bioc/R/library/minfiData/extdata/5723646052/5723646052_R05C02
## 4 /home/biocbuild/bbs-3.14-bioc/R/library/minfiData/extdata/5723646053/5723646053_R04C02
## 5 /home/biocbuild/bbs-3.14-bioc/R/library/minfiData/extdata/5723646053/5723646053_R05C02
## 6 /home/biocbuild/bbs-3.14-bioc/R/library/minfiData/extdata/5723646053/5723646053_R06C02
```

`rgSet <- read.metharray.exp(targets = targets)`

```
## Warning in readChar(con, nchars = n): truncating string with embedded nuls
## Warning in readChar(con, nchars = n): truncating string with embedded nuls
## Warning in readChar(con, nchars = n): truncating string with embedded nuls
## Warning in readChar(con, nchars = n): truncating string with embedded nuls
## Warning in readChar(con, nchars = n): truncating string with embedded nuls
## Warning in readChar(con, nchars = n): truncating string with embedded nuls
## Warning in readChar(con, nchars = n): truncating string with embedded nuls
## Warning in readChar(con, nchars = n): truncating string with embedded nuls
## Warning in readChar(con, nchars = n): truncating string with embedded nuls
## Warning in readChar(con, nchars = n): truncating string with embedded nuls
## Warning in readChar(con, nchars = n): truncating string with embedded nuls
## Warning in readChar(con, nchars = n): truncating string with embedded nuls
```

The data is now an `RGChannelSet`

object and needs to be normalised and
converted to a `MethylSet`

object.

SWAN (subset-quantile within array normalization) is a within-array normalization method for Illumina 450k & EPIC BeadChips. Technical differencs have been demonstrated to exist between the Infinium I and Infinium II assays on a single Illumina HumanMethylation array (Bibikova et al. 2011, @Dedeurwaerder2011). Using the SWAN method substantially reduces the technical variability between the assay designs whilst maintaining important biological differences. The SWAN method makes the assumption that the number of CpGs within the 50bp probe sequence reflects the underlying biology of the region being interrogated. Hence, the overall distribution of intensities of probes with the same number of CpGs in the probe body should be the same regardless of assay type. The method then uses a subset quantile normalization approach to adjust the intensities of each array (Maksimovic, Gordon, and Oshlack 2012).

`SWAN`

can take a `MethylSet`

, `RGChannelSet`

or `MethyLumiSet`

as input. It
should be noted that, in order to create the normalization subset, `SWAN`

randomly selects Infinium I and II probes that have one, two and three
underlying CpGs; as such, we recommend using `set.seed`

before to ensure that
the normalized intensities will be
identical, if the normalization is repeated.

The technical differences between Infinium I and II assay designs can result in aberrant \(\beta\) value distributions (Figure 1, panel “Raw”). Using SWAN corrects for the technical differences between the Infinium I and II assay designs and produces a smoother overall \(\beta\) value distribution (Figure 1, panel “SWAN”).

`mSet <- preprocessRaw(rgSet)`

`mSetSw <- SWAN(mSet,verbose=TRUE)`

```
## [SWAN] Preparing normalization subset
## 450k
## [SWAN] Normalizing methylated channel
## [SWAN] Normalizing array 1 of 6
## [SWAN] Normalizing array 2 of 6
## [SWAN] Normalizing array 3 of 6
## [SWAN] Normalizing array 4 of 6
## [SWAN] Normalizing array 5 of 6
## [SWAN] Normalizing array 6 of 6
## [SWAN] Normalizing unmethylated channel
## [SWAN] Normalizing array 1 of 6
## [SWAN] Normalizing array 2 of 6
## [SWAN] Normalizing array 3 of 6
## [SWAN] Normalizing array 4 of 6
## [SWAN] Normalizing array 5 of 6
## [SWAN] Normalizing array 6 of 6
```

```
par(mfrow=c(1,2), cex=1.25)
densityByProbeType(mSet[,1], main = "Raw")
densityByProbeType(mSetSw[,1], main = "SWAN")
```

Poor quality probes can be filtered out based on the detection p-value. For this example, to retain a CpG for further analysis, we require that the detection p-value is less than 0.01 in all samples.

```
detP <- detectionP(rgSet)
keep <- rowSums(detP < 0.01) == ncol(rgSet)
mSetSw <- mSetSw[keep,]
```

Now that the data has been `SWAN`

normalised we can extract \(\beta\) and
M-values from the object. We prefer to add an offset to the methylated
and unmethylated intensities when calculating M-values, hence we extract
the methylated and unmethylated channels separately and perform our own
calculation. For all subsequent analyses we use a random selection of
20000 CpGs to reduce computation time.

```
set.seed(10)
mset_reduced <- mSetSw[sample(1:nrow(mSetSw), 20000),]
meth <- getMeth(mset_reduced)
unmeth <- getUnmeth(mset_reduced)
Mval <- log2((meth + 100)/(unmeth + 100))
beta <- getBeta(mset_reduced)
dim(Mval)
```

`## [1] 20000 6`

```
par(mfrow=c(1,1))
plotMDS(Mval, labels=targets$Sample_Name, col=as.integer(factor(targets$status)))
legend("topleft",legend=c("Cancer","Normal"),pch=16,cex=1.2,col=1:2)
```

An MDS plot (Figure 2) is a good sanity check to make sure samples cluster together according to the main factor of interest, in this case, cancer and normal.

To test for differential methylation we use the *limma*
package (Smyth 2005), which employs an empirical Bayes framework based on
Guassian model theory. First we need to set up the design matrix.
There are a number of ways to do this, the most straightforward is directly from
the targets file. There are a number of variables, with the `status`

column
indicating **cancer/normal** samples. From the `person`

column of the targets
file, we see that the **cancer/normal** samples are matched, with 3 individuals
each contributing both a **cancer** and **normal** sample. Since the
*limma* model framework can handle any experimental
design which can be summarised by a design matrix, we can take into account the
paired nature of the data in the analysis. For more complicated experimental
designs, please refer to the *limma* User’s Guide.

```
group <- factor(targets$status,levels=c("normal","cancer"))
id <- factor(targets$person)
design <- model.matrix(~id + group)
design
```

```
## (Intercept) idid2 idid3 groupcancer
## 1 1 0 1 0
## 2 1 1 0 0
## 3 1 0 1 1
## 4 1 0 0 1
## 5 1 0 0 0
## 6 1 1 0 1
## attr(,"assign")
## [1] 0 1 1 2
## attr(,"contrasts")
## attr(,"contrasts")$id
## [1] "contr.treatment"
##
## attr(,"contrasts")$group
## [1] "contr.treatment"
```

Now we can test for differential methylation using the `lmFit`

and `eBayes`

functions from *limma*. As input data we use the matrix of
M-values.

```
fit.reduced <- lmFit(Mval,design)
fit.reduced <- eBayes(fit.reduced, robust=TRUE)
```

The numbers of hyper-methylated (1) and hypo-methylated (-1) can be
displayed using the `decideTests`

function in *limma*
and the top 10 differentially methylated CpGs for *cancer* versus *normal*
extracted using `topTable`

.

`summary(decideTests(fit.reduced))`

```
## (Intercept) idid2 idid3 groupcancer
## Down 6969 0 117 563
## NotSig 3370 20000 19878 18966
## Up 9661 0 5 471
```

```
top<-topTable(fit.reduced,coef=4)
top
```

```
## logFC AveExpr t P.Value adj.P.Val B
## cg16969623 4.719302 -1.3338675 18.23269 5.249555e-06 0.03007819 4.423187
## cg24115221 4.280178 -0.4460866 16.40463 9.139024e-06 0.03007819 4.058844
## cg13692446 4.343436 0.3083504 15.71934 1.142652e-05 0.03007819 3.902666
## cg11334818 4.031210 0.9360842 15.35166 1.293182e-05 0.03007819 3.813871
## cg10341556 4.998918 -1.5750732 15.37904 1.395473e-05 0.03007819 3.745930
## cg17815252 3.692069 -2.3613989 14.70146 1.621089e-05 0.03007819 3.647604
## cg24331301 3.842903 -1.4291982 14.47981 1.754761e-05 0.03007819 3.588070
## cg26976732 3.846213 -0.9302984 14.38523 1.815744e-05 0.03007819 3.562204
## cg26328335 3.713983 0.3584330 14.13234 1.991549e-05 0.03007819 3.491644
## cg05621343 4.270335 -0.5591089 14.09695 2.073199e-05 0.03007819 3.457933
```

Note that since we performed our analysis on M-values, the `logFC`

and
`AveExpr`

columns are computed on the M-value scale. For interpretability
and visualisation we can look at the \(\beta\) values. The \(\beta\) values for
the top 4 differentially methylated CpGs shown in Figure 3.

```
cpgs <- rownames(top)
par(mfrow=c(2,2))
for(i in 1:4){
stripchart(beta[rownames(beta)==cpgs[i],]~design[,4],method="jitter",
group.names=c("Normal","Cancer"),pch=16,cex=1.5,col=c(4,2),ylab="Beta values",
vertical=TRUE,cex.axis=1.5,cex.lab=1.5)
title(cpgs[i],cex.main=1.5)
}
```

Like other platforms, 450k array studies are subject to unwanted technical variation such as batch effects and other, often unknown, sources of variation. The adverse effects of unwanted variation have been extensively documented in gene expression array studies and have been shown to be able to both reduce power to detect true differences and to increase the number of false discoveries. As such, when it is apparent that data is significantly affected by unwanted variation, it is advisable to perform an adjustment to mitigate its effects.

*missMethyl* provides a *limma*
inspired interface to functions from the CRAN package
*ruv*, which enable the removal of unwanted variation
when performing a differential methylation analysis (Maksimovic et al. 2015).

`RUVfit`

uses the *RUV-inverse* method by default, as this does not require the
user to specify a \(k\) parameter. The ridged version of *RUV-inverse* is also
available by setting `method = rinv`

. The *RUV-2* and *RUV-4* functions can also
be used by setting `method = ruv2`

or `method = ruv4`

, respectively, and
specifying an appropriate value for *k* (number of components of unwanted
variation to remove) where \(0 \leq k < no. samples\).

All of the methods rely on negative control features
to accurately estimate the components of unwanted variation. Negative
control features are probes/genes/etc. that are known *a priori* to not
truly be associated with the biological factor of interest, but are
affected by unwanted variation. For example, in a microarray gene
expression study, these could be house-keeping genes or a set of
spike-in controls. Negative control features are extensively discussed
in Gagnon-Bartsch and Speed (2012) and Gagnon-Bartsch et al.
(2013). Once the unwanted factors are accurately estimated from
the data, they are adjusted for in the linear model that describes the
differential analysis.

If the negative control features are not known *a priori*, they can be
identified empirically. This can be achieved via a 2-stage approach,
**RUVm**. Stage 1 involves performing a differential methylation analysis using
*RUV-inverse* (by default) and the 613 Illumina negative controls (INCs) as
negative control features. This will produce a list of CpGs ranked by p-value
according to their level of association with the factor of interest.
This list can then be used to identify a set of empirical control probes (ECPs),
which will capture more of the unwanted variation than using the INCs alone.
ECPs are selected by designating a proportion of the CpGs least associated with
the factor of interest as negative control features; this can be done
based on either an FDR cut-off or by taking a fixed percentage of probes
from the bottom of the ranked list. Stage 2 involves performing a second
differential methylation analysis on the original data using *RUV-inverse*
(by default) and the ECPs. For simplicity, we are ignoring the paired
nature of the **cancer** and **normal** samples in this example.

```
# get M-values for ALL probes
meth <- getMeth(mSet)
unmeth <- getUnmeth(mSet)
M <- log2((meth + 100)/(unmeth + 100))
# setup the factor of interest
grp <- factor(targets$status, labels=c(0,1))
# extract Illumina negative control data
INCs <- getINCs(rgSet)
head(INCs)
```

```
## 5723646052_R02C02 5723646052_R04C01 5723646052_R05C02
## 13792480 -0.3299654 -1.0955482 -0.5266103
## 69649505 -1.0354488 -1.4943396 -1.0067050
## 34772371 -1.1286422 -0.2995603 -0.8192636
## 28715352 -0.5553373 -0.7599489 -0.7186973
## 74737439 -1.1169178 -0.8656399 -0.6429681
## 33730459 -0.7714684 -0.5622424 -0.7724825
## 5723646053_R04C02 5723646053_R05C02 5723646053_R06C02
## 13792480 -0.6374299 -1.116598 -0.4332793
## 69649505 -0.8854881 -1.586679 -0.9217329
## 34772371 -0.6895514 -1.161155 -0.6186795
## 28715352 -1.7903619 -1.348105 -1.0067259
## 74737439 -0.8872082 -1.064986 -0.9841833
## 33730459 -1.5623138 -2.079184 -1.0445246
```

```
# add negative control data to M-values
Mc <- rbind(M,INCs)
# create vector marking negative controls in data matrix
ctl1 <- rownames(Mc) %in% rownames(INCs)
table(ctl1)
```

```
## ctl1
## FALSE TRUE
## 485512 613
```

```
rfit1 <- RUVfit(Y = Mc, X = grp, ctl = ctl1) # Stage 1 analysis
rfit2 <- RUVadj(Y = Mc, fit = rfit1)
```

Now that we have performed an initial differential methylation analysis to rank the CpGs with respect to their association with the factor of interest, we can designate the CpGs that are least associated with the factor of interest based on FDR-adjusted p-value as ECPs.

```
top1 <- topRUV(rfit2, num=Inf, p.BH = 1)
head(top1)
```

```
## F.p F.p.BH p_X1.1 p.BH_X1.1 b_X1.1 sigma2
## cg04743961 3.516091e-07 0.01017357 3.516091e-07 0.01017357 -4.838190 0.10749571
## cg07155336 3.583107e-07 0.01017357 3.583107e-07 0.01017357 -5.887409 0.16002329
## cg20925841 3.730375e-07 0.01017357 3.730375e-07 0.01017357 -4.790211 0.10714494
## cg03607359 4.721205e-07 0.01017357 4.721205e-07 0.01017357 -4.394397 0.09636129
## cg10566121 5.238865e-07 0.01017357 5.238865e-07 0.01017357 -4.787914 0.11780108
## cg07655636 6.080091e-07 0.01017357 6.080091e-07 0.01017357 -4.571758 0.11201883
## var.b_X1.1 fit.ctl mean
## cg04743961 0.07729156 FALSE -2.31731496
## cg07155336 0.11505994 FALSE -1.27676413
## cg20925841 0.07703935 FALSE -0.87168892
## cg03607359 0.06928569 FALSE -2.19187147
## cg10566121 0.08470133 FALSE 0.03138961
## cg07655636 0.08054378 FALSE -1.29294851
```

```
ctl2 <- rownames(M) %in% rownames(top1[top1$p.BH_X1.1 > 0.5,])
table(ctl2)
```

```
## ctl2
## FALSE TRUE
## 172540 312972
```

We can then use the ECPs to perform a second differential methylation
with *RUV-inverse*, which is adjusted for the unwanted variation
estimated from the data.

```
# Perform RUV adjustment and fit
rfit3 <- RUVfit(Y = M, X = grp, ctl = ctl2) # Stage 2 analysis
rfit4 <- RUVadj(Y = M, fit = rfit3)
# Look at table of top results
topRUV(rfit4)
```

```
## F.p F.p.BH p_X1.1 p.BH_X1.1 b_X1.1 sigma2 var.b_X1.1 fit.ctl
## cg07155336 1e-24 1e-24 1e-24 1e-24 -5.769286 0.1668414 0.1349771 FALSE
## cg06463958 1e-24 1e-24 1e-24 1e-24 -5.733093 0.1668414 0.1349771 FALSE
## cg00024472 1e-24 1e-24 1e-24 1e-24 -5.662959 0.1668414 0.1349771 FALSE
## cg02040433 1e-24 1e-24 1e-24 1e-24 -5.651399 0.1668414 0.1349771 FALSE
## cg13355248 1e-24 1e-24 1e-24 1e-24 -5.595396 0.1668414 0.1349771 FALSE
## cg02467990 1e-24 1e-24 1e-24 1e-24 -5.592707 0.1668414 0.1349771 FALSE
## cg00817367 1e-24 1e-24 1e-24 1e-24 -5.527501 0.1668414 0.1349771 FALSE
## cg11396157 1e-24 1e-24 1e-24 1e-24 -5.487992 0.1668414 0.1349771 FALSE
## cg16306898 1e-24 1e-24 1e-24 1e-24 -5.466780 0.1668414 0.1349771 FALSE
## cg03735888 1e-24 1e-24 1e-24 1e-24 -5.396242 0.1668414 0.1349771 FALSE
## mean
## cg07155336 -1.2767641
## cg06463958 0.2776252
## cg00024472 -2.4445762
## cg02040433 -1.2918259
## cg13355248 -0.8483387
## cg02467990 -0.4154370
## cg00817367 -0.2911294
## cg11396157 -1.3800170
## cg16306898 -2.1469768
## cg03735888 -1.1527557
```

If the number of samples in your experiment is *greater* than the number of
Illumina negative controls on the array platform used - 613 for 450k, 411 for
EPIC - stage 1 of **RUVm** will not work. In such cases, we recommend performing
a standard *limma* analysis in stage 1.

```
# setup design matrix
des <- model.matrix(~grp)
des
```

```
## (Intercept) grp1
## 1 1 1
## 2 1 1
## 3 1 0
## 4 1 0
## 5 1 1
## 6 1 0
## attr(,"assign")
## [1] 0 1
## attr(,"contrasts")
## attr(,"contrasts")$grp
## [1] "contr.treatment"
```

```
# limma differential methylation analysis
lfit1 <- lmFit(M, design=des)
lfit2 <- eBayes(lfit1) # Stage 1 analysis
# Look at table of top results
topTable(lfit2)
```

`## Removing intercept from test coefficients`

```
## logFC AveExpr t P.Value adj.P.Val B
## cg07155336 -6.037439 -1.276764 -19.22210 1.175108e-07 0.005755968 7.635736
## cg04743961 -4.887986 -2.317315 -19.21709 1.177367e-07 0.005755968 7.634494
## cg03607359 -4.393946 -2.191871 -18.07007 1.852304e-07 0.005755968 7.334032
## cg13272280 -4.559707 -2.099665 -17.25531 2.599766e-07 0.005755968 7.099628
## cg22263007 -4.438420 -1.010994 -17.12384 2.749857e-07 0.005755968 7.060036
## cg03556069 -5.456754 -1.811718 -17.00720 2.891269e-07 0.005755968 7.024476
## cg08443814 -4.597347 -2.062275 -16.80835 3.151706e-07 0.005755968 6.962907
## cg18672939 -5.159383 -0.705992 -16.65643 3.368597e-07 0.005755968 6.915046
## cg24385334 -4.157473 -1.943370 -16.59313 3.463909e-07 0.005755968 6.894890
## cg18044663 -4.426118 -1.197724 -16.57851 3.486357e-07 0.005755968 6.890216
```

The results of this can then be used to define ECPs for stage 2, as in the previous example.

`topl1 <- topTable(lfit2, num=Inf)`

`## Removing intercept from test coefficients`

`head(topl1)`

```
## logFC AveExpr t P.Value adj.P.Val B
## cg07155336 -6.037439 -1.276764 -19.22210 1.175108e-07 0.005755968 7.635736
## cg04743961 -4.887986 -2.317315 -19.21709 1.177367e-07 0.005755968 7.634494
## cg03607359 -4.393946 -2.191871 -18.07007 1.852304e-07 0.005755968 7.334032
## cg13272280 -4.559707 -2.099665 -17.25531 2.599766e-07 0.005755968 7.099628
## cg22263007 -4.438420 -1.010994 -17.12384 2.749857e-07 0.005755968 7.060036
## cg03556069 -5.456754 -1.811718 -17.00720 2.891269e-07 0.005755968 7.024476
```

```
ctl3 <- rownames(M) %in% rownames(topl1[topl1$adj.P.Val > 0.5,])
table(ctl3)
```

```
## ctl3
## FALSE TRUE
## 199150 286362
```

We can then use the ECPs to perform a second differential methylation
with `RUV-inverse`

as before.

```
# Perform RUV adjustment and fit
rfit5 <- RUVfit(Y = M, X = grp, ctl = ctl3) # Stage 2 analysis
rfit6 <- RUVadj(Y = M, fit = rfit5)
# Look at table of top results
topRUV(rfit6)
```

```
## F.p F.p.BH p_X1.1 p.BH_X1.1 b_X1.1 sigma2 var.b_X1.1 fit.ctl
## cg06463958 1e-24 1e-24 1e-24 1e-24 -5.910598 0.1667397 0.1170589 FALSE
## cg07155336 1e-24 1e-24 1e-24 1e-24 -5.909549 0.1667397 0.1170589 FALSE
## cg02467990 1e-24 1e-24 1e-24 1e-24 -5.841079 0.1667397 0.1170589 FALSE
## cg00024472 1e-24 1e-24 1e-24 1e-24 -5.823529 0.1667397 0.1170589 FALSE
## cg01893212 1e-24 1e-24 1e-24 1e-24 -5.699627 0.1667397 0.1170589 FALSE
## cg11396157 1e-24 1e-24 1e-24 1e-24 -5.699331 0.1667397 0.1170589 FALSE
## cg13355248 1e-24 1e-24 1e-24 1e-24 -5.658606 0.1667397 0.1170589 FALSE
## cg00817367 1e-24 1e-24 1e-24 1e-24 -5.649284 0.1667397 0.1170589 FALSE
## cg16306898 1e-24 1e-24 1e-24 1e-24 -5.610118 0.1667397 0.1170589 FALSE
## cg16556906 1e-24 1e-24 1e-24 1e-24 -5.567659 0.1667397 0.1170589 FALSE
## mean
## cg06463958 0.27762518
## cg07155336 -1.27676413
## cg02467990 -0.41543703
## cg00024472 -2.44457624
## cg01893212 -0.08273355
## cg11396157 -1.38001701
## cg13355248 -0.84833866
## cg00817367 -0.29112939
## cg16306898 -2.14697683
## cg16556906 -0.96821744
```

To visualise the effect that the **RUVm** adjustment is having on the data,
using an MDS plot for example, the `getAdj`

function can be used to extract
the adjusted values from the **RUVm** fit object produced by `RUVfit`

.
NOTE: The adjusted values should only be used for visualisations - it is NOT
recommended that they are used in any downstream analysis.

`Madj <- getAdj(M, rfit5) # get adjusted values`

The MDS plots below show how the relationship between the samples changes with and
without **RUVm** adjustment. **RUVm** reduces the distance between the samples in
each group by removing unwanted variation. It can be useful to examine this
type of plot when trying to decide on the best set of ECPs or to help select the
optimal value of \(k\), if using *RUV-4* or *RUV-2*.

```
par(mfrow=c(1,2))
plotMDS(M, labels=targets$Sample_Name, col=as.integer(factor(targets$status)),
main="Unadjusted", gene.selection = "common")
legend("right",legend=c("Cancer","Normal"),pch=16,cex=1,col=1:2)
plotMDS(Madj, labels=targets$Sample_Name, col=as.integer(factor(targets$status)),
main="Adjusted: RUV-inverse", gene.selection = "common")
```

```
## Warning in plotMDS.default(Madj, labels = targets$Sample_Name, col =
## as.integer(factor(targets$status)), : dimension 2 is degenerate or all zero
```

`legend("topright",legend=c("Cancer","Normal"),pch=16,cex=1,col=1:2)`