Package 'MUVR2'

Title: Multivariate Methods with Unbiased Variable Selection
Description: Predictive multivariate modelling for metabolomics. Types: Classification and regression. Methods: Partial Least Squares, Random Forest ans Elastic Net Data structures: Paired and unpaired Validation: repeated double cross-validation (Westerhuis et al. (2008)<doi:10.1007/s11306-007-0099-6>, Filzmoser et al. (2009)<doi:10.1002/cem.1225>) Variable selection: Performed internally, through tuning in the inner cross-validation loop.
Authors: Carl Brunius [aut], Yingxiao Yan [aut, cre]
Maintainer: Yingxiao Yan <[email protected]>
License: GPL-3
Version: 0.1.0
Built: 2024-11-17 05:41:44 UTC
Source: https://github.com/metabocomp/muvr2

Help Index


PLS biplot

Description

Makes a biplot of a fitted object (e.g. from a MUVR with PLS core).

Usage

biplotPLS(
  fit,
  comps = 1:2,
  xCol,
  labPlSc = TRUE,
  labs,
  vars,
  labPlLo = TRUE,
  pchSc = 16,
  colSc,
  colLo = 2,
  supLeg = FALSE
)

Arguments

fit

A PLS fit (e.g. from MUVRclassObject$Fit[[2]])

comps

Which components to plot

xCol

(Optional) Continuous vector for grey scale gradient of observation (sample) color (e.g. Y vector in regression analysis)

labPlSc

Boolean to plot observation (sample) names (defaults to TRUE)

labs

(Optional) Label names

vars

Which variables to plot (names in rownames(loadings))

labPlLo

Boolean to plot variable names (defaults to TRUE)

pchSc

Plotting character for observation scores

colSc

Colors for observation scores (only if xCol omitted)

colLo

Colors for variable loadings (defaults to red)

supLeg

Boolean for whether to suppress legends

Value

A PLS biplot

Examples

data("freelive2")
nRep <- 2 # Number of MUVR2 repetitions
nOuter <- 3 # Number of outer cross-validation segments
varRatio <- 0.75 # Proportion of variables kept per iteration
method <- 'PLS' # Selected core modeling algorithm
regrModel <- MUVR2(X = XRVIP2,
                  Y = YR2,
                  nRep = nRep,
                  nOuter = nOuter,
                  method = method,
                  modReturn = TRUE)
biplotPLS(regrModel$Fit[[2]],
          comps = 1:2,
          xCol = YR2,
          labPlSc = FALSE,
          labPlLo = FALSE)

Check input

Description

This can be run to test if the command input of parameters contradict each other and check the structure of the data. If something goes wrong, warning messages are given.

Usage

checkinput(
  X,
  Y,
  ML,
  DA,
  method,
  fitness,
  nInner,
  nOuter,
  varRatio,
  scale,
  modReturn,
  logg,
  parallel
)

Arguments

X

The original data of X, not the result after onehotencoding

Y

The original data of Y

ML

ML in MUVR2

DA

DA in MUVR2

method

RF or PLS so far in MUVR2

fitness

fitness in MUVR2

nInner

nInnerin MUVR2

nOuter

nOuter in MUVR2

varRatio

varRatio in MUVR2

scale

scale

modReturn

modReturn in MUVR2

logg

logg in MUVR2

parallel

parallel in MUVR2

Value

correct_input: the original input(call) and the real input used in MUVR2 when you enter your input

Examples

data("freelive2")
checkinput(X = XRVIP2,
           Y = YR2,  ## YR2 a numeric variable
           DA = FALSE,
           fitness="RMSEP")

Confusion matrix

Description

Make a confusion matrix from a MUVR object.

Usage

confusionMatrix(MVObj, model = "mid")

Arguments

MVObj

A MUVR object (classification analysis)

model

min, mid or max model

Value

A confusion matrix of actual vs predicted class

Examples

data("mosquito")
data("crisp")
nRep <- 2 # Number of MUVR2 repetitions
nOuter <- 4 # Number of outer cross-validation segments
varRatio <- 0.6 # Proportion of variables kept per iteration
classModel <- MUVR2_EN(X = Xotu,
                      Y = Yotu,
                      nRep = nRep,
                      nOuter = nOuter,
                      DA = TRUE,
                      modReturn = TRUE)
confusionMatrix(classModel)
MLModel <- MUVR2(X = crispEM,
                 ML = TRUE,
                 nRep = nRep,
                 nOuter = nOuter,
                 varRatio = varRatio,
                 method = "RF",
                 modReturn = TRUE)
 confusionMatrix(MLModel)

Effect matrix for the crisp multilevel tutorial

Description

Effect matrix for the crisp multilevel tutorial

Usage

data(crisp)

Make custom parameters for internal modelling

Description

Make custom parameters for MUVR internal modelling, not rdCV. Please note that, at present, there is no mtryMax for the outer (consensus) loop in effect.

Usage

customParams(
  method = c("RF", "PLS", "SVM", "ANN"),
  robust = 0.05,
  ntreeIn = 150,
  ntreeOut = 300,
  mtryMaxIn = 150,
  compMax = 5,
  nodes = 200,
  threshold = 0.1,
  stepmax = 1e+08,
  neuralMaxIn = 10,
  kernel = "notkernel",
  nu = 0.1,
  gamma = 1,
  degree = 1,
  oneHot,
  NZV,
  rfMethod = c("randomForest", "ranger"),
  svmMethod = c("svm", "ksvm", "svmlight"),
  annMethod = c("nnet", "neuralnet")
)

Arguments

method

PLS or RF (default)

robust

Robustness (slack) criterion for determining min and max knees (defaults to 0.05)

ntreeIn

RF parameter: Number of trees in inner cross-validation loop models (defaults to 150)

ntreeOut

RF parameter: Number of trees in outer (consensus) cross-validation loop models (defaults to 300)

mtryMaxIn

RF parameter: Max number of variables to sample from at each node in the inner CV loop (defaults to 150). Will be further limited by standard RF rules (see randomForest documentation)

compMax

PLS parameter: Maximum number of PLS components (defaults to 5)

nodes

ann parameter:

threshold

ann parameter:

stepmax

ann parameter:

neuralMaxIn

ann parameter: Maximum number of ANN (defaults to 20)

kernel

svm parameter: kernal function to use, which includes sigmoid, radical, polynomial

nu

svm parameter: ratios of errors allowed in the training set range from 0-1

gamma

svm parameters: needed for "vanilladot","polydot","rbfdot" kernel in svm

degree

svm parameter: needed for polynomial kernel in svm

oneHot

TRUE or FALSE using onehot endcoding or not

NZV

TRUE or FALSE using non-zero variance or not

rfMethod

randomforest method, which includes randomForest and ranger

svmMethod

support vector machine method, which includes svm, ksvm, s

annMethod

artificial neural network method which includes 2 different ann methods

Value

a 'methParam' object

Examples

# Standard parameters for random forest
methParam <- customParams() # or
methParam <- customParams('RF')
# Custom ntreeOut parameters for random forest
methParam <- customParams('RF',ntreeOut=50) # or
methParam <- customParams('RF')
methParam$ntreeOut <- 50
methParam

Get RMSEP

Description

Get Root Mean Square Error of Prediction (RMSEP) in classification.

Usage

get_rmsep(actual, predicted)

Arguments

actual

Vector of actual classifications of samples

predicted

Vector of predicted classifications of samples

Value

RMSEP

Examples

data("mosquito")
actual <- YR2
predicted <- sampling_from_distribution(actual)
get_rmsep(actual, predicted)

Get BER

Description

Get Balanced Error Rate (BER) in classification.

Usage

getBER(actual, predicted, weigh_added = FALSE, weighing_matrix)

Arguments

actual

Vector of actual classifications of samples

predicted

Vector of predicted classifications of samples

weigh_added

To add a weighing matrix when it is classification

weighing_matrix

The matrix used to get a misclassification score

Value

BER

Examples

data("mosquito")
actual <- Yotu
predicted <- sampling_from_distribution(actual)
getBER(actual, predicted)

Get number of misclassifications

Description

Get number of misclassifications from classification analysis.

Usage

getMISS(actual, predicted, weigh_added = FALSE, weighing_matrix)

Arguments

actual

Vector of actual classifications of samples

predicted

Vector of predicted classifications of samples

weigh_added

Boolean, add a weighing matrix when it is classification

weighing_matrix

The matrix used to get a misclassification score

Value

number of misclassifications

Examples

data("mosquito")
actual <- Yotu
predicted <- sampling_from_distribution(actual)
getMISS(actual, predicted)

Get min, mid or max model from Elastic Net modelling

Description

Obtain the min, mid, or max number of variables for an object generated from the rdCVnet() function.

Usage

getVar(
  rdCVnetObject,
  option = c("quantile", "fitness"),
  fit_curve = c("loess", "gam"),
  span = 1,
  k = 5,
  outlier = c("none", "IQR", "residual"),
  robust = 0.05,
  quantile = 0.25
)

Arguments

rdCVnetObject

an object obtained from the rdCVnet() function

option

quantile or fitness: which way to perform variable selection

fit_curve

gam or loess method for fitting the curve in the fitness option

span

parameter for using loess to fit curve in the fitness option: how smooth the curve needs to be

k

parameter for using gam to fit curve in the fitness option

outlier

if remove outlier variables or not. There are 3 options: "none","IRQ", "residual"

robust

if the option is fitness, robust parameter decides how much deviation it is allowed from the optimal perdiction performance for min and max variabel selection

quantile

if the option is quantile, this value decides the cut for the first quantile, ranging from 0 to 0.5

Value

a rdCVnet object

Examples

data("mosquito")
nRep <- 2
nOuter <- 4
varRatio <-0.6
classModel <- MUVR2_EN(X = Xotu,
                       Y = Yotu,
                       nRep = nRep,
                       nOuter = nOuter,
                       DA = TRUE,
                       modReturn = TRUE)
classModel<-getVar(classModel)

Get variable importance

Description

Extract autoselected variables from MUVR model object.

Usage

getVIRank(MUVRclassObject, model = "mid", n, all = FALSE)

Arguments

MUVRclassObject

an object of MUVR class

model

which model to use ("min", "mid" (default), or "max")

n

customize values

all

logical, to get the ranks of all variable or not

Value

data frame with order, name and average rank of variables ('order', 'name' & 'rank')

Examples

data("freelive2")
nRep <- 2
nOuter <- 4
varRatio <-0.6
regrModel <- MUVR2(X = XRVIP2,
                   Y = YR2,
                   nRep = nRep,
                   nOuter = nOuter,
                   varRatio = varRatio,
                   method = "PLS",
                   modReturn = TRUE)
getVIRank(regrModel, model="min")

Get reference distribution for resampling tests

Description

Make reference distribution for resampling tests to assess overfitting.

Usage

H0_reference(Y, n = 1000, fitness = c("Q2", "BER", "MISS", "AUROC"), ...)

Arguments

Y

the target variable

n

number of permutations to run

fitness

number of repetitions for each permutation (defaults to value of actual model)

...

additional arguments for sampling from distribution

Value

a histogram of reference distribution

Examples

data("freelive2")
H0_reference(YR2)

Perform permutation or resampling tests

Description

This function will extract data and parameter settings from a MUVR object and run standard permutation or resampling test. This will fit a standard case of multivariate predictive modelling in either a regression, classification or multilevel case. However, if an analysis has a complex sample dependency which requires constrained permutation of your response vector or if a variable pre-selection is performed for decreased computational burden, then permutaion/resampling loops should be constructed manually. In those cases, View(H0_test) can be a first start from which to build custom solutions for permutation analysis.

Usage

H0_test(
  MUVRclassObject,
  n = 50,
  nRep,
  nOuter,
  varRatio,
  parallel,
  type = c("resampling", "permutation")
)

Arguments

MUVRclassObject

a 'MUVR' class object

n

number of permutations to run

nRep

number of repetitions for each permutation (defaults to value of actual model)

nOuter

number of outer validation segments for each permutation (defaults to value of actual model)

varRatio

varRatio for each permutation (defaults to value of actual model)

parallel

whether to run calculations using parallel processing which requires registered backend (defaults to parallelization for the actual model)

type

either permutation or resampling, to decide whether the permutation sampling is performed on original Y values or the probability(If Y categorical)/distributions(If Y continuous) of Y values

Value

permutation_output: A permutation matrix with permuted fitness statistics (nrow=n and ncol=3 for min/mid/max)

Examples

data("freelive2")
nRep <- 2
nOuter <- 4
varRatio <-0.6
regrModel <- MUVR2(X = XRVIP2,
                   Y = YR2,
                   nRep = nRep,
                  nOuter = nOuter,
                   varRatio = varRatio,
                   method = "PLS",
                   modReturn = TRUE)
H0_test(regrModel)

Subject identifiers for the rye metabolomics regression tutorial

Description

Subject identifiers for the rye metabolomics regression tutorial

Usage

data(freelive)

Subject identifiers for the rye metabolomics regression tutorial, using unique individuals

Description

Subject identifiers for the rye metabolomics regression tutorial, using unique individuals

Usage

data(freelive2)

Merge two MUVR class objects

Description

Merge two MUVR class objects that use regression for PLS or RF methods. The resultant MUVR class object has the same indata except that nRep is different.

Usage

mergeModels(MV1, MV2)

Arguments

MV1

a MUVR class Object

MV2

a MUVR class Object

Value

A merged MURV class object

Examples

data("freelive2")
nRep <- 2
nOuter <- 4
varRatio <-0.6
regrModel <- MUVR2(X = XRVIP2,
                   Y = YR2,
                   nRep = nRep,
                  nOuter = nOuter,
                   varRatio = varRatio,
                   method = "PLS",
                   modReturn = TRUE)
mergedModel<-mergeModels(regrModel,regrModel)

MUVR2 with PLS and RF

Description

"Multivariate modelling with Unbiased Variable selection" using PLS and RF. Repeated double cross validation with tuning of variables in the inner loop.

Usage

MUVR2(
  X,
  Y,
  ID,
  scale = TRUE,
  nRep = 5,
  nOuter = 6,
  nInner,
  varRatio = 0.75,
  DA = FALSE,
  fitness = c("AUROC", "MISS", "BER", "RMSEP", "wBER", "wMISS"),
  method = c("PLS", " RF", "ANN", "SVM"),
  methParam,
  ML = FALSE,
  modReturn = FALSE,
  logg = FALSE,
  parallel = TRUE,
  weigh_added = FALSE,
  weighing_matrix = NULL,
  keep,
  ...
)

Arguments

X

Predictor variables. NB: Variables (columns) must have names/unique identifiers. NAs not allowed in data. For multilevel, only the positive half of the difference matrix is specified.

Y

Response vector (Dependent variable). For classification, a factor (or character) variable should be used. For multilevel, Y is calculated automatically.

ID

Subject identifier (for sampling by subject; Assumption of independence if not specified)

scale

If TRUE, the predictor variable matrix is scaled to unit variance for PLS modeling.

nRep

Number of repetitions of double CV. (Defaults to 5)

nOuter

Number of outer CV loop segments. (Defaults to 6)

nInner

Number of inner CV loop segments. (Defaults to nOuter - 1)

varRatio

Ratio of variables to include in subsequent inner loop iteration. (Defaults to 0.75)

DA

Boolean for Classification (discriminant analysis) (By default, if Y is numeric -> DA = FALSE. If Y is factor (or character) -> DA = TRUE)

fitness

Fitness function for model tuning (choose either 'AUROC' or 'MISS' (default) for classification; or 'RMSEP' (default) for regression.)

method

Multivariate method. Supports 'PLS' and 'RF' (default)

methParam

List with parameter settings for specified MV method (see function code for details)

ML

Boolean for multilevel analysis (defaults to FALSE)

modReturn

Boolean for returning outer segment models (defaults to FALSE). Setting modReturn = TRUE is required for making MUVR predictions using predMV().

logg

Boolean for whether to sink model progressions to 'log.txt'

parallel

Boolean for whether to perform 'foreach' parallel processing (Requires a registered parallel backend; Defaults to 'TRUE')

weigh_added

To add a weighing matrix when it is classfication

weighing_matrix

The matrix used for get a miss classfication score

keep

Confounder variables can be added. NB: Variables (columns) must match column names.

...

additional argument

Value

A 'MUVR' object

Examples

data(freelive2)
nRep <- 2 # Number of MUVR2 repetitions
nOuter <- 3 # Number of outer cross-validation segments
varRatio <- 0.6 # Proportion of variables kept per iteration
method <- 'PLS' # Selected core modeling algorithm
regrModel <- MUVR2(X = XRVIP2,
                   Y = YR2,
                   nRep = nRep,
                   nOuter = nOuter,
                   varRatio = varRatio,
                   method = method,
                   modReturn = TRUE)

MUVR2 with EN

Description

"Multivariate modelling with Unbiased Variable selection" using Elastic Net (EN). Repeated double cross validation with tuning of variables using Elastic Net.

Usage

MUVR2_EN(
  X,
  Y,
  ID,
  alow = 1e-05,
  ahigh = 1,
  astep = 11,
  alog = TRUE,
  nRep = 5,
  nOuter = 6,
  nInner,
  NZV = TRUE,
  DA = FALSE,
  fitness = c("AUROC", "MISS", "BER", "RMSEP", "wBER", "wMISS"),
  methParam,
  ML = FALSE,
  modReturn = FALSE,
  parallel = TRUE,
  keep = NULL,
  weigh_added = FALSE,
  weighing_matrix = NULL,
  ...
)

Arguments

X

Predictor variables. NB: Variables (columns) must have names/unique identifiers. NAs not allowed in data. For multilevel, only the positive half of the difference matrix is specified.

Y

Response vector (Dependent variable). For classification, a factor (or character) variable should be used. For multilevel, Y is calculated automatically.

ID

Subject identifier (for sampling by subject; Assumption of independence if not specified)

alow

alpha tuning: lowest value of alpha

ahigh

alpha tuning: highest value of alpha

astep

alpha tuning: number of alphas to try from low to high

alog

alpha tuning: Whether to space tuning of alpha in logarithmic scale (TRUE; default) or normal/arithmetic scale (FALSE)

nRep

Number of repetitions of double CV. (Defaults to 5)

nOuter

Number of outer CV loop segments. (Defaults to 6)

nInner

Number of inner CV loop segments. (Defaults to nOuter-1)

NZV

Boolean for whether to filter out near zero variance variables (defaults to TRUE)

DA

Boolean for Classification (discriminant analysis) (By default, if Y is numeric -> DA=FALSE. If Y is factor (or character) -> DA=TRUE)

fitness

Fitness function for model tuning (choose either 'AUROC' or 'MISS' (default) for classification; or 'RMSEP' (default) for regression.)

methParam

List with parameter settings for specified MV method (see function code for details)

ML

Boolean for multilevel analysis (defaults to FALSE)

modReturn

Boolean for returning outer segment models (defaults to FALSE). Setting modReturn=TRUE is required for making MUVR predictions using predMV().

parallel

Boolean for whether to perform 'foreach' parallel processing (Requires a registered parallel backend; Defaults to 'TRUE')

keep

A group of confounders that you want to manually set as non-zero

weigh_added

weigh_added

weighing_matrix

weighing_matrix

...

Pass additional arguments

Value

A MUVR object

Examples

data("freelive2")
nRep <- 2 # Number of MUVR2 repetitions
nOuter <- 4 # Number of outer cross-validation segments
regrModel <- MUVR2_EN(X = XRVIP2,
                      Y = YR2,
                      nRep = nRep,
                      nOuter = nOuter,
                      modReturn = TRUE)

Identify variables with near zero variance

Description

Adapted and stripped down from mixOmics v 5.2.0 (https://cran.r-project.org/web/packages/mixOmics/).

Usage

nearZeroVar(x, freqCut = 95/5, uniqueCut = 10)

Arguments

x

a numeric vector or matrix, or a data frame with all numeric data.

freqCut

the cutoff for the ratio of the most common value to the second most common value.

uniqueCut

the cutoff for the percentage of distinct values out of the number of total samples.

Value

nzv object

Examples

data("freelive2")
nearZeroVar(XRVIP2)
data("mosquito")
nearZeroVar(Xotu)

One hot encoding

Description

Each factor and character variable with n categories(>2) will be transformed to n variables. Each factor and character variable with 2 categories will be transformed to one 01 numeric dummy variable. Each factor and character variable with 1 categories will be transformed to one numeric variable that only has value 1. Each factor and character variable with 0 categories will be transformed to one numeric variable that only has value -999. Each logical variable will be transformed to one 01 numeric dummy variable.

Usage

onehotencoding(X)

Arguments

X

data frame data with numeric, factor, character and/or logical variables

Value

matrix with all variables transformed to numeric variables

Examples

#To test the scenario when X has factor and character when using PLS
#add one factor and one character variable(freelive data X,
# which originally has 112 numeric samples and 1147 observations)
# factor variable has 3,6,5factors(nearzero variance), character variable has 7,4 categories
factor_variable1<-as.factor(c(rep("33",105),rep("44",3),rep("55",4)))
factor_variable2<-as.factor(c(rep("AB",20),rep("CD",10),rep("EF",30),
                          rep("GH",15),rep("IJ",25),rep("KL",12)))
factor_variable3<-as.factor(c(rep("Tessa",25),rep("Olle",30),rep("Yan",12),
                           rep("Calle",25),rep("Elisa",20)))
factor_variable4<-as.factor(c(rep(NA,112)))
character_variable1<-c(rep("one",16),rep("two",16),rep("three",16),
                      rep("four",16),rep("five",16),rep("six",16),rep("seven",16))
character_variable2<-c(rep("yes",28),rep("no",28),
                         rep("yes",28),rep("no",28))
character_variable3<-c(rep("Hahahah",112))
character_variable4<-as.character(c(rep(NA,112)))
logical_variable1<-c(rep(TRUE,16),rep(FALSE,16),rep(TRUE,16),
rep(FALSE,16),rep(TRUE,16),rep(FALSE,32))
logical_variable2<-c(rep(TRUE,28),rep(FALSE,28),rep(TRUE,28),rep(FALSE,28))

 X<-data.frame(row.names<-1:112)
 X<-cbind(X,XRVIP,
       factor_variable1,factor_variable2,factor_variable3,factor_variable4,
       character_variable1,character_variable2,character_variable3,character_variable4,
        logical_variable1,logical_variable2)
  onehotencoding(X)

Plot permutation analysis

Description

Plot permutation analysis using actual model and permutation result. This is basically a wrapper for the MUVR2::plotPerm() function using model objects to make coding nicer and cleaner.

Usage

permutationPlot(
  MUVRclassObject,
  permutation_result,
  model = "Mid",
  type = "t",
  side = c("greater", "smaller"),
  pos,
  xlab = NULL,
  xlim,
  ylim = NULL,
  breaks = "Sturges",
  main = NULL
)

Arguments

MUVRclassObject

A 'MUVR' class object

permutation_result

A permutation result. It is a list of 1 items: permutation_output

model

'Min', 'Mid', or 'Max'

type

't' (default; for Student's t) or 'non' for "non-parametric" (i.e. rank) studen'ts

side

'smaller' for actual lower than H0 or 'greater' for actual larger than H0 (automatically selected if not specified)

pos

which side of actual to put p-value on

xlab

optional xlabel

xlim

optional x-range

ylim

otional y-range

breaks

optional custom histogram breaks (defaults to 'sturges')

main

optional plot title (or TRUE for autoname)

Value

A permutation plot

Examples

data("freelive2")
nRep <- 2
nOuter <- 4
varRatio <-0.6
regrModel <- MUVR2(X = XRVIP2,
                   Y = YR2,
                   nRep = nRep,
                  nOuter = nOuter,
                   varRatio = varRatio,
                   method = "PLS",
                   modReturn = TRUE)
permutation_result<-H0_test(regrModel,n=10)
permutationPlot(regrModel,permutation_result)

Plot predictions

Description

Plot predicted and actual target variables, with different plots depending on modelling approach.

Usage

plotMV(MUVRclassObject, model = "min", factCols, sampLabels, ylim = NULL)

Arguments

MUVRclassObject

An MUVR class object

model

What type of model to plot ('min', 'mid' or 'max'). Defaults to 'mid'.

factCols

An optional vector with colors for the factor levels (in the same order as the levels)

sampLabels

Sample labels (optional; implemented for classification)

ylim

Optional for imposing y-limits for regression and classification analysis

Value

A plot of results from multivariate predictions

Examples

data("freelive2")
nRep <- 2
nOuter <- 4
varRatio <-0.6
regrModel <- MUVR2(X = XRVIP2,
                   Y = YR2,
                   nRep = nRep,
                  nOuter = nOuter,
                   varRatio = varRatio,
                   method = "PLS",
                   modReturn = TRUE)
plotMV(regrModel, model="min")

PCA score plot

Description

Customised PCA score plots with the possibility to choose PCs, exporting to png and the possibility to add color or different plotting symbols according to variable.

Usage

plotPCA(pca, PC1 = 1, PC2 = 2, file, colVar, symbVar, main = "")

Arguments

pca

A 'prcomp' object

PC1

Principal component on x-axis

PC2

Principal component on y-axis

file

If specified provides the name of a png export file. Otherwise normal plot.

colVar

Continuous variable for coloring observations (40 cuts)

symbVar

Categorical/discrete variable for multiple plot symbols

main

If provided provides a main title of the plot

Value

A PCA score plot. Exported as png if 'file' specified in function call.

Examples

data("freelive2")
pca_object<-prcomp(XRVIP2)
plotPCA(pca_object)

Plot for comparison of actual model fitness vs permutation/resampling

Description

Plots histogram of null hypothesis (permutation/resampling) distribution, actual model fitness and cumulative p-value. Plot defaults to "greater than" or "smaller than" tests and cumulative probability in Student's t-distribution.

Usage

plotPerm(
  actual,
  distribution,
  xlab = NULL,
  side = c("greater", "smaller"),
  type = "t",
  ylab = NULL,
  xlim,
  ylim = NULL,
  breaks = "Sturges",
  pos,
  main = NULL,
  permutation_visual = "none",
  curve = TRUE,
  extend = 0.1,
  multiple_p_shown = NULL,
  show_actual_value = TRUE,
  show_p = TRUE,
  round_number = 4
)

Arguments

actual

Actual model fitness (e.g. Q2, AUROC or number of misclassifications)

distribution

Null hypothesis (permutation) distribution of similar metric as 'actual'

xlab

Label for x-axis (e.g. 'Q2 using real value',"Q2 using distributions","BER" 'AUROC', or 'Misclassifications')

side

Cumulative p either "greater" or "smaller" than H0 distribution (defaults to side of median(H0))

type

c('t','non',"smooth","rank","ecdf")

ylab

label for y-axis

xlim

Choice of user-specified x-limits (if default is not adequate)

ylim

Choice of user-specified y-limits (if default is not adequate)

breaks

Choice of user-specified histogram breaks (if default is not adequate)

pos

Choice of position of p-value label (if default is not adequate)

main

Choice of user-specified plot title

permutation_visual

choice of showing median or mean or none

curve

if add curve or not base on the mid

extend

how many percenrtage of the orignical range do we start

multiple_p_shown

show many p values

show_actual_value

show the actual value on the vertical line or not

show_p

if p value is added to the figure

round_number

How many digits does it keep

Value

Plot

Examples

data("freelive2")
actual <- sample(YR2, 1)
distribution <- YR2
plotPerm (actual, distribution)

Plot predictions for PLS regression

Description

At present, this function only supports predictions for PLS regression type problems.

Usage

plotPred(Ytrue, Ypreds)

Arguments

Ytrue

True value of Y, should be a vector

Ypreds

Predicted value of Y can be a vector or data frame with the same number of rows

Value

A plot, plot the prediction

Examples

data("freelive2")
Ytrue<-YR2
Ypreds<-sampling_from_distribution(YR2)
plotPred(Ytrue,Ypreds)
Ytrue<-YR2
nRep <- 2
nOuter <- 4
varRatio <-0.6
regrModel <- MUVR2(X = XRVIP2,
                   Y = YR2,
                   nRep = nRep,
                   nOuter = nOuter,
                   varRatio = varRatio,
                   method = "PLS",
                   modReturn = TRUE)
Ypreds<-regrModel$yPred
plotPred(Ytrue,Ypreds)

Plot stability

Description

Plot stability of selected variables and prediction fitness as a function of number of repetitions.

Usage

plotStability(MUVRrdCVclassObject, model = "min", VAll, nVarLim, missLim)

Arguments

MUVRrdCVclassObject

MUVR class object or rdCV object

model

'min' (default), 'mid' or 'max'

VAll

Option of specifying which variables (i.e. names) to consider as reference set. Defaults to variables selected from the 'model' of the 'MUVRrdCVclassObject'

nVarLim

Option of specifying upper limit for number of variables

missLim

Option of specifying upper limit for number of misclassifications

Value

Plot of number of variables, proportion of variables overlapping with reference and prediction accuracy (Q2 for regression; MISS otherwise) as a function of number of repetitions.

Examples

data("freelive2")
nRep <- 2
nOuter <- 4
varRatio <-0.6
regrModel <- MUVR2(X = XRVIP2,
                   Y = YR2,
                   nRep = nRep,
                  nOuter = nOuter,
                   varRatio = varRatio,
                   method = "PLS",
                   modReturn = TRUE)
plotStability(regrModel, model = "min")

Plot validation metric

Description

Produces a plot of validation metric vs number of variables in model (inner segment).

Usage

plotVAL(MUVRclassObject, show_outlier = TRUE)

Arguments

MUVRclassObject

An object of class 'MUVR'

show_outlier

Boolean, show outliers

Value

A plot

Examples

data("freelive2")
nRep <- 2
nOuter <- 4
varRatio <-0.6
regrModel <- MUVR2(X = XRVIP2,
                   Y = YR2,
                   nRep = nRep,
                  nOuter = nOuter,
                   varRatio = varRatio,
                   method = "PLS",
                   modReturn = TRUE)
plotVAL(regrModel)

Plot variable importance ranking

Description

Plot variable importance ranking in MUVR object. Regardless of MV core method, variables are sorted by rank, where lower is better. 'plotVIRank' produces boxplots of variable rankings for all model repetitions.

Usage

plotVIRank(
  MUVRclassObject,
  n,
  model = "min",
  cut,
  maptype = c("heatmap", "dotplot"),
  add_blank = 4,
  cextext = 1
)

Arguments

MUVRclassObject

An MUVR class object only applied to PLS, RF not rdCVnet

n

Number of top ranking variables to plot (defaults to those selected by MUVR2)

model

Which model to choose ('min', 'mid' (default) or 'max')

cut

Optional value to cut length of variable names to 'cut' number of characters

maptype

for rdCvnet dot plot or heat map

add_blank

put more blank when the rownames is too long,

cextext

the cex of the text

Value

Barplot of variable rankings (lower is better)

Examples

data("freelive2")
nRep <- 2
nOuter <- 4
varRatio <-0.6
regrModel <- MUVR2(X = XRVIP2,
                   Y = YR2,
                   nRep = nRep,
                  nOuter = nOuter,
                   varRatio = varRatio,
                   method = "PLS",
                   modReturn = TRUE)
plotVIRank(regrModel, n=20)

Calculate permutation p-value Calculate perutation p-value of actual model performance vs null hypothesis distribution. 'pPerm' will calculate the cumulative (1-tailed) probability of 'actual' belonging to 'permutation_distribution'. 'side' is guessed by actual value compared to median(permutation_distribution). Test is performed on original data OR ranked for non-parametric statistics.

Description

Calculate permutation p-value Calculate perutation p-value of actual model performance vs null hypothesis distribution. 'pPerm' will calculate the cumulative (1-tailed) probability of 'actual' belonging to 'permutation_distribution'. 'side' is guessed by actual value compared to median(permutation_distribution). Test is performed on original data OR ranked for non-parametric statistics.

Usage

pPerm(
  actual,
  permutation_distribution,
  side = c("smaller", "greater"),
  type = "t",
  extend = 0.1
)

Arguments

actual

Actual model performance (e.g. misclassifications or Q2)

permutation_distribution

Null hypothesis distribution from permutation test (same metric as 'actual')

side

Smaller or greater than (automatically guessed if omitted) (Q2 and AUC is a "greater than" test, whereas misclassifications is "smaller than")

type

one of ('t','non',"smooth","ecdf","rank")

extend

extend how much it extend

Value

p-value

Examples

data("freelive2")
actual <- sample(YR2, 1)
permutation_distribution <- YR2
pPerm(actual, permutation_distribution)

Predict outcomes Predict MV object using a MUVR class object and a X testing set. At present, this function only supports predictions for PLS regression type problems.

Description

Predict outcomes Predict MV object using a MUVR class object and a X testing set. At present, this function only supports predictions for PLS regression type problems.

Usage

predMV(MUVRclassobject, newdata, model = "min")

Arguments

MUVRclassobject

An 'MUVR' class object

newdata

New data for which to predict outcomes

model

What type of model to plot ('min', 'mid' or 'max'). Defaults to 'mid'.

Value

The predicted result based on the MUVR model and the newdata

Examples

data("freelive2")
nRep <- 2
nOuter <- 4
varRatio <-0.6
regrModel <- MUVR2(X = XRVIP2,
                   Y = YR2,
                   nRep = nRep,
                   nOuter = nOuter,
                   varRatio = varRatio,
                   method = "PLS",
                   modReturn=TRUE)
predMV(regrModel,XRVIP2)

Perform matrix pre-processing

Description

Perform matrix pre-processing

Usage

preProcess(
  X,
  offset = 0,
  zeroOffset = 0,
  trans = "none",
  center = "none",
  scale = "none"
)

Arguments

X

Data matrix with samples in rows and variables in columns

offset

Add offset to all data points (defaults to 0)

zeroOffset

Add offset to zero data (defaults to 0)

trans

Either 'log', 'sqrt' or 'none' (default is 'none')

center

Either 'mean', 'none' or a numeric vector of length equal to the number of columns of X (defaults to 'none').

scale

Either 'UV', 'Pareto', 'none' or a numeric vector of length equal to the number of columns of X (defaults to 'none').

Value

A pre-processed data matrix

Examples

data("freelive2")
preProcess(XRVIP2)

Q2 calculation

Description

Q2 calculation

Usage

Q2_calculation(yhat, y)

Arguments

yhat

prediction values

y

real values

Value

Q2

Examples

data("freelive2")
actual <- YR2
predicted <- MUVR2::sampling_from_distribution(actual)
Q2_calculation(actual, predicted)

Wrapper for speedy access to MUVR2 (autosetup of parallelization)

Description

Wrapper for speedy access to MUVR2 (autosetup of parallelization)

Usage

qMUVR2(
  X,
  Y,
  ML = FALSE,
  method = "RF",
  varRatio = 0.65,
  nCore,
  repMult = 1,
  nOuter = 5,
  ...
)

Arguments

X

X-data

Y

Y-data

ML

Boolean for multilevel

method

'RF' (default) or 'PLS'

varRatio

proportion of variables to keep in each loop of the recursive feature elimination

nCore

Number of threads to use for calculation (defaults to detectCores()-1)

repMult

Multiplier of cores -> nRep = repMult * nCore

nOuter

Number of outer segments

...

Additional arguments(see MUVR)

Value

MUVR object

Examples

data("freelive2")
regrModel <- qMUVR2(X = XRVIP2,
                    Y = YR2,
                    nCore = 1)

Wrapper for repeated double cross-validation without variable selection

Description

Wrapper for repeated double cross-validation without variable selection

Usage

rdCV(
  X,
  Y,
  ID,
  nRep = 5,
  nOuter = 6,
  nInner,
  DA = FALSE,
  fitness = c("AUROC", "MISS", "RMSEP", "BER"),
  method = c("PLS", "RF"),
  methParam,
  ML = FALSE,
  modReturn = FALSE,
  logg = FALSE
)

Arguments

X

Independent variables. NB: Variables (columns) must have names/unique identifiers. NAs not allowed in data. For ML, X is upper half only (X1-X2)

Y

Response vector (Dependent variable). For DA (classification), Y should be factor or character. For ML, Y is omitted. For regression, Y is numeric.

ID

Subject identifier (for sampling by subject; Assumption of independence if not specified)

nRep

Number of repetitions of double CV.

nOuter

Number of outer CV loop segments.

nInner

Number of inner CV loop segments.

DA

Logical for Classification (discriminant analysis) (Defaults do FALSE, i.e. regression). PLS is limited to two-class problems (see 'Y' above).

fitness

Fitness function for model tuning (choose either 'AUROC' or 'MISS'or 'BER' for classification; or 'RMSEP' (default) for regression.)

method

Multivariate method. Supports 'PLS' and 'RF' (default)

methParam

List with parameter settings for specified MV method (defaults to ???)

ML

Logical for multilevel analysis (defaults to FALSE)

modReturn

Logical for returning outer segment models (defaults to FALSE)

logg

Logical for whether to sink model progressions to 'log.txt'

Value

An object containing stuff...

Examples

data("freelive2")
nRep <- 2 # Number of MUVR2 repetitions
nOuter <- 4 # Number of outer cross-validation segments
varRatio <- 0.75 # Proportion of variables kept per iteration
method <- 'RF' # Selected core modeling algorithm
regrModel <- rdCV(X = XRVIP2,
                  Y = YR2,
                  nRep = nRep,
                  nOuter = nOuter,
                  method = method,
                  modReturn = TRUE)

Make custom parameters for rdcvNet internal modelling

Description

Custom parameters can be set in the function call or by manually setting "slots" in the resulting methParam object.

Usage

rdcvNetParams(
  robust = 0.05,
  family = "gaussian",
  nRepInner = 1,
  NZV = TRUE,
  oneHot = TRUE
)

Arguments

robust

Robustness (slack) criterion for determining min and max knees (defaults to 0.05)

family

the options could be "gaussian", "binomial", "poisson", "multinomial", "cox", "mgaussian"

nRepInner

how many nRepInner

NZV

NZV

oneHot

TRUE or FALSE using onehot endcoding or not

Value

a 'methParam' object

Examples

# Standard parameters for rdcvNet
methParam <- rdcvNetParams()

Sampling from the distribution of something

Description

Sampling from the distribution of something

Usage

sampling_from_distribution(X, upperlimit, lowerlimit, extend, n)

Arguments

X

a vector (numeric or factor) where the distribution/probility will be generated

upperlimit

if X is numeric, set upper limit

lowerlimit

if X is numeric, set lower limit

extend

If X is numeric, how much you want to extend from the lower and upper existing X.

n

How many you want to sample

Value

a resampled thing

Examples

data("mosquito")
sampling_from_distribution(Yotu)
data("freelive2")
sampling_from_distribution(YR2,
                           upperlimit=200,
                           lowerlimit=0,
                           n=length(YR2)
                           )

Report variables belonging to different classes

Description

Reports names and numbers of variables: all as well as optimal (min model), redundant (from min up to max) and noisy (the rest).

Usage

varClass(MUVRclassObject)

Arguments

MUVRclassObject

A MUVR class object

Value

A list with names and numbers of variables: all as well as optimal (Corresponding to 'min' or minial-optimal model), redundant (from 'min' up to 'max' or all-relevant ) and noisy (the rest)

Examples

data("mosquito")
nRep <- 2
nOuter <- 4
classModel <- MUVR2_EN(X = Xotu,
                       Y = Yotu,
                       nRep = nRep,
                       nOuter = nOuter,
                       DA = TRUE,
                       modReturn = TRUE)
classModel<-getVar(classModel,option="quantile")
varClass(classModel)

Microbiota composition in mosquitos for the classification tutorial

Description

Microbiota composition in mosquitos for the classification tutorial

Usage

data(mosquito)

Metabolomics data for the rye metabolomics regression tutorial

Description

Metabolomics data for the rye metabolomics regression tutorial

Usage

data(freelive)

Metabolomics data for the rye metabolomics regression tutorial, using unique individuals

Description

Metabolomics data for the rye metabolomics regression tutorial, using unique individuals

Usage

data(freelive2)

Village of capture of mosquitos for the classification tutorial

Description

Village of capture of mosquitos for the classification tutorial

Usage

data(mosquito)

Rye consumption for the rye metabolomics regression tutorial

Description

Rye consumption for the rye metabolomics regression tutorial

Usage

data(freelive)

Rye consumption for the rye metabolomics regression tutorial, using unique individuals

Description

Rye consumption for the rye metabolomics regression tutorial, using unique individuals

Usage

data(freelive2)