Compute correlations type analysis on mixed classes columns of larges dataframes with parallel backend. The dataframe is allowed to have columns of these four classes: integer, numeric, factor and character. The character column is considered as categorical variable.
Usage
corrp(
df,
parallel = TRUE,
n.cores = 1,
p.value = 0.05,
verbose = TRUE,
num.s = 1000,
rk = FALSE,
comp = c("greater", "less"),
alternative = c("two.sided", "less", "greater"),
cor.nn = c("pearson", "mic", "dcor", "pps"),
cor.nc = c("lm", "pps"),
cor.cc = c("cramersV", "uncoef", "pps"),
lm.args = list(),
pearson.args = list(),
dcor.args = list(),
mic.args = list(),
pps.args = list(),
cramersV.args = list(),
uncoef.args = list(),
...
)
Arguments
- df
\[
data.frame(1)
]
input data frame.- parallel
\[
logical(1)
]
If its TRUE run the operations in parallel backend.- n.cores
\[
numeric(1)
]
The number of cores to use for parallel execution.- p.value
\[
logical(1)
]
P-value probability of obtaining the observed results of a test, assuming that the null hypothesis is correct. By default p.value=0.05 (Cutoff value for p-value.).- verbose
\[
logical(1)
]
Activate verbose mode.- num.s
\[
numeric(1)
]
Used in permutation test. The number of samples with replacement created with y numeric vector.- rk
\[
logical(1)
]
Used in permutation test. if its TRUE transform x, y numeric vectors with samples ranks.- comp
\[
character(1)
]
The paramp.value
must be greater or less than those estimated in tests and correlations.- alternative
\[
character(1)
]
a character string specifying the alternative hypothesis for the correlation inference. It must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.- cor.nn
\[
character(1)
]
Choose correlation type to be used in integer/numeric pair inference. The options are `pearson: Pearson Correlation`,`mic: Maximal Information Coefficient`, `dcor: Distance Correlation`,`pps: Predictive Power Score`.Default is `Pearson Correlation`.- cor.nc
\[
character(1)
]
Choose correlation type to be used in integer/numeric - factor/categorical pair inference. The option are `lm: Linear Model`,`pps: Predictive Power Score`. Default is `Linear Model`.- cor.cc
\[
character(1)
]
Choose correlation type to be used in factor/categorical pair inference. The option are `cramersV: Cramer's V`,`uncoef: Uncertainty coefficient`, `pps: Predictive Power Score`. Default is ` Cramer's V`.- lm.args
\[
list(1)
]
additional parameters for the specific method.- pearson.args
\[
list(1)
]
additional parameters for the specific method.- dcor.args
\[
list(1)
]
additional parameters for the specific method.- mic.args
\[
list(1)
]
additional parameters for the specific method.- pps.args
\[
list(1)
]
additional parameters for the specific method.- cramersV.args
\[
list(1)
]
additional parameters for the specific method.- uncoef.args
\[
list(1)
]
additional parameters for the specific method.- ...
Additional arguments (TODO).
Value
list with two tables: data and index.
- The `$data` table contains all the statistical results;
- The `$index` table contains the pairs of indices used in each inference of the data table.
- All statistical tests are controlled by the confidence internal of
p.value param. If the statistical tests do not obtain a significance greater/less
than p.value the value of variable `isig` will be `FALSE`.
- There is no statistical significance test for the pps algorithm. By default `isig` is TRUE.
- If any errors occur during operations the association measure(`infer.value`) will be `NA`.
Details (Types)
- integer/numeric pair
Pearson Correlation using cor
function. The
value lies between -1 and 1.
- integer/numeric pair
Distance Correlation using dcorT.test
function. The
value lies between 0 and 1.
- integer/numeric pair
Maximal Information Coefficient using mine
function. The
value lies between 0 and 1.
- integer/numeric pair
Predictive Power Score using score
function. The
value lies between 0 and 1.
- integer/numeric - factor/categorical pair
correlation coefficient or
squared root of R^2 coefficient of linear regression of integer/numeric
variable over factor/categorical variable using lm
function. The value
lies between 0 and 1.
- integer/numeric - factor/categorical pair
Predictive Power Score using score
function. The
value lies between 0 and 1.
- factor/categorical pair
Cramer's V value is
computed based on chisq test and using cramersV
function. The value lies
between 0 and 1.
- factor/categorical pair
Uncertainty coefficient using UncertCoef
function. The
value lies between 0 and 1.
- factor/categorical pair
Predictive Power Score using score
function. The
value lies between 0 and 1.
References
KS Srikanth,sidekicks,cor2, 2020. URL https://github.com/talegari/sidekicks/.
Paul van der Laken, ppsr,2021. URL https://github.com/paulvanderlaken/ppsr.