Compute correlation type analysis on two mixed classes columns of a given dataframe. The dataframe is allowed to have columns of these four classes: integer, numeric, factor and character. The character column is considered as categorical variable.
Usage
corr_fun(
df,
nx,
ny,
p.value = 0.05,
verbose = TRUE,
num.s = 1000,
rk = FALSE,
comp = c("greater", "less"),
alternative = c("two.sided", "less", "greater"),
cor.nn = c("pearson", "mic", "dcor", "pps"),
cor.nc = c("lm", "pps"),
cor.cc = c("cramersV", "uncoef", "pps"),
lm.args = list(),
pearson.args = list(),
dcor.args = list(),
mic.args = list(),
pps.args = list(),
cramersV.args = list(),
uncoef.args = list(),
...
)
Arguments
- df
\[
data.frame(1)
]
input data frame.- nx
\[
character(1)
]
column name of independent/predictor variable.- ny
\[
character(1)
]
column name of dependent/target variable.- p.value
\[
logical(1)
]
P-value probability of obtaining the observed results of a test, assuming that the null hypothesis is correct. By default p.value=0.05 (Cutoff value for p-value.).- verbose
\[
logical(1)
]
Activate verbose mode.- num.s
\[
numeric(1)
]
Used in permutation test. The number of samples with replacement created with y numeric vector.- rk
\[
logical(1)
]
Used in permutation test. if its TRUE transform x, y numeric vectors with samples ranks.- comp
\[
character(1)
]
The paramp.value
must be greater or less than those estimated in tests and correlations.- alternative
\[
character(1)
]
a character string specifying the alternative hypothesis for the correlation inference. It must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.- cor.nn
\[
character(1)
]
Choose correlation type to be used in integer/numeric pair inference. The options are `pearson: Pearson Correlation`,`mic: Maximal Information Coefficient`, `dcor: Distance Correlation`,`pps: Predictive Power Score`.Default is `Pearson Correlation`.- cor.nc
\[
character(1)
]
Choose correlation type to be used in integer/numeric - factor/categorical pair inference. The option are `lm: Linear Model`,`pps: Predictive Power Score`. Default is `Linear Model`.- cor.cc
\[
character(1)
]
Choose correlation type to be used in factor/categorical pair inference. The option are `cramersV: Cramer's V`,`uncoef: Uncertainty coefficient`, `pps: Predictive Power Score`. Default is ` Cramer's V`.- lm.args
\[
list(1)
]
additional parameters for the specific method.- pearson.args
\[
list(1)
]
additional parameters for the specific method.- dcor.args
\[
list(1)
]
additional parameters for the specific method.- mic.args
\[
list(1)
]
additional parameters for the specific method.- pps.args
\[
list(1)
]
additional parameters for the specific method.- cramersV.args
\[
list(1)
]
additional parameters for the specific method.- uncoef.args
\[
list(1)
]
additional parameters for the specific method.- ...
Additional arguments (TODO).
Value
list with all statistical results.
- All statistical tests are controlled by the confidence internal of
p.value param. If the statistical tests do not
obtain a significance greater/less
than p.value the value of variable `isig` will be `FALSE`.
- There is no statistical significance test
for the pps algorithm. By default `isig` is TRUE.
- If any errors occur during operations by default the association measure(`infer.value`) will be `NA`.
Details (Types)
- integer/numeric pair
Pearson Correlation using
cor
function. The
value lies between -1 and 1.
- integer/numeric pair
Distance Correlation
using dcorT.test
function. The
value lies between 0 and 1.
- integer/numeric pair
Maximal Information Coefficient using
mine
function. The
value lies between 0 and 1.
- integer/numeric pair
Predictive Power Score using
score
function. The
value lies between 0 and 1.
- integer/numeric - factor/categorical pair
correlation coefficient or
squared root of R^2 coefficient of linear regression of integer/numeric
variable over factor/categorical variable using
lm
function. The value
lies between 0 and 1.
- integer/numeric - factor/categorical pair
Predictive Power Score using score
function.
The value lies between 0 and 1.
- factor/categorical pair
Cramer's V value is
computed based on chisq test and using
cramersV
function. The value lies
between 0 and 1.
- factor/categorical pair
Uncertainty coefficient
using UncertCoef
function. The
value lies between 0 and 1.
- factor/categorical pair
Predictive Power Score
using score
function.
The value lies between 0 and 1.
References
KS Srikanth,sidekicks,cor2, 2020. URL https://github.com/talegari/sidekicks/.
Paul van der Laken, ppsr,2021. URL https://github.com/paulvanderlaken/ppsr.