Skip to contents

Compute correlation type analysis on two mixed classes columns of a given dataframe. The dataframe is allowed to have columns of these four classes: integer, numeric, factor and character. The character column is considered as categorical variable.

Usage

corr_fun(
  df,
  nx,
  ny,
  p.value = 0.05,
  verbose = TRUE,
  num.s = 1000,
  rk = FALSE,
  comp = c("greater", "less"),
  alternative = c("two.sided", "less", "greater"),
  cor.nn = c("pearson", "mic", "dcor", "pps"),
  cor.nc = c("lm", "pps"),
  cor.cc = c("cramersV", "uncoef", "pps"),
  lm.args = list(),
  pearson.args = list(),
  dcor.args = list(),
  mic.args = list(),
  pps.args = list(),
  cramersV.args = list(),
  uncoef.args = list(),
  ...
)

Arguments

df

\[data.frame(1)]
input data frame.

nx

\[character(1)]
column name of independent/predictor variable.

ny

\[character(1)]
column name of dependent/target variable.

p.value

\[logical(1)]
P-value probability of obtaining the observed results of a test, assuming that the null hypothesis is correct. By default p.value=0.05 (Cutoff value for p-value.).

verbose

\[logical(1)]
Activate verbose mode.

num.s

\[numeric(1)]
Used in permutation test. The number of samples with replacement created with y numeric vector.

rk

\[logical(1)]
Used in permutation test. if its TRUE transform x, y numeric vectors with samples ranks.

comp

\[character(1)]
The param p.value must be greater or less than those estimated in tests and correlations.

alternative

\[character(1)]
a character string specifying the alternative hypothesis for the correlation inference. It must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.

cor.nn

\[character(1)]
Choose correlation type to be used in integer/numeric pair inference. The options are `pearson: Pearson Correlation`,`mic: Maximal Information Coefficient`, `dcor: Distance Correlation`,`pps: Predictive Power Score`.Default is `Pearson Correlation`.

cor.nc

\[character(1)]
Choose correlation type to be used in integer/numeric - factor/categorical pair inference. The option are `lm: Linear Model`,`pps: Predictive Power Score`. Default is `Linear Model`.

cor.cc

\[character(1)]
Choose correlation type to be used in factor/categorical pair inference. The option are `cramersV: Cramer's V`,`uncoef: Uncertainty coefficient`, `pps: Predictive Power Score`. Default is ` Cramer's V`.

lm.args

\[list(1)]
additional parameters for the specific method.

pearson.args

\[list(1)]
additional parameters for the specific method.

dcor.args

\[list(1)]
additional parameters for the specific method.

mic.args

\[list(1)]
additional parameters for the specific method.

pps.args

\[list(1)]
additional parameters for the specific method.

cramersV.args

\[list(1)]
additional parameters for the specific method.

uncoef.args

\[list(1)]
additional parameters for the specific method.

...

Additional arguments (TODO).

Value

list with all statistical results.

- All statistical tests are controlled by the confidence internal of p.value param. If the statistical tests do not obtain a significance greater/less than p.value the value of variable `isig` will be `FALSE`.

- There is no statistical significance test for the pps algorithm. By default `isig` is TRUE.

- If any errors occur during operations by default the association measure(`infer.value`) will be `NA`.

Details (Types)

- integer/numeric pair Pearson Correlation using cor function. The value lies between -1 and 1.
- integer/numeric pair Distance Correlation using dcorT.test function. The value lies between 0 and 1.
- integer/numeric pair Maximal Information Coefficient using mine function. The value lies between 0 and 1.
- integer/numeric pair Predictive Power Score using score function. The value lies between 0 and 1.

- integer/numeric - factor/categorical pair correlation coefficient or squared root of R^2 coefficient of linear regression of integer/numeric variable over factor/categorical variable using lm function. The value lies between 0 and 1.
- integer/numeric - factor/categorical pair Predictive Power Score using score function. The value lies between 0 and 1.

- factor/categorical pair Cramer's V value is computed based on chisq test and using cramersV function. The value lies between 0 and 1.
- factor/categorical pair Uncertainty coefficient using UncertCoef function. The value lies between 0 and 1.
- factor/categorical pair Predictive Power Score using score function. The value lies between 0 and 1.

References

KS Srikanth,sidekicks,cor2, 2020. URL https://github.com/talegari/sidekicks/.

Paul van der Laken, ppsr,2021. URL https://github.com/paulvanderlaken/ppsr.

Author

Igor D.S. Siciliani