Non-Parametric Tests

https://colab.research.google.com/drive/1aFcVpTxy5XzfMx2toB7VRJOLW8uBAZQg?usp=sharing

Parametric methods

  • They are based in means, standard deviations or probabilities.

The Normal distribution is not always appropriate

  • To study variables with a few observations,
  • Non-symmetrical distributions, or
  • Variables that can have more than two values

Non-parametric methods

They are not based in the same assumptions that parametric methods, but also have some assumptions.

SituationNon-paramethric methodParamethric methods
One sampleWilcoxon signed rank testT test
Two indpendent samplesMann-Whitney U testT test for two independent samples
Two paired samplesWilcoxon signed rank testPaired T-test
One sample, two quantitative variablesCorrelation coefficient of SpearmanCorrelation coefficient of Pearson

Wilcoxon Signed Rank Test

Analysis of Variance (ANOVA)

Generalization of $𝑡$-test for >2 treatments

Given: $đť‘›$ experimental treatments, one dependent variable

Assumes:

  1. the variables are normally distributed in each treatment
  2. the variances for the treatments are similar
  3. the sample sizes for the treatments do not differ hugely
    (Okay to deviate slightly from these assumptions for larger samples sizes)

Works by analyzing how much of the total variance is due to differences within groups, and how much is due to differences across groups.

Procedure:

$H_0$: There is no difference in the population means across all treatments
Compute the F-statistic:

F=(found variation of the group averages)/(expected variation of the group averages)
(don’t do this by hand!)

If $H_0$ is true, we would expect F=1

Note: ANOVA tells you whether there is a significant difference, but does not tell you which treatment(s) are different.

$\chi^2$ Test

“ANOVA for non-interval data”

Given: data in an đť‘› x đť‘š frequency table (e.g. đť‘› treatments, đť‘š variables)

Assumes:

  1. Non-parametric, hence no assumption of normality
  2. Reasonable sample size (pref >50, although some say >20)
  3. Reasonable numbers in each cell

Calculates whether the data fits a given distribution

Basis: computes the sum of the Observed-Expected values

Calculate an expected value (mean) for each column

where $O_i$ is an observed frequency $E_i$ is the expected frequency asserted by the null hypothesis

Calculate $\chi^2$:

$\chi^2 = \sum_{i=1}^{n}\frac{(O_i-E_i)^2}{E_i}$

Compare with lookup value for a given significance level and ded

Get to know these and others: https://docs.scipy.org/doc/scipy/reference/stats.html

What about ordinal correlation?

Spearman’s Rank Coefficient $\rho$:

Convert each variable into a ranked list
Compute:

$\rho=1-\frac{6\sum{(x_i-y_i)^2}}{n(n^2-1})$

Kendall’s $\tau$

$\tau=\frac{(\textrm{num. concordant ranked pairs})-(\textrm{num. disconcordant ranked pairs})}{\binom{n}{2}}$

From Wikipedia