https://colab.research.google.com/drive/1aFcVpTxy5XzfMx2toB7VRJOLW8uBAZQg?usp=sharing
Parametric methods
- They are based in means, standard deviations or probabilities.
The Normal distribution is not always appropriate
- To study variables with a few observations,
- Non-symmetrical distributions, or
- Variables that can have more than two values
Non-parametric methods
They are not based in the same assumptions that parametric methods, but also have some assumptions.
Situation | Non-paramethric method | Paramethric methods |
One sample | Wilcoxon signed rank test | T test |
Two indpendent samples | Mann-Whitney U test | T test for two independent samples |
Two paired samples | Wilcoxon signed rank test | Paired T-test |
One sample, two quantitative variables | Correlation coefficient of Spearman | Correlation coefficient of Pearson |
Wilcoxon Signed Rank Test
Analysis of Variance (ANOVA)
Generalization of $𝑡$-test for >2 treatments
Given: $đť‘›$ experimental treatments, one dependent variable
Assumes:
- the variables are normally distributed in each treatment
- the variances for the treatments are similar
- the sample sizes for the treatments do not differ hugely
(Okay to deviate slightly from these assumptions for larger samples sizes)
Works by analyzing how much of the total variance is due to differences within groups, and how much is due to differences across groups.
Procedure:
$H_0$: There is no difference in the population means across all treatments
Compute the F-statistic:
F=(found variation of the group averages)/(expected variation of the group averages)
(don’t do this by hand!)
If $H_0$ is true, we would expect F=1
Note: ANOVA tells you whether there is a significant difference, but does not tell you which treatment(s) are different.
$\chi^2$ Test
“ANOVA for non-interval data”
Given: data in an đť‘› x đť‘š frequency table (e.g. đť‘› treatments, đť‘š variables)
Assumes:
- Non-parametric, hence no assumption of normality
- Reasonable sample size (pref >50, although some say >20)
- Reasonable numbers in each cell
Calculates whether the data fits a given distribution
Basis: computes the sum of the Observed-Expected values
Calculate an expected value (mean) for each column
where $O_i$ is an observed frequency $E_i$ is the expected frequency asserted by the null hypothesis
Calculate $\chi^2$:
$\chi^2 = \sum_{i=1}^{n}\frac{(O_i-E_i)^2}{E_i}$
Compare with lookup value for a given significance level and ded
Get to know these and others: https://docs.scipy.org/doc/scipy/reference/stats.html
What about ordinal correlation?
Spearman’s Rank Coefficient $\rho$:
Convert each variable into a ranked list
Compute:
$\rho=1-\frac{6\sum{(x_i-y_i)^2}}{n(n^2-1})$
Kendall’s $\tau$
$\tau=\frac{(\textrm{num. concordant ranked pairs})-(\textrm{num. disconcordant ranked pairs})}{\binom{n}{2}}$