https://colab.research.google.com/drive/1aFcVpTxy5XzfMx2toB7VRJOLW8uBAZQg?usp=sharing

**Parametric methods**

- They are based in means, standard deviations or probabilities.

**The Normal distribution is not always appropriate**

- To study variables with a few observations,
- Non-symmetrical distributions, or
- Variables that can have more than two values

## Non-parametric methods

They are not based in the same assumptions that parametric methods, but also have some assumptions.

Situation | Non-paramethric method | Paramethric methods |

One sample | Wilcoxon signed rank test | T test |

Two indpendent samples | Mann-Whitney U test | T test for two independent samples |

Two paired samples | Wilcoxon signed rank test | Paired T-test |

One sample, two quantitative variables | Correlation coefficient of Spearman | Correlation coefficient of Pearson |

## Wilcoxon Signed Rank Test

## Analysis of Variance (ANOVA)

Generalization of $đť‘ˇ$-test for >2 treatments

**Given**: $đť‘›$ experimental treatments, one dependent variable

**Assumes**:

- the variables are normally distributed in each treatment
- the variances for the treatments are similar
- the sample sizes for the treatments do not differ hugely

(Okay to deviate slightly from these assumptions for larger samples sizes)

Works by analyzing how much of the total variance is due to differences within groups, and how much is due to differences across groups.

**Procedure**:

$H_0$: There is no difference in the population means across all treatments

Compute the F-statistic:

F=(found variation of the group averages)/(expected variation of the group averages)

(donâ€™t do this by hand!)

If $H_0$ is true, we would expect F=1

**Note**: ANOVA tells you whether there is a significant difference, but does not tell you which treatment(s) are different.

## $\chi^2$ Test

â€śANOVA for non-interval dataâ€ť

**Given**: data in an đť‘› x đť‘š frequency table (e.g. đť‘› treatments, đť‘š variables)

**Assumes**:

- Non-parametric, hence no assumption of normality
- Reasonable sample size (pref >50, although some say >20)
- Reasonable numbers in each cell

Calculates whether the data fits a given distribution

**Basis**: computes the sum of the Observed-Expected values

Calculate an expected value (mean) for each column

where $O_i$ is an observed frequency $E_i$ is the expected frequency asserted by the null hypothesis

Calculate $\chi^2$:

$\chi^2 = \sum_{i=1}^{n}\frac{(O_i-E_i)^2}{E_i}$

Compare with lookup value for a given significance level and ded

Get to know these and others: https://docs.scipy.org/doc/scipy/reference/stats.html

## What about ordinal correlation?

#### Spearmanâ€™s Rank Coefficient $\rho$:

Convert each variable into a ranked list

Compute:

$\rho=1-\frac{6\sum{(x_i-y_i)^2}}{n(n^2-1})$

#### Kendall’s $\tau$

$\tau=\frac{(\textrm{num. concordant ranked pairs})-(\textrm{num. disconcordant ranked pairs})}{\binom{n}{2}}$