How to lie with Statistics – Tim Weninger, PhD

P-Hacking

Parametric methods

The Normal distribution is not always appropriate

They are not based in the same assumptions that parametric methods, but also have some assumptions.

Situation	Non-paramethric method	Paramethric methods
One sample	Wilcoxon signed rank test	T test
Two indpendent samples	Mann-Whitney U test	T test for two independent samples
Two paired samples	Wilcoxon signed rank test	Paired T-test
One sample, two quantitative variables	Correlation coefficient of Spearman	Correlation coefficient of Pearson

Generalization of $𝑡$-test for >2 treatments

Given: $𝑛$ experimental treatments, one dependent variable

Assumes:

the variables are normally distributed in each treatment
the variances for the treatments are similar
the sample sizes for the treatments do not differ hugely
(Okay to deviate slightly from these assumptions for larger samples sizes)

Works by analyzing how much of the total variance is due to differences within groups, and how much is due to differences across groups.

Procedure:

$H_0$: There is no difference in the population means across all treatments
Compute the F-statistic:

F=(found variation of the group averages)/(expected variation of the group averages)
(don’t do this by hand!)

If $H_0$ is true, we would expect F=1

Note: ANOVA tells you whether there is a significant difference, but does not tell you which treatment(s) are different.

“ANOVA for non-interval data”

Given: data in an 𝑛 x 𝑚 frequency table (e.g. 𝑛 treatments, 𝑚 variables)

Assumes:

Calculates whether the data fits a given distribution

Basis: computes the sum of the Observed-Expected values

Calculate an expected value (mean) for each column

where $O_i$ is an observed frequency $E_i$ is the expected frequency asserted by the null hypothesis

Calculate $\chi^2$:

$\chi^2 = \sum_{i=1}^{n}\frac{(O_i-E_i)^2}{E_i}$

Compare with lookup value for a given significance level and ded

Convert each variable into a ranked list
Compute:

$\rho=1-\frac{6\sum{(x_i-y_i)^2}}{n(n^2-1})$

$\tau=\frac{(\textrm{num. concordant ranked pairs})-(\textrm{num. disconcordant ranked pairs})}{\binom{n}{2}}$