InferentialStatistics/Notebook.rmd at main · DCS-training/InferentialStatistics · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
title: "Session B - How do I make inferential statistics?."
author: <INSERT YOUR NAME>
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

Throughout remember we will need tidyverse. Additionally we will continue to work with the MASS package and the cats data. Go ahead and load this in.

```{r Exercise0, message=FALSE}
library(tidyverse)
library(MASS)
data("cats")
```

-------------

When we conduct inferential statistics, we are simply formulating the same types of questions that we were asking in descriptive statistics but checking this in a more mathematically rigorous manner. For example in order to see whether there is a difference in body weights between the male and female cats we can conduct a t-test. Have a look at the following t-test for the body weights.

```{r T-TestBody}
cats_male <- cats %>%
  filter(Sex == 'M')

cats_female <- cats %>%
  filter(Sex == 'F')

t.test(cats_male$Bwt, cats_female$Bwt)
```

### Exercise 1

Is there a difference between the mean heart weight of the male and female cats? Conduct a t-test in order to check this.
[Hint: you do not need to recreate the dataframes cats_male and cats_female]

```{r T-TestHeart}

```

-------------

Did you notice that the R programme automatically performed the Welch t-test? Why might this be a good thing from what we have learnt?

### Exercise 2

Re-perform the t-test using the two sample t-test. To do this you may need to look at the parameters of the t-test function using ?t.test. Compare the confidence intervals of the two different tests, which one is more conservative? What can we take away about the difference in mean weights?

```{r T-TestHeart2}

```

-------------

### Exercise 3

Formally write out the hypothesis of the tests above, including a rigourous statistical conclusion.

-------------

We did all this on the assumption that the mean of the data was normally distributed.

### Exercise 4

Which mathematical theorem allows us to do this? Write this down.

-------------

### Exercise 5

Next, we will try a test of association. We will look at the relationship between cats' body and hearts' weights. First of all, do you expect there to be a relationship between the two?


-------------

### Exercise 6

Take a look at the plot below. How would you describe the relationship between cats' body and heart weights? Does this align with what you have answered above? Specifically, are they positively correlated (if one increases, the other also increases), negatively correlated (if one increases, the other decreases), or neither?

```{r corTestPlot, fig.width=3, fig.height=3}
ggplot(cats, aes(x=Bwt, y=Hwt)) +
  geom_point() +
  labs(x="Cat body weight (kg)", y="Cat heart weight (g)")
```

-------------

### Exercise 7

A commonly used test statistic for correlations is Pearson's Correlation Coefficient (PCC), or Pearson's r. The function for this test is `cor.test`. If r = 1, then the two variables are perfectly positively correlated. If r = -1, then they are perfectly negatively correlated. Run `cor.test` on cats' body and heart weight data to perform a correlation test.

```{r corTest}

```

-------------

### Exercise 8

Formally write out the hypothesis of the test above (Hint: the alternative hypothesis is formulated for you in `cor.test` results), including a rigourous statistical conclusion.