---
title: "PSY 392 - HW2"
output: html_document
---
Please read everything carefully!!!

*** Due Friday, September 13th at 5:00 PM via email to Garrett (goday@purdue.edu)

*** Submit this R markdown file but change the name of the file to lastName_FirstName_PSY392_HW2

Rmarkdown:
This document is a R markdown file and has the file extension ".rmd".

"R Markdown is a file format for making dynamic documents with R. An R Markdown document is written in markdown (an easy-to-write plain text format) and contains chunks of embedded R code, like the document below."

To add code you can click on 'Code' in the top menu bar and select 'Insert Chunk'. 
Alternatively, you can type '''{r} to start a code chunk and ``` to close a code chunk.

```{r}
# this is an example code chunk. 
# Anything between lines 16 and 18 is considered code.
2+2
```

Rmarkdown has three major benefits for this class:
1) I do not have to use comments outside of a code chunk, making the document more readable.
2) The output of a code chunk is presented below it, This allows me to see your exact output.
3) Please write all of your answers within a code chunk. This makes grading easier.


Homework Overview:
The purpose of this homework is to continue supporting your learning of R, expose you to Rmarkdown, and provide an opportunity for you to use and apply your knowledge about the normal distribution. 

Question 1.
What are the unique characteristics of the standard normal distribution?
```{r}
# provide your answer as a comment in this code chunk

# Standard normal distribution has a mean of 0 and a sd of 1 (2 points)

```

The online calculators will help you with this homework.

The Normal Distribution Calculator (https://introstatsonline.com/chapters/calculators/normal_dist.shtml) 

The Inverse Normal Distribution Calculator (https://introstatsonline.com/chapters/calculators/inverse_normal_dist.shtml)

Question 2.
We know that for a standard normal distribution, the shaded area above 0 is .5. 
How does increasing the standard deviation affect the shaded area above 0?
```{r}
# provide your answer as a comment in this code chunk

# (1 point)
# A normal distribution is symmetrical with half of the distribution above the mean and half below. 
# Increasing the standard deviation does not change this.
# You can also just plug this into the online calculator.

```


Question 3.
For a standard normal distribution, what is the area between -1 and 1 SDs?
```{r}
# provide your answer as a comment in this code chunk

# (1 point)
# Use the online calculator. Set mean 0 and sd 1. Click between -1 and 1.
# = .6827

```

Question 4.
Suppose that a intro to chemistry class has 500 students.
They all take a midterm and their scores are normally distributed.
The professor says anyone who scores 1.75 standard deviations above the mean will get an A.
How many students will receive an A on their midterm? (note the unit of interest)
```{r}
# provide your work in this code chunk

# (1 point)
# Find the area above 1.75 = .0401
# So, about 4% of students will be above this critical value 
# = .0401 * 500 = 20.05 = 20 students


```

Question 5.
Suppose you have a normal distribution with a mean of 75 and a standard deviation of 15.
What percentile corresponds to a score of 100?
```{r}
# provide your work and/or explain your reasoning in this code chunk

# (1 point)
# Set mean to 75 and sd to 15 in online calculator
# Find area below 100 = .9522
# 95 percent of scores fall at or below the score of 100.
# Thus, 100 is the 95th percentile
```


Answer questions 6-10 using information from the following scenario.

A psychology professor graded a midterm for a class of 100 students on strict cut offs of:
90 - 100% = A
80 - 89.9% = B
70 - 79.9% = C
60 - 69.9% = D
< 60% = F

Imagine that the actual test scores were normally distributed. With a mean of 85% and a standard deviation of 10%.

Question 6.
How many students received an A?
```{r}
# provide your answer as a comment in this code chunk
# be sure to explain your reasoning

# (1 point)
# Set mean to 85 and sd to 10
# Find the area above 90 as that is the cut off for As = .3085
# .3085 * 100 students = 30.85 = 31 students would be expected to get As

# You could have also found the area between 90-100 I graded both as correct
# This results in 24 students
```

Question 7.
How many students received a B?
```{r}
# provide your answer as a comment in this code chunk
# be sure to explain your reasoning

# (1 point)
# area between 80-90 = .3794 = 38 students

```

Question 8.
How many students received a C?
```{r}
# provide your answer as a comment in this code chunk
# be sure to explain your reasoning

# (1 point)
# area between 70-80 = 24 students

```

Question 9.
How many students received a D?
```{r}
# provide your answer as a comment in this code chunk
# be sure to explain your reasoning

# (1 point)
# area between 60 and 70 = .0606 * 100 students = 6 students
```

Question 10.
How many students failed the midterm?
```{r}
# provide your answer as a comment in this code chunk
# be sure to explain your reasoning

# (1 point)
# area below 60 = .006 * 100 = .6 = 1 student

# 31 As
# 38 Bs
# 24 Cs
# 6 Ds
# 1 F
```

Question 11 - Computing Z scores in R.
R comes with a number of built in data sets that people can explore.
When people ask for help online, people often provide answers using these universal data frames since every user has them.
For question 13, you will compute and plot z scores for the mtcars data frame.

```{r}
# create a dataframe (df) to contain mtcars
carsData <- mtcars
head(carsData) # peek at the first 6 rows of this data frame
carsData$mpg   # look at mpg column

# a) compute the mean mpg
mean(carsData$mpg) # 20.09 # (1 point)

# b) compute the sd mpg
sd(carsData$mpg) # 6.03 # (1 point)

# c) create a new column in the data frame that converts each mpg value into a z score
carsData$zscore <- (carsData$mpg - mean(carsData$mpg)) / sd(carsData$mpg) # (1 point)

# d) what are the units of this new column?
# z scores reflect standard deviations from the mean

# e) what is the mean of this new column?
mean(carsData$zscore) # What does this value mean and why is it not zero?

# f) what is the standard deviation of this new column?
sd(carsData$zscore)

# g) create a visualization of the z score column
plot(carsData$wt,carsData$zscore)

# h) provide a brief description of your visualization (You could have plotted whatever you liked as long as it made some sense)
# I plotted the weight of the car on the x-axis and the standardized mpg on the y-axis.
# The majority of cars have mpg that are close to the mean.
# The lighter cars in this data set have MPG that is 2 SD above the mean.
# The heavier cars in this data set have MPG that is 1 SD below the mean.
```