Data Specialization Course
Videos: search Brian Caffo on YouTube
Code: https://github.com/DataScienceSpecialization/courses and
Notes: http://datasciencespecialization.github.io/
ISLR Videos and Notes https://www.dataschool.io/15-hours-of-expert-machine-learning-videos/
library(UsingR)
library(ggplot2)
# generate some fake data
set.seed(1234)
beta <- 2
intercept <- 10
n <- 50
m <- 10
s <- 10
noise <- rnorm(n, mean = m, sd = s)
observed <- runif(n = 50, min = 1, max = 100 )
outcome <- beta*observed + rep(intercept, n) + noise
fake <- as.data.frame(cbind(observed, outcome))
# plot the data
plot(fake$observed, fake$outcome,
xlab = "observed (units)",
ylab = "outcome (units)",
bg = "lightblue",
col = "black", cex = 1.1, pch = 21,frame = FALSE)
# calculate the coefficients of the linear model
fit <- lm(outcome ~ observed, data = fake)
#plot the regression line and the predicted points
abline(fit, lwd = 2)
points(fake$observed, predict(fit), pch = 19, col = "red")
# examine the coefficients and the generated model
fit
Call:
lm(formula = outcome ~ observed, data = fake)
Coefficients:
(Intercept) observed
14.529 2.018
coef(fit)
(Intercept) observed
14.528576 2.017873
summary(fit)
Call:
lm(formula = outcome ~ observed, data = fake)
Residuals:
Min 1Q Median 3Q Max
-18.365 -5.018 -1.018 4.001 28.201
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.5286 2.5597 5.676 7.81e-07 ***
observed 2.0179 0.0423 47.703 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.926 on 48 degrees of freedom
Multiple R-squared: 0.9793, Adjusted R-squared: 0.9789
F-statistic: 2276 on 1 and 48 DF, p-value: < 2.2e-16
# to show the following 6) plots
# 1) a plot of residuals against fitted values (should be uncorrelated with fitted (observed) values),
# 2) a Scale-Location plot of sqrt(| residuals |) against fitted values,
# 3) a Normal Q-Q plot (residuals (error terms) are assumed to follow a normal distribution for many tests),
# 4) a plot of Cook's distances versus row labels,
# 5) a plot of residuals against leverages, and
# 6) a plot of Cook's distances against leverage/(1-leverage).
# see methods(plot) then ?plot.lm for more
plot(fit, which = c(1:6))