learning resources

proof of least squares solution

https://math.stackexchange.com/questions/131590/derivation-of-the-formula-for-ordinary-least-squares-linear-regression

linear regression using the linear model (lm) function

how to obtain coefficients (intercept and slope for a linear model)

library(UsingR)
library(ggplot2)

# generate some fake data
set.seed(1234)
beta <- 2
intercept <- 10
n <- 50
m <- 10
s <- 10
noise <- rnorm(n, mean = m, sd = s)
observed <- runif(n = 50, min = 1, max = 100 )
outcome <- beta*observed + rep(intercept, n) + noise
fake <- as.data.frame(cbind(observed, outcome))

# plot the data
plot(fake$observed, fake$outcome,  
     xlab = "observed (units)", 
     ylab = "outcome (units)", 
     bg = "lightblue", 
     col = "black", cex = 1.1, pch = 21,frame = FALSE)


# calculate the coefficients of the linear model
fit <- lm(outcome ~ observed, data = fake)


#plot the regression line and the predicted points
abline(fit, lwd = 2)
points(fake$observed, predict(fit), pch = 19, col = "red") 

# examine the coefficients and the generated model
fit

Call:
lm(formula = outcome ~ observed, data = fake)

Coefficients:
(Intercept)     observed  
     14.529        2.018  
coef(fit)
(Intercept)    observed 
  14.528576    2.017873 
summary(fit)

Call:
lm(formula = outcome ~ observed, data = fake)

Residuals:
    Min      1Q  Median      3Q     Max 
-18.365  -5.018  -1.018   4.001  28.201 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  14.5286     2.5597   5.676 7.81e-07 ***
observed      2.0179     0.0423  47.703  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.926 on 48 degrees of freedom
Multiple R-squared:  0.9793,    Adjusted R-squared:  0.9789 
F-statistic:  2276 on 1 and 48 DF,  p-value: < 2.2e-16
# to show the following 6) plots
# 1) a plot of residuals against fitted values (should be uncorrelated with fitted (observed) values), 
# 2) a Scale-Location plot of sqrt(| residuals |) against fitted values, 
# 3) a Normal Q-Q plot (residuals (error terms) are assumed to follow a normal distribution for many tests), 
# 4) a plot of Cook's distances versus row labels, 
# 5) a plot of residuals against leverages, and 
# 6) a plot of Cook's distances against leverage/(1-leverage). 
# see methods(plot) then ?plot.lm for more
plot(fit, which = c(1:6))