這是一個關於甚麼因素會影響出生嬰兒體重的研究,包含了以下欄位的資料,我們用R讀取資料後進行重新編碼。
#ID : Indentification Code
#LOW : Low Birth Weight Baby (1=Yes under 2500g,0 = No)
#AGE : Mother's age in years
#LWT : Weight at Last Period
#RACE : 1=White, 2=Black, 3=Other
#SMOKE : Smoke during the pregnancy (1=Yes,0=No)
#PTL : History of Premature Labour(# of time)
#HT : History of Hypertension (1=Yes,0=No)
#UI : Presence of Uterine Inrritability (1=Yes,0=No)
#FTV : Visits to Doctor During 1st trimester
#BWT : Baby's birth Weight in Grams
![]()
➤第二步:建立線性模型
#ID : Indentification Code
#LOW : Low Birth Weight Baby (1=Yes under 2500g,0 = No)
#AGE : Mother's age in years
#LWT : Weight at Last Period
#RACE : 1=White, 2=Black, 3=Other
#SMOKE : Smoke during the pregnancy (1=Yes,0=No)
#PTL : History of Premature Labour(# of time)
#HT : History of Hypertension (1=Yes,0=No)
#UI : Presence of Uterine Inrritability (1=Yes,0=No)
#FTV : Visits to Doctor During 1st trimester
#BWT : Baby's birth Weight in Grams
setwd("D:/Data analysis/R/Blog/Stepwise")
lowbwt <- br="" header="T)" lowbwt.csv="" read.csv="">
##Recoding
lowbwt = within(lowbwt,{
##Relabel race
race = factor(RACE , levels = 1:3, labels = c("White","Black","Other"))
##Categorize ftv
ftv = cut(FTV, breaks = c(-Inf,0,2,Inf), labels = c("None","Normal","Many"))
ftv = relevel(ftv, ref = "Normal")
##Dichotomize prl
ptl = factor(PTL >=1, levels = c(FALSE, TRUE), labels = c("0","1+"))
##Categorie smoke ht ui
smoke = factor(SMOKE, levels = 0:1, labels = c("NO","YES"))
ht = factor(HT, levels = 0:1, labels = c("NO","YES"))
ui = factor(UI, levels = 0:1, labels = c("NO","YES"))
})
➤第二步:建立線性模型
以下為了進行逐步回歸的建模,我們先建立兩組線性模型,以新生嬰兒體重BWT為應變數,一組是將所有自變數放入的全模型,另一組則是什麼都不放入的空模型,然後我們將全模型的圖畫出來,觀察一下每個自變數對應變數的影響狀況。
➤第三步:Manual F-test-based backward selection
第一個逐步回歸方法我們採用F-test 的向後逐步回歸,也就是我們將以全模型開始,逐步丟棄顯著性低的自變數,過程如下。
➤第四步:結論
以下我們可以觀察一下利用F-test做向後逐步回歸之後的結果,不過從R的console可以看出逐步回歸後模型的R-squared僅有0.2403,解釋能力不是太高,或許其中有許多因素,像是共線性或者資料蒐集不足等問題值得我們後續慢慢探討,不過這邊就算簡單練習用R進行逐步回歸的過程囉!
原本的全模型:BWT ~ AGE + LWT + race + smoke + ptl + ht + ui + ftv
逐步回歸結果:BWT ~ LWT + race + smoke + ht + ui
➤另外這裡也提供其他逐步回歸的建模方法的R-code 在下面,可以嘗試不同方法看看模型的變化以及解釋能力。
##Create a full and null linear models
lm.fullmodel = lm(BWT~AGE + LWT + race + smoke + ptl + ht + ui + ftv, data = lowbwt)
lm.nullmodel = lm(BWT~1,data = lowbwt)
plot(allEffects(lm.fullmodel),ylim = c(2000,4000))
第一個逐步回歸方法我們採用F-test 的向後逐步回歸,也就是我們將以全模型開始,逐步丟棄顯著性低的自變數,過程如下。
drop1(lm.fullmodel, test = "F")
drop1(update(lm.fullmodel, ~. -AGE),test ="F")
drop1(update(lm.fullmodel,~.-AGE-ftv),test = "F")
drop1(update(lm.fullmodel,~.-AGE-ftv-ptl),test = "F")
summary(update(lm.fullmodel,~.-AGE-ftv-ptl))
formula(update(lm.fullmodel,~.-AGE-ftv-ptl))
![]() | |
|
![]() |
我們將自變數Age拿掉後,觀察到第二個最不顯著的自變數ftv |
![]() |
我們將自變數ftv也拿掉後,觀察到第三個最不顯著的自變數ptl |
![]() |
我們已經將所有不顯著的自變數拿掉了! |
![]() |
最後我們觀察模型的結果 |
以下我們可以觀察一下利用F-test做向後逐步回歸之後的結果,不過從R的console可以看出逐步回歸後模型的R-squared僅有0.2403,解釋能力不是太高,或許其中有許多因素,像是共線性或者資料蒐集不足等問題值得我們後續慢慢探討,不過這邊就算簡單練習用R進行逐步回歸的過程囉!
原本的全模型:BWT ~ AGE + LWT + race + smoke + ptl + ht + ui + ftv
逐步回歸結果:BWT ~ LWT + race + smoke + ht + ui
➤另外這裡也提供其他逐步回歸的建模方法的R-code 在下面,可以嘗試不同方法看看模型的變化以及解釋能力。
- Manual F-test-based backward selection
- Manual F-test-base forward selection
- Automated F-test-based backward selection using rms::fasrbw()
- AIC-based backward selection
- AIC-based forward selection
- AIC-based forward-backward selection
##Manual F-test-based backward selection
drop1(lm.fullmodel, test = "F")
drop1(update(lm.fullmodel, ~. -AGE),test ="F")
drop1(update(lm.fullmodel,~.-AGE-ftv),test = "F")
drop1(update(lm.fullmodel,~.-AGE-ftv-ptl),test = "F")
summary(update(lm.fullmodel,~.-AGE-ftv-ptl))
formula(lm.fullmodel)
formula(update(lm.fullmodel,~.-AGE-ftv-ptl))
##Manual F-test-base forward selection
add1(lm.nullmodel,scope = ~AGE + LWT + race + smoke + ptl + ht + ui + ftv,test = "F" )
add1(update(lm.nullmodel,~.+ui),scope = ~AGE + LWT + race + smoke + ptl + ht + ui + ftv,test = "F" )
add1(update(lm.nullmodel,~.+ui+race),scope = ~AGE + LWT + race + smoke + ptl + ht + ui + ftv,test = "F" )
add1(update(lm.nullmodel,~.+ui+race+smoke),scope = ~AGE + LWT + race + smoke + ptl + ht + ui + ftv,test = "F" )
add1(update(lm.nullmodel,~.+ui+race+smoke+ht),scope = ~AGE + LWT + race + smoke + ptl + ht + ui + ftv,test = "F" )
add1(update(lm.nullmodel,~.+ui+race+smoke+ht+LWT),scope = ~AGE + LWT + race + smoke + ptl + ht + ui + ftv,test = "F" )
summary(update(lm.nullmodel,~.+ui+race+smoke+ht+LWT))
formula(update(lm.nullmodel,~.+ui+race+smoke+ht+LWT))
#Automated F-test-based backward selection using rms::fasrbw()
install.packages("rms")
library(rms)
ols.fullmodel = ols(BWT~AGE + LWT + race + smoke + ptl + ht + ui + ftv, data = lowbwt)
fastbw(ols.fullmodel, rule = "p" , sls = 0.1)
#AIC-based backward selection
model.AIC.backward = step(lm.fullmodel,direction = "backward",trace = 1)
summary(model.AIC.backward)
formula(model.AIC.backward)
#AIC-based forward selection
model.AIC.forward = step(lm.nullmodel,direction = "forward",trace = 1,scope = ~AGE + LWT + race + smoke + ptl + ht + ui + ftv)
summary(model.AIC.forward)
formula(model.AIC.forward)
#AIC-based forward-backward selection
model.AIC.both = step(lm.nullmodel,direction = "both",trace = 1,scope = ~AGE + LWT + race + smoke + ptl + ht + ui + ftv)
summary(model.AIC.both)
formula(model.AIC.both)