달력

12

« 2019/12 »

  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  •  
  •  
  •  
  •  

   

R 개발환경 구축 단계

R 다운로드 및 설치

R download

http://codedragon.tistory.com/1300

   

R install

http://codedragon.tistory.com/1301

   

   

RStudio 다운로드 및 설치

RStudio download

http://codedragon.tistory.com/1302

   

RStudio install

http://codedragon.tistory.com/1303

Posted by codedragon codedragon

댓글을 달아 주세요

   

   

다운 받은 설치파일 실행

   

다음

   

다음

   

설치

   

   

   

마침

   

프로그램 목록에서 설치 확인

   

   

RStudio 실행화면

'Development > Big Data, R, ...' 카테고리의 다른 글

긴 코딩을 여러줄 코딩으로  (0) 2015.05.13
R 개발환경 구축 단계  (0) 2015.05.03
RStudio IDE 설치  (0) 2015.04.24
RStudio IDE 다운로드, download  (0) 2015.04.17
RStudio Shortkey(알스튜티오 단축키)  (0) 2015.04.11
R-3.1.2 설치  (0) 2015.04.02
Posted by codedragon codedragon

댓글을 달아 주세요

   

   

RStudio

R을 이용한 빅데이터 분석을 더욱 더 효율적으로 지원

R스튜디오를 통해 개발 툴의 유용한 기능 및 다양한 리소스를 이용할 수 있습니다


Rstudio homepage

http://www.rstudio.com/

   

   

다운로드

상단 메뉴에서 Products >>> Rstudio >>>

Rstudio Desktop 부분에서 DOWNLOAD RSTUDIO DESKTOP 버튼 클릭

   

   

http://www.rstudio.com/products/rstudio/download/

   

RStudio 0.98.1102 - Windows XP/Vista/7/8

   

   

   

   

직접다운로드

RStudio-0.98.1102.zip.001


RStudio-0.98.1102.zip.002


RStudio-0.98.1102.zip.003


RStudio-0.98.1102.zip.004


RStudio-0.98.1102.zip.005


'Development > Big Data, R, ...' 카테고리의 다른 글

R 개발환경 구축 단계  (0) 2015.05.03
RStudio IDE 설치  (0) 2015.04.24
RStudio IDE 다운로드, download  (0) 2015.04.17
RStudio Shortkey(알스튜티오 단축키)  (0) 2015.04.11
R-3.1.2 설치  (0) 2015.04.02
R Code, 분석 알고리즘, pdf  (0) 2015.03.28
Posted by codedragon codedragon

댓글을 달아 주세요

   

R-3.1.2 설치

   

다운 받은 설치파일 실행

   

확인

   

다음

   

다음

   

다음

   

다음

   

다음

   

다음

   

다음

   

   

완료

   

메뉴에서 확인

Posted by codedragon codedragon

댓글을 달아 주세요

   

   

An Introduction to Statistical Learning

http://www-bcf.usc.edu/~gareth/ISL/

   

   

http://www-bcf.usc.edu/~gareth/ISL/book.html

   

   

목차

Preface vii

1 Introduction 1

2 Statistical Learning 15

2.1 What Is Statistical Learning?. . . . . . 15

2.1.1 Why Estimate f?. . . . . . . . . 17

2.1.2 How Do We Estimate f?. . . . 21

2.1.3 The Trade-Off Between Prediction Accuracy

and Model Interpretability. . . 24

2.1.4 Supervised Versus Unsupervised Learning . . . . . . 26

2.1.5 Regression Versus Classification Problems . . . . . . 28

2.2 AssessingModel Accuracy. . . . . . . . 29

2.2.1 Measuring the Quality of Fit. . 29

2.2.2 The Bias-VarianceTrade-Off. . 33

2.2.3 The Classification Setting. . . . 37

2.3 Lab: Introduction to R. . . . . . . . . . 42

2.3.1 Basic Commands. . . . . . . . . 42

2.3.2 Graphics.. 45

2.3.3 Indexing Data. . . . . . . . . . 47

2.3.4 Loading Data. . . . . . . . . . . 48

2.3.5 Additional Graphical and Numerical Summaries . . 49

2.4 Exercises.. . . . . 52

3 Linear Regression 59

3.1 Simple Linear Regression. . . . . . . . 61

3.1.1 Estimating the Coefficients. . . 61

3.1.2 Assessing the Accuracy of the Coefficient

Estimates.. 63

3.1.3 Assessing the Accuracy of theModel . . . . . . . . . 68

3.2 Multiple Linear Regression. . . . . . . 71

3.2.1 Estimating the Regression Coefficients . . . . . . . . 72

3.2.2 Some Important Questions. . . 75

3.3 Other Considerations in the Regression Model . . . . . . . . 82

3.3.1 Qualitative Predictors. . . . . . 82

3.3.2 Extensions of the LinearModel. 86

3.3.3 Potential Problems. . . . . . . . 92

3.4 TheMarketing Plan. . . . . . . . . . . 102

3.5 Comparison of Linear Regression with K-Nearest

Neighbors.. . . . . 104

3.6 Lab: Linear Regression. . . . . . . . . . 109

3.6.1 Libraries.. . 109

3.6.2 Simple Linear Regression. . . . 110

3.6.3 Multiple Linear Regression. . . 113

3.6.4 Interaction Terms. . . . . . . . 115

3.6.5 Non-linear Transformations of the Predictors . . . . 115

3.6.6 Qualitative Predictors. . . . . . 117

3.6.7 Writing Functions. . . . . . . . 119

3.7 Exercises.. . . . . 120

4 Classification 127

4.1 An Overview of Classification. . . . . . 128

4.2 Why Not Linear Regression?. . . . . . 129

4.3 Logistic Regression.130

4.3.1 The LogisticModel. . . . . . . . 131

4.3.2 Estimating the Regression Coefficients . . . . . . . . 133

4.3.3 Making Predictions. . . . . . . . 134

4.3.4 Multiple Logistic Regression.. . 135

4.3.5 Logistic Regression for >2 Response Classes . . . . . 137

4.4 Linear Discriminant Analysis. . . . . . 138

4.4.1 Using Bayes' Theorem for Classification . . . . . . . 138

4.4.2 Linear Discriminant Analysis for p=1 . . . . . . . . 139

4.4.3 Linear Discriminant Analysis for p >1 . . . . . . . . 142

4.4.4 Quadratic Discriminant Analysis149

4.5 A Comparison of Classification Methods151

4.6 Lab: Logistic Regression, LDA, QDA, and KNN . . . . . . 154

4.6.1 The StockMarket Data. . . . . 154

4.6.2 Logistic Regression. . . . . . . . 156

4.6.3 Linear Discriminant Analysis. . 161

4.6.4 Quadratic Discriminant Analysis163

4.6.5 K-NearestNeighbors. . . . . . . 163

4.6.6 An Application to Caravan Insurance Data . . . . . 165

4.7 Exercises.. . . . . 168

5 Resampling Methods 175

5.1 Cross-Validation.. 176

5.1.1 The Validation Set Approach. . 176

5.1.2 Leave-One-Out Cross-Validation178

5.1.3 k-Fold Cross-Validation. . . . . 181

5.1.4 Bias-Variance Trade-Off for k-Fold

Cross-Validation. . . . . . . . . 183

5.1.5 Cross-Validation on Classification Problems . . . . . 184

5.2 The Bootstrap.. . 187

5.3 Lab: Cross-Validation and the Bootstrap190

5.3.1 The Validation Set Approach. . 191

5.3.2 Leave-One-Out Cross-Validation192

5.3.3 k-Fold Cross-Validation. . . . . 193

5.3.4 The Bootstrap. . . . . . . . . . 194

5.4 Exercises.. . . . . 197

6 Linear Model Selection and Regularization 203

6.1 Subset Selection.. 205

6.1.1 Best Subset Selection. . . . . . 205

6.1.2 Stepwise Selection. . . . . . . . 207

6.1.3 Choosing the OptimalModel. . 210

6.2 ShrinkageMethods.214

6.2.1 Ridge Regression. . . . . . . . . 215

6.2.2 The Lasso.. 219

6.2.3 Selecting the Tuning Parameter. 227

6.3 Dimension ReductionMethods. . . . . 228

6.3.1 Principal Components Regression230

6.3.2 Partial Least Squares. . . . . . 237

6.4 Considerations in High Dimensions. . . 238

6.4.1 High-Dimensional Data. . . . . 238

6.4.2 What Goes Wrong in High Dimensions? . . . . . . . 239

6.4.3 Regression in High Dimensions. 241

6.4.4 Interpreting Results in High Dimensions . . . . . . . 243

6.5 Lab 1: Subset Selection Methods. . . . 244

6.5.1 Best Subset Selection. . . . . . 244

6.5.2 Forward and Backward Stepwise Selection . . . . . . 247

6.5.3 Choosing Among Models Using the Validation

Set Approach and Cross-Validation . . . . . . . . . . 248

6.6 Lab 2: Ridge Regression and the Lasso. 251

6.6.1 Ridge Regression. . . . . . . . . 251

6.6.2 The Lasso.. 255

6.7 Lab 3: PCR and PLS Regression. . . . 256

6.7.1 Principal Components Regression256

6.7.2 Partial Least Squares. . . . . . 258

6.8 Exercises.. . . . . 259

7 Moving Beyond Linearity 265

7.1 PolynomialRegression. . . . . . . . . . 266

7.2 Step Functions.. . 268

7.3 Basis Functions.. . 270

7.4 Regression Splines.271

7.4.1 Piecewise Polynomials. . . . . . 271

7.4.2 Constraints and Splines. . . . . 271

7.4.3 The Spline Basis Representation273

7.4.4 Choosing the Number and Locations

of the Knots. . . . . . . . . . . 274

7.4.5 Comparison to Polynomial Regression . . . . . . . . 276

7.5 Smoothing Splines.277

7.5.1 An Overview of Smoothing Splines . . . . . . . . . . 277

7.5.2 Choosing the Smoothing Parameter λ . . . . . . . . 278

7.6 Local Regression.. 280

7.7 Generalized AdditiveModels. . . . . . 282

7.7.1 GAMs for Regression Problems. 283

7.7.2 GAMs for Classification Problems . . . . . . . . . . 286

7.8 Lab: Non-linearModeling. . . . . . . . 287

7.8.1 Polynomial Regression and Step Functions . . . . . 288

7.8.2 Splines.. . . 293

7.8.3 GAMs.. . . 294

7.9 Exercises.. . . . . 297

8 Tree-Based Methods 303

8.1 The Basics of Decision Trees. . . . . . 303

8.1.1 Regression Trees. . . . . . . . . 304

8.1.2 Classification Trees. . . . . . . . 311

8.1.3 Trees Versus LinearModels. . . 314

8.1.4 Advantages and Disadvantages of Trees . . . . . . . 315

8.2 Bagging, Random Forests, Boosting. . 316

8.2.1 Bagging.. . 316

8.2.2 Random Forests. . . . . . . . . 320

8.2.3 Boosting.. . 321

8.3 Lab: Decision Trees.324

8.3.1 Fitting Classification Trees. . . 324

8.3.2 Fitting RegressionTrees. . . . . 327

8.3.3 Bagging and Random Forests. . 328

8.3.4 Boosting.. . 330

8.4 Exercises.. . . . . 332

9 Support Vector Machines 337

9.1 MaximalMargin Classifier. . . . . . . . 338

9.1.1 What Is a Hyperplane?. . . . . 338

9.1.2 Classification Using a Separating Hyperplane . . . . 339

9.1.3 TheMaximalMargin Classifier. 341

9.1.4 Construction of the Maximal Margin Classifier . . . 342

9.1.5 The Non-separable Case. . . . . 343

9.2 Support Vector Classifiers. . . . . . . . 344

9.2.1 Overview of the Support Vector Classifier . . . . . . 344

9.2.2 Details of the Support Vector Classifier . . . . . . . 345

9.3 Support Vector Machines. . . . . . . . 349

9.3.1 Classification with Non-linear Decision

Boundaries.349

9.3.2 The Support Vector Machine. . 350

9.3.3 An Application to the Heart Disease Data . . . . . . 354

9.4 SVMs withMore than Two Classes. . . 355

9.4.1 One-Versus-One Classification.. 355

9.4.2 One-Versus-All Classification. . 356

9.5 Relationship to Logistic Regression. . . 356

9.6 Lab: Support Vector Machines. . . . . 359

9.6.1 Support Vector Classifier. . . . 359

9.6.2 Support Vector Machine. . . . . 363

9.6.3 ROC Curves. . . . . . . . . . . 365

9.6.4 SVMwithMultiple Classes. . . 366

9.6.5 Application to Gene Expression Data . . . . . . . . 366

9.7 Exercises.. . . . . 368

10 Unsupervised Learning 373

10.1 The Challenge of Unsupervised Learning373

10.2 Principal Components Analysis. . . . . 374

10.2.1 What Are Principal Components? . . . . . . . . . . 375

10.2.2 Another Interpretation of Principal Components . . 379

10.2.3 More on PCA. . . . . . . . . . . 380

10.2.4 Other Uses for Principal Components . . . . . . . . 385

10.3 ClusteringMethods.385

10.3.1 K-Means Clustering. . . . . . . 386

10.3.2 Hierarchical Clustering. . . . . . 390

10.3.3 Practical Issues in Clustering. . 399

10.4 Lab 1: Principal Components Analysis. 401

10.5 Lab 2: Clustering.. 404

10.5.1 K-Means Clustering. . . . . . . 404

10.5.2 Hierarchical Clustering. . . . . . 406

10.6 Lab 3: NCI60 Data Example. . . . . . 407

10.6.1 PCA on the NCI60 Data. . . . 408

10.6.2 Clustering the Observations of the NCI60 Data . . . 410

10.7 Exercises.. . . . . 413

Index 419

   

   

직접 다운받기

ISLR Fourth Printing.pdf


Posted by codedragon codedragon

댓글을 달아 주세요

   

에러 메시지

필요한 패키지를 로딩중입니다: rJava
Error : .onLoad가 loadNamespace()에서 'rJava'때문에 실패했습니다:
호출: inDL(x, as.logical(local), as.logical(now), ...)
에러: unable to load shared object 'C:/Users/yuriyuri/Documents/R/win-library/3.1/rJava/libs/x64/rJava.dll':
LoadLibrary failure: 지정된 모듈을 찾을 수 없습니다.

Error: 패키지 'rJava'는 로드되어질 수 없습니다

   

해경방법 1

64bit 윈도우에서 32bit용 자바버전을 사용하고 있는 경우 발생할 수 있습니다.

   

   

JRE or JDK 를 해당 비트의 버전으로 설치

http://codedragon.tistory.com/1093

   

RStudio 재시작

   

   

   

해결방법 2

   

RStudio 창에 아래의 명령어 실행

Sys.setenv(JAVA_HOME="C:/Program Files/Java/jre1.8.0_40")

   

   

   

해결방법 3

.Rprofile 파일 생성 후 아래와 같이 코드 추가

(파일 존재하면 편집수행) 

   

.Rprofile

Sys.setenv(JAVA_HOME="C:/Program Files/Java/jre1.8.0_40")

   

   

RStudio 재시작

Posted by codedragon codedragon

댓글을 달아 주세요

  1. herin 2015.06.16 12:21 신고  댓글주소  수정/삭제  댓글쓰기

    감사합니다. 덕분에 오류 해결했네요

  2. 노경모 2017.06.13 16:39 신고  댓글주소  수정/삭제  댓글쓰기

    감사합니다. 계속 뭐가 문제인지 고민하였는데, 해결되었습니다.

up to date   

http://codedragon.tistory.com/4976


   

R homepage

The Comprehensive R Archive Network(CRAN)이라는 전세계적으로 연결되어 있는 미러사이트를 통해 R을 다운로드 받을 있습니다.

http://cran.rstudio.com/

   

다운로드

Download R for Windows

   

install R for the first time 클릭

   

Download R 3.1.2 for Windows

   

   

   

직접 다운로드

R-3.1.2-win.zip.001


R-3.1.2-win.zip.002


R-3.1.2-win.zip.003


R-3.1.2-win.zip.004


R-3.1.2-win.zip.005


R-3.1.2-win.zip.006


Posted by codedragon codedragon

댓글을 달아 주세요

   

   

R

  • 뉴질랜드 오클랜드(Auckland) 대학의 로스이하카(Ross Ihaka)와 로버트 젠틀맨(Robert Gentleman)에 의해 시작
  • 통계계산과 그래픽을 위한 프로그래밍 언어
  • GNU GPL을 따르고 있어 자유롭게 누구나 사용 가능
  • 멀티 플랫폼(윈도우, 리눅스, OS X등)지원
  • 소스코드의 수정 없이 재사용이 가능
  • 통계연구 및 비주얼라이제이션에 널리 활용
  • 통계계산과 패키지 개발 외에도 다양한 계산이 필요한 곳에서 우수한 성능 구현가능

   

   

R homepage

빅데이터 분석환경인 R은 “R Development Core Team”에서 운영하는 “The R Foundation” 공식사이트를 통해서 다운로드 및 다양한 정보를 이용할 수 있습니다.

http://www.r-project.org/

   

Posted by codedragon codedragon

댓글을 달아 주세요

 

An Introduction to Statistical Learning site

http://www-bcf.usc.edu/~gareth/ISL/

   

 

 

Introduction to Statistical Learning-Youtube

http://youtu.be/St2-97n7atk

 



An Introduction to Statistical Learning

 

 

목차

Preface vii

1 Introduction 1

2 Statistical Learning 15

2.1 What Is Statistical Learning? . . . . . . . . . . . . . . . . . 15

2.1.1 Why Estimate f? . . . . . . . . . . . . . . . . . . . . 17

2.1.2 How Do We Estimate f? . . . . . . . . . . . . . . . 21

2.1.3 The Trade-Off Between Prediction Accuracy

and Model Interpretability . . . . . . . . . . . . . . 24

2.1.4 Supervised Versus Unsupervised Learning . . . . . . 26

2.1.5 Regression Versus Classification Problems . . . . . . 28

2.2 AssessingModel Accuracy . . . . . . . . . . . . . . . . . . . 29

2.2.1 Measuring the Quality of Fit . . . . . . . . . . . . . 29

2.2.2 The Bias-VarianceTrade-Off . . . . . . . . . . . . . 33

2.2.3 The Classification Setting . . . . . . . . . . . . . . . 37

2.3 Lab: Introduction to R . . . . . . . . . . . . . . . . . . . . . 42

2.3.1 Basic Commands . . . . . . . . . . . . . . . . . . . . 42

2.3.2 Graphics . . . . . . . . . . . . . . . . . . . . . . . . 45

2.3.3 Indexing Data . . . . . . . . . . . . . . . . . . . . . 47

2.3.4 Loading Data . . . . . . . . . . . . . . . . . . . . . . 48

2.3.5 Additional Graphical and Numerical Summaries . . 49

2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3 Linear Regression 59

3.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . 61

3.1.1 Estimating the Coefficients . . . . . . . . . . . . . . 61

3.1.2 Assessing the Accuracy of the Coefficient

Estimates . . . . . . . . . . . . . . . . . . . . . . . . 63

3.1.3 Assessing the Accuracy of theModel . . . . . . . . . 68

3.2 Multiple Linear Regression . . . . . . . . . . . . . . . . . . 71

3.2.1 Estimating the Regression Coefficients . . . . . . . . 72

3.2.2 Some Important Questions . . . . . . . . . . . . . . 75

3.3 Other Considerations in the Regression Model . . . . . . . . 82

3.3.1 Qualitative Predictors . . . . . . . . . . . . . . . . . 82

3.3.2 Extensions of the LinearModel . . . . . . . . . . . . 86

3.3.3 Potential Problems . . . . . . . . . . . . . . . . . . . 92

3.4 TheMarketing Plan . . . . . . . . . . . . . . . . . . . . . . 102

3.5 Comparison of Linear Regression with K-Nearest

Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

3.6 Lab: Linear Regression . . . . . . . . . . . . . . . . . . . . . 109

3.6.1 Libraries . . . . . . . . . . . . . . . . . . . . . . . . . 109

3.6.2 Simple Linear Regression . . . . . . . . . . . . . . . 110

3.6.3 Multiple Linear Regression . . . . . . . . . . . . . . 113

3.6.4 Interaction Terms . . . . . . . . . . . . . . . . . . . 115

3.6.5 Non-linear Transformations of the Predictors . . . . 115

3.6.6 Qualitative Predictors . . . . . . . . . . . . . . . . . 117

3.6.7 Writing Functions . . . . . . . . . . . . . . . . . . . 119

3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

4 Classification 127

4.1 An Overview of Classification . . . . . . . . . . . . . . . . . 128

4.2 Why Not Linear Regression? . . . . . . . . . . . . . . . . . 129

4.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . 130

4.3.1 The LogisticModel . . . . . . . . . . . . . . . . . . . 131

4.3.2 Estimating the Regression Coefficients . . . . . . . . 133

4.3.3 Making Predictions . . . . . . . . . . . . . . . . . . . 134

4.3.4 Multiple Logistic Regression. . . . . . . . . . . . . . 135

4.3.5 Logistic Regression for >2 Response Classes . . . . . 137

4.4 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . 138

4.4.1 Using Bayes' Theorem for Classification . . . . . . . 138

4.4.2 Linear Discriminant Analysis for p=1 . . . . . . . . 139

4.4.3 Linear Discriminant Analysis for p >1 . . . . . . . . 142

4.4.4 Quadratic Discriminant Analysis . . . . . . . . . . . 149

4.5 A Comparison of Classification Methods . . . . . . . . . . . 151

4.6 Lab: Logistic Regression, LDA, QDA, and KNN . . . . . . 154

4.6.1 The StockMarket Data . . . . . . . . . . . . . . . . 154

4.6.2 Logistic Regression . . . . . . . . . . . . . . . . . . . 156

4.6.3 Linear Discriminant Analysis . . . . . . . . . . . . . 161

4.6.4 Quadratic Discriminant Analysis . . . . . . . . . . . 163

4.6.5 K-NearestNeighbors . . . . . . . . . . . . . . . . . . 163

4.6.6 An Application to Caravan Insurance Data . . . . . 165

4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

5 Resampling Methods 175

5.1 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . 176

5.1.1 The Validation Set Approach . . . . . . . . . . . . . 176

5.1.2 Leave-One-Out Cross-Validation . . . . . . . . . . . 178

5.1.3 k-Fold Cross-Validation . . . . . . . . . . . . . . . . 181

5.1.4 Bias-Variance Trade-Off for k-Fold

Cross-Validation . . . . . . . . . . . . . . . . . . . . 183

5.1.5 Cross-Validation on Classification Problems . . . . . 184

5.2 The Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . 187

5.3 Lab: Cross-Validation and the Bootstrap . . . . . . . . . . . 190

5.3.1 The Validation Set Approach . . . . . . . . . . . . . 191

5.3.2 Leave-One-Out Cross-Validation . . . . . . . . . . . 192

5.3.3 k-Fold Cross-Validation . . . . . . . . . . . . . . . . 193

5.3.4 The Bootstrap . . . . . . . . . . . . . . . . . . . . . 194

5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

6 Linear Model Selection and Regularization 203

6.1 Subset Selection . . . . . . . . . . . . . . . . . . . . . . . . 205

6.1.1 Best Subset Selection . . . . . . . . . . . . . . . . . 205

6.1.2 Stepwise Selection . . . . . . . . . . . . . . . . . . . 207

6.1.3 Choosing the OptimalModel . . . . . . . . . . . . . 210

6.2 ShrinkageMethods . . . . . . . . . . . . . . . . . . . . . . . 214

6.2.1 Ridge Regression . . . . . . . . . . . . . . . . . . . . 215

6.2.2 The Lasso . . . . . . . . . . . . . . . . . . . . . . . . 219

6.2.3 Selecting the Tuning Parameter . . . . . . . . . . . . 227

6.3 Dimension ReductionMethods . . . . . . . . . . . . . . . . 228

6.3.1 Principal Components Regression . . . . . . . . . . . 230

6.3.2 Partial Least Squares . . . . . . . . . . . . . . . . . 237

6.4 Considerations in High Dimensions . . . . . . . . . . . . . . 238

6.4.1 High-Dimensional Data . . . . . . . . . . . . . . . . 238

6.4.2 What Goes Wrong in High Dimensions? . . . . . . . 239

6.4.3 Regression in High Dimensions . . . . . . . . . . . . 241

6.4.4 Interpreting Results in High Dimensions . . . . . . . 243

6.5 Lab 1: Subset Selection Methods . . . . . . . . . . . . . . . 244

6.5.1 Best Subset Selection . . . . . . . . . . . . . . . . . 244

6.5.2 Forward and Backward Stepwise Selection . . . . . . 247

6.5.3 Choosing Among Models Using the Validation

Set Approach and Cross-Validation . . . . . . . . . . 248

6.6 Lab 2: Ridge Regression and the Lasso . . . . . . . . . . . . 251

6.6.1 Ridge Regression . . . . . . . . . . . . . . . . . . . . 251

6.6.2 The Lasso . . . . . . . . . . . . . . . . . . . . . . . . 255

6.7 Lab 3: PCR and PLS Regression . . . . . . . . . . . . . . . 256

6.7.1 Principal Components Regression . . . . . . . . . . . 256

6.7.2 Partial Least Squares . . . . . . . . . . . . . . . . . 258

6.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

7 Moving Beyond Linearity 265

7.1 PolynomialRegression . . . . . . . . . . . . . . . . . . . . . 266

7.2 Step Functions . . . . . . . . . . . . . . . . . . . . . . . . . 268

7.3 Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . 270

7.4 Regression Splines . . . . . . . . . . . . . . . . . . . . . . . 271

7.4.1 Piecewise Polynomials . . . . . . . . . . . . . . . . . 271

7.4.2 Constraints and Splines . . . . . . . . . . . . . . . . 271

7.4.3 The Spline Basis Representation . . . . . . . . . . . 273

7.4.4 Choosing the Number and Locations

of the Knots . . . . . . . . . . . . . . . . . . . . . . 274

7.4.5 Comparison to Polynomial Regression . . . . . . . . 276

7.5 Smoothing Splines . . . . . . . . . . . . . . . . . . . . . . . 277

7.5.1 An Overview of Smoothing Splines . . . . . . . . . . 277

7.5.2 Choosing the Smoothing Parameter λ . . . . . . . . 278

7.6 Local Regression . . . . . . . . . . . . . . . . . . . . . . . . 280

7.7 Generalized AdditiveModels . . . . . . . . . . . . . . . . . 282

7.7.1 GAMs for Regression Problems . . . . . . . . . . . . 283

7.7.2 GAMs for Classification Problems . . . . . . . . . . 286

7.8 Lab: Non-linearModeling . . . . . . . . . . . . . . . . . . . 287

7.8.1 Polynomial Regression and Step Functions . . . . . 288

7.8.2 Splines . . . . . . . . . . . . . . . . . . . . . . . . . . 293

7.8.3 GAMs . . . . . . . . . . . . . . . . . . . . . . . . . . 294

7.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

8 Tree-Based Methods 303

8.1 The Basics of Decision Trees . . . . . . . . . . . . . . . . . 303

8.1.1 Regression Trees . . . . . . . . . . . . . . . . . . . . 304

8.1.2 Classification Trees . . . . . . . . . . . . . . . . . . . 311

8.1.3 Trees Versus LinearModels . . . . . . . . . . . . . . 314

8.1.4 Advantages and Disadvantages of Trees . . . . . . . 315

8.2 Bagging, Random Forests, Boosting . . . . . . . . . . . . . 316

8.2.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . 316

8.2.2 Random Forests . . . . . . . . . . . . . . . . . . . . 320

8.2.3 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . 321

8.3 Lab: Decision Trees . . . . . . . . . . . . . . . . . . . . . . . 324

8.3.1 Fitting Classification Trees . . . . . . . . . . . . . . 324

8.3.2 Fitting RegressionTrees . . . . . . . . . . . . . . . . 327

8.3.3 Bagging and Random Forests . . . . . . . . . . . . . 328

8.3.4 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . 330

8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

9 Support Vector Machines 337

9.1 MaximalMargin Classifier . . . . . . . . . . . . . . . . . . . 338

9.1.1 What Is a Hyperplane? . . . . . . . . . . . . . . . . 338

9.1.2 Classification Using a Separating Hyperplane . . . . 339

9.1.3 TheMaximalMargin Classifier . . . . . . . . . . . . 341

9.1.4 Construction of the Maximal Margin Classifier . . . 342

9.1.5 The Non-separable Case . . . . . . . . . . . . . . . . 343

9.2 Support Vector Classifiers . . . . . . . . . . . . . . . . . . . 344

9.2.1 Overview of the Support Vector Classifier . . . . . . 344

9.2.2 Details of the Support Vector Classifier . . . . . . . 345

9.3 Support Vector Machines . . . . . . . . . . . . . . . . . . . 349

9.3.1 Classification with Non-linear Decision

Boundaries . . . . . . . . . . . . . . . . . . . . . . . 349

9.3.2 The Support Vector Machine . . . . . . . . . . . . . 350

9.3.3 An Application to the Heart Disease Data . . . . . . 354

9.4 SVMs withMore than Two Classes . . . . . . . . . . . . . . 355

9.4.1 One-Versus-One Classification. . . . . . . . . . . . . 355

9.4.2 One-Versus-All Classification . . . . . . . . . . . . . 356

9.5 Relationship to Logistic Regression . . . . . . . . . . . . . . 356

9.6 Lab: Support Vector Machines . . . . . . . . . . . . . . . . 359

9.6.1 Support Vector Classifier . . . . . . . . . . . . . . . 359

9.6.2 Support Vector Machine . . . . . . . . . . . . . . . . 363

9.6.3 ROC Curves . . . . . . . . . . . . . . . . . . . . . . 365

9.6.4 SVMwithMultiple Classes . . . . . . . . . . . . . . 366

9.6.5 Application to Gene Expression Data . . . . . . . . 366

9.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

10 Unsupervised Learning 373

10.1 The Challenge of Unsupervised Learning . . . . . . . . . . . 373

10.2 Principal Components Analysis . . . . . . . . . . . . . . . . 374

10.2.1 What Are Principal Components? . . . . . . . . . . 375

10.2.2 Another Interpretation of Principal Components . . 379

10.2.3 More on PCA . . . . . . . . . . . . . . . . . . . . . . 380

10.2.4 Other Uses for Principal Components . . . . . . . . 385

10.3 ClusteringMethods . . . . . . . . . . . . . . . . . . . . . . . 385

10.3.1 K-Means Clustering . . . . . . . . . . . . . . . . . . 386

10.3.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . 390

10.3.3 Practical Issues in Clustering . . . . . . . . . . . . . 399

10.4 Lab 1: Principal Components Analysis . . . . . . . . . . . . 401

10.5 Lab 2: Clustering . . . . . . . . . . . . . . . . . . . . . . . . 404

10.5.1 K-Means Clustering . . . . . . . . . . . . . . . . . . 404

10.5.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . 406

10.6 Lab 3: NCI60 Data Example . . . . . . . . . . . . . . . . . 407

10.6.1 PCA on the NCI60 Data . . . . . . . . . . . . . . . 408

10.6.2 Clustering the Observations of the NCI60 Data . . . 410

10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413

Index 419



직접다운로드

ISLR Fourth Printing.pdf




Posted by codedragon codedragon

댓글을 달아 주세요

R.id.list 정보 확인

http://developer.android.com/index.html

develop > Reference >

왼쪽에 Android 클릭 >

Classes항목 중 R 클릭

우측 Summary항목에 R.id클릭

   

list 클릭

   

Posted by codedragon codedragon

댓글을 달아 주세요