R 코드 예시 - 데이터 분할

R 코드 예시 - 데이터 분할

CODEDRAGON ㆍDevelopment/AI

R 코드 예시 - 데이터 분할

{caret} Package로 createDataPartition()함수를 사용하여 iris data를 Species 기준으로 70%는 훈련용 데이터로 나머지 30%는 검증용 데이터로 데이터셋을 분할하고 검증하는 R Code 예시입니다.

> library(caret)
Loading required package: lattice
Loading required package: ggplot2
Find out what's changed in ggplot2 at
https://github.com/tidyverse/ggplot2/releases.

# iris data를 Species 기준으로 70%는 훈련용 데이터로
# 나머지 30%는 검증용 데이터로 데이터셋을 분할하기 위해
# createDataPartition()함수를 사용하여 훈련데이터로 사용할 index 추출
# createDataPartition(): 데이터를 훈련 데이터와 테스트 데이터로 분할하여 훈련 데이터로 사용할 데이터의 색인을 list로 반환합니다.

> train.idx<-createDataPartition(iris$Species, p=0.7, list=F)

# 데이터의 색인list 확인
> head(train.idx)
     Resample1
[1,]         2
[2,]         3
[3,]        4
[4,]         5
[5,]         7
[6,]        11

# train.idx를 통해 훈련데이터 생성
> iris_train<-iris[train.idx,]

# 훈련데이터 확인
> head(iris_train)
   Sepal.Length Sepal.Width Petal.Length
2           4.9         3.0          1.4
3           4.7         3.2          1.3
4           4.6         3.1          1.5
5           5.0         3.6          1.4
7           4.6         3.4          1.4
11          5.4         3.7          1.5
   Petal.Width Species
2          0.2 setosa
3          0.2 setosa
4          0.2 setosa
5          0.2 setosa
7          0.3 setosa
11         0.2 setosa

# iris_train를 제외한 데이터로 테스트데이터 생성
> iris_test<-iris[-train.idx,]

# 테스트 데이터 확인
> head(iris_test)
   Sepal.Length Sepal.Width Petal.Length
1           5.1         3.5          1.4
6           5.4         3.9          1.7
8           5.0         3.4          1.5
9           4.4         2.9          1.4
10          4.9         3.1          1.5
25          4.8         3.4          1.9
   Petal.Width Species
1          0.2 setosa
6          0.4 setosa
8          0.2 setosa
9          0.2 setosa
10         0.1 setosa
25         0.2 setosa

# dim(): • 차원(dimension) 지정함수
# • m x n 차원의 행렬을 생성합니다.
# 105 x 5 차원의 행렬을 생성합니다.
> dim(iris_train)
[1] 105 5

# 45 x 5 차원의 행렬을 생성합니다.
> dim(iris_test)
[1] 45 5

https://codedragon.tistory.com/9580

https://codedragon.tistory.com/4970

https://codedragon.tistory.com/6783

https://codedragon.tistory.com/9933

https://codedragon.tistory.com/9493

저작자표시 비영리 (새창열림)

'Development > AI' 카테고리의 다른 글

데이터 분석 기법 선정 (0)	2020.01.15
공분산 값 의미 (0)	2020.01.15
3.Summary - 3.데이터 검증하기 (0)	2020.01.14
LOOCV(Leave-One-Out-Cross-Validation) (0)	2020.01.14
모델이 복잡한 경우 (0)	2020.01.14

CodeDragon

CodeDragon

태그

최근글

댓글

공지사항

아카이브

'Development > AI' 카테고리의 다른 글

관련글

티스토리툴바