CodeDragon

텍스트 데이터 전처리

CODEDRAGON ㆍDevelopment/AI

텍스트 데이터 전처리

구분

설명

정규화

(normalization)

입니닼ㅋㅋ -> 입니다 ㅋㅋ

샤릉해, 따랑해, 싸랑해 -> 사랑해

토큰화

(tokenization)

https://codedragon.tistory.com/7709

어근화

(stemming)

입니다 -> 이다

https://codedragon.tistory.com/7781

어구 추출

(phrase extraction)

한국어를 처리하는 예시입니다 -> 한국어, 처리, 예시, 처리하는 예시

불용어 처리

(Stopword Removal)

https://codedragon.tistory.com/7619

음소표기법

(Lemmatization)

https://codedragon.tistory.com/7787

728x90

저작자표시 비영리 (새창열림)

'Development > AI' 카테고리의 다른 글

선형 회귀분석로 분류시 문제점 (0)	2020.01.27
6.Summary - 6. 학습결과 적용하기 (0)	2020.01.27
말뭉치(Corpus) (0)	2020.01.25
신뢰도(conviction) (0)	2020.01.24
Moravec's Paradox (모라벡의 역설) (0)	2020.01.23

관련글

티스토리툴바