wonyong-jang / text-data-analysis Goto Github PK
View Code? Open in Web Editor NEWN gram language model
N gram language model
아래와 같이 해결
자바로 txt 파일을 읽을 때
1. 일반적인 방법
File file = new File("/home/smilem/SM-201104021025.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
String line = null;
while((line=br.readLine())!=null){
System.out.println(line);
}
2. 내가 사용한 방법
File file = new File("/home/smilem/SM-201104021025.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file),"euc-kr"));
String line = null;
if((line=br.readLine()) != null){
System.out.println(line);
}
1번에서 FileReader 클래스로 읽게 되면 임의의 캐릭터셋으로 byte 인코딩 해서 읽게 된다.
그러면 이미 line 변수로 들어오게 되면 데이터가 깨져있므로 이 line 을 아무리 인코딩하고
난리 부르스를 쳐도 다시 되돌릴 수가 없다
그런데 2번에서는 아예 읽을 때 수동으로 byte 데이터를 인코딩 해서 읽도록 하였다
그래서 위의 방법은 잘 돌아간다.
출처: http://shonm.tistory.com/category/JAVA/TEXT파일 읽을 때 한글 깨짐 [정윤재의 정리노트]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.