最近碰到一个需求,读取txt文本内容并加载到textview中,但是测试发现当txt文件编码为GBK时会乱码,查阅了很多资料,大多数都是通过判断文件前三字节实现字符编码的识别,但是实际发现读取到的前三字节并没有包含编码信息,直接就是文本内容的前三字节,后续查阅到一个资料,基本思想是在输入流中指定解码方式,然后根据读取时有无报错判断是否是该字符编码,这样一来,通过循环使用不同的字符编码去解码就可以得到正确的字符编码,具体实现如下:
private String getTextFileCharset(String filePath){
String[] charsets = {"US-ASCII", "UTF-8", "GB2312", "BIG5", "GBK", "GB18030", "UTF-16BE", "UTF-16LE", "UTF-16", "UNICODE"};
String charset = Charset.defaultCharset().displayName();
CharsetDecoder decoder;
BufferedReader br = null;
String s = null;
for (int i = 0; i < charsets.length; i++) {
decoder = Charset.forName(charsets[i]).newDecoder();
try {
br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), decoder));
do {
s = br.readLine();
} while (s != null);
charset = charsets[i];
Log.d(TAG, "getTextFileCharset: is " + charsets[i] + ",break");
break;
} catch (FileNotFoundException e) {
e.printStackTrace();
break;
} catch (MalformedInputException e) { //如果编码不能解码此文本就会抛出这个异常
Log.d(TAG, "getTextFileCharset: not " + charsets[i] + ",continue");
continue;
} catch (IOException e) {
e.printStackTrace();
break;
}
}
return charset;
/*File file = new File(filePath);
if (null == filePath || !file.exists() || file.isDirectory()) return charset;
try {
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bi = new BufferedInputStream(fis);
byte[] header = new byte[3];
bi.read(header);
if (header[0] == (byte) 0xEF && header[1] == (byte) 0xBB
&& header[2] == (byte) 0xBF) {// utf-8
charset = "UTF-8";
} else if (header[0] == (byte) 0xFF
&& header[1] == (byte) 0xFE) {
charset = "UNICODE";
} else if (header[0] == (byte) 0xFE
&& header[1] == (byte) 0xFF) {
charset = "UTF-16BE";
} else if (header[0] == (byte) 0xFF
&& header[1] == (byte) 0xFF) {
charset = "UTF-16LE";
} else {
charset = "GBK";
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
Log.d(TAG, "getTextFileCharset: charset = " + charset);
return charset;*/
}
这种方式无需第三方库,特此记录
|