注意: 非本人阅读请跳转到问题分析与解决一栏 分类分析:
非Spring MVC项目 Spirng MVC项目
本文章所使用 Tomcat 版本分别为7.0.100 和8.5.54
注意: 本文的 html 文件均带了<meta charset="utf-8"> 标签如果没带此标签,解析html和解析txt效果一样!!
非Spring MVC项目
在非spring mvc项目中出现访问静态资源乱码,如 txt 文本等,一般都是文件编码为非BOM 的UTF-8,将文件编码改为UTF-8 BOM 格式即可。具体信息查看下面表格:
Tomcat | 文件类型 | 文件编码 | 是否正常访问 |
---|
7.0.100 | html | UTF-8 | 是 | 7.0.100 | html | UTF-8 BOM | 是 | 7.0.100 | txt | UTF-8 | 否 | 7.0.100 | txt | UTF-8 BOM | 是 |
Tomcat | 文件类型 | 文件编码 | 是否正常访问 |
---|
8.5.54 | html | UTF-8 | 是 | 8.5.54 | html | UTF-8 BOM | 是 | 8.5.54 | txt | UTF-8 | 否 | 8.5.54 | txt | UTF-8 BOM | 是 |
Spirng MVC项目
如果配置了CharacterEncodingFilter 且初始化了参数forceResponseEncoding 或forceEncoding 为true ,此时每次请求静态资源时,CharacterEncodingFilter都会在响应头加上charset=UTF-8 。参数为false ,或未设置时,情况和上面非Spring MVC情况一致。
<filter>
<filter-name>characterEncodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceResponseEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>characterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
当设置了forceResponseEncoding 或forceEncoding 为true 时,具体显示信息如下:
Tomcat | 文件类型 | 文件编码 | 是否正常访问 |
---|
7.0.100 | html | UTF-8 | 是 | 7.0.100 | html | UTF-8 BOM | 是 | 7.0.100 | txt | UTF-8 | 是 | 7.0.100 | txt | UTF-8 BOM | 是 |
Tomcat | 文件类型 | 文件编码 | 是否正常访问 |
---|
8.5.54 | html | UTF-8 | 否 | 8.5.54 | html | UTF-8 BOM | 是 | 8.5.54 | txt | UTF-8 | 否 | 8.5.54 | txt | UTF-8 BOM | 是 |
问题分析与解决
解决乱码只需加入fileEncoding 参数,如下所示。(前提: 你的静态资源文件编码必须是UTF-8 )
<servlet>
<servlet-name>default</servlet-name>
<servlet-class>org.apache.catalina.servlets.DefaultServlet</servlet-class>
<init-param>
<param-name>debug</param-name>
<param-value>0</param-value>
</init-param>
<init-param>
<param-name>listings</param-name>
<param-value>false</param-value>
</init-param>
<init-param>
<param-name>fileEncoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<load-on-startup>1</load-on-startup>
</servlet>
影响DefaultServlet 解析静态资源编码主要有两个重要的参数fileEncoding 和useBomIfPresent
protected String fileEncoding = null;
private transient Charset fileEncodingCharset = null;
private BomConfig useBomIfPresent = null;
init()源码:
fileEncoding = getServletConfig().getInitParameter("fileEncoding");
if (fileEncoding == null) {
fileEncodingCharset = Charset.defaultCharset();
fileEncoding = fileEncodingCharset.name();
} else {
……
}
String useBomIfPresent = getServletConfig().getInitParameter("useBomIfPresent");
if (useBomIfPresent == null) {
this.useBomIfPresent = BomConfig.TRUE;
} else {
……
}
枚举类BomConfig源码:
static enum BomConfig {
TRUE("true", true, true),
FALSE("false", true, false),
……
综上,如果未指定fileEncoding 初始化参数,Tomcat将使用平台默认编码解析静态资源,而Windows默认编码为GBK ,所以解码UTF-8 的文件出现乱码。为什么带BOM的文件Tomcat能正常解析,那是因为参数useBomIfPresent 的默认值为BomConfig.TRUE ,此情况下Tomcat解析到带有BOM的文件时BOM推断出的编码优先级高于fileEncoding 属性,比如前三个字节为EF BB BF ,Tomcat便知道这个静态资源是UTF-8 编码,于是使用UTF-8 解码(即使fileEncoding = GBK )。至于为什么html能普遍解析成功,个人猜测是因为<meta charset="utf-8"> 标签告诉Tomcat此文件是UTF-8 编码,所以Tomcat能解析成功。
关于获取Request域中数据乱码问题
public void init() {
messageGET = "GET-----你好";
messagePOST = "POST-----你好";
}
public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException {
if (null == request.getCharacterEncoding()) {
System.out.println("charset == null");
}
String s = request.getParameter("s");
System.out.println("GET-----s: " + new String(s.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8));
System.out.println("response: " + response.getCharacterEncoding());
response.setCharacterEncoding("UTF-8");
response.setContentType("text/html");
PrintWriter out = response.getWriter();
out.println("<html><body>");
out.println("<h1>" + messageGET + "</h1>");
out.println("</body></html>");
}
@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
if (null == req.getCharacterEncoding()) {
System.out.println("charset == null");
}
req.setCharacterEncoding("UTF-8");
if (null != req.getCharacterEncoding()) {
System.out.println("charset == " + req.getCharacterEncoding());
}
String s = req.getParameter("s");
System.out.println("POST------s: " + s);
System.out.println("response: " + resp.getCharacterEncoding());
resp.setCharacterEncoding("UTF-8");
resp.setContentType("text/html");
PrintWriter out = resp.getWriter();
out.println("<html><body>");
out.println("<h1>" + messagePOST + "</h1>");
out.println("</body></html>");
}
本文章仅个人学习记载,包含主观想法,并不确保正确性。如若有误,恳请指正!
|