最新消息:

Unicode(0xb) error-An invalid XML character (Unicode: 0xb) was found in the element content of the document.

Java ksharpdabu 2251浏览 0评论

问题描述:

今天对接接口的时候,在解析对方的xml时,报了这样一个错误“Unicode(0xb) error-An invalid XML character (Unicode: 0xb) was found in the element content of the document.

原因分析:

报错提示是说,xml中有一个非法的xml字符(Unicode: 0xb),unicode编码的0xb表示垂直tab( vertical tab,VT),在notepad++之类的文本编辑器中,显示为VT。详情见:https://en.wikipedia.org/wiki/Tab_key。
如有以下unicode编码后的文本:
%u63a5%u53d7%0b%u3002
unicdoe解码后,notepad++中的显示,可以看到VT字符:
对于xml 1.0来说,它的合法的字符范围应该是(见:https://en.wikipedia.org/wiki/Valid_characters_in_XML#XML_1.0):

XML 1.0[edit]

Unicode code points in the following ranges are valid in XML 1.0 documents:[1]

  • U+0009, U+000A, U+000D: these are the only C0 controls accepted in XML 1.0;
  • U+0020–U+D7FF, U+E000–U+FFFD: this excludes some (not all) non-characters in the BMP (all surrogates, U+FFFE and U+FFFF are forbidden);
  • U+10000–U+10FFFF: this includes all code points in supplementary planes, including non-characters.

The preceding code points ranges contain the following controls which are only valid in certain contexts in XML 1.0 documents, and whose usage is restricted and highly discouraged:

  • U+007F–U+0084, U+0086–U+009F: this includes a C0 control character and all but one C1 control.
而0xb不在这个范围内,所以解析xml的时候会报错,解决的办法就是用正则将这些不合法的字符替换为空字符串,以此保证正常解析,java代码如下:

参考:
https://stackoverflow.com/questions/14192135/unicode0xb-error-while-parsing-an-xml-file-using-stax

 

来自为知笔记(Wiz)

转载请注明:大步's Blog » Unicode(0xb) error-An invalid XML character (Unicode: 0xb) was found in the element content of the document.

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址