[xep-support] RE: Rép. : Re: [xep-support] Invalid UTF-8 byte

From: DESEYNE Jacques <Jacques.DESEYNE@swift.com>
Date: Thu Jun 30 2005 - 05:52:24 PDT

Luc,
 
Bytes are not the same things as characters! There exist several conventions ("encodings") for representing characters by a byte
sequence. XML has the Unicode character set (there are quite a lot of characters in it, see the code charts at
http://www.unicode.org) and their default encoding is UTF-8, but other encodings can be used as well.
 
In an UTF-8 encoding, only characters under 127 (0x7F) are represented by a single byte. The non-breaking space character '0xA0' is
represented by the byte sequence 'C2 A0'. Your sample document has some of these, for instance within the <Auteur> tag for <Ouvrage>
where <Nuart> contains "9610767":
 
...
000001b0 3c 2f 54 69 74 72 65 3e 3c 41 75 74 65 75 72 3e </Titre><Auteur>
000001c0 c2 a0 3c 2f 41 75 74 65 75 72 3e 3c 50 72 69 78 ..</Auteur><Prix
...
 
Where you see the dodgy 'A0' byte (at file offset 0x00001140, if I'm not mistaken), you should have 'C2 A0', i.e. two bytes instead
of one. You may need to check how these data are generated.
 
Look for an explanation on UTF-8 (and other) encodings on the Web -- you will see that there's more about it than one might have
expected.
 
Best regards,

--
Jacques Deseyne
 
  _____  
From: owner-xep-support@renderx.com [mailto:owner-xep-support@renderx.com] On Behalf Of LUC AUDRAIN
Sent: Thursday, June 30, 2005 11:58 AM
To: msulyaev@renderx.com; xep-support@renderx.com
Subject: Rép. : Re: [xep-support] Invalid UTF-8 byte
Hello Michael,
 
I Think that it is an 0A I have after the xml declaration, as I have at the end of each line of this file. The invalid UTF-8 byte is
a0xA0.
 
Looking a bit more precisely, I have found this 'A0' byte : it is in the ligne beginning with "<Nuart>4776027" inside the element
Run.
 
Now, I still don't understand why it is an invalid UTF-8 byte, because when I open this file in UltraEdit in Hex mode I see "00A0"
and "00A0" is a valid Unicode character! I may filter it here, but in some case, I may need it as it is the "NO-BREAK SPACE".
 
What's wrong.
 
 
 
 
 
Best regards
 
Luc AUDRAIN
__________________________________
DSI / Infocube
Informatique Éditoriale
HACHETTE LIVRE
43, quai de Grenelle
75015 PARIS
00 33 1 43 92 38 12
laudrain@hachette-livre.fr
>>> msulyaev@renderx.com 24/06/2005 17:28:42 >>>
Hello, Luc,
Your .xml file is invalid: it has a 0xA0 byte after the xml declaration 
and before anything else, e.g. like here (the last byte shown):
3C 3F 78 6D 6C 20 76 65 ¦ 72 73 69 6F 6E 3D 22 31 <?xml version="1
2E 30 22 20 65 6E 63 6F ¦ 64 69 6E 67 3D 22 55 54 .0" encoding="UT
46 2D 38 22 3F 3E 20 20 ¦ 20 20 20 20 20 20 20 20 F-8"?>
20 20 20 20 20 20 20 20 ¦ 20 20 20 20 20 20 20 20
A0 <
Use any HEX editor to fix.
-- 
Best regards,
Michael Sulyaev mailto:msulyaev@renderx.com 
RenderX.
LUC AUDRAIN wrote:
> Hello,
> 
> On some XML files, I have an error message on validation :
> 
> [error] Error reported by XML parser; SystemID: file:/J:/Traitement 
> BdC/Depot TXT/lg/OPERATION ARTEMIS CHASSE 23 AOUT 2005.xml; Line#: -1; 
> Column#: 949
> [error] javax.xml.transform.TransformerException: Error reported by XML 
> parser error: formatting failed: 
> javax.xml.transform.TransformerException: org.xml.sax.SAXParseException: 
> invalid UTF-8 byte (check the XML declaration) (code: 0xa0)
> 
> I found information on the Renderx Web Site in this answer
> *From*: Mike Trotman < mike.trotman@datalucid.com 
> < mailto:mike.trotman@datalucid.com?Subject=Re:%20[xep-support]%20UTF%20data%20format >> 
> 
> *Date*: Mon May 02 2005 - 08:14:51 PDT
> and tried without success.
> 
> The workaround I found is to save the XML file again from any text or 
> xml editor (as XMLSPy) and it works fine.
> 
> In order to find what's wrong in my source file, I'd like to know how to 
> use the ligne and column information in the error message : Line#: -1; 
> Column#: 949.
> 
> Best regards.
> 
> 
> 
> 
> 
> 
> 
> Luc AUDRAIN
> __________________________________
> DSI / Infocube
> Informatique Éditoriale
> HACHETTE LIVRE
> 43, quai de Grenelle
> 75015 PARIS
> 00 33 1 43 92 38 12
> laudrain@hachette-livre.fr < mailto:laudrain@hachette-livre.fr >
> 
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo@renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/terms-of-service.html 

-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo@renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/terms-of-service.html

Received on Thu Jun 30 06:24:55 2005

This archive was generated by hypermail 2.1.8 : Thu Jun 30 2005 - 06:24:58 PDT