Wellformed RSS and RFC 3023

We’ve announced that the RSS platform in Vista will permit only well-formed XML. Most people are celebrating, but there are some comments that indicate some people may be confused.

To clear things up, this statement is ONLY about well-formed XML. It is NOT a statement about validity, or conformance to any other spec, including any RSS/Atom spec. Per the XML spec, well-formed is a very specific definition. If a document is not well-formed, it is not XML, period. A document can be invalid, yet still be well-formed, but if it’s not well-formed, it is never going to be valid, and is not RSS or Atom either (since it’s not XML).

Sam Ruby (who is not confused)responds positively to the news, and asks “Question: how will IE7 deal with Priorities in the Presence of External Encoding Information?” He’s talking about RFC 3023, which is referenced in appendix F of the XML spec. This is a non-normative reference, and is not a wellformedness constraint (nor a “must”), but it’s worth responding.

The short answer is that we do not implement RFC 3023 currently. The RSS platform uses MSXML (inXML conforming mode)to fetch and parse the data, so the behavior is inherited from MSXML. Since MSXML is used by most products that we ship, it means the platform is consistent. And nearly every other stack in the industry ignores RFC 3023 as well, so it’s not a widely accepted interop point at the moment.

The longer answer is that there are good arguments for having a well-defined standard for handling external encoding information. Without a spec that all of the vendors implement, the matrix of interop for edge cases can be pretty complex. However, I don’t think RFC 3023 in current form is a good starting point. Implementing would break a large chunk of the web, and the spec could easily be modified to be just as useful without breaking so many feeds. No vendor is going to unilaterally adopt RFC 3023, because it would mean their products would croak while their competitors’ continue to work. By the same token, the RSS platform is not going to unilaterally deviate from what MSXML does, because it means you would get behavior inconsistent with the rest of Microsoft products.

So, “well-formed” is a conservative bet, since practically everyone requires it anyway (it’s very, very difficult to read non-WF on Microsoft stack now). Requiring RFC 3023, on the other hand, would be attempting to make a much bigger change to current state of the web, and not likely that the RSS platform alone is in a position to force such a sea change. Such a change would need to be made lockstep across the whole Microsoft platform, and in concert with others in the industry.

Leave a Reply

Your email address will not be published. Required fields are marked *