j4: (dodecahedron)
j4 ([personal profile] j4) wrote2007-03-20 09:16 am
Entry tags:

This is an ex-HTML

Okay, I think I'm going mad. I put the following into our CMS:
<ul>
<li> Item 1
<ul>
<li> SubItem 1</li>
<li> SubItem 2</li>
</ul>
</li>
<li> Item 2</li>
</ul>
and it (silently, without any notification) 'corrected' it to the following:
<ul>
<li>Item 1
<ul></ul></li>
<li>SubItem 1</li>
<li>SubItem 2</li>
<li>Item 2</li></ul>
I pointed this out to the people who are setting up the new site for us, and they raised it as a support call with the CMS people, and got the following response:
"Could you please use the following schema:

<ul>
<li>Item 1</li>
<ul>
<li>SubItem 1</li>
<li>SubItem 2</li>
</ul>
<li>Item 2</li>
</ul>


Such syntax is formatted correctly."
If such syntax is formatted correctly, why doesn't it validate? I'm not even trying to be a validation Nazi about this (it's not as if anything that comes out of this CMS is ever going to validate anyway), it's more that I don't really want to have to 'correct' all our existing HTML to prevent it being 'corrected' by the CMS.

[identity profile] barnacle.livejournal.com 2007-03-20 10:52 am (UTC)(link)
Their option is dead wrong. But your option may be failing because they might be doing DTD validation and I don't know enough DTD language to be sure that it's right. There's a condition that might mean that li elements can contain EITHER block content (div, p etc.) or inline content (plain text, span, label, b, i, em, a etc.) but not both.

http://www.w3.org/TR/html4/sgml/dtd.html

states:

<!ENTITY % flow "%block; | %inline;">
...
<!ELEMENT LI - O (%flow;)* ...

The dash and the O means that the opening LI tag is required and the closing is optional. That seemss ambiguous depending on your SGML parser's behaviour and how it treats the binding there. As I say, I don't speak DTD or SGML well enough, but that could be interpreted as "any number of either-block-or-inline elements" or "either any number of block elements or any number of inline elements."

The schema also may be tight: http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd creates a complexType called Flow which can be any one of four choices. I think choice means it has to be a single one of them, but then that one can occur multiple times. So multiple elements from the block or inline groups, but not a mixture.

Try wrapping everything within the LI in a DIV and see where that gets you.

[identity profile] j4.livejournal.com 2007-03-20 11:08 am (UTC)(link)
I'm pretty sure "%block;|%inline;" is either-but-not-both, BICBW (I haven't used SGML DTDs in anger since 2003).

The CMS claims that what it's producing is XHTML. <html xmlns="http://www.w3.org/1999/xhtml">

    is block, not inline - isn't it? - so I'm not sure why wrapping the
      in an
    • inside a
      would be different. Or am I missing your point? I'm getting increasingly confused here. :-(

[identity profile] barnacle.livejournal.com 2007-03-20 12:12 pm (UTC)(link)
Me too. Do you think LiveJournal 0.2 will use something like wiki-ish?

If %flow; is %block;|%inline;, then it's either but not both. But then what does (%flow;)* mean, which is the content of LI elements? Does it mean lots of block XOR lots of inline, or does it mean lots of things, each of which can either be block or inline? My answer is a shrug.

It seems that DTD and schema validation could produce different results with your version. Given that in the schema Flow is an extension of a complexType (which can contain #PCDATA), then yours should work there; in the DTD (%flow;)* might match something that itself matches %inline;%block;, or it might not.

[identity profile] j4.livejournal.com 2007-03-20 12:27 pm (UTC)(link)
Oh, arsebiscuits, sorry about LJ's HTML-eating, there; there really really needs to be a "verbatim" setting where nothing gets auto-interpretedbollocksed. (I suppose the standard OSS answer applies: if I want it, I should write it rather than whinging.) Did what I said make any sense in the email notification?

Coming back to yer blocks/inlines, I am pretty sure (after some hasty revision of what SGML knowledge I used to have) that %block;|%inline means one-or-more block XOR one-or-more inline.

[identity profile] rgl.livejournal.com 2007-03-20 11:20 am (UTC)(link)
I don't think wrapping it in a DIV will help: the definition of %inline; includes #PCDATA, which makes it a Mixed Content Model and means you can have text arbitrarily intermingled with the various options specified in the definition of %inline;.

[identity profile] barnacle.livejournal.com 2007-03-20 12:00 pm (UTC)(link)
Yes, I know. "<li><div> fooItem 1> ... </ul> </div>", would be an option to get round the li's requirements, though. I don't know about the content in div: that might be "Flow" too.

[identity profile] barnacle.livejournal.com 2007-03-20 12:05 pm (UTC)(link)
Rrgh. Bloody LiveJournal's halfway-house HTML comprehension nubbins bollocks ARGH.