Browsers Always Assume TBODY

In 2019, I noticed that I had this unpublished draft from 2011. I made some quick edits and then hit the figurative publish button.

Here’s a fun fact regarding web browsers and your markup: It does not matter if your HTML has a tbody element. If you have a table with at least one table row (tr) that isn’t part of a header or footer section, you will get a tbody in the DOM. Put another way, every tr element will have a parent that is not table, even though it’s perfectly valid to write HTML with tr children of table.

Haven’t heard of tbody? It is a collection of some or all of the rows in a table. Each row must belong to the table’s header (thead), to a tbody, or to the footer (tfoot). While an HTML document can only have one body, an HTML table can have more than one tbody.

The fact that table elements get implied tbody sections is nothing new, but it is rather easy to overlook.

I have confirmed this behavior in major browsers: IE 7/8/9, Firefox 3.6 and 6, Safari 5.1, Opera 11.51, Safari/iOS 4, Android 2.

HTML loves tbody.

After spending a few minutes looking at the HTML5 section on tbody and even the HTML 4 Table spec it becomes clear that, in HTML, a tbody is implied when the browser sees a tr element that is not in a thead, tfoot, or tbody already.

That is, the only elements that can truly1 be direct children of a table element are, in order:

  • caption – optional.
  • col or colgroup – zero or more.
  • thead – optional.
  • tfoot – optional. Appears before tbody so the table footer can be rendered before entire table is downloaded
  • tbody – implied, if not explicit. one or more.

The tbody opening and closing tags are optional, if that makes sense.

Your parser might not love tbody.

The sneakiness of tbody can be a problem. At least a number of server-side parsing libraries aren’t tbody-savvy.

One such library is Nokogiri, at least as compiled on my systems. (Nokogiri actually uses different XML parsers depending on where it’s used, which has got to introduce incredibly frustrating edge-case bugs).

This means that the DOM constructed by a tool like Nokogiri and the one constructed by a browser may differ. This is, in fact, how I discovered this behavior. While building Blogic, I was implementing functionality that allowed users to select parts of the DOM from an existing webpage to be removed or replaced in the creation of a template. The selection happened in the browser; our code described the element's position in the DOM; and then the template creation happened server-side based on that description. The browser saw a tbody element where the server-side code didn’t, which was an interesting bug to track down and work around. (See work-arounds, below.)

Note: The above information was correct as of 2011. I do not know whether it is correct today.

You might not love tbody when writing CSS.

Another time the sneaky nature of tbody elements might break your code is if you expect to be able to write CSS selectors like > td (maybe you wish to avoid selecting nested tables' cells). Not gonna cut it. Consider > * > td instead.


If you don’t want to be surprised by any of this, I would recommend manually defining thead, tbody, and/or tfoot sections for your tables.

But if you, like I, am dealing with HTML generated by others, you’re just going to have to deal with it.

In my case, that is going to mean massaging the DOM tree that Nokogiri creates by

  1. Creating a tbody for each table if none exists.

  2. Moving any table > tr elements into the table's tbody element.

Note: Because tables can have multiple implied tbody elements separated by an explicit tbody element, the above algorithm is actually too simple and may cause table content re-ordering. Caveat emptor.

At this point, I started to wonder if Nokogiri and/or its component libraries at least create a ghost tr if they see a td directly in a table… but I did not go down this particular rabbit hole. I will leave it as an exercise for a masochistic reader.

HTML. It sure has its surprises, doesn’t it?

  1. What I mean by this is “in the DOM” as opposed to in HTML. If this does not make sense, think about it this way: Browsers construct, show, and allow interaction with a webpage, which is internally represented by the “DOM” (Document Object Model). HTML is a language to describe the initial state of the DOM that the browser will create. HTML allows some shortcuts. For example, you may not need to close your p tags. Omitting tbody is one such shortcut. ↩︎

July 10th, 2019
Alan Hogan (@alanhogan_com).  Contact · About