Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why carriage return not rendered as a line break when assigned to textContent although formatted using pre?

I'm confused about the rendering behavior of a string assigned to textContent when it contains a \r vs \n.

MDN says:

The textContent property of the Node interface represents the text content of the node and its descendants.

Then why does a textContent's text which contains a \r and whose applied a pre isn't rendered with line brakes, while \n does?

var textWithCR = "line1\rline2\r";
document.getElementById('crWithTextContent').textContent = textWithCR;
document.getElementById('crWithInnerHTML').innerHTML = textWithCR;
document.getElementById('crWithInnerText').innerText = textWithCR;
  
var textWithLF = "line3\nline4\n";
document.getElementById('lfWithTextContent').textContent = textWithLF;
document.getElementById('lfWithInnerHTML').innerHTML = textWithLF;
document.getElementById('lfWithInnerText').innerText = textWithLF;
.formatted {
  white-space: pre-wrap;
}
<div id="crWithTextContent" class="formatted"></div><br/>
<div id="crWithInnerHTML" class="formatted"></div><br/>
<div id="crWithInnerText" class="formatted"></div><br/>

<div id="lfWithTextContent" class="formatted"></div><br/>
<div id="lfWithInnerHTML" class="formatted"></div><br/>
<div id="lfWithInnerText" class="formatted"></div>

I also looked at the spec, which says:

This attribute returns the text content of this node and its descendants. [...]
On getting, no serialization is performed, the returned string does not contain any markup.
No whitespace normalization is performed and the returned string does not contain the white spaces in element content [...]

Well, if "the returned string does not contain the white spaces in element content", then why does it seems that in the following code \n exists when we get textContent (by printing it on console), while \r doesn't?

var textWithCR = "line1\rline2\r";
document.getElementById('crWithTextContent').textContent = textWithCR;
var textWithLF = "line3\nline4\n";
document.getElementById('lfWithTextContent').textContent = textWithLF;

console.log(document.getElementById('crWithTextContent').textContent);
console.log(document.getElementById('lfWithTextContent').textContent);
.formatted {
  white-space: pre-wrap;
}
<div id="crWithTextContent" class="formatted"></div><br/>
<div id="lfWithTextContent" class="formatted"></div>

What's the reason for this textContent behavior when it contains a \r?

like image 523
OfirD Avatar asked Sep 08 '25 10:09

OfirD


1 Answers

Your \r (U+000D CR) is there at index 5:

const elem = document.getElementById('test');
elem.textContent = "line1\rline2\r";

console.log( elem.textContent );
console.log( elem.textContent.charCodeAt( 5 ) ); // 13
console.log( "\r".charCodeAt( 0 ) ); // same char
<div id="test"></div>

The problem you are facing is that CSS doesn't define U+000D CR as a segment-break, and nor does HTML.

HTML when it normalizes newlines will convert all \r\n sequences to \n and then all remaining \r to \n, so effectively getting rid of all lonely \r characters. However, Node.textContent doesn't call this normalize-newlines algorithm, so they're not converted to \n and not interpreted as segment-break.

For this to happen, you would need to set your element's content by an other mean which will call this algorithm, but doing so, you will loose your original data.

const elem = document.getElementById('test');
elem.innerHTML = "line1\rline2\r";

console.log( elem.textContent );
console.log( elem.textContent.charCodeAt( 5 ) ); // converted to \n (U+000A  => 10)
#test { white-space: pre-wrap }
<div id="test"></div>
like image 178
Kaiido Avatar answered Sep 10 '25 04:09

Kaiido