I'm confused about the rendering behavior of a string assigned to textContent
when it contains a \r
vs \n
.
MDN says:
The textContent property of the Node interface represents the text content of the node and its descendants.
Then why does a textContent
's text which contains a \r
and whose applied a pre
isn't rendered with line brakes, while \n
does?
var textWithCR = "line1\rline2\r";
document.getElementById('crWithTextContent').textContent = textWithCR;
document.getElementById('crWithInnerHTML').innerHTML = textWithCR;
document.getElementById('crWithInnerText').innerText = textWithCR;
var textWithLF = "line3\nline4\n";
document.getElementById('lfWithTextContent').textContent = textWithLF;
document.getElementById('lfWithInnerHTML').innerHTML = textWithLF;
document.getElementById('lfWithInnerText').innerText = textWithLF;
.formatted {
white-space: pre-wrap;
}
<div id="crWithTextContent" class="formatted"></div><br/>
<div id="crWithInnerHTML" class="formatted"></div><br/>
<div id="crWithInnerText" class="formatted"></div><br/>
<div id="lfWithTextContent" class="formatted"></div><br/>
<div id="lfWithInnerHTML" class="formatted"></div><br/>
<div id="lfWithInnerText" class="formatted"></div>
I also looked at the spec, which says:
This attribute returns the text content of this node and its descendants. [...]
On getting, no serialization is performed, the returned string does not contain any markup.
No whitespace normalization is performed and the returned string does not contain the white spaces in element content [...]
Well, if "the returned string does not contain the white spaces in element content", then why does it seems that in the following code \n
exists when we get textContent
(by printing it on console), while \r
doesn't?
var textWithCR = "line1\rline2\r";
document.getElementById('crWithTextContent').textContent = textWithCR;
var textWithLF = "line3\nline4\n";
document.getElementById('lfWithTextContent').textContent = textWithLF;
console.log(document.getElementById('crWithTextContent').textContent);
console.log(document.getElementById('lfWithTextContent').textContent);
.formatted {
white-space: pre-wrap;
}
<div id="crWithTextContent" class="formatted"></div><br/>
<div id="lfWithTextContent" class="formatted"></div>
What's the reason for this textContent
behavior when it contains a \r
?
Your \r
(U+000D CR) is there at index 5:
const elem = document.getElementById('test');
elem.textContent = "line1\rline2\r";
console.log( elem.textContent );
console.log( elem.textContent.charCodeAt( 5 ) ); // 13
console.log( "\r".charCodeAt( 0 ) ); // same char
<div id="test"></div>
The problem you are facing is that CSS doesn't define U+000D CR as a segment-break, and nor does HTML.
HTML when it normalizes newlines will convert all \r\n
sequences to \n
and then all remaining \r
to \n
, so effectively getting rid of all lonely \r
characters. However, Node.textContent
doesn't call this normalize-newlines algorithm, so they're not converted to \n
and not interpreted as segment-break.
For this to happen, you would need to set your element's content by an other mean which will call this algorithm, but doing so, you will loose your original data.
const elem = document.getElementById('test');
elem.innerHTML = "line1\rline2\r";
console.log( elem.textContent );
console.log( elem.textContent.charCodeAt( 5 ) ); // converted to \n (U+000A => 10)
#test { white-space: pre-wrap }
<div id="test"></div>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With