Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does "Content-type: application/json; charset=utf-8" really mean?

When I make a POST request with a JSON body to my REST service I include Content-type: application/json; charset=utf-8 in the message header. Without this header, I get an error from the service. I can also successfully use Content-type: application/json without the ;charset=utf-8 portion.

What exactly does charset=utf-8 do ? I know it specifies the character encoding but the service works fine without it. Does this encoding limit the characters that can be in the message body?

like image 504
DenaliHardtail Avatar asked Feb 13 '12 02:02

DenaliHardtail


People also ask

What does content type application JSON mean?

Content-Type. application/json. Indicates that the request body format is JSON. application/xml. Indicates that the request body format is XML.

What is the meaning of charset UTF-8?

UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”

What is JSON UTF-8?

The default encoding is UTF-8, and JSON texts which are encoded in UTF-8 are interoperable in the sense that they will be read successfully by the maximum number of implementations; there are many implementations which cannot successfully read texts in other encodings (such as UTF-16 and UTF-32).

What is UTF-8 and why it is used?

UTF-8 is a character encoding system. It lets you represent characters as ASCII text, while still allowing for international characters, such as Chinese characters. As of the mid 2020s, UTF-8 is one of the most popular encoding systems.


2 Answers

The header just denotes what the content is encoded in. It is not necessarily possible to deduce the type of the content from the content itself, i.e. you can't necessarily just look at the content and know what to do with it. That's what HTTP headers are for, they tell the recipient what kind of content they're (supposedly) dealing with.

Content-type: application/json; charset=utf-8 designates the content to be in JSON format, encoded in the UTF-8 character encoding. Designating the encoding is somewhat redundant for JSON, since the default (only?) encoding for JSON is UTF-8. So in this case the receiving server apparently is happy knowing that it's dealing with JSON and assumes that the encoding is UTF-8 by default, that's why it works with or without the header.

Does this encoding limit the characters that can be in the message body?

No. You can send anything you want in the header and the body. But, if the two don't match, you may get wrong results. If you specify in the header that the content is UTF-8 encoded but you're actually sending Latin1 encoded content, the receiver may produce garbage data, trying to interpret Latin1 encoded data as UTF-8. If of course you specify that you're sending Latin1 encoded data and you're actually doing so, then yes, you're limited to the 256 characters you can encode in Latin1.

like image 67
deceze Avatar answered Sep 23 '22 21:09

deceze


To substantiate @deceze's claim that the default JSON encoding is UTF-8...

From IETF RFC4627:

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.

      00 00 00 xx  UTF-32BE       00 xx 00 xx  UTF-16BE       xx 00 00 00  UTF-32LE       xx 00 xx 00  UTF-16LE       xx xx xx xx  UTF-8 
like image 28
Drew Noakes Avatar answered Sep 24 '22 21:09

Drew Noakes



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!