I've been reading around and it seems there is no very well coherent and fully accepted terminology for the URL parts. Is that true? I'd like to know which standards exists for URL parts terminology. What is the most common? Is there any well established standard?
I found the following:
     foo://example.com:8042/over/there?name=ferret#nose
     \_/   \______________/\_________/ \_________/ \__/
      |           |            |            |        |
   scheme     authority       path        query   fragment
      |   _____________________|__
     / \ /                        \
     urn:example:animal:ferret:nose
window.location from Javascript on browsersprotocol://username:password@hostname:port/pathname?search#hash
-----------------------------href------------------------------
                             -----host----
-----------      origin      -------------
protocol - protocol scheme of the URL, including the final ':'hostname - domain nameport - port numberpathname - /pathnamesearch - ?parametershash - #fragment_identifierusername - username specified before the domain namepassword - password specified before the domain namehref - the entire URLorigin - protocol://hostname:porthost - hostname:porturl
Above the line with the URL you see node's url module old API, whilst under the line you see the new API. It seems node shifted from a RFC standard terminology to a more browser friendly standard terminology, that is, similar to browser's windows.location.
┌────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                              href                                              │
├──────────┬──┬─────────────────────┬────────────────────────┬───────────────────────────┬───────┤
│ protocol │  │        auth         │          host          │           path            │ hash  │
│          │  │                     ├─────────────────┬──────┼──────────┬────────────────┤       │
│          │  │                     │    hostname     │ port │ pathname │     search     │       │
│          │  │                     │                 │      │          ├─┬──────────────┤       │
│          │  │                     │                 │      │          │ │    query     │       │
"  https:   //    user   :   pass   @ sub.example.com : 8080   /p/a/t/h  ?  query=string   #hash "
│          │  │          │          │    hostname     │ port │          │                │       │
│          │  │          │          ├─────────────────┴──────┤          │                │       │
│ protocol │  │ username │ password │          host          │          │                │       │
├──────────┴──┼──────────┴──────────┼────────────────────────┤          │                │       │
│   origin    │                     │         origin         │ pathname │     search     │ hash  │
├─────────────┴─────────────────────┴────────────────────────┴──────────┴────────────────┴───────┤
│                                              href                                              │
└────────────────────────────────────────────────────────────────────────────────────────────────┘
URL: http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#00h02m30s
Some of my concerns:
Is window.location a standard or based on a standard?
Shall I call http:// the protocol or the scheme?
Shall I say host or authority?
Why nor window.location nor node have properties for TLD or other domain parts, when available?
The terminological difference between hostname (example.com) and
host (example.com:8080) is well established?
for node origin does not include username:password@ whilst for windows.location it does
I'd like to follow on my code a well established standard or best practises.
A canonical URL is the URL of the best representative page from a group of duplicate pages, according to Google. For example, if you have two URLs for the same page (such as example.com? dress=1234 and example.com/dresses/1234 ), Google chooses one as canonical.
A Canonical URL is an HTML tag in the <head> section of a web page. The best way to show the search engine which page URL has the original content.
Use a rel="canonical" link tag To indicate when a page is a duplicate of another page, you can use a <link> tag in the head section of your HTML. Suppose you want https://example.com/dresses/green-dresses to be the canonical URL, even though a variety of URLs can access this content.
Canonical URLs Consolidate Links For Duplicate Content and Manage Syndicated Content. Canonical URLs help search engines combine information about a URL into one authoritative URL. Besides, they also help to consolidate page ranking to your preferred URL.
The URI standard is STD 66. This is currently mapped to RFC 3986.
So for the generic URI syntax, these terms are authoritative, currently:
schemeauthorityuserinfohostportpathqueryfragmentTerminology depends on which architectural style/technology you are using.
I prefer REST style for identifying different parts of my url REST URI Standard
But I repeat again there are no single universal standard to represent URL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With