Extracting the hostname from a url is generally easier than parsing the domain. The hostname of a url consists of the entire domain plus sub-domain. We can easily parse this with a regular expression, which looks for everything to the left of the double-slash in a url. We remove the “www” (and associated integers e.g. www2), as this is typically not needed when parsing the hostname from a url.
The above code will successfully parse the hostnames for the following example urls:
In the above code, we take the hostname from the url, split its parts by period, and then reverse the list of parts. We then concatenate the first two parts of the hostname (actually, the last two parts of the hostname, but reversed). We optionally pre-pend any additional pieces, per the TLD rules for the domain, such as in the case of .co.uk.
The above code will successfully parse the domains for the following example urls: