@@ -50,11 +50,12 @@ URL Parsing
5050The URL parsing functions focus on splitting a URL string into its components,
5151or on combining URL components into a URL string.
5252
53- .. function :: urlparse (urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)
53+ .. function :: urlsplit (urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)
5454
55- Parse a URL into six components, returning a 6-item :term: `named tuple `. This
56- corresponds to the general structure of a URL:
57- ``scheme://netloc/path;parameters?query#fragment ``.
55+ Parse a URL into five components, returning a 5-item :term: `named tuple `
56+ :class: `SplitResult ` or :class: `SplitResultBytes `.
57+ This corresponds to the general structure of a URL:
58+ ``scheme://netloc/path?query#fragment ``.
5859 Each tuple item is a string, possibly empty, or ``None `` if
5960 *missing_as_none * is true.
6061 Not defined component are represented an empty string (by default) or
@@ -68,15 +69,15 @@ or on combining URL components into a URL string.
6869 .. doctest ::
6970 :options: +NORMALIZE_WHITESPACE
7071
71- >>> from urllib.parse import urlparse
72- >>> urlparse (" scheme://netloc/path;parameters ?query#fragment" )
73- ParseResult (scheme='scheme', netloc='netloc', path='/path;parameters', params=' ',
72+ >>> from urllib.parse import urlsplit
73+ >>> urlsplit (" scheme://netloc/path?query#fragment" )
74+ SplitResult (scheme='scheme', netloc='netloc', path='/path',
7475 query='query', fragment='fragment')
75- >>> o = urlparse (" http://docs.python.org:80/3/library/urllib.parse.html?"
76+ >>> o = urlsplit (" http://docs.python.org:80/3/library/urllib.parse.html?"
7677 ... " highlight=params#url-parsing" )
7778 >>> o
78- ParseResult (scheme='http', netloc='docs.python.org:80',
79- path='/3/library/urllib.parse.html', params='',
79+ SplitResult (scheme='http', netloc='docs.python.org:80',
80+ path='/3/library/urllib.parse.html',
8081 query='highlight=params', fragment='url-parsing')
8182 >>> o.scheme
8283 'http'
@@ -88,42 +89,42 @@ or on combining URL components into a URL string.
8889 80
8990 >>> o._replace(fragment = " " ).geturl()
9091 'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params'
91- >>> urlparse (" http://docs.python.org?" )
92- ParseResult (scheme='http', netloc='docs.python.org',
93- path='', params='', query='', fragment='')
94- >>> urlparse (" http://docs.python.org?" , missing_as_none = True )
95- ParseResult (scheme='http', netloc='docs.python.org',
96- path='', params=None, query='', fragment=None)
97-
98- Following the syntax specifications in :rfc: `1808 `, urlparse recognizes
92+ >>> urlsplit (" http://docs.python.org?" )
93+ SplitResult (scheme='http', netloc='docs.python.org', path=' ',
94+ query='', fragment='')
95+ >>> urlsplit (" http://docs.python.org?" , missing_as_none = True )
96+ SplitResult (scheme='http', netloc='docs.python.org', path=' ',
97+ query='', fragment=None)
98+
99+ Following the syntax specifications in :rfc: `1808 `, :func: ` !urlsplit ` recognizes
99100 a netloc only if it is properly introduced by '//'. Otherwise the
100101 input is presumed to be a relative URL and thus to start with
101102 a path component.
102103
103104 .. doctest ::
104105 :options: +NORMALIZE_WHITESPACE
105106
106- >>> from urllib.parse import urlparse
107- >>> urlparse (' //www.cwi.nl:80/%7E guido/Python.html' )
108- ParseResult (scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
109- params='', query='', fragment='')
110- >>> urlparse (' www.cwi.nl/%7E guido/Python.html' )
111- ParseResult (scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
112- params='', query='', fragment='')
113- >>> urlparse (' help/Python.html' )
114- ParseResult (scheme='', netloc='', path='help/Python.html',
115- params='', query='', fragment='')
116- >>> urlparse (' help/Python.html' , missing_as_none = True )
117- ParseResult (scheme=None, netloc=None, path='help/Python.html',
118- params=None, query=None, fragment=None)
107+ >>> from urllib.parse import urlsplit
108+ >>> urlsplit (' //www.cwi.nl:80/%7E guido/Python.html' )
109+ SplitResult (scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
110+ query='', fragment='')
111+ >>> urlsplit (' www.cwi.nl/%7E guido/Python.html' )
112+ SplitResult (scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
113+ query='', fragment='')
114+ >>> urlsplit (' help/Python.html' )
115+ SplitResult (scheme='', netloc='', path='help/Python.html',
116+ query='', fragment='')
117+ >>> urlsplit (' help/Python.html' , missing_as_none = True )
118+ SplitResult (scheme=None, netloc=None, path='help/Python.html',
119+ query=None, fragment=None)
119120
120121 The *scheme * argument gives the default addressing scheme, to be
121122 used only if the URL does not specify one. It should be the same type
122123 (text or bytes) as *urlstring * or ``None ``, except that the ``'' `` is
123124 always allowed, and is automatically converted to ``b'' `` if appropriate.
124125
125126 If the *allow_fragments * argument is false, fragment identifiers are not
126- recognized. Instead, they are parsed as part of the path, parameters
127+ recognized. Instead, they are parsed as part of the path
127128 or query component, and :attr: `fragment ` is set to ``None `` or the empty
128129 string (depending on the value of *missing_as_none *) in the return value.
129130
@@ -140,12 +141,9 @@ or on combining URL components into a URL string.
140141 +------------------+-------+-------------------------+-------------------------------+
141142 | :attr: `path ` | 2 | Hierarchical path | empty string |
142143 +------------------+-------+-------------------------+-------------------------------+
143- | :attr: `params ` | 3 | Parameters for last | ``None `` or empty string [1 ]_ |
144- | | | path element | |
145- +------------------+-------+-------------------------+-------------------------------+
146- | :attr: `query ` | 4 | Query component | ``None `` or empty string [1 ]_ |
144+ | :attr: `query ` | 3 | Query component | ``None `` or empty string [1 ]_ |
147145 +------------------+-------+-------------------------+-------------------------------+
148- | :attr: `fragment ` | 5 | Fragment identifier | ``None `` or empty string [1 ]_ |
146+ | :attr: `fragment ` | 4 | Fragment identifier | ``None `` or empty string [1 ]_ |
149147 +------------------+-------+-------------------------+-------------------------------+
150148 | :attr: `username ` | | User name | ``None `` |
151149 +------------------+-------+-------------------------+-------------------------------+
@@ -171,26 +169,30 @@ or on combining URL components into a URL string.
171169 ``# ``, ``@ ``, or ``: `` will raise a :exc: `ValueError `. If the URL is
172170 decomposed before parsing, no error will be raised.
173171
172+ Following some of the `WHATWG spec `_ that updates :rfc: `3986 `, leading C0
173+ control and space characters are stripped from the URL. ``\n ``,
174+ ``\r `` and tab ``\t `` characters are removed from the URL at any position.
175+
174176 As is the case with all named tuples, the subclass has a few additional methods
175177 and attributes that are particularly useful. One such method is :meth: `_replace `.
176- The :meth: `_replace ` method will return a new ParseResult object replacing specified
177- fields with new values.
178+ The :meth: `_replace ` method will return a new :class: ` SplitResult ` object
179+ replacing specified fields with new values.
178180
179181 .. doctest ::
180182 :options: +NORMALIZE_WHITESPACE
181183
182- >>> from urllib.parse import urlparse
183- >>> u = urlparse (' //www.cwi.nl:80/%7E guido/Python.html' )
184+ >>> from urllib.parse import urlsplit
185+ >>> u = urlsplit (' //www.cwi.nl:80/%7E guido/Python.html' )
184186 >>> u
185- ParseResult (scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
186- params='', query='', fragment='')
187+ SplitResult (scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
188+ query='', fragment='')
187189 >>> u._replace(scheme = ' http' )
188- ParseResult (scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
189- params='', query='', fragment='')
190+ SplitResult (scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
191+ query='', fragment='')
190192
191193 .. warning ::
192194
193- :func: `urlparse ` does not perform validation. See :ref: `URL parsing
195+ :func: `urlsplit ` does not perform validation. See :ref: `URL parsing
194196 security <url-parsing-security>` for details.
195197
196198 .. versionchanged :: 3.2
@@ -209,9 +211,17 @@ or on combining URL components into a URL string.
209211 Characters that affect netloc parsing under NFKC normalization will
210212 now raise :exc: `ValueError `.
211213
214+ .. versionchanged :: 3.10
215+ ASCII newline and tab characters are stripped from the URL.
216+
217+ .. versionchanged :: 3.12
218+ Leading WHATWG C0 control and space characters are stripped from the URL.
219+
212220 .. versionchanged :: next
213221 Added the *missing_as_none * parameter.
214222
223+ .. _WHATWG spec : https://url.spec.whatwg.org/#concept-basic-url-parser
224+
215225
216226.. function :: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
217227
@@ -306,11 +316,11 @@ or on combining URL components into a URL string.
306316 separator key, with ``& `` as the default separator.
307317
308318
309- .. function :: urlunparse (parts)
310- urlunparse (parts, *, keep_empty)
319+ .. function :: urlunsplit (parts)
320+ urlunsplit (parts, *, keep_empty)
311321
312- Construct a URL from a tuple as returned by `` urlparse() ` `. The *parts *
313- argument can be any six -item iterable.
322+ Construct a URL from a tuple as returned by :func: ` urlsplit `. The *parts *
323+ argument can be any five -item iterable.
314324
315325 This may result in a slightly different, but equivalent URL, if the
316326 URL that was parsed originally had unnecessary delimiters (for example,
@@ -321,97 +331,33 @@ or on combining URL components into a URL string.
321331 This allows rebuilding a URL that was parsed with option
322332 ``missing_as_none=True ``.
323333 By default, *keep_empty * is true if *parts * is the result of the
324- :func: `urlparse ` call with ``missing_as_none=True ``.
334+ :func: `urlsplit ` call with ``missing_as_none=True ``.
325335
326336 .. versionchanged :: next
327337 Added the *keep_empty * parameter.
328338
329339
330- .. function :: urlsplit(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)
331-
332- This is similar to :func: `urlparse `, but does not split the params from the URL.
333- This should generally be used instead of :func: `urlparse ` if the more recent URL
334- syntax allowing parameters to be applied to each segment of the *path * portion
335- of the URL (see :rfc: `2396 `) is wanted. A separate function is needed to
336- separate the path segments and parameters. This function returns a 5-item
337- :term: `named tuple `::
338-
339- (addressing scheme, network location, path, query, fragment identifier).
340-
341- The return value is a :term: `named tuple `, its items can be accessed by index
342- or as named attributes:
343-
344- +------------------+-------+-------------------------+-------------------------------+
345- | Attribute | Index | Value | Value if not present |
346- +==================+=======+=========================+===============================+
347- | :attr: `scheme ` | 0 | URL scheme specifier | *scheme * parameter or |
348- | | | | empty string [1 ]_ |
349- +------------------+-------+-------------------------+-------------------------------+
350- | :attr: `netloc ` | 1 | Network location part | ``None `` or empty string [2 ]_ |
351- +------------------+-------+-------------------------+-------------------------------+
352- | :attr: `path ` | 2 | Hierarchical path | empty string |
353- +------------------+-------+-------------------------+-------------------------------+
354- | :attr: `query ` | 3 | Query component | ``None `` or empty string [2 ]_ |
355- +------------------+-------+-------------------------+-------------------------------+
356- | :attr: `fragment ` | 4 | Fragment identifier | ``None `` or empty string [2 ]_ |
357- +------------------+-------+-------------------------+-------------------------------+
358- | :attr: `username ` | | User name | ``None `` |
359- +------------------+-------+-------------------------+-------------------------------+
360- | :attr: `password ` | | Password | ``None `` |
361- +------------------+-------+-------------------------+-------------------------------+
362- | :attr: `hostname ` | | Host name (lower case) | ``None `` |
363- +------------------+-------+-------------------------+-------------------------------+
364- | :attr: `port ` | | Port number as integer, | ``None `` |
365- | | | if present | |
366- +------------------+-------+-------------------------+-------------------------------+
367-
368- .. [2 ] Depending on the value of the *missing_as_none * argument.
369-
370- Reading the :attr: `port ` attribute will raise a :exc: `ValueError ` if
371- an invalid port is specified in the URL. See section
372- :ref: `urlparse-result-object ` for more information on the result object.
373-
374- Unmatched square brackets in the :attr: `netloc ` attribute will raise a
375- :exc: `ValueError `.
376-
377- Characters in the :attr: `netloc ` attribute that decompose under NFKC
378- normalization (as used by the IDNA encoding) into any of ``/ ``, ``? ``,
379- ``# ``, ``@ ``, or ``: `` will raise a :exc: `ValueError `. If the URL is
380- decomposed before parsing, no error will be raised.
381-
382- Following some of the `WHATWG spec `_ that updates RFC 3986, leading C0
383- control and space characters are stripped from the URL. ``\n ``,
384- ``\r `` and tab ``\t `` characters are removed from the URL at any position.
385-
386- .. warning ::
387-
388- :func: `urlsplit ` does not perform validation. See :ref: `URL parsing
389- security <url-parsing-security>` for details.
390-
391- .. versionchanged :: 3.6
392- Out-of-range port numbers now raise :exc: `ValueError `, instead of
393- returning ``None ``.
394-
395- .. versionchanged :: 3.8
396- Characters that affect netloc parsing under NFKC normalization will
397- now raise :exc: `ValueError `.
398-
399- .. versionchanged :: 3.10
400- ASCII newline and tab characters are stripped from the URL.
401-
402- .. versionchanged :: 3.12
403- Leading WHATWG C0 control and space characters are stripped from the URL.
340+ .. function :: urlparse(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)
404341
405- .. versionchanged :: next
406- Added the *missing_as_none * parameter.
342+ This is similar to :func: `urlsplit `, but additionally splits the *path *
343+ component on *path * and *params *.
344+ This function returns a 6-item :term: `named tuple ` :class: `ParseResult `
345+ or :class: `ParseResultBytes `.
346+ Its items are the same as for the :func: `!urlsplit ` result, except that
347+ *params * is inserted at index 3, between *path * and *query *.
407348
408- .. _WHATWG spec : https://url.spec.whatwg.org/#concept-basic-url-parser
349+ This function is based on obsoleted :rfc: `1738 ` and :rfc: `1808 `, which
350+ listed *params * as the main URL component.
351+ The more recent URL syntax allows parameters to be applied to each segment
352+ of the *path * portion of the URL (see :rfc: `3986 `).
353+ :func: `urlsplit ` should generally be used instead of :func: `urlparse `.
354+ A separate function is needed to separate the path segments and parameters.
409355
410- .. function :: urlunsplit (parts)
411- urlunsplit (parts, *, keep_empty)
356+ .. function :: urlunparse (parts)
357+ urlunparse (parts, *, keep_empty)
412358
413- Combine the elements of a tuple as returned by :func: `urlsplit ` into a
414- complete URL as a string. The *parts * argument can be any five -item
359+ Combine the elements of a tuple as returned by :func: `urlparse ` into a
360+ complete URL as a string. The *parts * argument can be any six -item
415361 iterable.
416362
417363 This may result in a slightly different, but equivalent URL, if the
@@ -423,7 +369,7 @@ or on combining URL components into a URL string.
423369 This allows rebuilding a URL that was parsed with option
424370 ``missing_as_none=True ``.
425371 By default, *keep_empty * is true if *parts * is the result of the
426- :func: `urlsplit ` call with ``missing_as_none=True ``.
372+ :func: `urlparse ` call with ``missing_as_none=True ``.
427373
428374 .. versionchanged :: next
429375 Added the *keep_empty * parameter.
@@ -441,7 +387,7 @@ or on combining URL components into a URL string.
441387 'http://www.cwi.nl/%7Eguido/FAQ.html'
442388
443389 The *allow_fragments * argument has the same meaning and default as for
444- :func: `urlparse `.
390+ :func: `urlsplit `.
445391
446392 .. note ::
447393
@@ -587,7 +533,7 @@ individual URL quoting functions.
587533Structured Parse Results
588534------------------------
589535
590- The result objects from the :func: `urlparse `, :func: `urlsplit ` and
536+ The result objects from the :func: `urlsplit `, :func: `urlparse ` and
591537:func: `urldefrag ` functions are subclasses of the :class: `tuple ` type.
592538These subclasses add the attributes listed in the documentation for
593539those functions, the encoding and decoding support described in the
0 commit comments