From b7863f28a5d80a22ad2892c6789db66c5cd55b57 Mon Sep 17 00:00:00 2001 From: Barry Warsaw Date: Tue, 24 Sep 2024 18:05:04 -0700 Subject: [PATCH 1/9] First round of changes --- peps/pep-0694.rst | 389 +++++++++++++++++++++++++++++----------------- 1 file changed, 244 insertions(+), 145 deletions(-) diff --git a/peps/pep-0694.rst b/peps/pep-0694.rst index 30e3a32b8bd..6dc1784ef7c 100644 --- a/peps/pep-0694.rst +++ b/peps/pep-0694.rst @@ -38,26 +38,27 @@ Beyond the above, there are a number of major issues with the current API: not go last, possibly some hard to build packages are attempting to be built from source. -- It has very limited support for communicating back to the user, with no support - for multiple errors, warnings, deprecations, etc. It is limited entirely to the - HTTP status code and reason phrase, of which the reason phrase has been - deprecated since HTTP/2 (:rfc:`RFC 7540 <7540#section-8.1.2.4>`). +- It has very limited support for communicating back to the user, with no + support for multiple errors, warnings, deprecations, etc. It is limited + entirely to the HTTP status code and reason phrase, of which the reason + phrase has been deprecated since HTTP/2 (:rfc:`RFC 7540 + <7540#section-8.1.2.4>`). -- The metadata for a release/file is submitted alongside the file, however this - metadata is famously unreliable, and most installers instead choose to download - the entire file and read that in part due to that unreliability. +- The metadata for a release/file is submitted alongside the file, however + this metadata is famously unreliable, and most installers instead choose to + download the entire file and read that in part due to that unreliability. - There is no mechanism for allowing a repository to do any sort of sanity checks before bandwidth starts getting expended on an upload, whereas a lot of the cases of invalid metadata or incorrect permissions could be checked prior to upload. -- It has no support for "staging" a draft release prior to publishing it to the +- It has no support for "staging" a release prior to publishing it to the repository. - It has no support for creating new projects, without uploading a file. -This PEP proposes a new API for uploads, and deprecates the existing non standard +This PEP proposes a new API for uploads, and deprecates the existing legacy API. @@ -122,10 +123,11 @@ roughly two things: - This is actually fine if used as a pre-check, but it should be validated against the actual ``METADATA`` or similar files within the distribution. -- It supports a single request, using nothing but form data, that either succeeds +- It supports only a single request, using nothing but form data, that either succeeds or fails, and everything is done and contained within that single request. -We then propose a multi-request workflow, that essentially boils down to: +To address these issues, we propose a multi-request workflow, which at a high +level involves these steps: 1. Initiate an upload session. 2. Upload the file(s) as part of the upload session. @@ -136,6 +138,11 @@ All URLs described here will be relative to the root endpoint, which may be located anywhere within the url structure of a domain. So it could be at ``https://upload.example.com/``, or ``https://example.com/upload/``. +Specifically for PyPI, we propose the root URL to be +``https://upload.pypi.org/2.0``. This root URL will be considered provisional +while the feature is being tested, and will be blessed as permanent after +sufficient testing with live projects. + Versioning ---------- @@ -152,8 +159,8 @@ Endpoints Create an Upload Session ~~~~~~~~~~~~~~~~~~~~~~~~ -To create a new upload session, you can send a ``POST`` request to ``/``, -with a payload that looks like: +To create a new upload session, you can send a ``POST`` request to ``/`` +(i.e. the root URL), with a payload that looks like: .. code-block:: json @@ -162,23 +169,34 @@ with a payload that looks like: "api-version": "2.0" }, "name": "foo", - "version": "1.0" + "version": "1.0", + "nonce": "" } -This currently has three keys, ``meta``, ``name``, and ``version``. +The request includes the following top-level keys: + +``meta`` (**required**) + Describes information about the payload itself. Currently, the only + defined subkey is ``api-version`` the value of which must be the string ``"2.0"``. + +``name`` (**required**) + The name of the project that this session is attempting to add files to. + +``version`` (**required**) + The version of the project that this session is attempting to add files to. -The ``meta`` key is included in all payloads, and it describes information about the -payload itself. +``nonce`` (**optional**) + An additional client-side string input to the `"session token" `_ + algorithm. Details are provided below, but if this key is omitted, it is equivalent + to passing the empty string. -The ``name`` key is the name of the project that this session is attempting to -add files to. -The ``version`` key is the version of the project that this session is attepmting to -add files to. +Upon successful session creation, the server returns a ``201 Created`` +response. If an error occurs, the appropriate ``4xx`` code will be returned, +as described in the :ref:`session-errors` section. -If creating the session was successful, then the server must return a response -that looks like: +The successful response includes the following JSON content: .. code-block:: json @@ -200,74 +218,99 @@ that looks like: } -Besides the ``meta`` key, this response has five keys, ``urls``, ``valid-for``, -``status``, ``files``, and ``notices``. +Besides the ``meta`` key, which has the same format as the POST JSON, the +success response has the following keys: -The ``urls`` key is a dictionary mapping identifiers to related URLs to this -session. +``urls`` + A dictionary mapping :ref:`"identifiers" ` to related + URLs to this session, the details of which are provided below. -The ``valid-for`` key is an integer representing how long, in seconds, until the -server itself will expire this session (and thus all of the URLs contained in it). -The session **SHOULD** live at least this much longer unless the client itself -has canceled the session. Servers **MAY** choose to *increase* this time, but should -never *decrease* it, except naturally through the passage of time. +``valid-for`` + An integer representing how long, in seconds, until the server itself will + expire this session (and thus all of the URLs contained in it). The + session **SHOULD** live at least this much longer unless the client itself + has canceled the session. Servers **MAY** choose to *increase* this time, + but should never *decrease* it, except naturally through the passage of time. -The ``status`` key is a string that contains one of ``pending``, ``published``, -``errored``, or ``canceled``, this string represents the overall status of -the session. +``status`` + A string that contains one of ``pending``, ``published``, ``error``, or + ``canceled``, this string represents the overall :ref:`status of the + session `. -The ``files`` key is a mapping containing the filenames that have been uploaded -to this session, to a mapping containing details about each file. +``files`` + A mapping containing the filenames that have been uploaded to this + session, to a mapping containing details about each :ref:`file referenced + in this session `> -The ``notices`` key is an optional key that points to an array of notices that -the server wishes to communicate to the end user that are not specific to any -one file. +``notices`` + An optional key that points to an array of human-readable informational + notices that the server wishes to communicate to the end user. These + notices are specific to the overall session, not to any particular file in + the session. -For each filename in ``files`` the mapping has three keys, ``status``, ``url``, -and ``notices``. +.. _url-identifiers: -The ``status`` key is the same as the top level ``status`` key, except that it -indicates the status of a specific file. +For the ``urls`` key in the success JSON, the following subkeys are valid: -The ``url`` key is the *absolute* URL that the client should upload that specific -file to (or use to delete that file). +``upload`` + The upload endpoint for this session to initiate :ref:`file uploads + ` for each file that will be part of this upload session. -The ``notices`` key is an optional key, that is an array of notices that the server -wishes to communicate to the end user that are specific to this file. +``stage`` + The endpoint where these files are :ref:`available to be accessed + ` prior to publishing the session. This can be used to + download and verify the not-yet-public files. -The required response code to a successful creation of the session is a -``201 Created`` response and it **MUST** include a ``Location`` header that is the -URL for this session, which may be used to check its status or cancel it. +``publish`` + The endpoint which triggers :ref:`publishing this session `. -For the ``urls`` key, there are currently three keys that may appear: +``status`` + The endpoint that can be used to query the :ref:`current status + ` of this session. -The ``upload`` key, which is the upload endpoint for this session to initiate -a file upload. +``cancel`` + The endpoint that can be used to :ref:`cancel the session `. -The ``draft`` key, which is the repository URL that these files are available at -prior to publishing. +.. _session-files: -The ``publish`` key, which is the endpoint to trigger publishing the session. +The ``files`` key contains a mapping from the names of the files participating +in this session to a sub-mapping with the following keys: +``status`` + A string with the same values and semantics as the same-named + :ref:`session status key `, except that it indicates the + status of the specific referenced file. -In addition to the above, if a second session is created for the same name+version -pair, then the upload server **MUST** return the already existing session rather -than creating a new, empty one. +``url`` + The *absolute* URL that the client should use to reference this specific file. This + URL is used to retrieve, replace or delete the referenced file. If a ``nonce`` was + provided, the URL **MUST** be obfuscated with a non-guessable token as described in + the `session token `_ section. +``notices`` + An optional key with similar format and semantics as the ``notices`` + session key, except that these notices are specific to the referenced file. + +If a second session is created for the same name-version pair while an upload +session for that pair is already ``pending``, then the upload server **MUST** +return the already existing session JSON status, along with the ``200 Ok`` +status code rather than creating a new, empty session. + + +.. _file-uploads: Upload Each File ~~~~~~~~~~~~~~~~ -Once you have initiated an upload session for one or more files, then you have -to actually upload each of those files. - -There is no set endpoint for actually uploading the file, that is given to the -client by the server as part of the creation of the upload session, and clients -**MUST NOT** assume that there is any commonality to what those URLs look like from -one session to the next. +Once an upload session has been created, the response provides the URL you can +use to upload files into that session. There is no predetermined endpoint for +uploading files into the session; the upload URL is given to the client by the +server in the session creation response JSON. Clients **MUST NOT** assume +there is any commonality to those URLs from one session to the next. -To initiate a file upload, a client sends a ``POST`` request to the upload URL -in the session, with a request body that looks like: +To initiate a file upload, a client sends a ``POST`` request to the URL given +in the ``upload`` subkey of the ``urls`` key in the session creation response. +The request body has the following format: .. code-block:: json @@ -282,28 +325,35 @@ in the session, with a request body that looks like: } -Besides the standard ``meta`` key, this currently has 4 keys: +Besides the standard ``meta`` key, the request JSON has the following +additional keys: -- ``filename``: The filename of the file being uploaded. -- ``size``: The size, in bytes, of the file that is being uploaded. -- ``hashes``: A mapping of hash names to hex encoded digests, each of these digests - are the digests of that file, when hashed by the hash identified in the name. +``filename`` + The name of the file being uploaded. - By default, any hash algorithm available via `hashlib - `_ (specifically any that can - be passed to ``hashlib.new()`` and do not require additional parameters) can - be used as a key for the hashes dictionary. At least one secure algorithm from - ``hashlib.algorithms_guaranteed`` **MUST** always be included. At the time - of this PEP, ``sha256`` specifically is recommended. +``size`` + The size in bytes of the file that is being uploaded. - Multiple hashes may be passed at a time, but all hashes must be valid for the - file. -- ``metadata``: An optional key that is a string containing the file's - `core metadata `_. +``hashes`` + A mapping of hash names to hex-encoded digests. Each of these digests are + the checksums of the file being uploaded when hashed by the algorithm + identified in the name. -Servers **MAY** use the data provided in this response to do some sanity checking -prior to allowing the file to be uploaded, which may include but is not limited -to: + By default, any hash algorithm available in `hashlib + `_ can be used as a key + for the hashes dictionary [#fn1]_. At least one secure algorithm from + ``hashlib.algorithms_guaranteed`` **MUST** always be included. At the time + of this PEP, ``sha256`` is specifically recommended. + + Multiple hashes may be passed at a time, but all hashes provided **MUST** + be valid for the file. + +``metadata`` + An optional key with a string value containing the file's `core metadata + `_. + +Servers **MAY** use the data provided in this request to do some sanity checking prior to +allowing the file to be uploaded, which may include but is not limited to: - Checking if the ``filename`` already exists. - Checking if the ``size`` would invalidate some quota. @@ -313,8 +363,8 @@ If the server determines that the client should attempt the upload, it will retu a ``201 Created`` response, with an empty body, and a ``Location`` header pointing to the URL that the file itself should be uploaded to. -At this point, the status of the session should show the filename, with the above url -included in it. +At this point, the status of the session should show the filename, with the above location +URL included in it. Upload Data @@ -328,11 +378,11 @@ as that requires fewer requests and typically has better performance. However for particularly large files, uploading within a single request may result in timeouts, so larger files may need to be uploaded in multiple chunks. -In either case, the client must generate a unique token (or nonce) for each upload -attempt for a file, and **MUST** include that token in each request in the ``Upload-Token`` -header. The ``Upload-Token`` is a binary blob encoded using base64 surrounded by -a ``:`` on either side. Clients **SHOULD** use at least 32 bytes of cryptographically -random data. You can generate it using the following: +In either case, the client **MUST** generate a unique token (or nonce) for each upload for +a file, and **MUST** include that token in each request in the ``Upload-Token`` +header. The ``Upload-Token`` is a binary blob encoded using base64 surrounded by a ``:`` +on either side. Clients **SHOULD** use at least 32 bytes of cryptographically secure +data. For example, the following algorithm can be used: .. code-block:: python @@ -341,47 +391,62 @@ random data. You can generate it using the following: header = ":" + base64.b64encode(secrets.token_bytes(32)).decode() + ":" -The one time that it is permissible to omit the ``Upload-Token`` from an upload -request is when a client wishes to opt out of the resumable or chunked file upload -feature completely. In that case, they **MAY** omit the ``Upload-Token``, and the -file must be successfully uploaded in a single HTTP request, and if it fails, the +The one time that it is permissible to omit the ``Upload-Token`` from an upload request is +when a client wishes to opt out of the resumable or chunked file upload feature +completely. In that case, they **MAY** omit the ``Upload-Token``, and the file must be +successfully uploaded in a single HTTP request. If the non-chunked upload fails, the entire file must be resent in another single HTTP request. -To upload in a single chunk, a client sends a ``POST`` request to the URL from the -session response for that filename. The client **MUST** include a ``Content-Length`` -header that is equal to the size of the file in bytes, and this **MUST** match the -size given in the original session creation. +To upload the file in a single chunk, a client sends a ``POST`` request to the URL from +the session response for that filename. The client **MUST** include a ``Content-Length`` +header that is equal to the size of the file in bytes, and this **MUST** match the size +given in the original session creation. As an example, if uploading a 100,000 byte file, you would send headers like:: Content-Length: 100000 Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: -If the upload completes successfully, the server **MUST** respond with a -``201 Created`` status. At this point this file **MUST** not be present in the -repository, but merely staged until the upload session has completed. +If the upload completes successfully, the server **MUST** respond with a ``201 Created`` +status. The response body has no content. -To upload in multiple chunks, a client sends multiple ``POST`` requests to the same -URL as before, one for each chunk. +To upload the file in multiple chunks, a client sends multiple ``POST`` requests to the +same URL as before, one for each chunk. -This time however, the ``Content-Length`` is equal to the size, in bytes, of the -chunk that they are sending. In addition, the client **MUST** include a -``Upload-Offset`` header which indicates a byte offset that the content included -in this request starts at and a ``Upload-Incomplete`` header set to ``1``. +For chunked uploads, the ``Content-Length`` is equal to the size, in bytes, of the chunk +that they are sending. The client **MUST** include a ``Upload-Offset`` header which +indicates a byte offset that the content included in this request starts at and a +``Upload-Incomplete`` header set to ``1``. For the first chunk, the ``Upload-Offset`` +header **MUST** be set to ``0``. -As an example, if uploading a 100,000 byte file in 1000 byte chunks, and this chunk -represents bytes 1001 through 2000, you would send headers like:: +For example, if uploading a 100,000 byte file in 1000 byte chunks,the first chunk's +headers would be: Content-Length: 1000 Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: - Upload-Offset: 1001 + Upload-Offset: 0 Upload-Incomplete: 1 -However, the **final** chunk of data omits the ``Upload-Incomplete`` header, since -at that point the upload is no longer incomplete. +And the second chunk represents bytes 1000 through 1999 would include the following +headers: + + Content-Length: 1000 + Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: + Upload-Offset: 1000 + Upload-Incomplete: 1 + +.. _complete-the-upload: + +The final chunk of data **MUST** omit the ``Upload-Incomplete`` header, since at that +point the upload is complete. For each successful chunk, the server **MUST** respond with a ``202 Accepted`` -header, except for the final chunk, which **MUST** be a ``201 Created``. +header, except for the final chunk, which **MUST** be a ``201 Created``, and as with +non-chunked uploads, the body has not content. + +With both chunked and non-chunked uploads, once completed successfully, the file **MUST** +not be publicly visible in the repository, but merely staged until the upload session has +completed. The following constraints are placed on uploads regardless of whether they are single chunk or multiple chunks: @@ -391,18 +456,21 @@ single chunk or multiple chunks: **MAY** terminate any ongoing ``POST`` request that utilizes the same ``Upload-Token``. - If the offset provided in ``Upload-Offset`` is not ``0`` or the next chunk - in an incomplete upload, then the server **MUST** respond with a 409 Conflict. + in an incomplete upload, then the server **MUST** respond with a ``409 Conflict``. This + means that a client **MAY NOT** upload chunks out of order. - Once an upload has started with a specific token, you may not use another token - for that file without deleting the in progress upload. + for that file without deleting the in-progress upload. - Once a file has uploaded successfully, you may initiate another upload for - that file, and doing so will replace that file. + that file, and doing so will replace that file. This is possible until the entire + session is completed, at which point no further file uploads (either creating or + replacing a session file) is accepted. Resume Upload +++++++++++++ To resume an upload, you first have to know how much of the data the server has -already received, regardless of if you were originally uploading the file as +already received, regardless of whether you were originally uploading the file as a single chunk, or in multiple chunks. To get the status of an individual upload, a client can make a ``HEAD`` request @@ -417,8 +485,9 @@ Once the client has retrieved the offset that they need to start from, they can upload the rest of the file as described above, either in a single request containing all of the remaining data or in multiple chunks. +.. _cancel-an-upload: -Canceling an In Progress Upload +Canceling an In-Progress Upload +++++++++++++++++++++++++++++++ If a client wishes to cancel an upload of a specific file, for instance because @@ -429,14 +498,29 @@ file in the first place. A successful cancellation request **MUST** response with a ``204 No Content``. -Delete an uploaded File -+++++++++++++++++++++++ +Delete a Partial or Fully Uploaded File ++++++++++++++++++++++++++++++++++++++++ Already uploaded files may be deleted by issuing a ``DELETE`` request to the file upload URL without the ``Upload-Token``. A successful deletion request **MUST** response with a ``204 No Content``. +Replacing a Partially or Fully Uploaded File +++++++++++++++++++++++++++++++++++++++++++++ + +To replace a session file, the file upload **MUST** have been previously completed or +deleted. It is not possible to replace a session file if the upload for that file is +incomplete. Clients have two options to replace an incomplete upload: + +- `Cancel the in-progress upload `_ by issuing a ``DELETE`` of that + specific file. After this, the new file upload can be initiated. +- `Complete the in-progress upload `_ by uploading a zero-length + chunk omitting the ``Upload-Incomplete`` header. This effectively truncates and + completes the in-progress upload, after which point the new upload can commence. + + +.. _session-status: Session Status ~~~~~~~~~~~~~~ @@ -451,26 +535,28 @@ they got when they initially created the upload session, except with any changes to ``status``, ``valid-for``, or updated ``files`` reflected. +.. _session-cancellation: + Session Cancellation ~~~~~~~~~~~~~~~~~~~~ -To cancel an upload session, a client issues a ``DELETE`` request to the -same session URL as before. At which point the server marks the session as -canceled, **MAY** purge any data that was uploaded as part of that session, -and future attempts to access that session URL or any of the file upload URLs -**MAY** return a ``404 Not Found``. +To cancel an upload session, a client issues a ``DELETE`` request to the same session URL +as before. The server then marks the session as canceled, **MAY** purge any data that was +uploaded as part of that session, and future attempts to access that session URL or any of +the file upload URLs **MAY** return a ``404 Not Found``. To prevent a lot of dangling sessions, servers may also choose to cancel a session on their own accord. It is recommended that servers expunge their sessions after no less than a week, but each server may choose their own schedule. +.. _publish-session: Session Completion ~~~~~~~~~~~~~~~~~~ To complete a session, and publish the files that have been included in it, -a client **MUST** send a ``POST`` request to the ``publish`` url in the +a client **MUST** send a ``POST`` request to the ``publish`` URL in the session status payload. If the server is able to immediately complete the session, it may do so @@ -483,11 +569,17 @@ In either case, the server should include a ``Location`` header pointing back to the session status url, and if the server returned a ``202 Accepted``, the client may poll that URL to watch for the status to change. +.. _session-errors: + +Session Previewing +~~~~~~~~~~~~~~~~~~ + +XXX TBD - talk about token Errors ------ -All Error responses that contain a body will have a body that looks like: +All error responses that contain content will have a body that looks like: .. code-block:: json @@ -504,22 +596,22 @@ All Error responses that contain a body will have a body that looks like: ] } -Besides the standard ``meta`` key, this has two top level keys, ``message`` -and ``errors``. +Besides the standard ``meta`` key, this has the following top level keys: -The ``message`` key is a singular message that encapsulates all errors that -may have happened on this request. +``message`` + A singular message that encapsulates all errors that may have happened on this + request. -The ``errors`` key is an array of specific errors, each of which contains -a ``source`` key, which is a string that indicates what the source of the -error is, and a ``message`` key for that specific error. +``errors`` + An array of specific errors, each of which contains a ``source`` key, which is a + string that indicates what the source of the error is, and a ``message`` key for that + specific error. The ``message`` and ``source`` strings do not have any specific meaning, and -are intended for human interpretation to figure out what the underlying issue -was. +are intended for human interpretation to aid in diagnosing underlying issue. -Content-Types +Content Types ------------- Like :pep:`691`, this PEP proposes that all requests and responses from the @@ -542,7 +634,7 @@ Unlike :pep:`691`, this PEP does not change the existing ``1.0`` API in any way, so servers will be required to host the new API described in this PEP at a different endpoint than the existing upload API. -Which means that for the new 2.0 API, the content types would be: +Thus for the new 2.0 API, the content type would be: - **JSON:** ``application/vnd.pypi.upload.v2+json`` @@ -553,15 +645,15 @@ that clients be explicit about what versions they support. These content types **DO NOT** apply to the file uploads themselves, only to the other API requests/responses in the upload API. The files themselves should use -the ``application/octet-stream`` content-type. +the ``application/octet-stream`` content type. Version + Format Selection -------------------------- -Again similar to :pep:`691`, this PEP standardizes on using server-driven +Again, similar to :pep:`691`, this PEP standardizes on using server-driven content negotiation to allow clients to request different versions or -serialization formats, which includes the ``format`` url parameter. +serialization formats, which includes the ``format`` URL parameter. Since this PEP expects the existing legacy ``1.0`` upload API to exist at a different endpoint, and it currently only provides for JSON serialization, this @@ -725,6 +817,13 @@ you don't have to try and do any sort of protection against parallel uploads, since they're just supported. That alone might erase most of the server side implementation simplification. +Footnotes +========= +.. [#fn1] Specifically any hash algorithm name that `can be passed to + `_ + ``hashlib.new()`` which does not require additional parameters. + + Copyright ========= From 0ae84ac8e3188003b703fa20afa3364c35ce0fae Mon Sep 17 00:00:00 2001 From: Barry Warsaw Date: Wed, 25 Sep 2024 16:13:48 -0700 Subject: [PATCH 2/9] Complete the major update/rewrite of the PEP --- peps/pep-0694.rst | 165 +++++++++++++++++++++++++++++++++++----------- 1 file changed, 128 insertions(+), 37 deletions(-) diff --git a/peps/pep-0694.rst b/peps/pep-0694.rst index 6dc1784ef7c..d48402dbfb3 100644 --- a/peps/pep-0694.rst +++ b/peps/pep-0694.rst @@ -1,6 +1,6 @@ PEP: 694 Title: Upload 2.0 API for Python Package Repositories -Author: Donald Stufft +Author: Donald Stufft , Barry Warsaw Discussions-To: https://discuss.python.org/t/pep-694-upload-2-0-api-for-python-package-repositories/16879 Status: Draft Type: Standards Track @@ -159,7 +159,7 @@ Endpoints Create an Upload Session ~~~~~~~~~~~~~~~~~~~~~~~~ -To create a new upload session, you can send a ``POST`` request to ``/`` +To create a new upload session, submit a ``POST`` request to ``/`` (i.e. the root URL), with a payload that looks like: .. code-block:: json @@ -187,7 +187,7 @@ The request includes the following top-level keys: The version of the project that this session is attempting to add files to. ``nonce`` (**optional**) - An additional client-side string input to the `"session token" `_ + An additional client-side string input to the :ref:`"session token" ` algorithm. Details are provided below, but if this key is omitted, it is equivalent to passing the empty string. @@ -206,9 +206,12 @@ The successful response includes the following JSON content: }, "urls": { "upload": "...", - "draft": "...", - "publish": "..." + "stage": "...", + "publish": "...", + "status": "...", + "cancel": "..." }, + "preview-token": "", "valid-for": 604800, "status": "pending", "files": {}, @@ -218,13 +221,19 @@ The successful response includes the following JSON content: } -Besides the ``meta`` key, which has the same format as the POST JSON, the +Besides the ``meta`` key, which has the same format as the request JSON, the success response has the following keys: ``urls`` A dictionary mapping :ref:`"identifiers" ` to related URLs to this session, the details of which are provided below. +``preview-token`` + If the index supports :ref:`previewing staged releases `, this key + will contain the unique :ref:`"preview token" ` that can be provided to + installer clients in order to preview the staged release before it's published. If + the index does *not* support stage previewing, this key **MUST** be omitted. + ``valid-for`` An integer representing how long, in seconds, until the server itself will expire this session (and thus all of the URLs contained in it). The @@ -240,7 +249,7 @@ success response has the following keys: ``files`` A mapping containing the filenames that have been uploaded to this session, to a mapping containing details about each :ref:`file referenced - in this session `> + in this session `. ``notices`` An optional key that points to an array of human-readable informational @@ -257,9 +266,10 @@ For the ``urls`` key in the success JSON, the following subkeys are valid: ` for each file that will be part of this upload session. ``stage`` - The endpoint where these files are :ref:`available to be accessed - ` prior to publishing the session. This can be used to - download and verify the not-yet-public files. + The endpoint where this staged release can be :ref:`previewed ` prior + to publishing the session. This can be used to download and verify the not-yet-public + files. If the index does not support previewing staged releases, this key **MUST** be + omitted. ``publish`` The endpoint which triggers :ref:`publishing this session `. @@ -285,7 +295,7 @@ in this session to a sub-mapping with the following keys: The *absolute* URL that the client should use to reference this specific file. This URL is used to retrieve, replace or delete the referenced file. If a ``nonce`` was provided, the URL **MUST** be obfuscated with a non-guessable token as described in - the `session token `_ section. + the :ref:`session token ` section. ``notices`` An optional key with similar format and semantics as the ``notices`` @@ -296,6 +306,12 @@ session for that pair is already ``pending``, then the upload server **MUST** return the already existing session JSON status, along with the ``200 Ok`` status code rather than creating a new, empty session. +If a session is created for a project which has no previous releases, then the index +**MAY** reserve the project name , however it **MUST NOT** be possible to navigate to that +project using the "regular" (i.e. :ref:`unstaged `) access protocols, +*until* the stage is published. If this first-release stage gets canceled, then the index +**SHOULD** delete the project record, as if it were never uploaded. + .. _file-uploads: @@ -378,11 +394,11 @@ as that requires fewer requests and typically has better performance. However for particularly large files, uploading within a single request may result in timeouts, so larger files may need to be uploaded in multiple chunks. -In either case, the client **MUST** generate a unique token (or nonce) for each upload for -a file, and **MUST** include that token in each request in the ``Upload-Token`` -header. The ``Upload-Token`` is a binary blob encoded using base64 surrounded by a ``:`` -on either side. Clients **SHOULD** use at least 32 bytes of cryptographically secure -data. For example, the following algorithm can be used: +In either case, the client **MUST** generate a unique token for each upload for a file, +and **MUST** include that token in each request in the ``Upload-Token`` header. The +``Upload-Token`` is a binary blob encoded using base64 surrounded by a ``:`` on either +side. Clients **SHOULD** use at least 32 bytes of cryptographically secure data. For +example, the following algorithm can be used: .. code-block:: python @@ -397,10 +413,10 @@ completely. In that case, they **MAY** omit the ``Upload-Token``, and the file m successfully uploaded in a single HTTP request. If the non-chunked upload fails, the entire file must be resent in another single HTTP request. -To upload the file in a single chunk, a client sends a ``POST`` request to the URL from -the session response for that filename. The client **MUST** include a ``Content-Length`` -header that is equal to the size of the file in bytes, and this **MUST** match the size -given in the original session creation. +To upload the file in a single chunk, a client sends a ``POST`` request to the +``Location`` header URL from the session response for that filename. The client **MUST** +include a ``Content-Length`` header that is equal to the size of the file in bytes, and +this **MUST** match the size given in the original session creation. As an example, if uploading a 100,000 byte file, you would send headers like:: @@ -422,6 +438,8 @@ header **MUST** be set to ``0``. For example, if uploading a 100,000 byte file in 1000 byte chunks,the first chunk's headers would be: +.. code-block:: email + Content-Length: 1000 Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: Upload-Offset: 0 @@ -430,6 +448,8 @@ headers would be: And the second chunk represents bytes 1000 through 1999 would include the following headers: +.. code-block:: email + Content-Length: 1000 Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: Upload-Offset: 1000 @@ -445,8 +465,8 @@ header, except for the final chunk, which **MUST** be a ``201 Created``, and as non-chunked uploads, the body has not content. With both chunked and non-chunked uploads, once completed successfully, the file **MUST** -not be publicly visible in the repository, but merely staged until the upload session has -completed. +not be publicly visible in the repository, but merely staged until the upload session is +:ref:`completed `. The following constraints are placed on uploads regardless of whether they are single chunk or multiple chunks: @@ -460,7 +480,7 @@ single chunk or multiple chunks: means that a client **MAY NOT** upload chunks out of order. - Once an upload has started with a specific token, you may not use another token for that file without deleting the in-progress upload. -- Once a file has uploaded successfully, you may initiate another upload for +- Once a file upload has completed successfully, you may initiate another upload for that file, and doing so will replace that file. This is possible until the entire session is completed, at which point no further file uploads (either creating or replacing a session file) is accepted. @@ -513,9 +533,9 @@ To replace a session file, the file upload **MUST** have been previously complet deleted. It is not possible to replace a session file if the upload for that file is incomplete. Clients have two options to replace an incomplete upload: -- `Cancel the in-progress upload `_ by issuing a ``DELETE`` of that +- :ref:`Cancel the in-progress upload ` by issuing a ``DELETE`` of that specific file. After this, the new file upload can be initiated. -- `Complete the in-progress upload `_ by uploading a zero-length +- :ref:`Complete the in-progress upload ` by uploading a zero-length chunk omitting the ``Upload-Incomplete`` header. This effectively truncates and completes the in-progress upload, after which point the new upload can commence. @@ -545,17 +565,16 @@ as before. The server then marks the session as canceled, **MAY** purge any data uploaded as part of that session, and future attempts to access that session URL or any of the file upload URLs **MAY** return a ``404 Not Found``. -To prevent a lot of dangling sessions, servers may also choose to cancel a -session on their own accord. It is recommended that servers expunge their -sessions after no less than a week, but each server may choose their own -schedule. +To prevent dangling sessions, servers may also choose to cancel timed-out sessions on +their own accord. It is recommended that servers expunge their sessions after no less than +a week, but each server may choose their own schedule. .. _publish-session: Session Completion ~~~~~~~~~~~~~~~~~~ -To complete a session, and publish the files that have been included in it, +To complete a session and publish the files that have been included in it, a client **MUST** send a ``POST`` request to the ``publish`` URL in the session status payload. @@ -569,12 +588,84 @@ In either case, the server should include a ``Location`` header pointing back to the session status url, and if the server returned a ``202 Accepted``, the client may poll that URL to watch for the status to change. -.. _session-errors: +It is an error to publish a session that has no staged files. In this case, a +``400 Bad Request`` is turned and the session is canceled, just as if an +explicit :ref:`session cancellation ` was issued. -Session Previewing -~~~~~~~~~~~~~~~~~~ +.. _session-token: + +Session Token +~~~~~~~~~~~~~ + +When initiating the staged uploads, clients can provide a ``nonce``, essentially a string +with arbitrary content. The ``nonce`` is optional, and if omitted, is equivalent to +providing an empty string. + +In order to support previewing of staged uploads, the package ``name`` and ``version``, +along with this ``nonce`` are used as input into a hashing algorithm to produce a unique +"session token". This session token is valid for the life of the session (i.e., until it +is completed, either by cancellation or publishing), and can be provided to installer +clients such as ``pip`` to gain access to the staged releases. + +The use of the ``nonce`` allows clients to decide whether they want to obscure the +visibility of their staged releases or not, and there can be good reasons for either +choice. + +The `SHA256 algorithm `_ is +used to turn these inputs into a unique token, in the order ``name``, ``version``, +``nonce``, using the following Python code as an example: -XXX TBD - talk about token +.. code-block:: python + + from hashlib import sha256 + + def gentoken(name: bytes, version: bytes, nonce: bytes = b''): + h = sha256() + h.update(name) + h.update(version) + h.update(nonce) + return h.hexdigest() + +It should be evident that if no ``nonce`` is provided in the session initiation request, +then the preview token is easily guessable from the package name and version number alone. +Clients can elect to omit the ``nonce`` (or set it to the empty string themselves) if they +want to allow previewing from anybody without access to the preview token. By providing a +non-empty ``nonce``, clients can elect for security-through-obscurity, but this does not +protect staged files behind any kind of authentication. + +.. _staged-preview: + +Stage Previews +~~~~~~~~~~~~~~ + +The ability to preview staged releases before they are published is an important feature, +enabling an additional level of last-mile testing before the release is available to the +public. Indexes **MAY** provide this functionality in one or both of the following ways. + +* Through the URL provided in the ``stage`` subkey of the :ref:`URL + identifiers ` returned when the session is created. The + ``stage`` URL can be passed to installers such as ``pip`` by setting the + `--extra-index-url + `_ + flag to this value. Multiple stages can even be previewed by repeating this + flag with multiple values. + +* By passing the ``Stage-Token`` header to the `Simple Repository API + `_ + requests or the :pep:`691` JSON-based Simple API, with the value from the + ``preview-token`` subkey of the JSON response to the session creation + request. Multiple ``Stage-Token`` headers are allowed. It is recommended + that installers add a ``--staged `` or similarly named option to set + the ``Stage-Token`` header at the command line. + +In both cases, the index will return views that expose the staged releases to the +installer tool, making them available to download and install into a virtual environment +built for that last-mile testing. The former option allows for existing installers to +preview staged releases with no changes, although perhaps in a less user-friendly way. +The latter option can be a better user experience, but the details of this are left to +installer tool maintainers to decide. + +.. _session-errors: Errors ------ @@ -722,7 +813,7 @@ Multipart Uploads vs tus ------------------------ This PEP currently bases the actual uploading of files on an internet draft -from tus.io that supports resumable file uploads. +from ``tus.io`` that supports resumable file uploads. That protocol requires a few things: @@ -746,7 +837,7 @@ The other benefit is that even if you do want to support resumption, you can still just ``POST`` the file, and unless you *need* to resume the download, that's all you have to do. -Another, possibly theoretical, benefit is that for hashing the uploaded files, +Another, possibly theoretical benefit is that for hashing the uploaded files, the serial chunks requirement means that the server can maintain hashing state between requests, update it for each request, then write that file back to storage. Unfortunately this isn't actually possible to do with Python's hashlib, @@ -807,7 +898,7 @@ It does have its own downsides: - See above about whether this is actually a downside in practice, or if it's just in theory. -I lean towards the tus style resumable uploads as I think they're simpler +I lean towards the ``tus`` style resumable uploads as I think they're simpler to use and to implement, and the main downside is that we possibly leave some multi-threaded performance on the table, which I think that I'm personally fine with? From 14a540238a88d2f0f68446d979d3bbfd3b57527e Mon Sep 17 00:00:00 2001 From: Barry Warsaw Date: Fri, 18 Oct 2024 16:44:11 -0700 Subject: [PATCH 3/9] Add a section on authentication --- peps/pep-0694.rst | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/peps/pep-0694.rst b/peps/pep-0694.rst index d48402dbfb3..ab58ca26f49 100644 --- a/peps/pep-0694.rst +++ b/peps/pep-0694.rst @@ -86,7 +86,6 @@ So in practice, on PyPI, the endpoint is ``https://upload.pypi.org/legacy/?:action=file_upload&protocol_version=1``. - Encoding -------- @@ -111,6 +110,15 @@ of ``content``, and if there is a PGP signature attached, then it will be includ as a ``application/octet-stream`` part with the name of ``gpg_signature``. +Authentication +-------------- + +Upload authentication is also not standardized, but on PyPI, authentication is +through `API tokens `__ or `Trusted Publisher (OpenID +Connect) `__. Other indexes may +support different authentication methods. + + Specification ============= @@ -134,15 +142,6 @@ level involves these steps: 3. Complete the upload session. 4. (Optional) Check the status of an upload session. -All URLs described here will be relative to the root endpoint, which may be -located anywhere within the url structure of a domain. So it could be at -``https://upload.example.com/``, or ``https://example.com/upload/``. - -Specifically for PyPI, we propose the root URL to be -``https://upload.pypi.org/2.0``. This root URL will be considered provisional -while the feature is being tested, and will be blessed as permanent after -sufficient testing with live projects. - Versioning ---------- @@ -156,6 +155,16 @@ that API in any way. Endpoints --------- +All URLs described here will be relative to the root endpoint, which may be +located anywhere within the url structure of a domain. So it could be at +``https://upload.example.com/``, or ``https://example.com/upload/``. + +Specifically for PyPI, we propose the root URL to be +``https://upload.pypi.org/2.0``. This root URL will be considered provisional +while the feature is being tested, and will be blessed as permanent after +sufficient testing with live projects. + + Create an Upload Session ~~~~~~~~~~~~~~~~~~~~~~~~ From b3de19a9a2d68eefd14e2f96789b642ffea30b38 Mon Sep 17 00:00:00 2001 From: Barry Warsaw Date: Wed, 4 Dec 2024 08:14:24 -0800 Subject: [PATCH 4/9] Much updating, checkpointing --- peps/pep-0694.rst | 840 +++++++++++++++++++++++++--------------------- 1 file changed, 464 insertions(+), 376 deletions(-) diff --git a/peps/pep-0694.rst b/peps/pep-0694.rst index ab58ca26f49..cd0142b61b8 100644 --- a/peps/pep-0694.rst +++ b/peps/pep-0694.rst @@ -1,6 +1,6 @@ PEP: 694 -Title: Upload 2.0 API for Python Package Repositories -Author: Donald Stufft , Barry Warsaw +Title: Upload 2.0 API for Python Package Indexes +Author: Barry Warsaw , Donald Stufft Discussions-To: https://discuss.python.org/t/pep-694-upload-2-0-api-for-python-package-repositories/16879 Status: Draft Type: Standards Track @@ -13,163 +13,177 @@ Post-History: `27-Jun-2022 `__. The stage can also be used to simultaneously + and atomically publish all the wheels in a package release. -Beyond the above, there are a number of major issues with the current API: +* artifacts which can be overwritten and replaced, until a stage is published. -- It is a fully synchronous API, which means that we're forced to have a single - request being held open for potentially a long time, both for the upload itself, - and then while the repository processes the uploaded file to determine success - or failure. +* asynchronous and "chunked" uploads, for more efficient use of network bandwidth. Chunked uploads + also enable resumable uploads of individual artifacts. -- It does not support any mechanism for resuming an upload, with the largest file - size on PyPI being just under 1GB in size, that's a lot of wasted bandwidth if - a large file has a network blip towards the end of an upload. +* detailed status on the state of artifact uploads. -- It treats a single file as the atomic unit of operation, which can be problematic - when a release might have multiple binary wheels which can cause people to get - different versions while the files are uploading, and if the sdist happens to - not go last, possibly some hard to build packages are attempting to be built - from source. +* new project creation without requiring the uploading of an artifact. -- It has very limited support for communicating back to the user, with no - support for multiple errors, warnings, deprecations, etc. It is limited - entirely to the HTTP status code and reason phrase, of which the reason - phrase has been deprecated since HTTP/2 (:rfc:`RFC 7540 - <7540#section-8.1.2.4>`). +Once this new upload API is adopted, the existing legacy API can be deprecated, however this PEP +does not propose a deprecation schedule for the legacy API. -- The metadata for a release/file is submitted alongside the file, however - this metadata is famously unreliable, and most installers instead choose to - download the entire file and read that in part due to that unreliability. -- There is no mechanism for allowing a repository to do any sort of sanity - checks before bandwidth starts getting expended on an upload, whereas a lot - of the cases of invalid metadata or incorrect permissions could be checked - prior to upload. +Rationale +========= + +There is currently no standardized API for uploading files to a Python package index such as +PyPI. Instead, everyone has been forced to reverse engineer the non-standard, `"legacy" +`__ API. + +The legacy API, while functional, leaks implementation details of the original PyPI code base, +which has been faithfully replicated in the new code base and alternative implementations. + +In addition, there are a number of major issues with the legacy API: + +* It is fully synchronous, which forces requests to be held open both for the upload itself, and + while the index processes the uploaded file to determine success or failure. + +* It does not support any mechanism for resuming an upload. With the largest default file size on + PyPI being just under 1GB in size, requiring the entire upload to complete successfully means + bandwidth is wasted when such uploads experience a network interruption while the request is in + progress. + +* The atomic unit of operation is a single file. This is problematic when a release logically + includes multiple binary wheels, leading to race conditions where consumers get different versions + of the package if they are unlucky enough to require a package before their platform's wheel has + completely uploaded. If the release uploads an sdist first, this may also manifest in some + consumers seeing only the sdist, triggering a local build from source. + +* Status reporting is very limited. There's no support for reporting multiple errors, warnings, + deprecations, etc. Status is limited to the HTTP status code and reason phrase, of which the + reason phrase has been deprecated since HTTP/2 (:rfc:`RFC 7540 <7540#section-8.1.2.4>`). + +* Metadata for a release is submitted alongside the file. However, as this metadata is famously + unreliable, most installers instead choose to download the entire file and read the metadata from + there. + +* There is no mechanism for allowing an index to do any sort of sanity checks before bandwidth gets + expended on an upload. Many cases of invalid metadata or incorrect permissions could be checked + prior to uploading files. -- It has no support for "staging" a release prior to publishing it to the - repository. +* There is no support for "staging" a release prior to publishing it to the index. -- It has no support for creating new projects, without uploading a file. +* Creation of new projects requires the uploading of at least one file, leading to "stub" uploads + to claim a project namespace. -This PEP proposes a new API for uploads, and deprecates the existing legacy -API. +The new upload API proposed in this PEP solves all of these problems, providing for a much more +flexible, bandwidth friendly approach, with better error reporting, a better release testing +experience, and atomic and simultaneous publishing of all release artifacts. -Status Quo +Legacy API ========== -This does not attempt to be a fully exhaustive documentation of the current API, but -give a high level overview of the existing API. +The following is an overview of the legacy API. For the detailed description, consult the +`PyPI user guide documentation `__ Endpoint -------- -The existing upload API (and the now removed register API) lives at an url, currently -``https://upload.pypi.org/legacy/``, and to communicate which specific API you want -to call, you add a ``:action`` url parameter with a value of ``file_upload``. The values -of ``submit``, ``submit_pkg_info``, and ``doc_upload`` also used to be supported, but -no longer are. +The existing upload API lives at a base URL. For PyPI, that URL is currently +``https://upload.pypi.org/legacy/``. Clients performing uploads specify the API they want to call +by adding an ``:action`` URL parameter with a value of ``file_upload``. [#fn-action]_ -It also has a ``protocol_version`` parameter, in theory to allow new versions of the -API to be written, but in practice that has never happened, and the value is always -``1``. +The legacy API also has a ``protocol_version`` parameter, in theory allowing new versions of the API +to be defined. In practice this has never happened, and the value is always ``1``. -So in practice, on PyPI, the endpoint is +Thus, the effective upload API on PyPI is: ``https://upload.pypi.org/legacy/?:action=file_upload&protocol_version=1``. Encoding -------- -The data to be submitted is submitted as a ``POST`` request with the content type -of ``multipart/form-data``. This is due to the historical nature, that this API -was not actually designed as an API, but rather was a form on the initial PyPI -implementation, then client code was written to programmatically submit that form. +The data to be submitted is submitted as a ``POST`` request with the content type of +``multipart/form-data``. This reflects the legacy API's historical nature, which was originally +designed not as an API, but rather as a web form on the initial PyPI implementation, with client code +written to programmatically submit that form. Content ------- -Roughly speaking, the metadata contained within the package is submitted as parts -where the content-disposition is ``form-data``, and the name is the name of the -field. The names of these various pieces of metadata are not documented, and they -sometimes, but not always match the names used in the ``METADATA`` files. The casing -rarely matches though, but overall the ``METADATA`` to ``form-data`` conversion is -extremely inconsistent. +Roughly speaking, the metadata contained within the package is submitted as parts where the content +disposition is ``form-data``, and the metadata key is the name of the field. The names of these +various pieces of metadata are not documented, and they sometimes, but not always match the names +used in the ``METADATA`` files for package artifacts. The case rarely matches, and the ``form-data`` +to ``METADATA`` conversion is inconsistent. -The file itself is then sent as a ``application/octet-stream`` part with the name -of ``content``, and if there is a PGP signature attached, then it will be included -as a ``application/octet-stream`` part with the name of ``gpg_signature``. +The upload artifact file itself is sent as a ``application/octet-stream`` part with the name of +``content``, and if there is a PGP signature attached, then it will be included as a +``application/octet-stream`` part with the name of ``gpg_signature``. Authentication -------------- -Upload authentication is also not standardized, but on PyPI, authentication is -through `API tokens `__ or `Trusted Publisher (OpenID -Connect) `__. Other indexes may -support different authentication methods. +Upload authentication is also not standardized. On PyPI, authentication is through `API tokens +`__ or `Trusted Publisher (OpenID Connect) +`__. Other indexes may support different authentication +methods. -Specification -============= +Upload 2.0 API Specification +============================ This PEP traces the root cause of most of the issues with the existing API to be roughly two things: - The metadata is submitted alongside the file, rather than being parsed from the - file itself. + file itself. [#fn-metadata]_ - - This is actually fine if used as a pre-check, but it should be validated - against the actual ``METADATA`` or similar files within the distribution. +- It supports only a single request, using only form data, that either succeeds or fails, and all + actions are atomic within that single request. -- It supports only a single request, using nothing but form data, that either succeeds - or fails, and everything is done and contained within that single request. +To address these issues, this PEP proposes a multi-request workflow, which at a high level involves +these steps: -To address these issues, we propose a multi-request workflow, which at a high -level involves these steps: - -1. Initiate an upload session. -2. Upload the file(s) as part of the upload session. -3. Complete the upload session. -4. (Optional) Check the status of an upload session. +#. Initiate an upload session, creating a release stage. +#. Upload the file(s) to that stage as part of the upload session. +#. Complete the upload session, publishing or discarding the stage. +#. Optionally check the status of an upload session. Versioning ---------- -This PEP uses the same ``MAJOR.MINOR`` versioning system as used in :pep:`691`, -but it is otherwise independently versioned. The existing API is considered by -this spec to be version ``1.0``, but it otherwise does not attempt to modify -that API in any way. +This PEP uses the same ``MAJOR.MINOR`` versioning system as used in :pep:`691`, but it is otherwise +independently versioned. The legacy API is considered by this spec to be version ``1.0``, but does +not modify that API in any way. + +The API proposed in this PEP therefor has the version number ``2.0``. -Endpoints ---------- +Root Endpoint +------------- -All URLs described here will be relative to the root endpoint, which may be -located anywhere within the url structure of a domain. So it could be at +All URLs described here will be relative to the root endpoint, which may be located anywhere within +the url structure of a domain. For example, the root endpoint could be ``https://upload.example.com/``, or ``https://example.com/upload/``. -Specifically for PyPI, we propose the root URL to be -``https://upload.pypi.org/2.0``. This root URL will be considered provisional -while the feature is being tested, and will be blessed as permanent after -sufficient testing with live projects. +Specifically for PyPI, this PEP proposes to implement the root endpoint URL to be +``https://upload.pypi.org/2.0``. This root URL will be considered provisional while the feature is +being tested, and will be blessed as permanent after sufficient testing with live projects. +.. _session-create: + Create an Upload Session ~~~~~~~~~~~~~~~~~~~~~~~~ -To create a new upload session, submit a ``POST`` request to ``/`` -(i.e. the root URL), with a payload that looks like: +To create a new upload session, submit a ``POST`` request to ``/`` (i.e. the root URL), with a +payload that looks like: .. code-block:: json @@ -186,8 +200,8 @@ To create a new upload session, submit a ``POST`` request to ``/`` The request includes the following top-level keys: ``meta`` (**required**) - Describes information about the payload itself. Currently, the only - defined subkey is ``api-version`` the value of which must be the string ``"2.0"``. + Describes information about the payload itself. Currently, the only defined sub-key is + ``api-version`` the value of which must be the string ``"2.0"``. ``name`` (**required**) The name of the project that this session is attempting to add files to. @@ -200,10 +214,24 @@ The request includes the following top-level keys: algorithm. Details are provided below, but if this key is omitted, it is equivalent to passing the empty string. +Upon successful session creation, the server returns a ``201 Created`` response. If an error +occurs, the appropriate ``4xx`` code will be returned, as described in the :ref:`session-errors` +section. + +If a session is created for a project which has no previous release, then the index **MAY** reserve +the project name before the session is published, however it **MUST NOT** be possible to navigate to +that project using the "regular" (i.e. :ref:`unstaged `) access protocols, *until* +the stage is published. If this first-release stage gets canceled, then the index **SHOULD** delete +the project record, as if it were never uploaded. -Upon successful session creation, the server returns a ``201 Created`` -response. If an error occurs, the appropriate ``4xx`` code will be returned, -as described in the :ref:`session-errors` section. +The session is owned by the user that created it, and all subsequent requests **MUST** be performed +with the same credentials, otherwise a ``403 Forbidden`` will be returned on those subsequent +requests. + +.. _session-response: + +Response body ++++++++++++++ The successful response includes the following JSON content: @@ -213,14 +241,15 @@ The successful response includes the following JSON content: "meta": { "api-version": "2.0" }, - "urls": { - "upload": "...", + "links": { "stage": "...", - "publish": "...", - "status": "...", + "upload": "...", + "status": "xxx-remove-me", + "extend": "...", "cancel": "..." + "publish": "...", }, - "preview-token": "", + "session-token": "", "valid-for": 604800, "status": "pending", "files": {}, @@ -230,112 +259,112 @@ The successful response includes the following JSON content: } -Besides the ``meta`` key, which has the same format as the request JSON, the -success response has the following keys: +Besides the ``meta`` key, which has the same format as the request JSON, the success response has +the following keys: -``urls`` - A dictionary mapping :ref:`"identifiers" ` to related - URLs to this session, the details of which are provided below. +``links`` + A dictionary mapping :ref:`keys to URLs ` related to this session, the details of + which are provided below. -``preview-token`` - If the index supports :ref:`previewing staged releases `, this key - will contain the unique :ref:`"preview token" ` that can be provided to - installer clients in order to preview the staged release before it's published. If - the index does *not* support stage previewing, this key **MUST** be omitted. +``session-token`` + If the index supports :ref:`previewing staged releases `, this key will contain + the unique :ref:`"session token" ` that can be provided to installers in order to + preview the staged release before it's published. If the index does *not* support stage + previewing, this key **MUST** be omitted. ``valid-for`` - An integer representing how long, in seconds, until the server itself will - expire this session (and thus all of the URLs contained in it). The - session **SHOULD** live at least this much longer unless the client itself - has canceled the session. Servers **MAY** choose to *increase* this time, - but should never *decrease* it, except naturally through the passage of time. + An integer representing how long, in seconds, until the server itself will expire this session, + and thus all of its content, including any uploaded files and the URL links related to the + session. This value is roughly relative to the time at which the session was created or + :ref:`extended `. The session **SHOULD** live at least this much longer + unless the client itself has canceled or published the session. Servers **MAY** choose to + *increase* this time, but should never *decrease* it, except naturally through the passage of + time. Clients can query the `session status ` to get time remaining in the + session. ``status`` - A string that contains one of ``pending``, ``published``, ``error``, or - ``canceled``, this string represents the overall :ref:`status of the - session `. + A string that contains one of ``pending``, ``published``, ``error``, or ``canceled``, + representing the overall :ref:`status of the session `. ``files`` - A mapping containing the filenames that have been uploaded to this - session, to a mapping containing details about each :ref:`file referenced - in this session `. + A mapping containing the filenames that have been uploaded to this session, to a mapping + containing details about each :ref:`file referenced in this session `. ``notices`` - An optional key that points to an array of human-readable informational - notices that the server wishes to communicate to the end user. These - notices are specific to the overall session, not to any particular file in - the session. + An optional key that points to an array of human-readable informational notices that the server + wishes to communicate to the end user. These notices are specific to the overall session, not + to any particular file in the session. -.. _url-identifiers: +.. _session-links: -For the ``urls`` key in the success JSON, the following subkeys are valid: +Session Links ++++++++++++++ + +For the ``links`` key in the success JSON, the following sub-keys are valid: ``upload`` - The upload endpoint for this session to initiate :ref:`file uploads - ` for each file that will be part of this upload session. + The endpoint for this session clients will use to initiate :ref:`uploads ` for + each file to be included in this session. ``stage`` - The endpoint where this staged release can be :ref:`previewed ` prior - to publishing the session. This can be used to download and verify the not-yet-public - files. If the index does not support previewing staged releases, this key **MUST** be - omitted. + The endpoint where this staged release can be :ref:`previewed ` prior to + publishing the session. This can be used to download and verify the not-yet-public files. If + the index does not support previewing staged releases, this key **MUST** be omitted. ``publish`` The endpoint which triggers :ref:`publishing this session `. ``status`` - The endpoint that can be used to query the :ref:`current status - ` of this session. + The endpoint that can be used to query the :ref:`current status ` of this + session. + +``extend`` + The endpoint that can be used to :ref:`extend ` the current session, *if* the + server supports it. If the server does not support session extension, this key **MUST** be omitted. ``cancel`` - The endpoint that can be used to :ref:`cancel the session `. + The endpoint that can be used to :ref:`cancel and discard the session `. + .. _session-files: -The ``files`` key contains a mapping from the names of the files participating -in this session to a sub-mapping with the following keys: +Session Files ++++++++++++++ + +The ``files`` key contains a mapping from the names of the files uploaded in this session to a +sub-mapping with the following keys: ``status`` - A string with the same values and semantics as the same-named - :ref:`session status key `, except that it indicates the - status of the specific referenced file. + A string with the same values and semantics as the :ref:`session status key `, + except that it indicates the status of the specific referenced file. -``url`` - The *absolute* URL that the client should use to reference this specific file. This - URL is used to retrieve, replace or delete the referenced file. If a ``nonce`` was - provided, the URL **MUST** be obfuscated with a non-guessable token as described in - the :ref:`session token ` section. +``link`` + The *absolute* URL that the client should use to reference this specific file. This URL is used + to retrieve, replace, or delete the :ref:`referenced file `. If a ``nonce`` was + provided, this URL **MUST** be obfuscated with a non-guessable token as described in the + :ref:`session token ` section. ``notices`` - An optional key with similar format and semantics as the ``notices`` - session key, except that these notices are specific to the referenced file. + An optional key with similar format and semantics as the ``notices`` session key, except that + these notices are specific to the referenced file. -If a second session is created for the same name-version pair while an upload -session for that pair is already ``pending``, then the upload server **MUST** -return the already existing session JSON status, along with the ``200 Ok`` -status code rather than creating a new, empty session. - -If a session is created for a project which has no previous releases, then the index -**MAY** reserve the project name , however it **MUST NOT** be possible to navigate to that -project using the "regular" (i.e. :ref:`unstaged `) access protocols, -*until* the stage is published. If this first-release stage gets canceled, then the index -**SHOULD** delete the project record, as if it were never uploaded. +If a second session is created for the same name-version pair while a session for that pair is in +the ``pending`` state, then the server **MUST** return the JSON status response for the already +existing session, along with the ``200 Ok`` status code rather than creating a new, empty session. .. _file-uploads: -Upload Each File -~~~~~~~~~~~~~~~~ +File Upload +~~~~~~~~~~~ -Once an upload session has been created, the response provides the URL you can -use to upload files into that session. There is no predetermined endpoint for -uploading files into the session; the upload URL is given to the client by the -server in the session creation response JSON. Clients **MUST NOT** assume -there is any commonality to those URLs from one session to the next. +After creating the session, the ``upload`` endpoint from the response's :ref:`session links +` mapping is used to upload new files into that session. Clients **MUST** use the +provided ``upload`` URL and **MUST NOT** assume there is any pattern or commonality to those URLs +from one session to the next. -To initiate a file upload, a client sends a ``POST`` request to the URL given -in the ``upload`` subkey of the ``urls`` key in the session creation response. -The request body has the following format: +To initiate a file upload, a client sends a ``POST`` request to the ``upload`` URL. The request +body has the following JSON format: .. code-block:: json @@ -350,179 +379,175 @@ The request body has the following format: } -Besides the standard ``meta`` key, the request JSON has the following -additional keys: +Besides the standard ``meta`` key, the request JSON has the following additional keys: -``filename`` +``filename`` (**required**) The name of the file being uploaded. -``size`` - The size in bytes of the file that is being uploaded. +``size`` (**required**) + The size in bytes of the file being uploaded. -``hashes`` - A mapping of hash names to hex-encoded digests. Each of these digests are - the checksums of the file being uploaded when hashed by the algorithm - identified in the name. +``hashes`` (**required**) + A mapping of hash names to hex-encoded digests. Each of these digests are the checksums of the + file being uploaded when hashed by the algorithm identified in the name. By default, any hash algorithm available in `hashlib - `_ can be used as a key - for the hashes dictionary [#fn1]_. At least one secure algorithm from - ``hashlib.algorithms_guaranteed`` **MUST** always be included. At the time - of this PEP, ``sha256`` is specifically recommended. + `_ can be used as a key for the hashes + dictionary [#fn-hash]_. At least one secure algorithm from ``hashlib.algorithms_guaranteed`` + **MUST** always be included. This PEP specifically recommends ``sha256``. - Multiple hashes may be passed at a time, but all hashes provided **MUST** - be valid for the file. + Multiple hashes may be passed at a time, but all hashes provided **MUST** be valid for the file. -``metadata`` - An optional key with a string value containing the file's `core metadata +``metadata`` (**optional**) + If given, this is a string value containing the file's `core metadata `_. Servers **MAY** use the data provided in this request to do some sanity checking prior to -allowing the file to be uploaded, which may include but is not limited to: - -- Checking if the ``filename`` already exists. -- Checking if the ``size`` would invalidate some quota. -- Checking if the contents of the ``metadata``, if provided, are valid. +allowing the file to be uploaded. These checks may include, but are not limited to: -If the server determines that the client should attempt the upload, it will return -a ``201 Created`` response, with an empty body, and a ``Location`` header pointing -to the URL that the file itself should be uploaded to. +- Checking if the ``filename`` already exists in a published release +- Checking if the ``size`` would exceed any project or file quota +- Checking if the contents of the ``metadata``, if provided, are valid -At this point, the status of the session should show the filename, with the above location -URL included in it. +If the server determines that upload should proceed, it will return a ``201 Created`` response, with +an empty body, and a ``Location`` header pointing to the URL that the file itself should be uploaded +to. The :ref:`status ` of the session will also include the filename in the +``files`` mapping, with the above ``Location`` URL included in under the ``link`` sub-key. -Upload Data -+++++++++++ +Upload File Contents +++++++++++++++++++++ -To upload the file, a client has two choices, they may upload the file as either -a single chunk, or as multiple chunks. Either option is acceptable, but it is -recommended that most clients should choose to upload each file as a single chunk -as that requires fewer requests and typically has better performance. +The actual file contents are uploaded by issuing a ``POST`` request to this URL location. The +client may either upload the entire file in a single request, or it may opt for "chunked" upload +where the file contents are split into multiple requests, as described below. -However for particularly large files, uploading within a single request may result -in timeouts, so larger files may need to be uploaded in multiple chunks. +In either case, the request **MUST** include both a ``Content-Length`` and a ``Content-Type`` +header. The ``Content-Type`` header **MUST** be ``application/octet-stream``. The body of the +request as unencoded raw binary data. -In either case, the client **MUST** generate a unique token for each upload for a file, -and **MUST** include that token in each request in the ``Upload-Token`` header. The -``Upload-Token`` is a binary blob encoded using base64 surrounded by a ``:`` on either -side. Clients **SHOULD** use at least 32 bytes of cryptographically secure data. For -example, the following algorithm can be used: +For all-in-one requests, where the entire file contents is uploaded in a single request, the +``Content-Length`` size is the size of the entire file in bytes, and this **MUST** match the size +given in the original session creation request. If this single-request upload fails, the entire +file must be resent in another single HTTP request. This is the recommended, preferred format for +file uploads since fewer requests are required. -.. code-block:: python - - import base64 - import secrets +As an example, if uploading a 100,000 byte file, you would send headers like:: - header = ":" + base64.b64encode(secrets.token_bytes(32)).decode() + ":" + Content-Length: 100000 + Content-Type: application/octet-stream + Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: -The one time that it is permissible to omit the ``Upload-Token`` from an upload request is -when a client wishes to opt out of the resumable or chunked file upload feature -completely. In that case, they **MAY** omit the ``Upload-Token``, and the file must be -successfully uploaded in a single HTTP request. If the non-chunked upload fails, the -entire file must be resent in another single HTTP request. +If the upload completes successfully, the server **MUST** respond with a ``201 Created`` status. +The response body has no content. -To upload the file in a single chunk, a client sends a ``POST`` request to the -``Location`` header URL from the session response for that filename. The client **MUST** -include a ``Content-Length`` header that is equal to the size of the file in bytes, and -this **MUST** match the size given in the original session creation. +However for large files, uploading the file in a single request may result in timeouts, so clients +can opt to upload the file in multiple chunks. For chunked uploads, the client **MUST** +[#fn-chunk-token]_ generate a unique token which is provided in each request for this file upload. +This token is a binary blob, `base64 `__ encoded, +bracketed by the ``:`` (colon) character, and included in the ``Upload-Token`` header. Clients +**SHOULD** use at least 32 bytes of cryptographically secure data. For example, the following +algorithm can be used: -As an example, if uploading a 100,000 byte file, you would send headers like:: +.. code-block:: python - Content-Length: 100000 - Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: + from base64 import b64encode + from secrets import token_bytes -If the upload completes successfully, the server **MUST** respond with a ``201 Created`` -status. The response body has no content. + header = f':{b64encode(token_bytes(32)).decode()}:' To upload the file in multiple chunks, a client sends multiple ``POST`` requests to the -same URL as before, one for each chunk. +same URL as above, with one request per chunk. -For chunked uploads, the ``Content-Length`` is equal to the size, in bytes, of the chunk -that they are sending. The client **MUST** include a ``Upload-Offset`` header which -indicates a byte offset that the content included in this request starts at and a -``Upload-Incomplete`` header set to ``1``. For the first chunk, the ``Upload-Offset`` -header **MUST** be set to ``0``. +For chunked uploads, the ``Content-Length`` is equal to the size in bytes of the chunk that is +currently being sent. The client **MUST** include a ``Upload-Offset`` header which indicates the +byte offset that the content included in this chunk's request starts at and an ``Upload-Incomplete`` +header with the value ``1``. For the first chunk, the ``Upload-Offset`` header **MUST** be set to +``0``. As with single-request uploads, the ``Content-Type`` header is ``application/octet-stream`` +and the body is the raw, unencoded bytes of the chunk. -For example, if uploading a 100,000 byte file in 1000 byte chunks,the first chunk's +For example, if uploading a 100,000 byte file in 1000 byte chunks, the first chunk's headers would be: .. code-block:: email Content-Length: 1000 + Content-Type: application/octet-stream Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: Upload-Offset: 0 Upload-Incomplete: 1 -And the second chunk represents bytes 1000 through 1999 would include the following -headers: +For the second chunk representing bytes 1000 through 1999, include the following headers: .. code-block:: email Content-Length: 1000 + Content-Type: application/octet-stream Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: Upload-Offset: 1000 Upload-Incomplete: 1 .. _complete-the-upload: -The final chunk of data **MUST** omit the ``Upload-Incomplete`` header, since at that -point the upload is complete. +The final chunk of data **MUST** omit the ``Upload-Incomplete`` header, since at that point the +entire file has been uploaded. -For each successful chunk, the server **MUST** respond with a ``202 Accepted`` -header, except for the final chunk, which **MUST** be a ``201 Created``, and as with -non-chunked uploads, the body has not content. +For each successful chunk, the server **MUST** respond with a ``202 Accepted`` header, except for +the final chunk, which **MUST** be a ``201 Created``, and as with non-chunked uploads, the body of +these responses has no content. -With both chunked and non-chunked uploads, once completed successfully, the file **MUST** -not be publicly visible in the repository, but merely staged until the upload session is -:ref:`completed `. +With both chunked and non-chunked uploads, once completed successfully, the file **MUST** not be +publicly visible in the repository, but merely staged until the upload session is :ref:`completed +`. The file **MUST** be visible at the ``stage`` :ref:`URL ` but +only if the server supports :ref:`previews `. Partially uploaded chunked files +**SHOULD NOT** be visible at the ``stage`` URL. -The following constraints are placed on uploads regardless of whether they are -single chunk or multiple chunks: +The following constraints are placed on uploads regardless of whether they are single chunk or +multiple chunks: -- A client **MUST NOT** perform multiple ``POST`` requests in parallel for the - same file to avoid race conditions and data loss or corruption. The server - **MAY** terminate any ongoing ``POST`` request that utilizes the same - ``Upload-Token``. -- If the offset provided in ``Upload-Offset`` is not ``0`` or the next chunk - in an incomplete upload, then the server **MUST** respond with a ``409 Conflict``. This - means that a client **MAY NOT** upload chunks out of order. -- Once an upload has started with a specific token, you may not use another token - for that file without deleting the in-progress upload. -- Once a file upload has completed successfully, you may initiate another upload for - that file, and doing so will replace that file. This is possible until the entire - session is completed, at which point no further file uploads (either creating or - replacing a session file) is accepted. +- A client **MUST NOT** perform multiple ``POST`` requests in parallel for the same file to avoid + race conditions and data loss or corruption. The server **MAY** terminate any ongoing ``POST`` + request that utilizes the same ``Upload-Token`` for chunks of a different file. +- If the offset provided in ``Upload-Offset`` is not ``0`` or correctly specifies the byte offset of + the next chunk in an incomplete upload, then the server **MUST** respond with a ``409 Conflict``. + This means that a client **MAY NOT** upload chunks out of order. + +- Once an upload has started with a specific token, you may not use another token for that file + without deleting the in-progress upload. + +- Once a file upload has completed successfully, you may initiate another upload for that file, + which **once completed**, will replace that file. This is possible until the entire session is + completed, at which point no further file uploads (either creating or replacing a session file) + are accepted. I.e. once a session is published, the files included in that release are immutable + [#fn-immutable]_. -Resume Upload -+++++++++++++ -To resume an upload, you first have to know how much of the data the server has -already received, regardless of whether you were originally uploading the file as -a single chunk, or in multiple chunks. +Resume an Upload +++++++++++++++++ -To get the status of an individual upload, a client can make a ``HEAD`` request -with their existing ``Upload-Token`` to the same URL they were uploading to. +To resume an upload, you first have to know how much of the file's contents the server has already +received. If this is not already known, a client can make a ``HEAD`` request with their existing +``Upload-Token`` to the same URL they were uploading the file to. -The server **MUST** respond back with a ``204 No Content`` response, with an -``Upload-Offset`` header that indicates what offset the client should continue -uploading from. If the server has not received any data, then this would be ``0``, -if it has received 1007 bytes then it would be ``1007``. +The server **MUST** respond back with a ``204 No Content`` response, with an ``Upload-Offset`` +header that indicates what offset the client should continue uploading from. If the server has not +received any data, then this would be ``0``, if it has received 1007 bytes then it would be +``1007``. + +Once the client has retrieved the offset that they need to start from, they can upload the rest of +the file as described above, either in a single request containing all of the remaining bytes, or in +multiple chunks as per the above protocol. -Once the client has retrieved the offset that they need to start from, they can -upload the rest of the file as described above, either in a single request -containing all of the remaining data or in multiple chunks. .. _cancel-an-upload: Canceling an In-Progress Upload +++++++++++++++++++++++++++++++ -If a client wishes to cancel an upload of a specific file, for instance because -they need to upload a different file, they may do so by issuing a ``DELETE`` -request to the file upload URL with the ``Upload-Token`` used to upload the -file in the first place. +If a client wishes to cancel an upload of a specific file, for instance because they need to upload +a different file, they may do so by issuing a ``DELETE`` request to the file upload URL with the +``Upload-Token`` used to upload the file in the first place. A successful cancellation request **MUST** response with a ``204 No Content``. @@ -530,23 +555,25 @@ A successful cancellation request **MUST** response with a ``204 No Content``. Delete a Partial or Fully Uploaded File +++++++++++++++++++++++++++++++++++++++ -Already uploaded files may be deleted by issuing a ``DELETE`` request to the file -upload URL without the ``Upload-Token``. +For files which have already been completely uploaded, clients can delete the file by issuing a +``DELETE`` request to the file upload URL without the ``Upload-Token``. A successful deletion request **MUST** response with a ``204 No Content``. + Replacing a Partially or Fully Uploaded File ++++++++++++++++++++++++++++++++++++++++++++ -To replace a session file, the file upload **MUST** have been previously completed or -deleted. It is not possible to replace a session file if the upload for that file is -incomplete. Clients have two options to replace an incomplete upload: +To replace a session file, the file upload **MUST** have been previously completed or deleted. It +is not possible to replace a file if the upload for that file is incomplete. Clients have two +options to replace an incomplete upload: -- :ref:`Cancel the in-progress upload ` by issuing a ``DELETE`` of that - specific file. After this, the new file upload can be initiated. -- :ref:`Complete the in-progress upload ` by uploading a zero-length - chunk omitting the ``Upload-Incomplete`` header. This effectively truncates and - completes the in-progress upload, after which point the new upload can commence. +- :ref:`Cancel the in-progress upload ` by issuing a ``DELETE`` of that specific + file. After this, the new file upload can be initiated. + +- :ref:`Complete the in-progress upload ` by uploading a zero-length chunk + omitting the ``Upload-Incomplete`` header. This effectively truncates and completes the + in-progress upload, after which point the new upload can commence. .. _session-status: @@ -554,14 +581,45 @@ incomplete. Clients have two options to replace an incomplete upload: Session Status ~~~~~~~~~~~~~~ -Similarly to file upload, the session URL is provided in the response to -creating the upload session, and clients **MUST NOT** assume that there is any -commonality to what those URLs look like from one session to the next. +At any time, a client can query the status of the session by issuing a ``GET`` request to the +``status`` URL from the :ref:`initial session response body `. As with other +session requests, clients **MUST NOT** assume that there is any commonality to what :ref:`session +URLs ` look like from one session to the next. + +The server will respond to this ``GET`` request with the same :ref:`response ` +that they got when they initially created the upload session, except with any changes to ``status``, +``valid-for``, or ``files`` reflected. + + +.. _session-extension: + +Session Extension +~~~~~~~~~~~~~~~~~ + +Servers **MAY** allow clients to extend sessions, but the overall lifetime and number of extensions +allowed is left to the server. To extend a session, a client issues a ``POST`` request to the +``extend`` :ref:`session link ` given in the :ref:`session creation ` +request. The JSON body of this request looks like: + +.. code-block:: json + + { + "meta": { + "api-version": "2.0" + }, + "extend-for": 3600 + } + +The number of seconds specified is just a suggestion to the server for the number of additional +seconds to extend the current session. For example, if the client wants to extend the current +session for another hour, ``extend-for`` would be ``3600``. Upon successful extension, the server +will respond with the same :ref:`response ` that they got when they initially +created the upload session, except with any changes to ``status``, ``valid-for``, or ``files`` +reflected. -To check the status of a session, clients issue a ``GET`` request to the -session URL, to which the server will respond with the same response that -they got when they initially created the upload session, except with any -changes to ``status``, ``valid-for``, or updated ``files`` reflected. +If the server refuses to extend the session for the requested number of seconds, it still returns a +success response, and the ``valid-for`` key will simply include the number of seconds remaining in +the current session. .. _session-cancellation: @@ -569,60 +627,64 @@ changes to ``status``, ``valid-for``, or updated ``files`` reflected. Session Cancellation ~~~~~~~~~~~~~~~~~~~~ -To cancel an upload session, a client issues a ``DELETE`` request to the same session URL -as before. The server then marks the session as canceled, **MAY** purge any data that was -uploaded as part of that session, and future attempts to access that session URL or any of -the file upload URLs **MAY** return a ``404 Not Found``. +To cancel an entire session, a client issues a ``DELETE`` request to the same session URL as +before. The server then marks the session as canceled, and **SHOULD** purge any data that was +uploaded as part of that session. Future attempts to access that session URL or any of the file +upload URLs **MUST** return a ``404 Not Found``. + +To prevent dangling sessions, servers may also choose to cancel timed-out sessions on their own +accord. It is recommended that servers expunge their sessions after no less than a week, but each +server may choose their own schedule. -To prevent dangling sessions, servers may also choose to cancel timed-out sessions on -their own accord. It is recommended that servers expunge their sessions after no less than -a week, but each server may choose their own schedule. .. _publish-session: Session Completion ~~~~~~~~~~~~~~~~~~ -To complete a session and publish the files that have been included in it, -a client **MUST** send a ``POST`` request to the ``publish`` URL in the -session status payload. +To complete a session and publish the files that have been included in it, a client **MUST** send a +``POST`` request to the ``publish`` URL in the session status payload. The body of the request +contains no content. + +If the server is able to immediately complete the session, it may do so and return a ``201 Created`` +response. If it is unable to immediately complete the session (for instance, if it needs to do +processing that may take longer than reasonable in a single HTTP request), then it may return a +``202 Accepted`` response. -If the server is able to immediately complete the session, it may do so -and return a ``201 Created`` response. If it is unable to immediately -complete the session (for instance, if it needs to do processing that may -take longer than reasonable in a single HTTP request), then it may return -a ``202 Accepted`` response. +In either case, the server should include a ``Location`` header pointing back to the session status +URL, and if the server returned a ``202 Accepted``, the client may poll that URL to watch for the +status to change. -In either case, the server should include a ``Location`` header pointing -back to the session status url, and if the server returned a ``202 Accepted``, -the client may poll that URL to watch for the status to change. +If a session is published that has no staged files, the operation is effectively a no-op, except +where a new project name is being reserved. In this case, the new project is created, reserved, and +owned by the user that created the session. -It is an error to publish a session that has no staged files. In this case, a -``400 Bad Request`` is turned and the session is canceled, just as if an -explicit :ref:`session cancellation ` was issued. .. _session-token: Session Token ~~~~~~~~~~~~~ -When initiating the staged uploads, clients can provide a ``nonce``, essentially a string -with arbitrary content. The ``nonce`` is optional, and if omitted, is equivalent to -providing an empty string. +When creating a session, clients can provide a ``nonce`` in the :ref:`initial session creation +request ` . This nonce is a string with arbitrary content. The ``nonce`` is +optional, and if omitted, is equivalent to providing an empty string. -In order to support previewing of staged uploads, the package ``name`` and ``version``, -along with this ``nonce`` are used as input into a hashing algorithm to produce a unique -"session token". This session token is valid for the life of the session (i.e., until it -is completed, either by cancellation or publishing), and can be provided to installer -clients such as ``pip`` to gain access to the staged releases. +In order to support previewing of staged uploads, the package ``name`` and ``version``, along with +this ``nonce`` are used as input into a hashing algorithm to produce a unique "session token". This +session token is valid for the life of the session (i.e., until it is completed, either by +cancellation or publishing), and can be provided to supporting installers to gain access to the +staged release. -The use of the ``nonce`` allows clients to decide whether they want to obscure the -visibility of their staged releases or not, and there can be good reasons for either -choice. +The use of the ``nonce`` allows clients to decide whether they want to obscure the visibility of +their staged releases or not, and there can be good reasons for either choice. For example, if a CI +system wants to upload some wheels for a new release, and wants to allow independent validation of a +stage before it's published, the client may opt for not including a nonce. On the other hand, if a +client would like to pre-seed a release which it publishes atomically at the time of a public +announcement, that client will likely opt for providing a nonce. -The `SHA256 algorithm `_ is -used to turn these inputs into a unique token, in the order ``name``, ``version``, -``nonce``, using the following Python code as an example: +The `SHA256 algorithm `_ is used to +turn these inputs into a unique token, in the order ``name``, ``version``, ``nonce``, using the +following Python code as an example: .. code-block:: python @@ -635,44 +697,44 @@ used to turn these inputs into a unique token, in the order ``name``, ``version` h.update(nonce) return h.hexdigest() -It should be evident that if no ``nonce`` is provided in the session initiation request, -then the preview token is easily guessable from the package name and version number alone. -Clients can elect to omit the ``nonce`` (or set it to the empty string themselves) if they -want to allow previewing from anybody without access to the preview token. By providing a -non-empty ``nonce``, clients can elect for security-through-obscurity, but this does not -protect staged files behind any kind of authentication. +It should be evident that if no ``nonce`` is provided in the :ref:`session creation request +`, then the preview token is easily guessable from the package name and version +number alone. Clients can elect to omit the ``nonce`` (or set it to the empty string themselves) if +they want to allow previewing from anybody without access to the preview token. By providing a +non-empty ``nonce``, clients can elect for security-through-obscurity, but this does not protect +staged files behind any kind of authentication. + .. _staged-preview: Stage Previews ~~~~~~~~~~~~~~ -The ability to preview staged releases before they are published is an important feature, -enabling an additional level of last-mile testing before the release is available to the +The ability to preview staged releases before they are published is an important feature of this +PEP, enabling an additional level of last-mile testing before the release is available to the public. Indexes **MAY** provide this functionality in one or both of the following ways. -* Through the URL provided in the ``stage`` subkey of the :ref:`URL - identifiers ` returned when the session is created. The - ``stage`` URL can be passed to installers such as ``pip`` by setting the - `--extra-index-url - `_ - flag to this value. Multiple stages can even be previewed by repeating this - flag with multiple values. +* Through the URL provided in the ``stage`` sub-key of the :ref:`links key ` + returned when the session is created. The ``stage`` URL can be passed to installers such as + ``pip`` by setting the `--extra-index-url + `__ flag to this value. + Multiple stages can even be previewed by repeating this flag with multiple values. * By passing the ``Stage-Token`` header to the `Simple Repository API - `_ - requests or the :pep:`691` JSON-based Simple API, with the value from the - ``preview-token`` subkey of the JSON response to the session creation - request. Multiple ``Stage-Token`` headers are allowed. It is recommended - that installers add a ``--staged `` or similarly named option to set - the ``Stage-Token`` header at the command line. - -In both cases, the index will return views that expose the staged releases to the -installer tool, making them available to download and install into a virtual environment -built for that last-mile testing. The former option allows for existing installers to -preview staged releases with no changes, although perhaps in a less user-friendly way. -The latter option can be a better user experience, but the details of this are left to -installer tool maintainers to decide. + `_ requests or the + :pep:`691` JSON-based Simple API, with the value from the ``session-token`` sub-key of the JSON + response to the session creation request. Multiple ``Stage-Token`` headers are allowed. It is + recommended that installers add a ``--staged `` or similarly named option to set the + ``Stage-Token`` header at the command line. + +In both cases, the index will return views that expose the staged releases to the installer tool, +making them available to download and install into virtual environments built for that last-mile +testing. The former option allows for existing installers to preview staged releases with no +changes, although perhaps in a less user-friendly way. The latter option can be a better user +experience, but the details of this are left to installer tool maintainers. + +**XXX verify Stage-Token exists - I think it doesn't** + .. _session-errors: @@ -703,21 +765,21 @@ Besides the standard ``meta`` key, this has the following top level keys: request. ``errors`` - An array of specific errors, each of which contains a ``source`` key, which is a - string that indicates what the source of the error is, and a ``message`` key for that - specific error. + An array of specific errors, each of which contains a ``source`` key, which is a string that + indicates what the source of the error is, and a ``message`` key for that specific error. + +The ``message`` and ``source`` strings do not have any specific meaning, and are intended for human +interpretation to aid in diagnosing underlying issue. -The ``message`` and ``source`` strings do not have any specific meaning, and -are intended for human interpretation to aid in diagnosing underlying issue. +**XXX REWRITTEN TO HERE** Content Types ------------- -Like :pep:`691`, this PEP proposes that all requests and responses from the -Upload API will have a standard content type that describes what the content -is, what version of the API it represents, and what serialization format has -been used. +Like :pep:`691`, this PEP proposes that all requests and responses from this upload API will have a +standard content type that describes what the content is, what version of the API it represents, and +what serialization format has been used. The structure of this content type will be: @@ -813,6 +875,18 @@ does not, that doesn't actually affect us. It would just mean that our support for resumable uploads is an application specific protocol, but is still wholly standards compliant. +Can I use the upload 2.0 API to reserve a project name? +------------------------------------------------------- + +Yes! If you're not ready to upload files to make a release, you can still reserve a project +name (assuming of course that the name doesn't already exist). + +To do this, :ref:`create a new session `, then :ref:`publish the session +` without uploading any files. While the ``version`` key is required in the JSON +body of the create session request, you can simply use the placeholder version number ``"0.0.0"``. + +The user that created the session will become the owner of the new project. + Open Questions ============== @@ -917,11 +991,25 @@ you don't have to try and do any sort of protection against parallel uploads, since they're just supported. That alone might erase most of the server side implementation simplification. -Footnotes -========= -.. [#fn1] Specifically any hash algorithm name that `can be passed to - `_ - ``hashlib.new()`` which does not require additional parameters. +.. rubric:: Footnotes + +.. [#fn-action] Obsolete ``:action`` values ``submit``, ``submit_pkg_info``, and ``doc_upload`` are + no longer supported + + +.. [#fn-metadata] This would be fine if used as a pre-check, but the parallel metadata should be + validated against the actual ``METADATA`` or similar files within the + distribution. + +.. [#fn-hash] Specifically any hash algorithm name that `can be passed to + `_ ``hashlib.new()`` and + which does not require additional parameters. + +.. [#fn-chunk-token] Single request uploads **MAY** include the ``Upload-Token`` header, but it is + not required in that case. + +.. [#fn-immutable] Published files may still be `yanked `__ or + `deleted `__ as normal. Copyright From 70b1bf79da9296b23cd92c3ba1c80542032b271e Mon Sep 17 00:00:00 2001 From: Barry Warsaw Date: Wed, 4 Dec 2024 13:49:08 -0800 Subject: [PATCH 5/9] Complete this phase of the rewrite. One more pass to go! --- peps/pep-0694.rst | 291 +++++++++++++++++++++------------------------- 1 file changed, 133 insertions(+), 158 deletions(-) diff --git a/peps/pep-0694.rst b/peps/pep-0694.rst index cd0142b61b8..d356db762f8 100644 --- a/peps/pep-0694.rst +++ b/peps/pep-0694.rst @@ -415,6 +415,8 @@ to. The :ref:`status ` of the session will also include the fil ``files`` mapping, with the above ``Location`` URL included in under the ``link`` sub-key. +.. _upload-contents: + Upload File Contents ++++++++++++++++++++ @@ -772,8 +774,6 @@ The ``message`` and ``source`` strings do not have any specific meaning, and are interpretation to aid in diagnosing underlying issue. -**XXX REWRITTEN TO HERE** - Content Types ------------- @@ -781,48 +781,39 @@ Like :pep:`691`, this PEP proposes that all requests and responses from this upl standard content type that describes what the content is, what version of the API it represents, and what serialization format has been used. -The structure of this content type will be: - -.. code-block:: text - - application/vnd.pypi.upload.$version+format +This standard request content type applies to all requests *except* for :ref:`file upload requests +` which, since they contain only binary data, is ``application/octet-stream``. -Since only major versions should be disruptive to systems attempting to -understand one of these API content bodies, only the major version will be -included in the content type, and will be prefixed with a ``v`` to clarify -that it is a version number. +The structure of the ``Content-Type`` header for all other requests is: -Unlike :pep:`691`, this PEP does not change the existing ``1.0`` API in any -way, so servers will be required to host the new API described in this PEP at -a different endpoint than the existing upload API. - -Thus for the new 2.0 API, the content type would be: +.. code-block:: text -- **JSON:** ``application/vnd.pypi.upload.v2+json`` + application/vnd.pypi.upload.$version+$format -In addition to the above, a special "meta" version is supported named ``latest``, -whose purpose is to allow clients to request the absolute latest version, without -having to know ahead of time what that version is. It is recommended however, -that clients be explicit about what versions they support. +Since minor API version differences should never be disruptive, only the major version is included +in the content type; the version number is prefixed with a ``v``. -These content types **DO NOT** apply to the file uploads themselves, only to the -other API requests/responses in the upload API. The files themselves should use -the ``application/octet-stream`` content type. +Unlike :pep:`691`, this PEP does not change the existing *legacy* `1.0`` upload API in any way, so +servers are required to host the new API described in this PEP at a different endpoint than the +existing upload API. +Since JSON is the only defined request format defined in this PEP, all non-file-upload requests +defined in this PEP **MUST** include a ``Content-Type`` header value of: -Version + Format Selection --------------------------- +- ``application/vnd.pypi.upload.v2+json``. -Again, similar to :pep:`691`, this PEP standardizes on using server-driven -content negotiation to allow clients to request different versions or -serialization formats, which includes the ``format`` URL parameter. +As with :pep:`691`, a special "meta" version is supported named ``latest``, the purpose of which is +to allow clients to request the latest version implemented by the server, without having to know +ahead of time what that version is. It is recommended however, that clients be explicit about what +versions they support. -Since this PEP expects the existing legacy ``1.0`` upload API to exist at a -different endpoint, and it currently only provides for JSON serialization, this -mechanism is not particularly useful, and clients only have a single version and -serialization they can request. However clients **SHOULD** be setup to handle -content negotiation gracefully in the case that additional formats or versions -are added in the future. +Similar to :pep:`691`, this PEP also standardizes on using server-driven content negotiation to +allow clients to request different versions or serialization formats, which includes the ``format`` +part of the content type. However, since this PEP expects the existing legacy ``1.0`` upload API to +exist at a different endpoint, and this PEP currently only provides for JSON serialization, this +mechanism is not particularly useful. Clients only have a single version and serialization they can +request. However clients **SHOULD** be prepared to handle content negotiation gracefully in the case +that additional formats or versions are added in the future. FAQ @@ -831,13 +822,11 @@ FAQ Does this mean PyPI is planning to drop support for the existing upload API? ---------------------------------------------------------------------------- -At this time PyPI does not have any specific plans to drop support for the -existing upload API. +At this time PyPI does not have any specific plans to drop support for the existing upload API. -Unlike with :pep:`691` there are wide benefits to doing so, so it is likely -that we will want to drop support for it at some point in the future, but -until this API is implemented, and receiving broad use it would be premature -to make any plans for actually dropping support for it. +Unlike with :pep:`691` there are significant benefits to doing so, so it is likely that support for +the legacy upload API to be (responsibly) deprecated and removed at some point in the future. Such +future deprecation planning is explicitly out of scope for *this* PEP. Is this Resumable Upload protocol based on anything? @@ -845,35 +834,35 @@ Is this Resumable Upload protocol based on anything? Yes! -It's actually the protocol specified in an -`Active Internet-Draft `_, -where the authors took what they learned implementing `tus `_ -to provide the idea of resumable uploads in a wholly generic, standards based -way. - -The only deviation we've made from that spec is that we don't use the -``104 Upload Resumption Supported`` informational response in the first -``POST`` request. This decision was made for a few reasons: - -- The ``104 Upload Resumption Supported`` is the only part of that draft - which does not rely entirely on things that are already supported in the - existing standards, since it was adding a new informational status. -- Many clients and web frameworks don't support ``1xx`` informational - responses in a very good way, if at all, adding it would complicate - implementation for very little benefit. -- The purpose of the ``104 Upload Resumption Supported`` support is to allow - clients to determine that an arbitrary endpoint that they're interacting - with supports resumable uploads. Since this PEP is mandating support for - that in servers, clients can just assume that the server they are +It's actually the protocol specified in an `Active Internet-Draft `_, where the authors +took what they learned implementing `tus `_ to provide the idea of resumable +uploads in a wholly generic, standards based way. + +.. _ietf-draft: https://datatracker.ietf.org/doc/draft-ietf-httpbis-resumable-upload/ + +The only deviation we've made from that spec is that we don't use the ``104 Upload Resumption +Supported`` informational response in the first ``POST`` request. This decision was made for a few +reasons: + +- The ``104 Upload Resumption Supported`` is the only part of that draft which does not rely + entirely on things that are already supported in the existing standards, since it was adding a new + informational status. + +- Many clients and web frameworks don't support ``1xx`` informational responses in a very good way, + if at all, adding it would complicate implementation for very little benefit. + +- The purpose of the ``104 Upload Resumption Supported`` support is to allow clients to determine + that an arbitrary endpoint that they're interacting with supports resumable uploads. Since this + PEP is mandating support for that in servers, clients can just assume that the server they are interacting with supports it, which makes using it unneeded. -- In theory, if the support for ``1xx`` responses got resolved and the draft - gets accepted with it in, we can add that in at a later date without - changing the overall flow of the API. -There is a risk that the above draft doesn't get accepted, but even if it -does not, that doesn't actually affect us. It would just mean that our -support for resumable uploads is an application specific protocol, but is -still wholly standards compliant. +- In theory, if the support for ``1xx`` responses got resolved and the draft gets accepted with it + in, we can add that in at a later date without changing the overall flow of the API. + +There is a risk that the above draft doesn't get accepted, but even if it does not, that doesn't +actually affect us. It would just mean that our support for resumable uploads is an application +specific protocol, but is still wholly standards compliant. + Can I use the upload 2.0 API to reserve a project name? ------------------------------------------------------- @@ -891,105 +880,91 @@ The user that created the session will become the owner of the new project. Open Questions ============== - Multipart Uploads vs tus ------------------------ -This PEP currently bases the actual uploading of files on an internet draft -from ``tus.io`` that supports resumable file uploads. +This PEP currently bases the actual uploading of files on an internet draft from ``tus.io`` that +supports resumable file uploads. That protocol requires a few things: -- That the client selects a secure ``Upload-Token`` that they use to identify - uploading a single file. -- That if clients don't upload the entire file in one shot, that they have - to submit the chunks serially, and in the correct order, with all but the - final chunk having a ``Upload-Incomplete: 1`` header. -- Resumption of an upload is essentially just querying the server to see how - much data they've gotten, then sending the remaining bytes (either as a single - request, or in chunks). -- The upload implicitly is completed when the server successfully gets all of - the data from the client. - -This has one big benefit, that if a client doesn't care about resuming their -download, the work to support, from a client side, resumable uploads is able -to be completely ignored. They can just ``POST`` the file to the URL, and if -it doesn't succeed, they can just ``POST`` the whole file again. - -The other benefit is that even if you do want to support resumption, you can -still just ``POST`` the file, and unless you *need* to resume the download, -that's all you have to do. - -Another, possibly theoretical benefit is that for hashing the uploaded files, -the serial chunks requirement means that the server can maintain hashing state -between requests, update it for each request, then write that file back to -storage. Unfortunately this isn't actually possible to do with Python's hashlib, -though there are some libraries like `Rehash `_ -that implement it, but they don't support every hash that hashlib does -(specifically not blake2 or sha3 at the time of writing). - -We might also need to reconstitute the download for processing anyways to do -things like extract metadata, etc from it, which would make it a moot point. - -The downside is that there is no ability to parallelize the upload of a single -file because each chunk has to be submitted serially. - -AWS S3 has a similar API (and most blob stores have copied it either wholesale -or something like it) which they call multipart uploading. +- That the client selects a secure ``Upload-Token`` that they use to identify uploading a single + file. + +- That if clients don't upload the entire file in one shot, that they have to submit the chunks + serially, and in the correct order, with all but the final chunk having a ``Upload-Incomplete: 1`` + header. + +- Resumption of an upload is essentially just querying the server to see how much data they've + gotten, then sending the remaining bytes (either as a single request, or in chunks). + +- The upload implicitly is completed when the server successfully gets all of the data from the + client. + +This has the benefit that if a client doesn't care about resuming their download, it can essentially +ignore the protocol. Clients can just ``POST`` the file to the file upload URL, and if it doesn't +succeed, they can just ``POST`` the whole file again. + +The other benefit is that even if clients do want to support resumption, unless they *need* to +resume the download, they can still just ``POST`` the file. + +Another, possibly theoretical benefit is that for hashing the uploaded files, the serial chunks +requirement means that the server can maintain hashing state between requests, update it for each +request, then write that file back to storage. Unfortunately this isn't actually possible to do with +Python's `hashlib `__ standard library module. +There are some libraries third party libraries, such as `Rehash +`__ that do implement the necessary APIs, but they don't +support every hash that ``hashlib`` does (e.g. ``blake2`` or ``sha3`` at the time of writing). + +We might also need to reconstitute the download for processing anyways to do things like extract +metadata, etc from it, which would make it a moot point. + +The downside is that there is no ability to parallelize the upload of a single file because each +chunk has to be submitted serially. + +AWS S3 has a similar API, and most blob stores have copied it either wholesale or something like it +which they call multipart uploading. The basic flow for a multipart upload is: -1. Initiate a Multipart Upload to get an Upload ID. -2. Break your file up into chunks, and upload each one of them individually. -3. Once all chunks have been uploaded, finalize the upload. - - This is the step where any errors would occur. - -It does not directly support resuming an upload, but it allows clients to -control the "blast radius" of failure by adjusting the size of each part -they upload, and if any of the parts fail, they only have to resend those -specific parts. - -This has a big benefit in that it allows parallelization in uploading files, -allowing clients to maximize their bandwidth using multiple threads to send -the data. - -We wouldn't need an explicit step (1), because our session would implicitly -initiate a multipart upload for each file. - -It does have its own downsides: - -- Clients have to do more work on every request to have something resembling - resumable uploads. They would *have* to break the file up into multiple parts - rather than just making a single POST request, and only needing to deal - with the complexity if something fails. - -- Clients that don't care about resumption at all still have to deal with - the third explicit step, though they could just upload the file all as a - single part. - - - S3 works around this by having another API for one shot uploads, but - I'd rather not have two different APIs for uploading the same file. - -- Verifying hashes gets somewhat more complicated. AWS implements hashing - multipart uploads by hashing each part, then the overall hash is just a - hash of those hashes, not of the content itself. We need to know the - actual hash of the file itself for PyPI, so we would have to reconstitute - the file and read its content and hash it once it's been fully uploaded, - though we could still use the hash of hashes trick for checksumming the - upload itself. - - - See above about whether this is actually a downside in practice, or - if it's just in theory. - -I lean towards the ``tus`` style resumable uploads as I think they're simpler -to use and to implement, and the main downside is that we possibly leave -some multi-threaded performance on the table, which I think that I'm -personally fine with? - -I guess one additional benefit of the S3 style multi part uploads is that -you don't have to try and do any sort of protection against parallel uploads, -since they're just supported. That alone might erase most of the server side -implementation simplification. +#. Initiate a multipart upload to get an upload ID. +#. Break your file up into chunks, and upload each one of them individually. +#. Once all chunks have been uploaded, finalize the upload. This is the step where any errors would + occur. + +Such multipart uploads do not directly support resuming an upload, but it allows clients to control +the "blast radius" of failure by adjusting the size of each part they upload, and if any of the +parts fail, they only have to resend those specific parts. The trade-off is that it allows for more +parallelism when uploading a single file, allowing clients to maximize their bandwidth using +multiple threads to send the file data. + +We wouldn't need an explicit step (1), because our session would implicitly initiate a multipart +upload for each file. + +There are downsides to this though: + +- Clients have to do more work on every request to have something resembling resumable uploads. They + would *have* to break the file up into multiple parts rather than just making a single POST + request, and only needing to deal with the complexity if something fails. + +- Clients that don't care about resumption at all still have to deal with the third explicit step, + though they could just upload the file all as a single part. (S3 works around this by having + another API for one shot uploads, but the PEP authors place a high value on having a single API + for uploading any individual file.) + +- Verifying hashes gets somewhat more complicated. AWS implements hashing multipart uploads by + hashing each part, then the overall hash is just a hash of those hashes, not of the content + itself. Since PyPI needs to know the actual hash of the file itself anyway, we would have to + reconstitute the file, read its content, and hash it once it's been fully uploaded, though it + could still use the hash of hashes trick for checksumming the upload itself. + +The PEP authors lean towards ``tus`` style resumable uploads, due to them being simpler to use, +easier to imp;lement, and more consistent, with the main downside being that multi-threaded +performance is theoretically left on the table. + +One other possible benefit of the S3 style multipart uploads is that you don't have to try and do +any sort of protection against parallel uploads, since they're just supported. That alone might +erase most of the server side implementation simplification. .. rubric:: Footnotes From fc65918fe84702f59e12154239a16834e716e9da Mon Sep 17 00:00:00 2001 From: Barry Warsaw Date: Wed, 4 Dec 2024 16:52:16 -0800 Subject: [PATCH 6/9] Fix some links and lints --- peps/pep-0694.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/peps/pep-0694.rst b/peps/pep-0694.rst index d356db762f8..4f170d63119 100644 --- a/peps/pep-0694.rst +++ b/peps/pep-0694.rst @@ -793,7 +793,7 @@ The structure of the ``Content-Type`` header for all other requests is: Since minor API version differences should never be disruptive, only the major version is included in the content type; the version number is prefixed with a ``v``. -Unlike :pep:`691`, this PEP does not change the existing *legacy* `1.0`` upload API in any way, so +Unlike :pep:`691`, this PEP does not change the existing *legacy* ``1.0`` upload API in any way, so servers are required to host the new API described in this PEP at a different endpoint than the existing upload API. @@ -834,7 +834,7 @@ Is this Resumable Upload protocol based on anything? Yes! -It's actually the protocol specified in an `Active Internet-Draft `_, where the authors +It's actually the protocol specified in an `active internet draft `_, where the authors took what they learned implementing `tus `_ to provide the idea of resumable uploads in a wholly generic, standards based way. @@ -883,8 +883,8 @@ Open Questions Multipart Uploads vs tus ------------------------ -This PEP currently bases the actual uploading of files on an internet draft from ``tus.io`` that -supports resumable file uploads. +This PEP currently bases the actual uploading of files on an `internet draft `_ +(originally designed by `tus.io `__) that supports resumable file uploads. That protocol requires a few things: @@ -983,8 +983,8 @@ erase most of the server side implementation simplification. .. [#fn-chunk-token] Single request uploads **MAY** include the ``Upload-Token`` header, but it is not required in that case. -.. [#fn-immutable] Published files may still be `yanked `__ or - `deleted `__ as normal. +.. [#fn-immutable] Published files may still be yanked (i.e. :pep:`592`) or `deleted + `__ as normal. Copyright From 0e366536610b8fbd61e16dd300b7691473d0ca2b Mon Sep 17 00:00:00 2001 From: Barry Warsaw Date: Mon, 9 Dec 2024 10:34:03 -0800 Subject: [PATCH 7/9] Minor updates --- peps/pep-0694.rst | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/peps/pep-0694.rst b/peps/pep-0694.rst index 4f170d63119..c80ca20a125 100644 --- a/peps/pep-0694.rst +++ b/peps/pep-0694.rst @@ -137,8 +137,12 @@ methods. Upload 2.0 API Specification ============================ -This PEP traces the root cause of most of the issues with the existing API to be -roughly two things: +This PEP draws inspiration from the `Resumable Uploads for HTTP `_ internet draft, +however it has some significant differences. This is largely due to the unique nature of Python +package releases (i.e. metadata, multiple related artifacts, etc.), and the support for an upload +session and release stages. Where it makes sense to adopt details of the draft, this PEP does so. + +This PEP traces the root cause of most of the issues with the existing API to be roughly two things: - The metadata is submitted alongside the file, rather than being parsed from the file itself. [#fn-metadata]_ @@ -182,8 +186,8 @@ being tested, and will be blessed as permanent after sufficient testing with liv Create an Upload Session ~~~~~~~~~~~~~~~~~~~~~~~~ -To create a new upload session, submit a ``POST`` request to ``/`` (i.e. the root URL), with a -payload that looks like: +To create a new upload session, submit a ``POST`` request to the root URL, with a payload that looks +like: .. code-block:: json @@ -230,7 +234,7 @@ requests. .. _session-response: -Response body +Response Body +++++++++++++ The successful response includes the following JSON content: @@ -402,8 +406,8 @@ Besides the standard ``meta`` key, the request JSON has the following additional If given, this is a string value containing the file's `core metadata `_. -Servers **MAY** use the data provided in this request to do some sanity checking prior to -allowing the file to be uploaded. These checks may include, but are not limited to: +Servers **MAY** use the data provided in this request to do some sanity checking prior to allowing +the file to be uploaded. These checks may include, but are not limited to: - Checking if the ``filename`` already exists in a published release - Checking if the ``size`` would exceed any project or file quota @@ -838,7 +842,7 @@ It's actually the protocol specified in an `active internet draft `_ took what they learned implementing `tus `_ to provide the idea of resumable uploads in a wholly generic, standards based way. -.. _ietf-draft: https://datatracker.ietf.org/doc/draft-ietf-httpbis-resumable-upload/ +.. _ietf-draft: https://www.ietf.org/archive/id/draft-ietf-httpbis-resumable-upload-05.html The only deviation we've made from that spec is that we don't use the ``104 Upload Resumption Supported`` informational response in the first ``POST`` request. This decision was made for a few From 8e8ebd89f8591879199f6b60a79dc3f88dbfe9e6 Mon Sep 17 00:00:00 2001 From: Barry Warsaw Date: Thu, 12 Dec 2024 18:03:47 -0800 Subject: [PATCH 8/9] Last round of updates --- peps/pep-0694.rst | 375 ++++++++++++++++++++++++++-------------------- 1 file changed, 211 insertions(+), 164 deletions(-) diff --git a/peps/pep-0694.rst b/peps/pep-0694.rst index c80ca20a125..6dc0ccacc1b 100644 --- a/peps/pep-0694.rst +++ b/peps/pep-0694.rst @@ -16,16 +16,16 @@ Abstract This PEP proposes a standard API for uploading files to a Python package index such as PyPI. Along with standardization, the upload API provides additional useful features such as support for: +* an upload session, which can be used to simultaneously publish all wheels in a package release; + * "staging" a release, which can be used to test uploads before publicly publishing them, without the - need for `test.pypi.org `__. The stage can also be used to simultaneously - and atomically publish all the wheels in a package release. + need for `test.pypi.org `__; -* artifacts which can be overwritten and replaced, until a stage is published. +* artifacts which can be overwritten and replaced, until a session is published; -* asynchronous and "chunked" uploads, for more efficient use of network bandwidth. Chunked uploads - also enable resumable uploads of individual artifacts. +* asynchronous and "chunked", resumable file uploads, for more efficient use of network bandwidth; -* detailed status on the state of artifact uploads. +* detailed status on the state of artifact uploads; * new project creation without requiring the uploading of an artifact. @@ -37,7 +37,7 @@ Rationale ========= There is currently no standardized API for uploading files to a Python package index such as -PyPI. Instead, everyone has been forced to reverse engineer the non-standard, `"legacy" +PyPI. Instead, everyone has been forced to reverse engineer the existing `"legacy" `__ API. The legacy API, while functional, leaks implementation details of the original PyPI code base, @@ -49,15 +49,15 @@ In addition, there are a number of major issues with the legacy API: while the index processes the uploaded file to determine success or failure. * It does not support any mechanism for resuming an upload. With the largest default file size on - PyPI being just under 1GB in size, requiring the entire upload to complete successfully means + PyPI being around 1GB in size, requiring the entire upload to complete successfully means bandwidth is wasted when such uploads experience a network interruption while the request is in progress. * The atomic unit of operation is a single file. This is problematic when a release logically - includes multiple binary wheels, leading to race conditions where consumers get different versions - of the package if they are unlucky enough to require a package before their platform's wheel has - completely uploaded. If the release uploads an sdist first, this may also manifest in some - consumers seeing only the sdist, triggering a local build from source. + includes an sdist and multiple binary wheels, leading to race conditions where consumers get + different versions of the package if they are unlucky enough to require a package before their + platform's wheel has completely uploaded. If the release uploads its sdist first, this may also + manifest in some consumers seeing only the sdist, triggering a local build from source. * Status reporting is very limited. There's no support for reporting multiple errors, warnings, deprecations, etc. Status is limited to the HTTP status code and reason phrase, of which the @@ -85,7 +85,7 @@ Legacy API ========== The following is an overview of the legacy API. For the detailed description, consult the -`PyPI user guide documentation `__ +`PyPI user guide documentation `__. Endpoint @@ -133,12 +133,13 @@ Upload authentication is also not standardized. On PyPI, authentication is throu `__. Other indexes may support different authentication methods. +.. _spec: Upload 2.0 API Specification ============================ This PEP draws inspiration from the `Resumable Uploads for HTTP `_ internet draft, -however it has some significant differences. This is largely due to the unique nature of Python +however there are significant differences. This is largely due to the unique nature of Python package releases (i.e. metadata, multiple related artifacts, etc.), and the support for an upload session and release stages. Where it makes sense to adopt details of the draft, this PEP does so. @@ -163,8 +164,8 @@ Versioning ---------- This PEP uses the same ``MAJOR.MINOR`` versioning system as used in :pep:`691`, but it is otherwise -independently versioned. The legacy API is considered by this spec to be version ``1.0``, but does -not modify that API in any way. +independently versioned. The legacy API is considered by this PEP to be version ``1.0``, but this +PEP does not modify the legacy API in any way. The API proposed in this PEP therefor has the version number ``2.0``. @@ -172,11 +173,11 @@ The API proposed in this PEP therefor has the version number ``2.0``. Root Endpoint ------------- -All URLs described here will be relative to the root endpoint, which may be located anywhere within +All URLs described here are relative to the "root endpoint", which may be located anywhere within the url structure of a domain. For example, the root endpoint could be ``https://upload.example.com/``, or ``https://example.com/upload/``. -Specifically for PyPI, this PEP proposes to implement the root endpoint URL to be +Specifically for PyPI, this PEP proposes to implement the root endpoint at ``https://upload.pypi.org/2.0``. This root URL will be considered provisional while the feature is being tested, and will be blessed as permanent after sufficient testing with live projects. @@ -186,8 +187,8 @@ being tested, and will be blessed as permanent after sufficient testing with liv Create an Upload Session ~~~~~~~~~~~~~~~~~~~~~~~~ -To create a new upload session, submit a ``POST`` request to the root URL, with a payload that looks -like: +A release starts by creating a new upload session. To create the session, a client submits a ``POST`` request +to the root URL, with a payload that looks like: .. code-block:: json @@ -208,7 +209,7 @@ The request includes the following top-level keys: ``api-version`` the value of which must be the string ``"2.0"``. ``name`` (**required**) - The name of the project that this session is attempting to add files to. + The name of the project that this session is attempting to release a new version of. ``version`` (**required**) The version of the project that this session is attempting to add files to. @@ -232,6 +233,7 @@ The session is owned by the user that created it, and all subsequent requests ** with the same credentials, otherwise a ``403 Forbidden`` will be returned on those subsequent requests. + .. _session-response: Response Body @@ -248,10 +250,7 @@ The successful response includes the following JSON content: "links": { "stage": "...", "upload": "...", - "status": "xxx-remove-me", - "extend": "...", - "cancel": "..." - "publish": "...", + "session": "...", }, "session-token": "", "valid-for": 604800, @@ -283,7 +282,7 @@ the following keys: :ref:`extended `. The session **SHOULD** live at least this much longer unless the client itself has canceled or published the session. Servers **MAY** choose to *increase* this time, but should never *decrease* it, except naturally through the passage of - time. Clients can query the `session status ` to get time remaining in the + time. Clients can query the :ref:`session status ` to get time remaining in the session. ``status`` @@ -307,27 +306,19 @@ Session Links For the ``links`` key in the success JSON, the following sub-keys are valid: ``upload`` - The endpoint for this session clients will use to initiate :ref:`uploads ` for - each file to be included in this session. + The endpoint session clients will use to initiate :ref:`uploads ` for each file to + be included in this session. ``stage`` The endpoint where this staged release can be :ref:`previewed ` prior to publishing the session. This can be used to download and verify the not-yet-public files. If the index does not support previewing staged releases, this key **MUST** be omitted. -``publish`` - The endpoint which triggers :ref:`publishing this session `. - -``status`` - The endpoint that can be used to query the :ref:`current status ` of this - session. - -``extend`` - The endpoint that can be used to :ref:`extend ` the current session, *if* the - server supports it. If the server does not support session extension, this key **MUST** be omitted. - -``cancel`` - The endpoint that can be used to :ref:`cancel and discard the session `. +``session`` + The endpoint where actions for this session can be performed, including :ref:`publishing this + session `, :ref:`canceling and discarding the session `, + :ref:`querying the current session status `, and :ref:`requesting an extension + of the session lifetime ` (*if* the server supports it). .. _session-files: @@ -363,12 +354,12 @@ File Upload ~~~~~~~~~~~ After creating the session, the ``upload`` endpoint from the response's :ref:`session links -` mapping is used to upload new files into that session. Clients **MUST** use the -provided ``upload`` URL and **MUST NOT** assume there is any pattern or commonality to those URLs -from one session to the next. +` mapping is used to begin the upload of new files into that session. Clients +**MUST** use the provided ``upload`` URL and **MUST NOT** assume there is any pattern or commonality +to those URLs from one session to the next. -To initiate a file upload, a client sends a ``POST`` request to the ``upload`` URL. The request -body has the following JSON format: +To initiate a file upload, a client first sends a ``POST`` request to the ``upload`` URL. The +request body has the following JSON format: .. code-block:: json @@ -409,14 +400,23 @@ Besides the standard ``meta`` key, the request JSON has the following additional Servers **MAY** use the data provided in this request to do some sanity checking prior to allowing the file to be uploaded. These checks may include, but are not limited to: -- Checking if the ``filename`` already exists in a published release -- Checking if the ``size`` would exceed any project or file quota -- Checking if the contents of the ``metadata``, if provided, are valid +- checking if the ``filename`` already exists in a published release; + +- checking if the ``size`` would exceed any project or file quota; + +- checking if the contents of the ``metadata``, if provided, are valid. If the server determines that upload should proceed, it will return a ``201 Created`` response, with -an empty body, and a ``Location`` header pointing to the URL that the file itself should be uploaded -to. The :ref:`status ` of the session will also include the filename in the -``files`` mapping, with the above ``Location`` URL included in under the ``link`` sub-key. +an empty body, and a ``Location`` header pointing to the URL that the file content should be +uploaded to. The :ref:`status ` of the session will also include the filename in +the ``files`` mapping, with the above ``Location`` URL included in under the ``link`` sub-key. + +.. IMPORTANT:: + + The `IETF draft `_ calls this the URL of the `upload resource + `_, and this PEP uses that nomenclature as well. + +.. _ietf-upload-resource: https://www.ietf.org/archive/id/draft-ietf-httpbis-resumable-upload-05.html#name-upload-creation-2 .. _upload-contents: @@ -424,64 +424,90 @@ to. The :ref:`status ` of the session will also include the fil Upload File Contents ++++++++++++++++++++ -The actual file contents are uploaded by issuing a ``POST`` request to this URL location. The -client may either upload the entire file in a single request, or it may opt for "chunked" upload -where the file contents are split into multiple requests, as described below. +The actual file contents are uploaded by issuing a ``POST`` request to the upload resource URL +[#fn-location]_. The client may either upload the entire file in a single request, or it may opt +for "chunked" upload where the file contents are split into multiple requests, as described below. -In either case, the request **MUST** include both a ``Content-Length`` and a ``Content-Type`` -header. The ``Content-Type`` header **MUST** be ``application/octet-stream``. The body of the -request as unencoded raw binary data. +.. IMPORTANT:: -For all-in-one requests, where the entire file contents is uploaded in a single request, the -``Content-Length`` size is the size of the entire file in bytes, and this **MUST** match the size -given in the original session creation request. If this single-request upload fails, the entire -file must be resent in another single HTTP request. This is the recommended, preferred format for -file uploads since fewer requests are required. + The protocol defined in this PEP differs from the `IETF draft `_ in a few ways: -As an example, if uploading a 100,000 byte file, you would send headers like:: + * For chunked uploads, the `second and subsequent chunks `_ are uploaded + using a ``POST`` request instead of ``PATCH`` requests. Similarly, this PEP uses + ``application/octet-stream`` for the ``Content-Type`` headers for all chunks. - Content-Length: 100000 - Content-Type: application/octet-stream - Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: + * No ``Upload-Draft-Interop-Version`` header is required. + + * Some of the server responses are different. + +.. _ietf-upload-append: https://www.ietf.org/archive/id/draft-ietf-httpbis-resumable-upload-05.html#name-upload-append-2 + + +When uploading the entire file in a single request, the request **MUST** include the following +headers (e.g. for a 100,000 byte file): + +.. code-block:: email + + Content-Length: 100000 + Content-Type: application/octet-stream + Upload-Length: 100000 + Upload-Complete: ?1 + +The body of this request contains all 100,000 bytes of the unencoded raw binary data. + +``Content-Length`` + The number of file bytes contained in the body of *this* request. + +``Content-Type`` + **MUST** be ``application/octet-stream``. + +``Upload-Length`` + Indicates the total number of bytes that will be uploaded for this file. For single-request + uploads this will always be equal to ``Content-Length``, but these values will likely differ for + chunked uploads. This value **MUST** equal the number of bytes given in the ``size`` field of + the file upload initiation request. + +``Upload-Complete`` + A flag indicating whether more chunks are coming for this file. For single-request uploads, the + value of this header **MUST** be ``?1``. If the upload completes successfully, the server **MUST** respond with a ``201 Created`` status. The response body has no content. -However for large files, uploading the file in a single request may result in timeouts, so clients -can opt to upload the file in multiple chunks. For chunked uploads, the client **MUST** -[#fn-chunk-token]_ generate a unique token which is provided in each request for this file upload. -This token is a binary blob, `base64 `__ encoded, -bracketed by the ``:`` (colon) character, and included in the ``Upload-Token`` header. Clients -**SHOULD** use at least 32 bytes of cryptographically secure data. For example, the following -algorithm can be used: +If this single-request upload fails, the entire file must be resent in another single HTTP request. +This is the recommended, preferred format for file uploads since fewer requests are required. -.. code-block:: python +As an example, if the client was to upload a 100,000 byte file, the headers would look like: - from base64 import b64encode - from secrets import token_bytes +.. code-block:: email - header = f':{b64encode(token_bytes(32)).decode()}:' + Content-Length: 100000 + Content-Type: application/octet-stream + Upload-Length: 100000 + Upload-Complete: ?1 -To upload the file in multiple chunks, a client sends multiple ``POST`` requests to the -same URL as above, with one request per chunk. +Clients can opt to upload the file in multiple chunks. Because the upload resource URL provided in +the metadata response will be unique per file, clients **MUST** use the given upload resource URL +for all chunks. Clients upload file chunks by sending multiple ``POST`` requests to this URL, with +one request per chunk. For chunked uploads, the ``Content-Length`` is equal to the size in bytes of the chunk that is currently being sent. The client **MUST** include a ``Upload-Offset`` header which indicates the -byte offset that the content included in this chunk's request starts at and an ``Upload-Incomplete`` -header with the value ``1``. For the first chunk, the ``Upload-Offset`` header **MUST** be set to +byte offset that the content included in this chunk's request starts at, and an ``Upload-Complete`` +header with the value ``?0``. For the first chunk, the ``Upload-Offset`` header **MUST** be set to ``0``. As with single-request uploads, the ``Content-Type`` header is ``application/octet-stream`` and the body is the raw, unencoded bytes of the chunk. -For example, if uploading a 100,000 byte file in 1000 byte chunks, the first chunk's -headers would be: +For example, if uploading a 100,000 byte file in 1000 byte chunks, the first chunk's request headers +would be: .. code-block:: email Content-Length: 1000 Content-Type: application/octet-stream - Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: Upload-Offset: 0 - Upload-Incomplete: 1 + Upload-Length: 100000 + Upload-Complete: ?0 For the second chunk representing bytes 1000 through 1999, include the following headers: @@ -489,39 +515,37 @@ For the second chunk representing bytes 1000 through 1999, include the following Content-Length: 1000 Content-Type: application/octet-stream - Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: Upload-Offset: 1000 - Upload-Incomplete: 1 + Upload-Length: 100000 + Upload-Complete: ?0 -.. _complete-the-upload: - -The final chunk of data **MUST** omit the ``Upload-Incomplete`` header, since at that point the -entire file has been uploaded. +These requests would continue sequentially until the last chunk is ready to be uploaded. For each successful chunk, the server **MUST** respond with a ``202 Accepted`` header, except for the final chunk, which **MUST** be a ``201 Created``, and as with non-chunked uploads, the body of these responses has no content. +.. _complete-the-upload: + +The final chunk of data **MUST** include the ``Upload-Complete: ?1`` header, since at that point the +entire file has been uploaded. + With both chunked and non-chunked uploads, once completed successfully, the file **MUST** not be publicly visible in the repository, but merely staged until the upload session is :ref:`completed -`. The file **MUST** be visible at the ``stage`` :ref:`URL ` but -only if the server supports :ref:`previews `. Partially uploaded chunked files -**SHOULD NOT** be visible at the ``stage`` URL. +`. If the server supports :ref:`previews `, the file **MUST** be +visible at the ``stage`` :ref:`URL `. Partially uploaded chunked files **SHOULD +NOT** be visible at the ``stage`` URL. The following constraints are placed on uploads regardless of whether they are single chunk or multiple chunks: - A client **MUST NOT** perform multiple ``POST`` requests in parallel for the same file to avoid - race conditions and data loss or corruption. The server **MAY** terminate any ongoing ``POST`` - request that utilizes the same ``Upload-Token`` for chunks of a different file. + race conditions and data loss or corruption. - If the offset provided in ``Upload-Offset`` is not ``0`` or correctly specifies the byte offset of the next chunk in an incomplete upload, then the server **MUST** respond with a ``409 Conflict``. This means that a client **MAY NOT** upload chunks out of order. -- Once an upload has started with a specific token, you may not use another token for that file - without deleting the in-progress upload. - - Once a file upload has completed successfully, you may initiate another upload for that file, which **once completed**, will replace that file. This is possible until the entire session is completed, at which point no further file uploads (either creating or replacing a session file) @@ -533,13 +557,20 @@ Resume an Upload ++++++++++++++++ To resume an upload, you first have to know how much of the file's contents the server has already -received. If this is not already known, a client can make a ``HEAD`` request with their existing -``Upload-Token`` to the same URL they were uploading the file to. +received. If this is not already known, a client can make a ``HEAD`` request to the upload resource +URL. + +The server **MUST** respond with a ``204 No Content`` response, with an ``Upload-Offset`` header +that indicates what offset the client should continue uploading from. If the server has not received +any data, then this would be ``0``, if it has received 1007 bytes then it would be ``1007``. For +this example, the full response headers would look like: + +.. code-block:: email + + Upload-Offset: 1007 + Upload-Complete: ?0 + Cache-Control: no-store -The server **MUST** respond back with a ``204 No Content`` response, with an ``Upload-Offset`` -header that indicates what offset the client should continue uploading from. If the server has not -received any data, then this would be ``0``, if it has received 1007 bytes then it would be -``1007``. Once the client has retrieved the offset that they need to start from, they can upload the rest of the file as described above, either in a single request containing all of the remaining bytes, or in @@ -552,20 +583,24 @@ Canceling an In-Progress Upload +++++++++++++++++++++++++++++++ If a client wishes to cancel an upload of a specific file, for instance because they need to upload -a different file, they may do so by issuing a ``DELETE`` request to the file upload URL with the -``Upload-Token`` used to upload the file in the first place. +a different file, they may do so by issuing a ``DELETE`` request to the upload resource URL of the +file they want to delete. -A successful cancellation request **MUST** response with a ``204 No Content``. +A successful cancellation request **MUST** respond with a ``204 No Content``. + +Once deleting, a client **MUST NOT** assume that the previous upload resource URL can be reused. Delete a Partial or Fully Uploaded File +++++++++++++++++++++++++++++++++++++++ -For files which have already been completely uploaded, clients can delete the file by issuing a -``DELETE`` request to the file upload URL without the ``Upload-Token``. +Similarly, for files which have already been completely uploaded, clients can delete the file by +issuing a ``DELETE`` request to the upload resource URL. A successful deletion request **MUST** response with a ``204 No Content``. +Once deleting, a client **MUST NOT** assume that the previous upload resource URL can be reused. + Replacing a Partially or Fully Uploaded File ++++++++++++++++++++++++++++++++++++++++++++ @@ -574,12 +609,17 @@ To replace a session file, the file upload **MUST** have been previously complet is not possible to replace a file if the upload for that file is incomplete. Clients have two options to replace an incomplete upload: -- :ref:`Cancel the in-progress upload ` by issuing a ``DELETE`` of that specific - file. After this, the new file upload can be initiated. +- :ref:`Cancel the in-progress upload ` by issuing a ``DELETE`` to the upload + resource URL for the file they want to replace. After this, the new file upload can be initiated + by beginning the entire :ref:`file upload ` sequence over again. This means + providing the metadata request again to retrieve a new upload resource URL. Client **MUST NOT** + assume that the previous upload resource URL can be reused after deletion. - :ref:`Complete the in-progress upload ` by uploading a zero-length chunk - omitting the ``Upload-Incomplete`` header. This effectively truncates and completes the - in-progress upload, after which point the new upload can commence. + providing the ``Upload-Complete: ?1`` header. This effectively truncates and completes the + in-progress upload, after which point the new upload can commence. In this case, clients + **SHOULD** reuse the previous upload resource URL and do not need to begin the entire :ref:`file + upload ` sequence over again. .. _session-status: @@ -588,9 +628,8 @@ Session Status ~~~~~~~~~~~~~~ At any time, a client can query the status of the session by issuing a ``GET`` request to the -``status`` URL from the :ref:`initial session response body `. As with other -session requests, clients **MUST NOT** assume that there is any commonality to what :ref:`session -URLs ` look like from one session to the next. +``session`` :ref:`link ` given in the :ref:`session creation response body +`. The server will respond to this ``GET`` request with the same :ref:`response ` that they got when they initially created the upload session, except with any changes to ``status``, @@ -604,8 +643,10 @@ Session Extension Servers **MAY** allow clients to extend sessions, but the overall lifetime and number of extensions allowed is left to the server. To extend a session, a client issues a ``POST`` request to the -``extend`` :ref:`session link ` given in the :ref:`session creation ` -request. The JSON body of this request looks like: +``session`` :ref:`link ` given in the :ref:`session creation response body +`. + +The JSON body of this request looks like: .. code-block:: json @@ -613,6 +654,7 @@ request. The JSON body of this request looks like: "meta": { "api-version": "2.0" }, + ":action": "extend", "extend-for": 3600 } @@ -633,14 +675,16 @@ the current session. Session Cancellation ~~~~~~~~~~~~~~~~~~~~ -To cancel an entire session, a client issues a ``DELETE`` request to the same session URL as -before. The server then marks the session as canceled, and **SHOULD** purge any data that was -uploaded as part of that session. Future attempts to access that session URL or any of the file -upload URLs **MUST** return a ``404 Not Found``. +To cancel an entire session, a client issues a ``DELETE`` request to the ``session`` :ref:`link +` given in the :ref:`session creation response body `. The server +then marks the session as canceled, and **SHOULD** purge any data that was uploaded as part of that +session. Future attempts to access that session URL or any of the upload session URLs **MUST** +return a ``404 Not Found``. To prevent dangling sessions, servers may also choose to cancel timed-out sessions on their own accord. It is recommended that servers expunge their sessions after no less than a week, but each -server may choose their own schedule. +server may choose their own schedule. Servers **MAY** support client-directed :ref:`session +extensions `. .. _publish-session: @@ -648,9 +692,21 @@ server may choose their own schedule. Session Completion ~~~~~~~~~~~~~~~~~~ -To complete a session and publish the files that have been included in it, a client **MUST** send a -``POST`` request to the ``publish`` URL in the session status payload. The body of the request -contains no content. +To complete a session and publish the files that have been included in it, a client issues a +``POST`` request to the ``session`` :ref:`link ` given in the :ref:`session creation +response body `. + +The JSON body of this request looks like: + +.. code-block:: json + + { + "meta": { + "api-version": "2.0" + }, + ":action": "publish", + } + If the server is able to immediately complete the session, it may do so and return a ``201 Created`` response. If it is unable to immediately complete the session (for instance, if it needs to do @@ -718,29 +774,27 @@ Stage Previews The ability to preview staged releases before they are published is an important feature of this PEP, enabling an additional level of last-mile testing before the release is available to the -public. Indexes **MAY** provide this functionality in one or both of the following ways. - -* Through the URL provided in the ``stage`` sub-key of the :ref:`links key ` - returned when the session is created. The ``stage`` URL can be passed to installers such as - ``pip`` by setting the `--extra-index-url - `__ flag to this value. - Multiple stages can even be previewed by repeating this flag with multiple values. - -* By passing the ``Stage-Token`` header to the `Simple Repository API - `_ requests or the - :pep:`691` JSON-based Simple API, with the value from the ``session-token`` sub-key of the JSON - response to the session creation request. Multiple ``Stage-Token`` headers are allowed. It is - recommended that installers add a ``--staged `` or similarly named option to set the - ``Stage-Token`` header at the command line. - -In both cases, the index will return views that expose the staged releases to the installer tool, +public. Indexes **MAY** provide this functionality through the URL provided in the ``stage`` +sub-key of the :ref:`links key ` returned when the session is created. The ``stage`` +URL can be passed to installers such as ``pip`` by setting the `--extra-index-url +`__ flag to this value. +Multiple stages can even be previewed by repeating this flag with multiple values. + +In the future, it may be valuable to include something like a ``Stage-Token`` header to the `Simple +Repository API `_ +requests or the :pep:`691` JSON-based Simple API, with the value from the ``session-token`` sub-key +of the JSON response to the session creation request. Multiple ``Stage-Token`` headers could be +allowed, and installers could support enabling stage previews by adding a ``--staged `` or +similarly named option to set the ``Stage-Token`` header at the command line. This feature is not +currently support, nor proposed by this PEP, though it could be proposed by a separate PEP in the +future. + +In either case, the index will return views that expose the staged releases to the installer tool, making them available to download and install into virtual environments built for that last-mile testing. The former option allows for existing installers to preview staged releases with no changes, although perhaps in a less user-friendly way. The latter option can be a better user experience, but the details of this are left to installer tool maintainers. -**XXX verify Stage-Token exists - I think it doesn't** - .. _session-errors: @@ -786,7 +840,7 @@ standard content type that describes what the content is, what version of the AP what serialization format has been used. This standard request content type applies to all requests *except* for :ref:`file upload requests -` which, since they contain only binary data, is ``application/octet-stream``. +` which, since they contain only binary data, is always ``application/octet-stream``. The structure of the ``Content-Type`` header for all other requests is: @@ -838,15 +892,14 @@ Is this Resumable Upload protocol based on anything? Yes! -It's actually the protocol specified in an `active internet draft `_, where the authors -took what they learned implementing `tus `_ to provide the idea of resumable -uploads in a wholly generic, standards based way. +It's actually based on the protocol specified in an `active internet draft `_, where the +authors took what they learned implementing `tus `_ to provide the idea of +resumable uploads in a wholly generic, standards based way. .. _ietf-draft: https://www.ietf.org/archive/id/draft-ietf-httpbis-resumable-upload-05.html -The only deviation we've made from that spec is that we don't use the ``104 Upload Resumption -Supported`` informational response in the first ``POST`` request. This decision was made for a few -reasons: +This PEP deviates from that spec in several ways, as described in the body of the proposal. This +decision was made for a few reasons: - The ``104 Upload Resumption Supported`` is the only part of that draft which does not rely entirely on things that are already supported in the existing standards, since it was adding a new @@ -863,10 +916,6 @@ reasons: - In theory, if the support for ``1xx`` responses got resolved and the draft gets accepted with it in, we can add that in at a later date without changing the overall flow of the API. -There is a risk that the above draft doesn't get accepted, but even if it does not, that doesn't -actually affect us. It would just mean that our support for resumable uploads is an application -specific protocol, but is still wholly standards compliant. - Can I use the upload 2.0 API to reserve a project name? ------------------------------------------------------- @@ -892,11 +941,8 @@ This PEP currently bases the actual uploading of files on an `internet draft `_ ``hashlib.new()`` and which does not require additional parameters. -.. [#fn-chunk-token] Single request uploads **MAY** include the ``Upload-Token`` header, but it is - not required in that case. - .. [#fn-immutable] Published files may still be yanked (i.e. :pep:`592`) or `deleted `__ as normal. +.. [#fn-location] Or the URL given in the ``Location`` header in the response to the file upload + initiation request, i.e. the metadata upload request; both of these links **MUST** + be the same. + Copyright ========= From d3d2edfe4041946b7b501ac59108e0b490292c27 Mon Sep 17 00:00:00 2001 From: Barry Warsaw Date: Thu, 12 Dec 2024 18:05:13 -0800 Subject: [PATCH 9/9] Add myself to CODEOWNERS for PEP 694 --- .github/CODEOWNERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index a48e9dbc1b0..a5516e95e20 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -574,7 +574,7 @@ peps/pep-0690.rst @warsaw peps/pep-0691.rst @dstufft peps/pep-0692.rst @jellezijlstra peps/pep-0693.rst @Yhg1s -peps/pep-0694.rst @dstufft +peps/pep-0694.rst @dstufft @warsaw peps/pep-0695.rst @gvanrossum peps/pep-0696.rst @jellezijlstra peps/pep-0697.rst @encukou