-
Notifications
You must be signed in to change notification settings - Fork 10
Add functionality for more descriptive console messages and return of response objects. #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| import logging | ||
| import re | ||
| import requests | ||
| import time | ||
|
|
||
| REQ_MSG = "Requested: {}" # get request url | ||
| RESPONSE_TIME_MSG = "Response received in {} seconds." # requests.elapsed value. | ||
| RESPONSE_MSG = "HTTP Response: {} ({})" # Brief description, status code | ||
| MULTIPAGE_MSG = ("The requested data quantity is greater than the " | ||
| "supplied row limit and will be downloaded over multiple requests.") | ||
|
|
||
|
|
||
| def setup_logger(logger_name: str = 'onc-client', | ||
| level: int | str = 'DEBUG') -> logging.Logger: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are actually still supporting the 3.9 version, which does not have this
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could switch to
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is better to keep the | syntax. I already used this in the code (and added that import). It is easier to update after I bump the version. Actually I plan to do that and release a new minor version after I merge your PR. |
||
| """ | ||
| Set up a logger object for displaying verbose messages to console. | ||
| :param logger_name: The unique logger name to use. Can be shared between modules | ||
| :param level: The logging level to use. Default is 2, which corresponds to DEBUG. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. logging.DEBUG is 10, right? https://docs.python.org/3/howto/logging.html#logging-levels
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that is an artifact from a separate project. Will change. |
||
| :return: The configured logging.Logger object. | ||
| """ | ||
|
|
||
| logger = logging.getLogger(logger_name) | ||
| logger.propagate = False | ||
| if not logger.handlers: | ||
| logger.setLevel(logging.DEBUG) | ||
| console = logging.StreamHandler() | ||
| console.setLevel(level) | ||
|
|
||
| # Set the logging format. | ||
| dtfmt = '%Y-%m-%dT%H:%M:%S' | ||
| strfmt = f'%(asctime)s.%(msecs)03dZ | %(name)-12s | %(levelname)-8s | %(message)s' | ||
| #strfmt = f'%(asctime)s.%(msecs)03dZ | %(levelname)-8s | %(message)s' # Use this if you don't want to include logger name. | ||
| fmt = logging.Formatter(strfmt, datefmt=dtfmt) | ||
| fmt.converter = time.gmtime | ||
|
|
||
| console.setFormatter(fmt) | ||
| logger.addHandler(console) | ||
| return logger | ||
|
|
||
|
|
||
| def scrub_token(input: str) -> str: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe it is better to use import re
re.sub(r'&token=[a-z0-9-]{8}', r'&token=REDACTED',"foo&token=foo-bar0&foo=bar")BTW how did you figure out the token pattern is
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Couldn't tell you how I settled on that. I will change to your regex. Thanks.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean your regex works fine (the fact that token generation only uses a subset of alphanumeric is to my surprise). You don't need to change the regex, just change the function call to re.sub. |
||
| """ | ||
| Replace a token in a query URL or other string with the string 'REDACTED' | ||
| so that users don't accidentally commit their tokens to public repositories | ||
| if ONC Info/Warnings are too verbose. | ||
| :param query_url: An Oceans 3.0 API URL or string with a token query parameter. | ||
| :return: A scrubbed url. | ||
| """ | ||
| token_regex = r'(&token=[a-f0-9-]{36})' | ||
| token_qp = re.findall(token_regex, input)[0] | ||
| redacted_url = input.replace(token_qp, '&token=REDACTED') | ||
| return redacted_url | ||
|
|
||
|
|
||
| def build_error_message(response: requests.Response, redact_token: bool) -> str: | ||
| """ | ||
| Build an error message from a requests.Response object. | ||
| :param response: A requests.Response object. | ||
| :param redact_token: If true, redact tokens before returning an error message. | ||
| :return: An error message. | ||
| """ | ||
| payload = response.json() | ||
| if 'message' in payload.keys(): | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This if-else clause could be simplified by |
||
| message = payload['message'] | ||
| else: | ||
| message = None | ||
|
|
||
| if 'errors' in payload.keys(): | ||
| errors = payload['errors'] | ||
| error_messages = [] | ||
| for error in errors: | ||
| emsg = (f"(API Error Code {error['errorCode']}) " | ||
| f"{error['errorMessage']} for query parameter(s) " | ||
| f"'{error['parameter']}'.") | ||
| error_messages.append(emsg) | ||
| error_message = '\n'.join(error_messages) | ||
| else: | ||
| error_message = None | ||
| msg = '\n'.join([m for m in (message, error_message) if m is not None]) | ||
| if redact_token is True and 'token=' in msg: | ||
| msg = scrub_token(msg) | ||
| return msg | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,12 +8,23 @@ | |
|
|
||
| from ._util import _formatDuration | ||
|
|
||
| from onc.modules._Messages import (setup_logger, MULTIPAGE_MSG, | ||
| build_error_message, | ||
| scrub_token, | ||
| REQ_MSG, | ||
| RESPONSE_TIME_MSG, | ||
| RESPONSE_MSG) | ||
|
|
||
|
|
||
|
|
||
| # Handles data multi-page downloads (scalardata, rawdata, archivefiles) | ||
| class _MultiPage: | ||
| def __init__(self, parent: object): | ||
| def __init__(self, parent: object, verbosity: bool, raise_http_errors: bool): | ||
| self.parent = weakref.ref(parent) | ||
| self.result = None | ||
| self.raise_http_errors = raise_http_errors | ||
| self.__log = setup_logger('onc-multi', verbosity) | ||
|
|
||
|
|
||
| def getAllPages(self, service: str, url: str, filters: dict): | ||
| """ | ||
|
|
@@ -30,48 +41,42 @@ def getAllPages(self, service: str, url: str, filters: dict): | |
| # download first page | ||
| start = time() | ||
| response, responseTime = self._doPageRequest(url, filters, service, extension) | ||
| rNext = response["next"] | ||
|
|
||
| if rNext is not None: | ||
| print( | ||
| "Data quantity is greater than the row limit and", | ||
| "will be downloaded in multiple pages.", | ||
| ) | ||
|
|
||
| pageCount = 1 | ||
| pageEstimate = self._estimatePages(response, service) | ||
| if pageEstimate > 0: | ||
| # Exclude the first page when calculating the time estimation | ||
| timeEstimate = _formatDuration((pageEstimate - 1) * responseTime) | ||
| print( | ||
| f"Downloading time for the first page: {humanize.naturaldelta(responseTime)}" # noqa: E501 | ||
| ) | ||
| print(f"Estimated approx. {pageEstimate} pages in total.") | ||
| print( | ||
| f"Estimated approx. {timeEstimate} to complete for the rest of the pages." # noqa: E501 | ||
| ) | ||
|
|
||
| # keep downloading pages until next is None | ||
| print("") | ||
| while rNext is not None: | ||
| pageCount += 1 | ||
| rowCount = self._rowCount(response, service) | ||
| if isinstance(response,dict): | ||
| rNext = response["next"] | ||
|
|
||
| print(f" ({rowCount} samples) Downloading page {pageCount}...") | ||
| nextResponse, nextTime = self._doPageRequest( | ||
| url, rNext["parameters"], service, extension | ||
| ) | ||
| rNext = nextResponse["next"] | ||
| if rNext is not None: | ||
| self.__log.info("The requested data quantity is greater than the supplied " | ||
| "row limit and will be downloaded over multiple requests.") | ||
|
|
||
| pageCount = 1 | ||
| pageEstimate = self._estimatePages(response, service) | ||
| if pageEstimate > 0: | ||
| # Exclude the first page when calculating the time estimation | ||
| timeEstimate = _formatDuration((pageEstimate - 1) * responseTime) | ||
| self.__log.debug(f'Download time for page {pageCount}: {round(responseTime,2)} seconds') | ||
| self.__log.info(f'Est. number of pages remaining for download: {pageEstimate-1}') | ||
| self.__log.info(f'Est. number of seconds to download remaining data: {timeEstimate}') | ||
|
|
||
| # keep downloading pages until next is None | ||
| while rNext is not None: | ||
| pageCount += 1 | ||
| rowCount = self._rowCount(response, service) | ||
|
|
||
| self.__log.debug(f"Submitting request for page {pageCount} ({rowCount} samples)...") | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here and line 57,58,59 for the |
||
|
|
||
| nextResponse, nextTime = self._doPageRequest( | ||
| url, rNext["parameters"], service, extension | ||
| ) | ||
| rNext = nextResponse["next"] | ||
|
|
||
| # concatenate new data obtained | ||
| self._catenateData(response, nextResponse, service) | ||
|
|
||
| # concatenate new data obtained | ||
| self._catenateData(response, nextResponse, service) | ||
| totalTime = _formatDuration(time() - start) | ||
|
|
||
| totalTime = _formatDuration(time() - start) | ||
| print( | ||
| f" ({self._rowCount(response, service):d} samples)" | ||
| f" Completed in {totalTime}." | ||
| ) | ||
| response["next"] = None | ||
| self.__log.info(f"Downloaded {self._rowCount(response, service):d} total samples in {totalTime}.") | ||
| response["next"] = None | ||
|
|
||
| return response | ||
|
|
||
|
|
||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At line 172 there is also a I am also thinking is it better to still keep the single parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the illustration, but probably no need to be included in the PR. I guess it can be put in a temp branch.