Command line utility script to monitor the availability and response time of a set of endpoints. The script will check the endpoints at regular intervals of 15s and log the results, calculating the availability percentage for each domain.
Completed as part of the Fetch Rewards Site Reliability Engineering take-home exercise.
- Must accept a YAML configuration file as a command-line argument.
- YAML format must match that in the provided sample.
- Must accurately determine the availability of all endpoints during every check cycle.o
- Endpoints are only considered available if they meet the conditions:
- HTTP status code is between 200 and 299
- Response time is less than 500ms
- Must ignore port numbers when determining domains
- Must determine availability cumulatively
- Check cycles should run every 15 seconds, regardless of the number of endpoints and their response times.
-
Clone the repository:
git clone https://github.com/lskellerm/sre-take-home-exercise-python.git cd sre-take-home-exercise-python -
Create a virtual environment (optional but recommended):
python -m venv venv # Activate the virtual environment # macOS/Linux source venv/bin/activate # Windows venv\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
-
Run the script:
python main.py sample.yaml python main.py custom_endpoints.yaml
Logs will be written to
/endpoint_monitor_logsand also written to stdout.
-
No Default HTTP method
- The original code does not specify a default HTTP method for requests when the method is not provided in the YAML file.
- Fix: Assigned a default value for the HTTP method to "GET" if not specified in the YAML file, as per the requirements.
-
Missing name handling
- The original implementation does not handle the case where the 'free-text name' of the endpoint is not provided in the YAML file. Per the requirements, the name key is required.
- Fix: Assigned a default value for the name to "unknown" if not specified in the YAML file and logged a warning message.
-
Missing URL handling
- No validation is performed on the URL format in the original code. If the URL is not valid, the script will fail when trying to make a request.
- Fix: Added a check to ensure that the URL is valid before making a request. If the URL is invalid, log the error and mark the endpoint as unavailable.
-
Missing timeout handling
- The original code does not explicitly handle timeouts when making requests. If a request takes too long, it may hang indefinitely.
- Fix: Added a timeout parameter to the http client request method to ensure that the request times out after the specified duration (500ms).
-
Incorrect enforcement of 15s interval
- The original code does not enforce the 15-second interval between checks. It wasn't ensuring the correct timing between checks.
- Fix: Calculated how much time has passed since the last check and adjusted the sleep time accordingly to ensure that each check occurs every 15 seconds.
-
Not extracting port numbers from the endpoint
- Port numbers were not being extracted from the url key in the yaml configuration.
- Fix: Parsed out the port number from the URL using
urlparseand used it to determine the domain name correctly, ensuring proper per-domain availability calculations.
- Logging
- The original code does not log the results of the checks. It only prints them to the console.
- Fix: Added structured and unified logging functionality to log the results of each check to a log file and also the console.
- Performance
- The original code checks each endpoint sequentially, which can be slow if there are many endpoints, and may result in the 15s not being enforced.
- Fix: Implemented asynchronous requests using the
aiohttplibrary to check multiple endpoints concurrently, improving performance and reducing overall check time.
- Error Handling
- Improved error handling by explicitly logging timeout and exception reasons throughout the code.
- Strongly typed
- Added type annotations to functions and methods to provide better clarity and improve dx.
- Created TypeDict for Endpoint to ensure keys in the YAML file are correctly typed and validated.