From 3420744b222b5022721f6f0bc1580e339b4329b3 Mon Sep 17 00:00:00 2001 From: thodson-usgs Date: Tue, 26 May 2026 15:49:56 -0500 Subject: [PATCH] docs: migrate documentation and demo notebooks to the Water Data API Move the user guide, example pages, and demo notebooks off the deprecated `nwis` module onto the `waterdata` module (USGS Water Data API): - userguide/timeconventions: rewritten for the Water Data API datetime model (`time` is a column; tz-aware UTC for continuous data, tz-naive for daily), using the `.dt.tz_convert` idiom. - examples/readme_examples + siteinfo_examples: ported to get_continuous, get_monitoring_locations, and get_time_series_metadata. - demo notebooks: migrated to the waterdata API (USGS-prefixed monitoring location ids; get_daily, get_continuous, get_field_measurements, get_peaks, get_ratings, get_stats_*, get_samples), with titles, narratives, and argument lists rewritten to match the code. WaterUse stays on the legacy nwis.get_water_use with a note (no Water Data API equivalent yet). - renamed the demo notebooks to lead with their module (USGS_WaterData_*, USGS_NLDI, USGS_NWIS_WaterUse) and renamed the peak-flow trends demo to peak_streamflow_trends. - nldi: mark the nationwide get_features_by_data_source example `# doctest: +SKIP` (it streams every nwissite feature and hangs the build). - fixed pre-existing Sphinx build warnings: get_channel bullet-list RST, the duplicated get_monitoring_locations link, the NWIS_Metadata duplicate object description, the deprecated display_version theme option, the missing _static directory, and a WaterData_demo heading level. Notebooks are kept output-free (nbsphinx executes them at build time). `sphinx-build -b doctest` passes and every notebook executes against the live API with no cell errors. Co-Authored-By: Claude Opus 4.7 (1M context) --- dataretrieval/nldi.py | 5 +- dataretrieval/nwis.py | 18 +- dataretrieval/waterdata/api.py | 15 +- demos/R Python Vignette equivalents.ipynb | 138 +++---- demos/WaterData_demo.ipynb | 14 +- ...xamples.ipynb => USGS_NLDI_Examples.ipynb} | 0 ...pynb => USGS_NWIS_WaterUse_Examples.ipynb} | 16 +- .../USGS_WaterData_DailyValues_Examples.ipynb | 243 ++++++++++++ ...WaterData_GroundwaterLevels_Examples.ipynb | 261 +++++++++++++ ...USGS_WaterData_Measurements_Examples.ipynb | 182 +++++++++ ...S_WaterData_ParameterCodes_Examples.ipynb} | 25 +- .../USGS_WaterData_Peaks_Examples.ipynb | 214 +++++++++++ .../USGS_WaterData_Ratings_Examples.ipynb | 183 +++++++++ .../USGS_WaterData_Samples_Examples.ipynb | 220 +++++++++++ .../USGS_WaterData_SiteInfo_Examples.ipynb | 203 ++++++++++ ...SGS_WaterData_SiteInventory_Examples.ipynb | 201 ++++++++++ .../USGS_WaterData_Statistics_Examples.ipynb | 250 +++++++++++++ ... USGS_WaterData_UnitValues_Examples.ipynb} | 95 ++--- ...S_dataretrieval_DailyValues_Examples.ipynb | 300 --------------- ...retrieval_GroundwaterLevels_Examples.ipynb | 300 --------------- ..._dataretrieval_Measurements_Examples.ipynb | 201 ---------- .../USGS_dataretrieval_Peaks_Examples.ipynb | 213 ----------- .../USGS_dataretrieval_Ratings_Examples.ipynb | 192 ---------- ...USGS_dataretrieval_SiteInfo_Examples.ipynb | 242 ------------ ...dataretrieval_SiteInventory_Examples.ipynb | 232 ------------ ...GS_dataretrieval_Statistics_Examples.ipynb | 240 ------------ ..._dataretrieval_WaterSamples_Examples.ipynb | 349 ------------------ ...o_1.ipynb => peak_streamflow_trends.ipynb} | 116 +++--- docs/source/_static/.gitkeep | 0 docs/source/conf.py | 1 - .../USGS_NWIS_WaterUse_Examples.nblink | 3 + ...USGS_WaterData_DailyValues_Examples.nblink | 3 + ...aterData_GroundwaterLevels_Examples.nblink | 3 + ...SGS_WaterData_Measurements_Examples.nblink | 3 + ...S_WaterData_ParameterCodes_Examples.nblink | 3 + .../USGS_WaterData_Peaks_Examples.nblink | 3 + .../USGS_WaterData_Ratings_Examples.nblink | 3 + .../USGS_WaterData_Samples_Examples.nblink | 3 + .../USGS_WaterData_SiteInfo_Examples.nblink | 3 + ...GS_WaterData_SiteInventory_Examples.nblink | 3 + .../USGS_WaterData_Statistics_Examples.nblink | 3 + .../USGS_WaterData_UnitValues_Examples.nblink | 3 + ..._dataretrieval_DailyValues_Examples.nblink | 3 - ...etrieval_GroundwaterLevels_Examples.nblink | 3 - ...dataretrieval_Measurements_Examples.nblink | 3 - ...taretrieval_ParameterCodes_Examples.nblink | 3 - .../USGS_dataretrieval_Peaks_Examples.nblink | 3 - ...USGS_dataretrieval_Ratings_Examples.nblink | 3 - ...SGS_dataretrieval_SiteInfo_Examples.nblink | 3 - ...ataretrieval_SiteInventory_Examples.nblink | 3 - ...S_dataretrieval_Statistics_Examples.nblink | 3 - ...S_dataretrieval_UnitValues_Examples.nblink | 3 - ...dataretrieval_WaterSamples_Examples.nblink | 3 - ...SGS_dataretrieval_WaterUse_Examples.nblink | 3 - docs/source/examples/index.rst | 26 +- ...1.nblink => peak_streamflow_trends.nblink} | 4 +- docs/source/examples/readme_examples.rst | 81 ++-- docs/source/examples/siteinfo_examples.rst | 90 +++-- docs/source/userguide/timeconventions.rst | 124 +++---- 59 files changed, 2351 insertions(+), 2715 deletions(-) rename demos/hydroshare/{USGS_dataretrieval_NLDI_Examples.ipynb => USGS_NLDI_Examples.ipynb} (100%) rename demos/hydroshare/{USGS_dataretrieval_WaterUse_Examples.ipynb => USGS_NWIS_WaterUse_Examples.ipynb} (81%) create mode 100644 demos/hydroshare/USGS_WaterData_DailyValues_Examples.ipynb create mode 100644 demos/hydroshare/USGS_WaterData_GroundwaterLevels_Examples.ipynb create mode 100644 demos/hydroshare/USGS_WaterData_Measurements_Examples.ipynb rename demos/hydroshare/{USGS_dataretrieval_ParameterCodes_Examples.ipynb => USGS_WaterData_ParameterCodes_Examples.ipynb} (56%) create mode 100644 demos/hydroshare/USGS_WaterData_Peaks_Examples.ipynb create mode 100644 demos/hydroshare/USGS_WaterData_Ratings_Examples.ipynb create mode 100644 demos/hydroshare/USGS_WaterData_Samples_Examples.ipynb create mode 100644 demos/hydroshare/USGS_WaterData_SiteInfo_Examples.ipynb create mode 100644 demos/hydroshare/USGS_WaterData_SiteInventory_Examples.ipynb create mode 100644 demos/hydroshare/USGS_WaterData_Statistics_Examples.ipynb rename demos/hydroshare/{USGS_dataretrieval_UnitValues_Examples.ipynb => USGS_WaterData_UnitValues_Examples.ipynb} (51%) delete mode 100644 demos/hydroshare/USGS_dataretrieval_DailyValues_Examples.ipynb delete mode 100644 demos/hydroshare/USGS_dataretrieval_GroundwaterLevels_Examples.ipynb delete mode 100644 demos/hydroshare/USGS_dataretrieval_Measurements_Examples.ipynb delete mode 100644 demos/hydroshare/USGS_dataretrieval_Peaks_Examples.ipynb delete mode 100644 demos/hydroshare/USGS_dataretrieval_Ratings_Examples.ipynb delete mode 100644 demos/hydroshare/USGS_dataretrieval_SiteInfo_Examples.ipynb delete mode 100644 demos/hydroshare/USGS_dataretrieval_SiteInventory_Examples.ipynb delete mode 100644 demos/hydroshare/USGS_dataretrieval_Statistics_Examples.ipynb delete mode 100644 demos/hydroshare/USGS_dataretrieval_WaterSamples_Examples.ipynb rename demos/{NWIS_demo_1.ipynb => peak_streamflow_trends.ipynb} (54%) create mode 100644 docs/source/_static/.gitkeep create mode 100644 docs/source/examples/USGS_NWIS_WaterUse_Examples.nblink create mode 100644 docs/source/examples/USGS_WaterData_DailyValues_Examples.nblink create mode 100644 docs/source/examples/USGS_WaterData_GroundwaterLevels_Examples.nblink create mode 100644 docs/source/examples/USGS_WaterData_Measurements_Examples.nblink create mode 100644 docs/source/examples/USGS_WaterData_ParameterCodes_Examples.nblink create mode 100644 docs/source/examples/USGS_WaterData_Peaks_Examples.nblink create mode 100644 docs/source/examples/USGS_WaterData_Ratings_Examples.nblink create mode 100644 docs/source/examples/USGS_WaterData_Samples_Examples.nblink create mode 100644 docs/source/examples/USGS_WaterData_SiteInfo_Examples.nblink create mode 100644 docs/source/examples/USGS_WaterData_SiteInventory_Examples.nblink create mode 100644 docs/source/examples/USGS_WaterData_Statistics_Examples.nblink create mode 100644 docs/source/examples/USGS_WaterData_UnitValues_Examples.nblink delete mode 100644 docs/source/examples/USGS_dataretrieval_DailyValues_Examples.nblink delete mode 100644 docs/source/examples/USGS_dataretrieval_GroundwaterLevels_Examples.nblink delete mode 100644 docs/source/examples/USGS_dataretrieval_Measurements_Examples.nblink delete mode 100644 docs/source/examples/USGS_dataretrieval_ParameterCodes_Examples.nblink delete mode 100644 docs/source/examples/USGS_dataretrieval_Peaks_Examples.nblink delete mode 100644 docs/source/examples/USGS_dataretrieval_Ratings_Examples.nblink delete mode 100644 docs/source/examples/USGS_dataretrieval_SiteInfo_Examples.nblink delete mode 100644 docs/source/examples/USGS_dataretrieval_SiteInventory_Examples.nblink delete mode 100644 docs/source/examples/USGS_dataretrieval_Statistics_Examples.nblink delete mode 100644 docs/source/examples/USGS_dataretrieval_UnitValues_Examples.nblink delete mode 100644 docs/source/examples/USGS_dataretrieval_WaterSamples_Examples.nblink delete mode 100644 docs/source/examples/USGS_dataretrieval_WaterUse_Examples.nblink rename docs/source/examples/{nwisdemo01.nblink => peak_streamflow_trends.nblink} (50%) diff --git a/dataretrieval/nldi.py b/dataretrieval/nldi.py index e54ceb85..a483dc9e 100644 --- a/dataretrieval/nldi.py +++ b/dataretrieval/nldi.py @@ -296,8 +296,9 @@ def get_features_by_data_source(data_source: str) -> gpd.GeoDataFrame: -------- .. doctest:: - >>> # Get features for a feature wqp and feature_id USGS-01031500 - >>> gdf = dataretrieval.nldi.get_features_by_data_source( + >>> # "nwissite" returns every NWIS site nationwide, so this example is + >>> # skipped in the doctest build to avoid the (very large) download. + >>> gdf = dataretrieval.nldi.get_features_by_data_source( # doctest: +SKIP ... data_source="nwissite" ... ) """ diff --git a/dataretrieval/nwis.py b/dataretrieval/nwis.py index cfcdc64e..6a1ed472 100644 --- a/dataretrieval/nwis.py +++ b/dataretrieval/nwis.py @@ -1125,12 +1125,11 @@ class NWIS_Metadata(BaseMetadata): Response headers comments: str | None Metadata comments, if any - site_info: tuple[pd.DataFrame, NWIS_Metadata] | None - Site information if the query included `site_no`, `sites`, `stateCd`, - `huc`, `countyCd` or `bBox`. `site_no` is preferred over `sites` if - both are present. - variable_info: None - Deprecated. Accessing variable_info via NWIS_Metadata is deprecated. + + Notes + ----- + ``site_info`` and ``variable_info`` are exposed as properties (documented + below) rather than plain attributes. """ @@ -1164,7 +1163,12 @@ def __init__(self, response, **parameters) -> None: @property def site_info(self) -> tuple[pd.DataFrame, BaseMetadata] | None: - """ + """Site information for the query. + + Populated when the query included ``site_no``, ``sites``, ``stateCd``, + ``huc``, ``countyCd`` or ``bBox`` (``site_no`` is preferred over + ``sites`` if both are present); ``None`` otherwise. + Return ------ df: ``pandas.DataFrame`` diff --git a/dataretrieval/waterdata/api.py b/dataretrieval/waterdata/api.py index 57fffc88..1ec9ed42 100644 --- a/dataretrieval/waterdata/api.py +++ b/dataretrieval/waterdata/api.py @@ -554,11 +554,11 @@ def get_monitoring_locations( county_code : string or iterable of strings, optional The code for the county or county equivalent (parish, borough, etc.) in which the monitoring location is located. A `list of codes - `_ is available. + `__ is available. county_name : string or iterable of strings, optional The name of the county or county equivalent (parish, borough, etc.) in which the monitoring location is located. A `list of codes - `_ is available. + `__ is available. minor_civil_division_code : string or iterable of strings, optional Codes for primary governmental or administrative divisions of the county or county equivalent in which the monitoring location is located. @@ -2751,9 +2751,10 @@ def get_channel( * A date-time: "2018-02-12T23:20:50Z" * A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z" * Half-bounded intervals: "2018-02-12T00:00:00Z/.." or - "../2018-03-18T12:31:12Z" - * Duration objects: "P1M" for data from the past month or "PT36H" for - the last 36 hours + "../2018-03-18T12:31:12Z" + * Duration objects: "P1M" for data from the past month or "PT36H" + for the last 36 hours + channel_name : string or iterable of strings, optional The channel name. channel_flow : string or iterable of strings, optional @@ -2799,9 +2800,9 @@ def get_channel( * A date-time: "2018-02-12T23:20:50Z" * A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z" * Half-bounded intervals: "2018-02-12T00:00:00Z/.." or - "../2018-03-18T12:31:12Z" + "../2018-03-18T12:31:12Z" * Duration objects: "P1M" for data from the past month or "PT36H" for the - last 36 hours + last 36 hours Only features that have a last_modified that intersects the value of datetime are selected. diff --git a/demos/R Python Vignette equivalents.ipynb b/demos/R Python Vignette equivalents.ipynb index 1904e88e..cf5672fd 100644 --- a/demos/R Python Vignette equivalents.ipynb +++ b/demos/R Python Vignette equivalents.ipynb @@ -20,9 +20,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The dataRetrieval package was created as a python equivalent to the R dataRetrieval tool.\n", + "The `dataretrieval` Python package was created as an equivalent to the R `dataRetrieval` package.\n", "\n", - "The following shows python equivalents for methods outlined in the R dataRetrieval Vignette with the equivalent R code in comments" + "The following shows Python equivalents for the methods outlined in the R `dataRetrieval` vignette, with the equivalent R code in comments." ] }, { @@ -36,8 +36,10 @@ "siteNumbers <- c(\"01491000\",\"01645000\")\n", "siteINFO <- readNWISsite(siteNumbers)\n", "\"\"\"\n", - "siteNumbers = [\"01491000\", \"01645000\"]\n", - "siteINFO, md = nwis.get_iv(sites=siteNumbers)" + "siteNumbers = [\"USGS-01491000\", \"USGS-01645000\"]\n", + "siteINFO, md = waterdata.get_monitoring_locations(\n", + " monitoring_location_id=siteNumbers, skip_geometry=True\n", + ")" ] }, { @@ -52,8 +54,9 @@ "dailyDataAvailable <- whatNWISdata(siteNumbers,\n", " service=\"dv\", statCd=\"00003\")\n", "\"\"\"\n", - "\n", - "dailyDataAvailable, md = nwis.get_dv(sites=siteNumbers, statCd=\"00003\")" + "dailyDataAvailable, md = waterdata.get_time_series_metadata(\n", + " monitoring_location_id=siteNumbers, statistic_id=\"00003\", skip_geometry=True\n", + ")" ] }, { @@ -66,20 +69,19 @@ "# Choptank River near Greensboro, MD:\n", "siteNumber <- \"01491000\"\n", "parameterCd <- \"00060\" # Discharge\n", - "startDate <- \"2009-10-01\" \n", - "endDate <- \"2012-09-30\" \n", + "startDate <- \"2009-10-01\"\n", + "endDate <- \"2012-09-30\"\n", "\n", - "discharge <- readNWISdv(siteNumber, \n", - " parameterCd, startDate, endDate)\n", + "discharge <- readNWISdv(siteNumber, parameterCd, startDate, endDate)\n", "\"\"\"\n", "# Choptank River near Greensboro, MD:\n", - "siteNumber = \"01491000\"\n", + "siteNumber = \"USGS-01491000\"\n", "parameterCd = \"00060\" # Discharge\n", - "startDate = \"2009-10-01\"\n", - "endDate = \"2012-09-30\"\n", "\n", - "discharge, md = nwis.get_dv(\n", - " sites=siteNumber, parameterCd=parameterCd, start=startDate, end=endDate\n", + "discharge, md = waterdata.get_daily(\n", + " monitoring_location_id=siteNumber,\n", + " parameter_code=parameterCd,\n", + " time=\"2009-10-01/2012-09-30\",\n", ")" ] }, @@ -92,25 +94,21 @@ "\"\"\"\n", "siteNumber <- \"01491000\"\n", "parameterCd <- c(\"00010\",\"00060\") # Temperature and discharge\n", - "statCd <- c(\"00001\",\"00003\") # Mean and maximum\n", + "statCd <- c(\"00001\",\"00003\") # Maximum and mean\n", "startDate <- \"2012-01-01\"\n", "endDate <- \"2012-05-01\"\n", "\n", - "temperatureAndFlow <- readNWISdv(siteNumber, parameterCd, \n", - " startDate, endDate, statCd=statCd)\n", + "temperatureAndFlow <- readNWISdv(siteNumber, parameterCd, startDate, endDate, statCd=statCd)\n", "\"\"\"\n", - "siteNumber = \"01491000\"\n", + "siteNumber = \"USGS-01491000\"\n", "parameterCd = [\"00010\", \"00060\"] # Temperature and discharge\n", - "statCd = [\"00001\", \"00003\"] # Mean and maximum\n", - "startDate = \"2012-01-01\"\n", - "endDate = \"2012-05-01\"\n", + "statisticId = [\"00001\", \"00003\"] # Maximum and mean\n", "\n", - "temperatureAndFlow, md = nwis.get_dv(\n", - " sites=siteNumber,\n", - " parameterCd=parameterCd,\n", - " start=startDate,\n", - " end=endDate,\n", - " statCd=statCd,\n", + "temperatureAndFlow, md = waterdata.get_daily(\n", + " monitoring_location_id=siteNumber,\n", + " parameter_code=parameterCd,\n", + " statistic_id=statisticId,\n", + " time=\"2012-01-01/2012-05-01\",\n", ")" ] }, @@ -122,17 +120,17 @@ "source": [ "\"\"\"\n", "parameterCd <- \"00060\" # Discharge\n", - "startDate <- \"2012-05-12\" \n", - "endDate <- \"2012-05-13\" \n", - "dischargeUnit <- readNWISuv(siteNumber, parameterCd, \n", - " startDate, endDate)\n", + "startDate <- \"2012-05-12\"\n", + "endDate <- \"2012-05-13\"\n", + "dischargeUnit <- readNWISuv(siteNumber, parameterCd, startDate, endDate)\n", "\"\"\"\n", - "siteNumber = \"01491000\"\n", + "siteNumber = \"USGS-01491000\"\n", "parameterCd = \"00060\" # Discharge\n", - "startDate = \"2012-05-12\"\n", - "endDate = \"2012-05-13\"\n", - "dischargeUnit, md = nwis.get_iv(\n", - " sites=siteNumber, parameterCd=parameterCd, start=startDate, end=endDate\n", + "\n", + "dischargeUnit, md = waterdata.get_continuous(\n", + " monitoring_location_id=siteNumber,\n", + " parameter_code=parameterCd,\n", + " time=\"2012-05-12/2012-05-13\",\n", ")" ] }, @@ -147,18 +145,17 @@ "parameterCd <- c(\"00618\",\"71851\")\n", "startDate <- \"1985-10-01\"\n", "endDate <- \"2012-09-30\"\n", - "dfLong <- read_USGS_samples(monitoringLocationIdentifier=sprintf(\"USGS-%s\", siteNumber), usgsPCode=parameterCd, \n", - " activityStartDateLower=startDate, activityStartDateUpper=endDate)\n", + "dfLong <- read_USGS_samples(monitoringLocationIdentifier=sprintf(\"USGS-%s\", siteNumber),\n", + " usgsPCode=parameterCd, activityStartDateLower=startDate, activityStartDateUpper=endDate)\n", "\"\"\"\n", - "siteNumber = \"01491000\"\n", + "siteNumber = \"USGS-01491000\"\n", "parameterCd = [\"00618\", \"71851\"]\n", - "startDate = \"1985-10-01\"\n", - "endDate = \"2012-09-30\"\n", + "\n", "dfLong, md = waterdata.get_samples(\n", - " monitoringLocationIdentifier=f\"USGS-{siteNumber}\",\n", + " monitoringLocationIdentifier=siteNumber,\n", " usgsPCode=parameterCd,\n", - " activityStartDateLower=startDate,\n", - " activityStartDateUpper=endDate,\n", + " activityStartDateLower=\"1985-10-01\",\n", + " activityStartDateUpper=\"2012-09-30\",\n", ")" ] }, @@ -172,8 +169,9 @@ "siteNumber <- '01594440'\n", "peakData <- readNWISpeak(siteNumber)\n", "\"\"\"\n", - "siteNumber = \"01594440\"\n", - "peakData, md = nwis.get_discharge_peaks(sites=siteNumber)" + "peakData, md = waterdata.get_peaks(\n", + " monitoring_location_id=\"USGS-01594440\", parameter_code=\"00060\"\n", + ")" ] }, { @@ -186,7 +184,11 @@ "ratingData <- readNWISrating(siteNumber, \"base\")\n", "attr(ratingData, \"RATING\")\n", "\"\"\"\n", - "ratings_data, md = nwis.get_ratings(site=\"01594440\", file_type=\"base\")" + "# get_ratings returns a dict keyed by \"..rdb\"\n", + "ratings_data = waterdata.get_ratings(\n", + " monitoring_location_id=\"USGS-01594440\", file_type=\"base\"\n", + ")\n", + "list(ratings_data.keys())" ] }, { @@ -198,10 +200,12 @@ "\"\"\"\n", "discharge_stats <- readNWISstat(siteNumbers=c(\"02319394\"),\n", " parameterCd=c(\"00060\"),\n", - " statReportType=\"annual\") \n", + " statReportType=\"annual\")\n", "\"\"\"\n", - "discharge_stats, md = nwis.get_stats(\n", - " sites=\"02319394\", parameterCd=\"00060\", statReportType=\"annual\", statTypeCd=\"all\"\n", + "discharge_stats, md = waterdata.get_stats_date_range(\n", + " monitoring_location_id=\"USGS-02319394\",\n", + " parameter_code=\"00060\",\n", + " computation_type=\"arithmetic_mean\",\n", ")" ] }, @@ -211,14 +215,14 @@ "metadata": {}, "outputs": [], "source": [ - "# '''\n", - "# dischargeWI <- readNWISdata(service=\"dv\",\n", - "# stateCd=\"WI\",\n", - "# parameterCd=\"00060\",\n", - "# drainAreaMin=\"50\",\n", - "# statCd=\"00003\")\n", - "# '''\n", - "# dischargeWI, md = nwis.get_dv(stateCd=\"WI\", parameterCd=\"00060\", drainAreaMin=\"50\", statCd=\"00003\")" + "# R: readNWISdata(service=\"dv\", stateCd=\"WI\", parameterCd=\"00060\",\n", + "# drainAreaMin=\"50\", statCd=\"00003\")\n", + "#\n", + "# The Water Data API serves daily values per monitoring location. To assemble a\n", + "# state-wide set, first find the locations (optionally filtering by drainage\n", + "# area) with waterdata.get_monitoring_locations(state_name=\"Wisconsin\", ...),\n", + "# then pass their ids to waterdata.get_daily(parameter_code=\"00060\",\n", + "# statistic_id=\"00003\")." ] }, { @@ -292,21 +296,21 @@ "source": [ "# Embedded Metadata\n", "\n", - "All service methods return the DataFrame containing requested data and Metadata as a tuple. Note, a call using get_record will only return the DataFrame to remain compatible with previous usage.\n", + "Most `waterdata` and `wqp` service methods return a tuple of the requested data (a pandas DataFrame) and a metadata object.\n", "\n", + "`md` is an object with the following attributes:\n", "\n", "```\n", - "national, md = nwis.get_water_use()\n", + "Metadata\n", + " url # the URL used to query the service\n", + " query_time # how long the query took\n", + " header # the response headers\n", "```\n", "\n", - "md is an object with the following attributes\n", + "Note: USGS *water use* data has no Water Data API equivalent yet, so it remains available only through the deprecated `nwis` module:\n", "\n", "```\n", - "Metadata\n", - " url # the resulting url to query usgs\n", - " query_time # the time it took to query usgs\n", - " site_info # a method to call site_info with the site parameters supplied\n", - " header # any headers attached to the response object\n", + "national, md = nwis.get_water_use()\n", "```" ] }, diff --git a/demos/WaterData_demo.ipynb b/demos/WaterData_demo.ipynb index f7d9e8d1..66eac64d 100644 --- a/demos/WaterData_demo.ipynb +++ b/demos/WaterData_demo.ipynb @@ -21,7 +21,7 @@ "## Prerequisite: Get your Water Data API key\n", "We highly suggest signing up for your own API key [here](https://api.waterdata.usgs.gov/signup/) to afford yourself higher rate limits and more reliable access to the data. If you opt not to register for an API key, then the number of requests you can make to the Water Data APIs is considerably lower, and if you share an IP address across users or workflows, you may hit those limits even faster. Luckily, registering for an API key is free and easy.\n", "\n", - "Once you've copied your API key and saved it in a safe place, you can set it as an environment variable in your python script for the current session:\n", + "Once you've copied your API key and saved it in a safe place, you can set it as an environment variable in your Python script for the current session:\n", "\n", "```python\n", "import os\n", @@ -64,7 +64,7 @@ "- `get_time_series_metadata()` - Timeseries metadata across monitoring locations, parameter codes, statistical codes, and more. Can be used to answer the question: what types of data are collected at my site(s) of interest and over what time period are/were they collected? \n", "- `get_latest_continuous()` - Latest instantaneous values for requested monitoring locations, parameter codes, statistical codes, and more.\n", "- `get_latest_daily()` - Latest daily values for requested monitoring locations, parameter codes, statistical codes, and more.\n", - "- `get_field_measurements()` - Physically measured values (a.k.a discrete) of gage height, discharge, groundwater levels, and more for requested monitoring locations.\n", + "- `get_field_measurements()` - Physically measured values (a.k.a. discrete) of gage height, discharge, groundwater levels, and more for requested monitoring locations.\n", "- `get_samples()` - Discrete water quality sample results for monitoring locations, observed properties, and more." ] }, @@ -74,8 +74,8 @@ "metadata": {}, "source": [ "### A few key tips\n", - "- You'll notice that each of the data functions have many unique inputs you can specify. **DO NOT** specify too many! Specify *just enough* inputs to return what you need. But do not provide redundant geographical or parameter information as this may slow down your query and lead to errors.\n", - "- Each function returns a Tuple, containing a dataframe and a Metadata class. If you have `geopandas` installed in your environment, the dataframe will be a `GeoDataFrame` with a geometry included. If you do not have `geopandas`, the dataframe will be a `pandas` dataframe with the geometry contained in a coordinates column. The Metadata object contains information about your query, like the query url.\n", + "- You'll notice that each of the data functions has many unique inputs you can specify. **DO NOT** specify too many! Specify *just enough* inputs to return what you need. But do not provide redundant geographical or parameter information as this may slow down your query and lead to errors.\n", + "- Each function returns a Tuple, containing a dataframe and a Metadata class. If you have `geopandas` installed in your environment, the dataframe will be a `GeoDataFrame` with a geometry included. If you do not have `geopandas`, the dataframe will be a `pandas` dataframe with the geometry contained in a coordinates column. The Metadata object contains information about your query, like the query URL.\n", "- If you do not want to return the `geometry` column, use the input `skip_geometry=True`.\n", "- All of these functions (except `get_samples()`) have a `limit` argument, which signifies the number of rows returned with each \"page\" of data. The Water Data APIs use paging to chunk up large responses and send data most efficiently to the requester. The `waterdata` functions collect the rows of data from each page and combine them into one final dataframe at the end. The default and maximum limit per page is 50,000 rows. In other words, if you request 100,000 rows of data from the database, it will return all the data in 2 pages, and each page counts as a \"request\" using your API key. If you were to change the argument to `limit=10000`, then each page returned would contain 10,000 rows, and it would take 10 requests/pages to return the total 100,000 rows. In general, there is no need to adjust the `limit` argument. However, if you are working with slow internet speeds, adjusting the `limit` argument may reduce chances of failures due to bandwidth.\n", "- You can find some other helpful tips in the [Water Data API documentation](https://api.waterdata.usgs.gov/docs/ogcapi/)." @@ -123,9 +123,7 @@ "cell_type": "markdown", "id": "406762ab", "metadata": {}, - "source": [ - "#### Reference tables" - ] + "source": "### Reference tables" }, { "cell_type": "code", @@ -243,7 +241,7 @@ "### Monitoring locations\n", "Now that we know which sites have recent discharge data, let's find stream sites and plot them on a map. We will use the `waterdata.get_monitoring_locations()` function to grab more metadata about these sites.\n", "\n", - "We can feed the unique monitoring location IDs from `NE_discharge` into the `get_monitoring_locations()` function to get the metadata for just those sites. However, there is a limit to the number of IDs that can be passed in one call to the API. Further down in this notebook, you'll see an example where we successfully feed all ~100 IDs in one call to the API. However, for demonstration purposes, we will split the list of monitoring location IDs into a few chunks of 50 sent to the API and stitch the resulting dataframes together. A loose rule of thumb is to keep the number of IDs below 200, but this exact number will depend on the typical length of each monitoring location ID (i.e. if your monitoring location IDs are > 13 characters long: \"USGS-XXXXXXXX\"+, you will need to feed in less than 200 at a time)." + "We can feed the unique monitoring location IDs from `NE_discharge` into the `get_monitoring_locations()` function to get the metadata for just those sites. However, there is a limit to the number of IDs that can be passed in one call to the API. Further down in this notebook, you'll see an example where we successfully feed all ~100 IDs in one call to the API. However, for demonstration purposes, we will split the list of monitoring location IDs into a few chunks of 50 sent to the API and stitch the resulting dataframes together. A loose rule of thumb is to keep the number of IDs below 200, but this exact number will depend on the typical length of each monitoring location ID (i.e. if your monitoring location IDs are > 13 characters long: \"USGS-XXXXXXXX\"+, you will need to feed in fewer than 200 at a time)." ] }, { diff --git a/demos/hydroshare/USGS_dataretrieval_NLDI_Examples.ipynb b/demos/hydroshare/USGS_NLDI_Examples.ipynb similarity index 100% rename from demos/hydroshare/USGS_dataretrieval_NLDI_Examples.ipynb rename to demos/hydroshare/USGS_NLDI_Examples.ipynb diff --git a/demos/hydroshare/USGS_dataretrieval_WaterUse_Examples.ipynb b/demos/hydroshare/USGS_NWIS_WaterUse_Examples.ipynb similarity index 81% rename from demos/hydroshare/USGS_dataretrieval_WaterUse_Examples.ipynb rename to demos/hydroshare/USGS_NWIS_WaterUse_Examples.ipynb index 4d6eb927..f2e3f94e 100644 --- a/demos/hydroshare/USGS_dataretrieval_WaterUse_Examples.ipynb +++ b/demos/hydroshare/USGS_NWIS_WaterUse_Examples.ipynb @@ -4,9 +4,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# USGS dataretrieval Python Package `get_water_use()` Examples\n", + "# USGS dataretrieval Python Package Water Use Examples\n", "\n", - "This notebook provides examples of using the Python dataretrieval package to retrieve water use data. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)." + "> **Note:** USGS water-use data has **no USGS Water Data API equivalent**. The legacy `nwis.get_water_use()` service has been decommissioned and now raises a \"defunct\" error, so the examples below are retained for historical reference only — they document the former interface and are not runnable. There is currently no `waterdata` replacement for water-use data.\n", + "\n", + "This notebook formerly retrieved water use data through the USGS National Water Information System (NWIS)." ] }, { @@ -42,9 +44,7 @@ "source": [ "from IPython.display import display\n", "\n", - "from dataretrieval import nwis\n", - "from dataretrieval import waterdata\n", - "import dataretrieval.waterdata as waterdata\n" + "from dataretrieval import nwis\n" ] }, { @@ -53,7 +53,7 @@ "source": [ "### Basic Usage\n", "\n", - "The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the `get_water_use()` function to retrieve water use data. The following arguments are supported:\n", + "The dataretrieval package has several functions that allow you to retrieve data from different web services. This example uses the `get_water_use()` function to retrieve water use data. The following arguments are supported:\n", "\n", "Arguments (Additional arguments, if supplied, will be used as query parameters)\n", "\n", @@ -88,7 +88,7 @@ "\n", "The result of calling the `get_water_use()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the water use data.\n", "\n", - "Once you've got the data frame, there's several useful things you can do to explore the data." + "Once you've got the data frame, there are several useful things you can do to explore the data." ] }, { @@ -129,7 +129,7 @@ "source": [ "#### Example 2: Retrieve data for an entire state for certain years\n", "\n", - "Returns data parsed by county - one row for each county for each year of interest rather than the entire state. Data are included for 5 year periods." + "Returns data parsed by county: one row for each county for each year of interest, rather than for the entire state. Data are included for 5-year periods." ] }, { diff --git a/demos/hydroshare/USGS_WaterData_DailyValues_Examples.ipynb b/demos/hydroshare/USGS_WaterData_DailyValues_Examples.ipynb new file mode 100644 index 00000000..1610d9fd --- /dev/null +++ b/demos/hydroshare/USGS_WaterData_DailyValues_Examples.ipynb @@ -0,0 +1,243 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# USGS dataretrieval Python Package `get_daily()` Examples\n", + "\n", + "This notebook provides examples of using the Python dataretrieval package to retrieve daily streamflow data for a United States Geological Survey (USGS) monitoring location. The dataretrieval package provides a collection of functions to get data from the USGS Water Data API and other online sources of hydrology and water quality data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Install the Package\n", + "\n", + "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install dataretrieval" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load the package so you can use it along with other packages used in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "from IPython.display import display\n\nimport dataretrieval.waterdata as waterdata" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Basic Usage\n", + "\n", + "The dataretrieval package has several functions that allow you to retrieve data from different web services. This example uses the `get_daily()` function to retrieve daily streamflow data for a USGS monitoring location from the USGS Water Data API. The following arguments are supported:\n", + "\n", + "Arguments (Additional arguments, if supplied, will be used as query parameters)\n", + "\n", + "* **monitoring_location_id** (string or iterable of strings): A unique identifier representing a single monitoring location, formed by combining the agency code with the site number (e.g. `USGS-10109000`). Accepts a single ID or a list of IDs.\n", + "* **parameter_code** (string or iterable of strings): One or more 5-digit USGS parameter codes identifying the constituent measured and its units of measure (e.g. `00060` for discharge).\n", + "* **statistic_id** (string or iterable of strings): One or more codes corresponding to the statistic an observation represents (e.g. `00001` for maximum, `00003` for mean).\n", + "* **time** (string): The date or interval an observation represents, following RFC 3339. May be a single date, a bounded or half-bounded interval (e.g. `2020-10-01/2021-09-30`), or an ISO 8601 duration (e.g. `P7D` for the past seven days).\n", + "* **skip_geometry** (boolean): If `True`, response geometries are omitted and the returned data frame contains no spatial information." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Example 1: Get daily value data for a specific parameter at a single USGS monitoring location between a begin and end date." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "# Set the parameters needed to retrieve data\nsiteNumber = \"USGS-10109000\" # LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT\nparameterCode = \"00060\" # Discharge\nstartDate = \"2020-10-01\"\nendDate = \"2021-09-30\"\n\n# Retrieve the data\ndailyStreamflow = waterdata.get_daily(\n monitoring_location_id=siteNumber, parameter_code=parameterCode, time=f\"{startDate}/{endDate}\"\n)\nprint(\"Retrieved \" + str(len(dailyStreamflow[0])) + \" data values.\")" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Interpreting the Result\n", + "\n", + "The `get_daily()` function returns a tuple containing a pandas data frame and an associated metadata object. The data frame contains the daily values for the observed variable and time period requested. It is a flat table with a default integer index; the dates associated with each observation are held in a `time` column rather than in the index.\n", + "\n", + "Once you've got the data frame, there are several useful things you can do to explore the data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Display the data frame as a table\n", + "display(dailyStreamflow[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Show the data types of the columns in the resulting data frame." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(dailyStreamflow[0].dtypes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Get summary statistics for the daily streamflow values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dailyStreamflow[0].describe()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Make a quick time series plot." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "ax = dailyStreamflow[0][[\"time\", \"value\"]].plot(x=\"time\", y=\"value\")\nax.set_xlabel(\"Date\")\nax.set_ylabel(\"Streamflow (cfs)\")" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The other part of the result returned from the `get_daily()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS Water Data API." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "print(\n \"The query URL used to retrieve the data from the Water Data API was: \" + dailyStreamflow[1].url\n)" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Additional Examples\n", + "\n", + "Example 2: Get daily mean and max discharge and temperature values for a monitoring location between a begin and end date.\n", + "\n", + "Parameter Code: 00010 = temperature, 00060 = discharge\n", + "See https://help.waterdata.usgs.gov/codes-and-parameters/parameters\n", + "\n", + "Statistic Code: 00001 = Maximum, 00003 = Mean\n", + "See https://help.waterdata.usgs.gov/stat_code\n", + "\n", + "NOTE: Temperature and discharge are not fully available for both statistics at this monitoring location." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "siteID = \"USGS-04085427\"\ndailyQAndT = waterdata.get_daily(\n monitoring_location_id=siteID,\n parameter_code=[\"00010\", \"00060\"],\n time=f\"{startDate}/{endDate}\",\n statistic_id=[\"00001\", \"00003\"],\n)\ndisplay(dailyQAndT[0])" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Example 3: Get daily mean and max discharge and temperature values for multiple monitoring locations between a begin and end date." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "dailyMultiSites = waterdata.get_daily(\n monitoring_location_id=[\"USGS-01491000\", \"USGS-01645000\"],\n parameter_code=[\"00010\", \"00060\"],\n time=\"2012-01-01/2012-06-30\",\n statistic_id=[\"00001\", \"00003\"],\n)\ndisplay(dailyMultiSites[0])" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Like all of the `waterdata` getters, `get_daily()` returns a flat data frame with a default integer index regardless of how many monitoring locations are requested. Each row carries its own `monitoring_location_id`, `parameter_code`, `statistic_id`, and `time`, so multi-location results can be filtered or pivoted as needed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "dailyMultiSites = waterdata.get_daily(\n monitoring_location_id=[\"USGS-01491000\", \"USGS-01645000\"],\n parameter_code=[\"00010\", \"00060\"],\n time=\"2012-01-01/2012-06-30\",\n statistic_id=[\"00001\", \"00003\"],\n \n)\ndisplay(dailyMultiSites[0])" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Example 4: Query a monitoring location that has no matching data for the requested period; returns an empty data frame." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "siteID = \"USGS-05212700\"\nnotActive = waterdata.get_daily(\n monitoring_location_id=siteID, parameter_code=\"00060\", time=\"2014-01-01/2014-01-07\"\n)\ndisplay(notActive[0])" + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} diff --git a/demos/hydroshare/USGS_WaterData_GroundwaterLevels_Examples.ipynb b/demos/hydroshare/USGS_WaterData_GroundwaterLevels_Examples.ipynb new file mode 100644 index 00000000..7c3966e3 --- /dev/null +++ b/demos/hydroshare/USGS_WaterData_GroundwaterLevels_Examples.ipynb @@ -0,0 +1,261 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# USGS dataretrieval Python Package `get_field_measurements()` Examples\n", + "\n", + "This notebook provides examples of using the Python dataretrieval package to retrieve groundwater level field measurements for a United States Geological Survey (USGS) monitoring location. The dataretrieval package provides a collection of functions to get data from the USGS Water Data API and other online sources of hydrology and water quality data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Install the Package\n", + "\n", + "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install dataretrieval" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load the package so you can use it along with other packages used in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "from IPython.display import display\n\nimport dataretrieval.waterdata as waterdata" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Basic Usage\n", + "\n", + "The dataretrieval package has several functions that allow you to retrieve data from different services of the USGS Water Data API. This example uses the `get_field_measurements()` function to retrieve groundwater level field measurements. Field measurements are physically measured values collected during a visit to a monitoring location and are commonly used to record groundwater levels. The following arguments are commonly used:\n", + "\n", + "* **monitoring_location_id** (string or list of strings): A unique identifier representing one or more monitoring locations. IDs combine the responsible agency code with the location number, separated by a hyphen (e.g. `USGS-434400121275801`).\n", + "* **parameter_code** (string or list of strings): One or more 5-digit codes identifying the constituent measured and its units of measure.\n", + "* **time** (string): The date an observation represents. Accepts a single RFC 3339 date-time, a bounded or half-bounded interval (e.g. `\"1980-01-01/2000-12-31\"`, `\"1980-01-01/..\"`), or an ISO 8601 duration (e.g. `\"P20Y\"`). Only observations whose time intersects this value are returned.\n", + "* **skip_geometry** (boolean): If `True`, response geometries are omitted and the returned data frame contains no spatial information." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Example 1: Get groundwater level field measurements for a single monitoring location." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "# Set the parameters needed to retrieve data\nsite_id = \"USGS-434400121275801\"\n\n# Retrieve the data\ndata = waterdata.get_field_measurements(monitoring_location_id=site_id)\nprint(\"Retrieved \" + str(len(data[0])) + \" data values.\")" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Interpreting the Result\n", + "\n", + "The `get_field_measurements()` function returns a tuple of two objects: a pandas data frame containing the requested data and an associated metadata object. The data frame is flat, using a default integer index, and the observation dates are held in its `time` column.\n", + "\n", + "Once you've got the data frame, there are several useful things you can do to explore the data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Display the data frame as a table" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "display(data[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Show the data types of the columns in the resulting data frame." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(data[0].dtypes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Get summary statistics for the measured groundwater level values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data[0][\"value\"].describe()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Make a quick time series plot." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "ax = data[0][[\"time\", \"value\"]].plot(x=\"time\", y=\"value\", style=\".\")\nax.set_xlabel(\"Date\")\nax.set_ylabel(\"Water Level (feet below land surface)\")" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The other part of the result returned from the `get_field_measurements()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS Water Data API." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "print(\"The query URL used to retrieve the data from the Water Data API was: \" + data[1].url)" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Additional Examples\n", + "\n", + "You can also request data for multiple monitoring locations at the same time.\n", + "\n", + "Example 2: Get data for multiple monitoring locations. The monitoring location ids are passed as a list of strings." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "site_ids = [\"USGS-434400121275801\", \"USGS-375907091432201\"]\ndata2 = waterdata.get_field_measurements(monitoring_location_id=site_ids)\nprint(\"Retrieved \" + str(len(data2[0])) + \" data values.\")\ndisplay(data2[0])" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following example requests the same data as the previous example, again passing the monitoring location ids as a list of strings." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "site_ids = [\"USGS-434400121275801\", \"USGS-375907091432201\"]\ndata2 = waterdata.get_field_measurements(monitoring_location_id=site_ids, )\nprint(\"Retrieved \" + str(len(data2[0])) + \" data values.\")\ndisplay(data2[0])" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Some groundwater level data have dates that include only a year or a month and year, but no day.\n", + "\n", + "Example 3: Retrieve groundwater level data that have dates without a day." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "data3 = waterdata.get_field_measurements(monitoring_location_id=\"USGS-425957088141001\")\nprint(\"Retrieved \" + str(len(data3[0])) + \" data values.\")\n\n# Print the date/time index values, which show up as NaT because\n# the dates can't be converted to a date/time data type\nprint(data3[0].index)" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you want to inspect the request that was sent, you can get the URL for the query that was issued to the USGS Water Data API." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "# Print the URL used to retrieve the data\nprint(\"You can examine the data retrieved from the Water Data API at: \" + data3[1].url)" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also retrieve data for a monitoring location within a specified time window by giving a start and end date.\n", + "\n", + "Example 4: Get groundwater level data for a monitoring location between a start and end date." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data4 = waterdata.get_field_measurements(monitoring_location_id=site_id, time=\"1980-01-01/2000-12-31\")\n", + "print(\"Retrieved \" + str(len(data4[0])) + \" data values.\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/demos/hydroshare/USGS_WaterData_Measurements_Examples.ipynb b/demos/hydroshare/USGS_WaterData_Measurements_Examples.ipynb new file mode 100644 index 00000000..a8b05ded --- /dev/null +++ b/demos/hydroshare/USGS_WaterData_Measurements_Examples.ipynb @@ -0,0 +1,182 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# USGS dataretrieval Python Package `get_field_measurements()` Examples\n", + "\n", + "This notebook provides examples of using the Python dataretrieval package to retrieve surface water field measurement data for a United States Geological Survey (USGS) monitoring location. The dataretrieval package provides a collection of functions to get data from the USGS Water Data API and other online sources of hydrology and water quality data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Install the Package\n", + "\n", + "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install dataretrieval" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load the package so you can use it along with other packages used in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "from IPython.display import display\n\nimport dataretrieval.waterdata as waterdata" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Basic Usage\n", + "\n", + "The dataretrieval package has several functions that allow you to retrieve data from the USGS Water Data API. This example uses the `get_field_measurements()` function to retrieve surface water field measurements for a USGS monitoring location. Field measurements are physically measured values (such as gage height and discharge) collected during a visit to a monitoring location, and are primarily used as calibration readings for the automated sensors collecting continuous data. The function accepts the following arguments (all optional):\n", + "\n", + "* **monitoring_location_id** (string or list of strings): A unique identifier representing a single monitoring location, formed by combining the responsible agency code with the location ID number, separated by a hyphen (e.g. `USGS-10109000`). A list may be supplied to query multiple locations.\n", + "* **parameter_code** (string or list of strings): One or more 5-digit parameter codes identifying the constituent measured and its units of measure.\n", + "* **time** (string): The date an observation represents. Accepts an RFC 3339 date-time, a bounded or half-bounded interval (e.g. `2019-01-01/2019-12-31` or `2019-01-01/..`), or an ISO 8601 duration (e.g. `P20Y` for the past 20 years).\n", + "* **skip_geometry** (boolean): If `True`, response geometries are omitted and a plain (non-spatial) data frame is returned.\n", + "\n", + "Additional query parameters such as `approval_status`, `qualifier`, `bbox`, and `limit` are also supported; see the `get_field_measurements()` docstring for the full list." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Example 1: Get all of the field measurements for a single monitoring location" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "measurements1 = waterdata.get_field_measurements(monitoring_location_id=\"USGS-10109000\")\nprint(\"Retrieved \" + str(len(measurements1[0])) + \" data values.\")" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Interpreting the Result\n", + "\n", + "The `get_field_measurements()` function returns a tuple of two objects: a pandas data frame and an associated metadata object. The data frame is flat, uses a default integer index, and contains a `time` column holding the date of each measurement (along with columns such as `monitoring_location_id`, `parameter_code`, `value`, and `unit_of_measure`).\n", + "\n", + "Once you've got the data frame, there are several useful things you can do to explore the data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Display the data frame as a table" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "display(measurements1[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Show the data types of the columns in the resulting data frame." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(measurements1[0].dtypes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The other part of the result returned from the `get_field_measurements()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS Water Data API." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "print(\"The query URL used to retrieve the data from the Water Data API was: \" + measurements1[1].url)" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Additional Examples\n", + "\n", + "Example 2: Get all of the field measurements between a start and end date" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "measurements2 = waterdata.get_field_measurements(\n monitoring_location_id=\"USGS-10109000\", time=\"2019-01-01/2019-12-31\"\n)\nprint(\"Retrieved \" + str(len(measurements2[0])) + \" data values.\")\ndisplay(measurements2[0])" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Example 3: Get all of the field measurements for multiple monitoring locations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "measurements3 = waterdata.get_field_measurements(monitoring_location_id=[\"USGS-01594440\", \"USGS-040851325\"])\nprint(\"Retrieved \" + str(len(measurements3[0])) + \" data values.\")\ndisplay(measurements3[0])" + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/demos/hydroshare/USGS_dataretrieval_ParameterCodes_Examples.ipynb b/demos/hydroshare/USGS_WaterData_ParameterCodes_Examples.ipynb similarity index 56% rename from demos/hydroshare/USGS_dataretrieval_ParameterCodes_Examples.ipynb rename to demos/hydroshare/USGS_WaterData_ParameterCodes_Examples.ipynb index 7e0d7e6a..6605fe76 100644 --- a/demos/hydroshare/USGS_dataretrieval_ParameterCodes_Examples.ipynb +++ b/demos/hydroshare/USGS_WaterData_ParameterCodes_Examples.ipynb @@ -4,11 +4,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# USGS dataretrieval Python Package `get_pmcodes()` Examples\n", + "# USGS dataretrieval Python Package `get_reference_table()` Examples\n", "\n", - "This notebook provides examples of using the Python dataretrieval package to retrieve information about USGS parameter codes from NWIS. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA).\n", + "This notebook provides examples of using the Python dataretrieval package to retrieve information about USGS parameter codes from the USGS Water Data API. The dataretrieval package provides a collection of functions to get data from the USGS Water Data API and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA).\n", "\n", - "For more information about USGS NWIS parameter codes, see:\n", + "For more information about USGS parameter codes, see:\n", "https://help.waterdata.usgs.gov/codes-and-parameters/parameters" ] }, @@ -42,12 +42,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "from IPython.display import display\n", - "\n", - "from dataretrieval import nwis\n", - "import dataretrieval.waterdata as waterdata" - ] + "source": "from IPython.display import display\n\nimport dataretrieval.waterdata as waterdata" }, { "cell_type": "markdown", @@ -55,18 +50,18 @@ "source": [ "### Basic Usage\n", "\n", - "The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the `get_pmcodes()` function to retrieve information about parameter codes (i.e., observed variables) from NWIS. The following arguments are supported:\n", - "\n", - "Arguments (Additional arguments, if supplied, will be used as query parameters)\n", + "The dataretrieval package has several functions that allow you to retrieve data from the USGS Water Data API. This example uses the `get_reference_table()` function to retrieve information about parameter codes (i.e., observed variables). The function returns metadata reference tables, which list the range of allowable values for the parameter arguments in the `waterdata` module. The following arguments are supported:\n", "\n", - "* **parameterCd** (string): A string containing the parameter code for which information is to be retrieved." + "* **collection** (string): The name of the reference table to retrieve. To retrieve parameter codes, use `\"parameter-codes\"`. Other options include `\"agency-codes\"`, `\"site-types\"`, `\"states\"`, `\"counties\"`, `\"statistic-codes\"`, and several more.\n", + "* **limit** (numeric, optional): Controls the subset of selected features returned in each page. The maximum allowable limit is 50000. The default (`None`) uses the maximum allowable limit for the service.\n", + "* **query** (dictionary, optional): A dictionary of query parameters passed to the collection API call. For example, `{\"id\": \"00400\"}` selects a specific parameter code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Example 1: Retrieve information for a set of USGS NWIS parameter codes." + "Example 1: Retrieve information for a USGS parameter code." ] }, { @@ -87,7 +82,7 @@ "source": [ "### Interpreting the Result\n", "\n", - "The result of calling the `get_pmcodes()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the parameter code information requested.\n", + "Calling `get_reference_table()` returns a tuple containing a pandas data frame and an associated metadata object. The data frame holds the requested parameter code information, while the metadata object includes the request URL and the query time.\n", "\n", "Once you've got the data frame, you can explore the data." ] diff --git a/demos/hydroshare/USGS_WaterData_Peaks_Examples.ipynb b/demos/hydroshare/USGS_WaterData_Peaks_Examples.ipynb new file mode 100644 index 00000000..8560b939 --- /dev/null +++ b/demos/hydroshare/USGS_WaterData_Peaks_Examples.ipynb @@ -0,0 +1,214 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# USGS dataretrieval Python Package Peak Streamflow Examples\n", + "\n", + "This notebook provides examples of using the Python dataretrieval package to retrieve annual peak streamflow data for United States Geological Survey (USGS) monitoring locations using the **USGS Water Data API** via the `waterdata` module. The `waterdata` module is the recommended way to access USGS water data and replaces the deprecated `nwis` module." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Install the Package\n", + "\n", + "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install dataretrieval" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load the package so you can use it along with other packages used in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import display\n", + "\n", + "from dataretrieval import waterdata\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Basic Usage\n", + "\n", + "This example uses the `get_peaks()` function to retrieve annual peak data for a USGS monitoring location. Commonly used arguments include:\n", + "\n", + "* **monitoring_location_id** (string or list of strings): USGS monitoring location id(s), formed as the agency code and site number joined by a hyphen (e.g. `\"USGS-01594440\"`).\n", + "* **parameter_code** (string or list of strings): 5-digit USGS parameter code(s). Peak records include both peak discharge (`00060`) and the corresponding gage height (`00065`); pass `parameter_code=\"00060\"` to retrieve peak discharge only.\n", + "* **time** (string): an ISO-8601 date or interval (e.g. `\"1953-01-01/1960-01-01\"`) restricting the period retrieved." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 1: Retrieve peak discharge for two USGS monitoring locations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "site_ids = [\"USGS-01594440\", \"USGS-040851325\"]\n", + "peak_data = waterdata.get_peaks(\n", + " monitoring_location_id=site_ids, parameter_code=\"00060\"\n", + ")\n", + "print(\"Retrieved \" + str(len(peak_data[0])) + \" peak values.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Interpreting the Result\n", + "\n", + "Each `waterdata` function returns a tuple of a pandas data frame and a metadata object. The data frame contains one row per annual peak, including the peak `value`, its `time`, and the `water_year`.\n", + "\n", + "Once you've got the data frame, there are several useful things you can do to explore the data.\n", + "\n", + "Display the data frame as a table." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "display(peak_data[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Show the data types of the columns in the resulting data frame." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(peak_data[0].dtypes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The other part of the result is a metadata object describing the query that was executed. For example, you can access the URL that was assembled to retrieve the requested data from the USGS Water Data API." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"The query URL used to retrieve the data was: \" + peak_data[1].url)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Peak records cover multiple parameters. By default `get_peaks()` returns both peak discharge (`00060`) and the gage height at the peak (`00065`); inspect the `parameter_code` column to see which are present." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "all_params = waterdata.get_peaks(monitoring_location_id=\"USGS-01594440\")\n", + "print(all_params[0][\"parameter_code\"].unique())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Additional Examples\n", + "\n", + "#### Example 2: Retrieve peak discharge for a single monitoring location" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "station = \"USGS-06011000\"\n", + "data3 = waterdata.get_peaks(monitoring_location_id=station, parameter_code=\"00060\")\n", + "display(data3[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 3: Retrieve peak discharge for a monitoring location between two dates" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data4 = waterdata.get_peaks(\n", + " monitoring_location_id=station,\n", + " parameter_code=\"00060\",\n", + " time=\"1953-01-01/1960-01-01\",\n", + ")\n", + "display(data4[0])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/demos/hydroshare/USGS_WaterData_Ratings_Examples.ipynb b/demos/hydroshare/USGS_WaterData_Ratings_Examples.ipynb new file mode 100644 index 00000000..2696e2f0 --- /dev/null +++ b/demos/hydroshare/USGS_WaterData_Ratings_Examples.ipynb @@ -0,0 +1,183 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# USGS dataretrieval Python Package Rating Curve Examples\n", + "\n", + "This notebook provides examples of using the Python dataretrieval package to retrieve stage–discharge rating curve data for United States Geological Survey (USGS) streamgages using the **USGS Water Data API** via the `waterdata` module. The `waterdata` module is the recommended way to access USGS water data and replaces the deprecated `nwis` module.\n", + "\n", + "Note: not all active USGS streamflow gages have traditional rating curves relating stage to discharge." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Install the Package\n", + "\n", + "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install dataretrieval" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load the package so you can use it along with other packages used in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import display\n", + "\n", + "from dataretrieval import waterdata\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Basic Usage\n", + "\n", + "This example uses the `get_ratings()` function to retrieve rating curve data for a monitoring location from the USGS Water Data STAC catalog. Commonly used arguments include:\n", + "\n", + "* **monitoring_location_id** (string or list of strings): USGS monitoring location id(s) in `AGENCY-ID` form (e.g. `\"USGS-10109000\"`).\n", + "* **file_type** (string or list): which rating file(s) to request — `\"exsa\"` (expanded, shift-adjusted; the default), `\"base\"`, or `\"corr\"`.\n", + "\n", + "Unlike most `waterdata` functions, `get_ratings()` returns a dictionary mapping each feature id (e.g. `\"USGS-10109000.exsa.rdb\"`) to a parsed pandas data frame." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 1: Get the rating curve for a monitoring location" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Specify the USGS monitoring location id\n", + "site_id = \"USGS-10109000\"\n", + "\n", + "# Get the (expanded, shift-adjusted) rating curve\n", + "ratings = waterdata.get_ratings(monitoring_location_id=site_id, file_type=\"exsa\")\n", + "rating = ratings[f\"{site_id}.exsa.rdb\"]\n", + "print(\"Retrieved \" + str(len(rating)) + \" rating points.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Interpreting the Result\n", + "\n", + "`get_ratings()` returns a dictionary keyed by feature id; each value is a pandas data frame holding one rating table. For the `\"exsa\"` file type the columns are:\n", + "\n", + "* **INDEP** — typically the gage height, in feet\n", + "* **SHIFT** — the current shift in the rating for that value of INDEP\n", + "* **DEP** — typically the discharge, in cubic feet per second\n", + "* **STOR** — an `*` indicates the pair is a fixed point of the rating curve\n", + "\n", + "The `\"base\"` and `\"corr\"` file types provide alternative representations of the rating. You can display the data frame as a table." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "display(rating)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Show the data types of the columns in the resulting data frame." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(rating.dtypes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each rating data frame carries provenance in its `attrs`: `attrs[\"url\"]` records the catalog asset it was fetched from, and `attrs[\"comment\"]` holds the RDB header lines (rating id, parameter, last-shifted timestamp, etc.)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"The rating was fetched from: \" + rating.attrs[\"url\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 2: Get the rating curve for a different monitoring location" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "site_id = \"USGS-01594440\"\n", + "ratings = waterdata.get_ratings(monitoring_location_id=site_id, file_type=\"exsa\")\n", + "rating = ratings[f\"{site_id}.exsa.rdb\"]\n", + "print(\"Retrieved \" + str(len(rating)) + \" rating points.\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/demos/hydroshare/USGS_WaterData_Samples_Examples.ipynb b/demos/hydroshare/USGS_WaterData_Samples_Examples.ipynb new file mode 100644 index 00000000..450921a9 --- /dev/null +++ b/demos/hydroshare/USGS_WaterData_Samples_Examples.ipynb @@ -0,0 +1,220 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# USGS dataretrieval Python Package `get_samples()` Examples\n", + "\n", + "This notebook provides examples of using the Python dataretrieval package to retrieve water quality sample data for United States Geological Survey (USGS) monitoring locations. The dataretrieval package provides a collection of functions to get data from the USGS Samples database and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Install the Package\n", + "\n", + "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install dataretrieval" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load the package so you can use it along with other packages used in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import display\n", + "\n", + "from dataretrieval import waterdata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": "### Basic Usage\n\nThe dataretrieval package has several functions that allow you to retrieve data from different web services. This example uses the `get_samples()` function to retrieve water quality sample data for USGS monitoring locations from Samples. The following arguments are supported:\n\n* **ssl_check** : boolean, optional\n Check the SSL certificate.\n* **service** : string\n One of the available Samples services: \"results\", \"locations\", \"activities\",\n \"projects\", or \"organizations\". Defaults to \"results\".\n* **profile** : string\n One of the available profiles associated with a service. Options for each\n service are:\n results - \"fullphyschem\", \"basicphyschem\",\n \"fullbio\", \"basicbio\", \"narrow\",\n \"resultdetectionquantitationlimit\",\n \"labsampleprep\", \"count\"\n locations - \"site\", \"count\"\n activities - \"sampact\", \"actmetric\",\n \"actgroup\", \"count\"\n projects - \"project\", \"projectmonitoringlocationweight\"\n organizations - \"organization\", \"count\"\n* **activityMediaName** : string or list of strings, optional\n Name or code indicating environmental medium in which sample was taken.\n Check the `activityMediaName_lookup()` function in this module for all\n possible inputs.\n Example: \"Water\".\n* **activityStartDateLower** : string, optional\n The start date if using a date range. Takes the format YYYY-MM-DD.\n The logic is inclusive, i.e. it will also return results that\n match the date. If left as None, will pull all data on or before\n activityStartDateUpper, if populated.\n* **activityStartDateUpper** : string, optional\n The end date if using a date range. Takes the format YYYY-MM-DD.\n The logic is inclusive, i.e. it will also return results that\n match the date. If left as None, will pull all data after\n activityStartDateLower up to the most recent available results.\n* **activityTypeCode** : string or list of strings, optional\n Text code that describes type of field activity performed.\n Example: \"Sample-Routine, regular\".\n* **characteristicGroup** : string or list of strings, optional\n Characteristic group is a broad category of characteristics\n describing one or more results. Check the `characteristicGroup_lookup()`\n function in this module for all possible inputs.\n Example: \"Organics, PFAS\"\n* **characteristic** : string or list of strings, optional\n Characteristic is a specific category describing one or more results.\n Check the `characteristic_lookup()` function in this module for all\n possible inputs.\n Example: \"Suspended Sediment Discharge\"\n* **characteristicUserSupplied** : string or list of strings, optional\n A user supplied characteristic name describing one or more results.\n* **boundingBox**: list of four floats, optional\n Filters on the the associated monitoring location's point location\n by checking if it is located within the specified geographic area. \n The logic is inclusive, i.e. it will include locations that overlap\n with the edge of the bounding box. Values are separated by commas,\n expressed in decimal degrees, NAD83, and longitudes west of Greenwich\n are negative.\n The format is a string consisting of:\n - Western-most longitude\n - Southern-most latitude\n - Eastern-most longitude\n - Northern-most longitude \n Example: [-92.8,44.2,-88.9,46.0]\n* **countryFips** : string or list of strings, optional\n Example: \"US\" (United States)\n* **stateFips** : string or list of strings, optional\n Check the `stateFips_lookup()` function in this module for all\n possible inputs.\n Example: \"US:15\" (United States: Hawaii)\n* **countyFips** : string or list of strings, optional\n Check the `countyFips_lookup()` function in this module for all\n possible inputs.\n Example: \"US:15:001\" (United States: Hawaii, Hawaii County)\n* **siteTypeCode** : string or list of strings, optional\n An abbreviation for a certain site type. Check the `siteType_lookup()`\n function in this module for all possible inputs.\n Example: \"GW\" (Groundwater site)\n* **siteTypeName** : string or list of strings, optional\n A full name for a certain site type. Check the `siteType_lookup()`\n function in this module for all possible inputs.\n Example: \"Well\"\n* **usgsPCode** : string or list of strings, optional\n 5-digit number used in the US Geological Survey computerized\n data system, National Water Information System (NWIS), to\n uniquely identify a specific constituent. Check the \n `characteristic_lookup()` function in this module for all possible\n inputs.\n Example: \"00060\" (Discharge, cubic feet per second)\n* **hydrologicUnit** : string or list of strings, optional\n Max 12-digit number used to describe a hydrologic unit.\n Example: \"070900020502\"\n* **monitoringLocationIdentifier** : string or list of strings, optional\n A monitoring location identifier has two parts: the agency code\n and the location number, separated by a dash (-).\n Example: \"USGS-040851385\"\n* **organizationIdentifier** : string or list of strings, optional\n Designator used to uniquely identify a specific organization.\n Currently only accepting the organization \"USGS\".\n* **pointLocationLatitude** : float, optional\n Latitude for a point/radius query (decimal degrees). Must be used\n with pointLocationLongitude and pointLocationWithinMiles.\n* **pointLocationLongitude** : float, optional\n Longitude for a point/radius query (decimal degrees). Must be used\n with pointLocationLatitude and pointLocationWithinMiles.\n* **pointLocationWithinMiles** : float, optional\n Radius for a point/radius query. Must be used with\n pointLocationLatitude and pointLocationLongitude\n* **projectIdentifier** : string or list of strings, optional\n Designator used to uniquely identify a data collection project. Project\n identifiers are specific to an organization (e.g. USGS).\n Example: \"ZH003QW03\"\n* **recordIdentifierUserSupplied** : string or list of strings, optional\n Internal AQS record identifier that returns 1 entry. Only available\n for the \"results\" service." + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 1: Get all water quality sample data for a single monitoring site" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "siteID = \"USGS-10109000\"\n", + "wq_data = waterdata.get_samples(monitoringLocationIdentifier=siteID)\n", + "print(\"Retrieved data for \" + str(len(wq_data[0])) + \" samples.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Interpreting the Result\n", + "\n", + "The result of calling the `get_samples()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the water quality sample data for the requested monitoring location, observed variables, and time frame.\n", + "\n", + "Once you've got the data frame, there are several useful things you can do to explore the data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Display the data frame as a table. The default data frame for this function is a long, flat table, with a row for each observed variable at a given monitoring location and date/time." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "display(wq_data[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Show the data types of the columns in the resulting data frame." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(wq_data[0].dtypes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": "The other part of the result returned from the `get_samples()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS Water Data API." + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\n", + " \"The query URL used to retrieve the data from USGS Samples was: \" + wq_data[1].url\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Additional Examples\n", + "\n", + "#### Example 2: Get water quality sample data for multiple sites for a single parameter" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "site_ids = [\"USGS-04024430\", \"USGS-04024000\"]\n", + "parameter_code = \"00065\"\n", + "wq_multi_site = waterdata.get_samples(\n", + " monitoringLocationIdentifier=site_ids, usgsPCode=parameter_code\n", + ")\n", + "print(\"Retrieved data for \" + str(len(wq_multi_site[0])) + \" samples.\")\n", + "display(wq_multi_site[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 3: Retrieve water quality sample data for multiple sites, including a list of parameters, within a time period defined by start date until present" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "site_ids = [\"USGS-04024430\", \"USGS-04024000\"]\nparameterCd = [\"34247\", \"30234\", \"32104\", \"34220\"]\nstartDate = \"2012-01-01\"\nwq_data2 = waterdata.get_samples(\n monitoringLocationIdentifier=site_ids,\n usgsPCode=parameterCd,\n activityStartDateLower=startDate,\n)\nprint(\"Retrieved data for \" + str(len(wq_data2[0])) + \" samples.\")\ndisplay(wq_data2[0])" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 4: Retrieve water quality sample data for one site and convert to a wide format\n", + "\n", + "Note that the USGS Samples database returns multiple parameters in a \"long\" format: each row in the resulting table represents a single observation of a single parameter. Furthermore, every observation has 181 fields of metadata. However, if you wanted to place your water quality data into a \"wide\" format, where each column represents a water quality parameter code, the code below details one solution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "siteID = \"USGS-10109000\"\n", + "wq_data, _ = waterdata.get_samples(monitoringLocationIdentifier=siteID)\n", + "print(\"Retrieved data for \" + str(len(wq_data)) + \" sample results.\")\n", + "\n", + "wq_data[\"characteristic_unit\"] = (\n", + " wq_data[\"Result_Characteristic\"] + \", \" + wq_data[\"Result_MeasureUnit\"]\n", + ")\n", + "wq_data_wide = wq_data.pivot_table(\n", + " index=[\"Location_Identifier\", \"Activity_StartDate\", \"Activity_StartTime\"],\n", + " columns=\"characteristic_unit\",\n", + " values=\"Result_Measure\",\n", + " aggfunc=\"first\",\n", + ")\n", + "display(wq_data_wide)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "hyswap-dev-environment", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/demos/hydroshare/USGS_WaterData_SiteInfo_Examples.ipynb b/demos/hydroshare/USGS_WaterData_SiteInfo_Examples.ipynb new file mode 100644 index 00000000..d7cf96a2 --- /dev/null +++ b/demos/hydroshare/USGS_WaterData_SiteInfo_Examples.ipynb @@ -0,0 +1,203 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# USGS dataretrieval Python Package `get_monitoring_locations()` Examples\n", + "\n", + "This notebook provides examples of using the Python dataretrieval package to retrieve information about a United States Geological Survey (USGS) monitoring location. The dataretrieval package provides a collection of functions to get data from the USGS Water Data API and other online sources of hydrology and water quality data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Install the Package\n", + "\n", + "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install dataretrieval" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load the package so you can use it along with other packages used in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "from IPython.display import display\n\nimport dataretrieval.waterdata as waterdata" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Basic Usage\n", + "\n", + "The dataretrieval package has several functions that allow you to retrieve data from the USGS Water Data API. This example uses the `get_monitoring_locations()` function to retrieve information about USGS monitoring locations. The function accepts many optional query arguments; supplying one or more of them filters the locations that are returned. The arguments used in the examples below are:\n", + "\n", + "* **monitoring_location_id** (string or iterable of strings): A unique identifier for a single monitoring location. A monitoring location id joins the code of the agency responsible for the location (e.g. `USGS`) and the location's id number (e.g. `10109000`) with a hyphen (e.g. `USGS-10109000`).\n", + "* **state_code** (string or iterable of strings): State code. A two-digit ANSI code (formerly FIPS code) as defined by the American National Standards Institute, used to define states and equivalents.\n", + "* **state_name** (string or iterable of strings): The name of the state or state equivalent in which the monitoring location is located.\n", + "* **hydrologic_unit_code** (string or iterable of strings): A hydrologic unit code (HUC) of two to eight digits identifying a hydrologic unit (region, sub-region, accounting unit, or cataloging unit). For example, `hydrologic_unit_code=\"16010203\"` selects locations within that cataloging unit.\n", + "* **site_type_code** (string or iterable of strings): A code describing the hydrologic setting of the monitoring location, such as stream, spring, or well. For example, `site_type_code=\"ST\"` returns streams only.\n", + "* **skip_geometry** (boolean): If `True`, response geometries are skipped and the returned object is a plain `pandas.DataFrame` with no spatial information. Otherwise the result is a `geopandas.GeoDataFrame` that includes a `geometry` column.\n", + "\n", + "Many additional filter arguments are available, including `monitoring_location_name`, `county_code`, `county_name`, `site_type`, `aquifer_code`, `bbox`, and `properties`. For the complete list of arguments and their descriptions, see the `get_monitoring_locations()` docstring and the USGS Water Data API documentation." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Example 1: Get information for a USGS monitoring location" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "# Specify the site you want to retrieve information for\nsiteID = \"USGS-10109000\"\n\n# Get the site information\nsiteINFO = waterdata.get_monitoring_locations(monitoring_location_id=siteID)" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Interpreting the Result\n", + "\n", + "The `get_monitoring_locations()` function returns a tuple of two items: a pandas data frame (`siteINFO[0]`) and an associated metadata object (`siteINFO[1]`). The data frame contains one row per monitoring location, with a column for each location attribute. Unless `skip_geometry=True` is passed, the data frame is a `geopandas.GeoDataFrame` and includes a `geometry` column holding the location's coordinates.\n", + "\n", + "Once you have the data frame, there are several useful things you can do to explore the information about the location." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Display the data frame as a table\n", + "display(siteINFO[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Show the data types of the columns in the resulting data frame." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(siteINFO[0].dtypes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The other part of the result returned from the `get_monitoring_locations()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS Water Data API." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "print(\"The query URL used to retrieve the data from the Water Data API was: \" + siteINFO[1].url)" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Additional Examples\n", + "\n", + "#### Example 2: Get information for multiple monitoring locations in a list" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "# Create a list of the site identifiers you want to retrieve information for\nsiteIDs = [\"USGS-05114000\", \"USGS-09423350\"]\n\n# Get the site information\nsiteINFO_multi = waterdata.get_monitoring_locations(monitoring_location_id=siteIDs)\ndisplay(siteINFO_multi[0])" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 3: Get information for all monitoring locations within a state" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get the site information for a state\n", + "siteINFO_state = waterdata.get_monitoring_locations(state_code=\"UT\")\n", + "display(siteINFO_state[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 4: Get all \"stream\" monitoring locations within a USGS HUC" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create a list of hucs for which to query sites\n", + "huc_list = [\"16010203\"]\n", + "\n", + "# Get the site information - limit to stream sites\n", + "siteINFO_huc = waterdata.get_monitoring_locations(hydrologic_unit_code=huc_list, site_type_code=\"ST\")\n", + "display(siteINFO_huc[0])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/demos/hydroshare/USGS_WaterData_SiteInventory_Examples.ipynb b/demos/hydroshare/USGS_WaterData_SiteInventory_Examples.ipynb new file mode 100644 index 00000000..3a11a814 --- /dev/null +++ b/demos/hydroshare/USGS_WaterData_SiteInventory_Examples.ipynb @@ -0,0 +1,201 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# USGS dataretrieval Python Package `get_monitoring_locations()` Examples\n", + "\n", + "This notebook provides examples of using the Python dataretrieval package to search for monitoring locations within a region or with specific characteristics. The dataretrieval package provides a collection of functions to get data from the USGS Water Data API and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Install the Package\n", + "\n", + "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install dataretrieval" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load the package so you can use it along with other packages used in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "from IPython.display import display\n\nimport dataretrieval.waterdata as waterdata" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Basic Usage\n", + "\n", + "The dataretrieval package has several functions that allow you to retrieve data from the USGS Water Data API. This example uses the `get_monitoring_locations()` function to search for monitoring locations within a region or with specific characteristics. The function has many optional arguments that act as filters on the query; supply the ones that match the locations you want to retrieve.\n", + "\n", + "#### Commonly Used Arguments\n", + "\n", + "* **monitoring_location_id** (string or iterable of strings): A unique identifier for a single monitoring location, formed by joining the agency code (e.g. `USGS`) and the location's ID number with a hyphen (e.g. `USGS-05114000`).\n", + "* **state_name** (string or iterable of strings): The full name of the state or state equivalent in which the monitoring location is located (e.g. `\"Ohio\"`, `\"Utah\"`). Note that this is the spelled-out state name, not a two-letter postal abbreviation.\n", + "* **county_name** (string or iterable of strings): The name of the county (or county equivalent) in which the monitoring location is located.\n", + "* **site_type_code** (string or iterable of strings): A code describing the hydrologic setting of the monitoring location (e.g. stream, spring, or well).\n", + "* **site_type** (string or iterable of strings): A text description of the hydrologic setting of the monitoring location.\n", + "* **hydrologic_unit_code** (string or iterable of strings): A two- to eight-digit hydrologic unit code (HUC) identifying the region, sub-region, accounting unit, or cataloging unit of interest.\n", + "* **bbox** (list of numbers): A bounding box given as `[xmin, ymin, xmax, ymax]` (western-most longitude, southern-most latitude, eastern-most longitude, northern-most latitude). Only locations whose geometry intersects the box are returned.\n", + "\n", + "#### Formatting Arguments\n", + "\n", + "* **properties** (string or iterable of strings): The subset of columns to return from the query (for example `id`, `state_name`, `site_type_code`). When omitted, all available columns are returned.\n", + "* **skip_geometry** (boolean): When `True`, the response geometries are skipped and the result is a plain `pandas.DataFrame` with no spatial information. When `False` (the default), the result carries a `geometry` column.\n", + "* **limit** (numeric): Controls how many of the selected features are returned per page (maximum 50000).\n", + "\n", + "For the full list of arguments see the `get_monitoring_locations()` docstring and the USGS Water Data API documentation." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 1: Retrieve all monitoring locations in Ohio" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "siteListPhos = waterdata.get_monitoring_locations(state_name=\"Ohio\")" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Interpreting the Result\n", + "\n", + "Calling `get_monitoring_locations()` returns a tuple of two objects: a pandas data frame of the requested monitoring location records and an associated metadata object. The data frame has one row per monitoring location and one column per requested property; unless `skip_geometry=True` was passed, it also carries a `geometry` column describing each location's position.\n", + "\n", + "Once you have the data frame, there are several useful things you can do to explore the data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Display the data frame as a table\n", + "display(siteListPhos[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The other part of the result returned from the `get_monitoring_locations()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS Water Data API." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "print(\"The query URL used to retrieve the data from the Water Data API was: \" + siteListPhos[1].url)" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Additional Examples\n", + "\n", + "#### Example 2: Retrieve information for a single monitoring location" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "oneSite = waterdata.get_monitoring_locations(monitoring_location_id=\"USGS-05114000\")\ndisplay(oneSite[0])" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 3: Retrieve information for a single monitoring location" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "oneSite = waterdata.get_monitoring_locations(monitoring_location_id=\"USGS-05114000\")\ndisplay(oneSite[0])" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 4: Retrieve information for monitoring locations in Utah" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "UTsites = waterdata.get_monitoring_locations(\n state_name=\"Utah\"\n)\ndisplay(UTsites[0])" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 5: Retrieve the time series metadata for a single site\n", + "\n", + "The `get_time_series_metadata()` function lists the parameters that have been collected at a monitoring location." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "oneSite = waterdata.get_time_series_metadata(monitoring_location_id=\"USGS-05114000\")\ndisplay(oneSite[0])" + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/demos/hydroshare/USGS_WaterData_Statistics_Examples.ipynb b/demos/hydroshare/USGS_WaterData_Statistics_Examples.ipynb new file mode 100644 index 00000000..8474727e --- /dev/null +++ b/demos/hydroshare/USGS_WaterData_Statistics_Examples.ipynb @@ -0,0 +1,250 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# USGS dataretrieval Python Package Statistics Examples\n", + "\n", + "This notebook provides examples of using the Python dataretrieval package to retrieve summary statistics for observed variables at a United States Geological Survey (USGS) monitoring location using the **USGS Water Data API** via the `waterdata` module. The `waterdata` module is the recommended way to access USGS water data and replaces the deprecated `nwis` module.\n", + "\n", + "Two statistics functions are demonstrated:\n", + "\n", + "* `get_stats_date_range()` — monthly, calendar-year, and water-year summaries (the \"observationIntervals\" service).\n", + "* `get_stats_por()` — day-of-year and month-of-year summaries over the full period of record (the \"observationNormals\" service)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Install the Package\n", + "\n", + "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install dataretrieval" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load the package so you can use it along with other packages used in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import display\n", + "from matplotlib import ticker\n", + "\n", + "from dataretrieval import waterdata\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Basic Usage\n", + "\n", + "This example uses `get_stats_date_range()` to retrieve monthly and annual statistics for an observed variable at a USGS monitoring location. Commonly used arguments include:\n", + "\n", + "* **monitoring_location_id** (string or list of strings): USGS monitoring location id(s), formed as the agency code and site number joined by a hyphen (e.g. `\"USGS-02319394\"`).\n", + "* **parameter_code** (string or list of strings): 5-digit USGS parameter code(s), e.g. `\"00060\"` (discharge).\n", + "* **computation_type** (string or list of strings): the statistic(s) to compute — one or more of `arithmetic_mean`, `maximum`, `median`, `minimum`, `percentile`.\n", + "* **start_date** / **end_date** (string): optionally bound the period summarized, in `YYYY-MM-DD` format." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 1: Get monthly and annual mean discharge for a single monitoring location" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Set the parameters needed to retrieve data\n", + "site = \"USGS-02319394\"\n", + "parameter_code = \"00060\" # Discharge\n", + "\n", + "# Retrieve the statistics (monthly, calendar-year, and water-year means)\n", + "x1 = waterdata.get_stats_date_range(\n", + " monitoring_location_id=site,\n", + " parameter_code=parameter_code,\n", + " computation_type=\"arithmetic_mean\",\n", + ")\n", + "print(\"Retrieved \" + str(len(x1[0])) + \" statistic values.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Interpreting the Result\n", + "\n", + "Each `waterdata` function returns a tuple of a pandas data frame and a metadata object. The data frame holds the computed statistics; each row is one interval, identified by the `interval_type` column (`month`, `calendar_year`, or `water_year`), with the computed statistic in the `value` column.\n", + "\n", + "Once you've got the data frame, there are several useful things you can do to explore the data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Display the data frame as a table\n", + "display(x1[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Show the data types of the columns in the resulting data frame." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(x1[0].dtypes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Make a quick time series plot of the annual (calendar-year) mean values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# select the annual (calendar-year) means into a plain DataFrame for plotting.\n", + "# The statistics services return a GeoDataFrame carrying a site-point geometry,\n", + "# and report numeric values as strings, so we coerce ``value`` to float.\n", + "annual = x1[0].loc[\n", + " x1[0][\"interval_type\"] == \"calendar_year\", [\"start_date\", \"value\"]\n", + "].copy()\n", + "annual[\"year\"] = annual[\"start_date\"].str[:4].astype(int)\n", + "annual[\"value\"] = annual[\"value\"].astype(float)\n", + "annual = annual.sort_values(\"year\")\n", + "\n", + "ax = annual.plot(x=\"year\", y=\"value\", legend=False)\n", + "ax.xaxis.set_major_formatter(ticker.FormatStrFormatter(\"%d\"))\n", + "ax.set_xlabel(\"Year\")\n", + "ax.set_ylabel(\"Annual mean discharge (cfs)\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The other part of the result is a metadata object describing the query that was executed. For example, you can access the URL that was assembled to retrieve the requested data from the USGS Water Data API." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"The query URL used to retrieve the data was: \" + x1[1].url)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Additional Examples" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 2: Get monthly and annual mean statistics for two monitoring locations\n", + "\n", + "Multiple monitoring locations and parameter codes can be requested at once; only the data that are available are returned." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x2 = waterdata.get_stats_date_range(\n", + " monitoring_location_id=[\"USGS-02319394\", \"USGS-02171500\"],\n", + " parameter_code=[\"00010\", \"00060\"],\n", + " computation_type=\"arithmetic_mean\",\n", + ")\n", + "display(x2[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Example 3: Day-of-year mean and median statistics over the period of record\n", + "\n", + "`get_stats_por()` summarizes the full period of record by day of year (and month of year). Here we request both the mean and median daily statistics for discharge at a monitoring location." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x3 = waterdata.get_stats_por(\n", + " monitoring_location_id=\"USGS-02171500\",\n", + " parameter_code=\"00060\",\n", + " computation_type=[\"arithmetic_mean\", \"median\"],\n", + ")\n", + "display(x3[0])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/demos/hydroshare/USGS_dataretrieval_UnitValues_Examples.ipynb b/demos/hydroshare/USGS_WaterData_UnitValues_Examples.ipynb similarity index 51% rename from demos/hydroshare/USGS_dataretrieval_UnitValues_Examples.ipynb rename to demos/hydroshare/USGS_WaterData_UnitValues_Examples.ipynb index 1a734e28..7c1b454f 100644 --- a/demos/hydroshare/USGS_dataretrieval_UnitValues_Examples.ipynb +++ b/demos/hydroshare/USGS_WaterData_UnitValues_Examples.ipynb @@ -4,9 +4,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# USGS dataretrieval Python Package `get_iv()` Examples\n", + "# USGS dataretrieval Python Package `get_continuous()` Examples\n", "\n", - "This notebook provides examples of using the Python dataretrieval package to retrieve instantaneous values data for a United States Geological Survey (USGS) monitoring site. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)." + "This notebook provides examples of using the Python dataretrieval package to retrieve continuous (instantaneous, or \"unit\") values data for a United States Geological Survey (USGS) monitoring location. The dataretrieval package provides a collection of functions to get data from the USGS Water Data API and other online sources of hydrology and water quality data." ] }, { @@ -39,14 +39,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "from datetime import date\n", - "\n", - "from IPython.display import display\n", - "\n", - "from dataretrieval import nwis\n", - "import dataretrieval.waterdata as waterdata" - ] + "source": "from datetime import date\n\nfrom IPython.display import display\n\nimport dataretrieval.waterdata as waterdata" }, { "cell_type": "markdown", @@ -54,19 +47,18 @@ "source": [ "### Basic Usage\n", "\n", - "The dataretrieval package has several functions that allow you to retrieve data from different web services. This example uses the `get_iv()` function to retrieve instantaneous streamflow data for a USGS monitoring site from NWIS. The following arguments are supported:\n", + "The dataretrieval package has several functions that allow you to retrieve data from the USGS Water Data API. This example uses the `get_continuous()` function to retrieve continuous (instantaneous) streamflow data for a USGS monitoring location. The following arguments are supported:\n", "\n", - "* **sites** (string or list of strings): A list of USGS site identifiers for which to retrieve data.\n", - "* **parameterCd** (string or list of strings): A list of USGS parameter codes for which to retrieve data.\n", - "* **start** (string): The beginning date for a period for which to retrieve data. If the waterdata parameter startDt is supplied, it will overwrite the start parameter.\n", - "* **end** (string): The ending date for a period for which to retrieve data. If the waterdata parameter endDt is supplied, it will overwrite the end parameter." + "* **monitoring_location_id** (string or iterable of strings): One or more unique monitoring location identifiers. An ID combines the agency code with the location number, separated by a hyphen (e.g. `USGS-10109000`).\n", + "* **parameter_code** (string or iterable of strings): One or more 5-digit USGS parameter codes identifying the constituent measured and its units (e.g. `00060` for discharge).\n", + "* **time** (string): The date or time interval for which to retrieve observations, given as an RFC 3339 date-time, a bounded or half-bounded interval (e.g. `2021-09-01/2021-09-30`), or an ISO 8601 duration. Continuous data are requested with `time`, and no more than three years of data may be requested per call. If `time` is omitted, the service returns the most recent year of measurements." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "#### Example 1: Get unit value data for a specific parameter at a USGS NWIS monitoring site between a begin and end date" + "#### Example 1: Get unit value data for a specific parameter at a USGS monitoring location between a begin and end date" ] }, { @@ -74,19 +66,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "# Set the parameters needed for the web service call\n", - "siteID = \"10109000\" # LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT\n", - "parameterCode = \"00060\" # Discharge\n", - "startDate = \"2021-09-01\"\n", - "endDate = \"2021-09-30\"\n", - "\n", - "# Get the data\n", - "discharge = waterdata.get_continuous(\n", - " monitoring_location_id=siteID, parameter_code=parameterCode, time=f\"{startDate}/{endDate}\"\n", - ")\n", - "print(\"Retrieved \" + str(len(discharge[0])) + \" data values.\")" - ] + "source": "# Set the parameters needed for the web service call\nsiteID = \"USGS-10109000\" # LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT\nparameterCode = \"00060\" # Discharge\nstartDate = \"2021-09-01\"\nendDate = \"2021-09-30\"\n\n# Get the data\ndischarge = waterdata.get_continuous(\n monitoring_location_id=siteID, parameter_code=parameterCode, time=f\"{startDate}/{endDate}\"\n)\nprint(\"Retrieved \" + str(len(discharge[0])) + \" data values.\")" }, { "cell_type": "markdown", @@ -94,9 +74,9 @@ "source": [ "### Interpreting the Result\n", "\n", - "The result of calling the `get_iv()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the values for the observed variable and time period requested. The data frame is indexed by the dates associated with the data values.\n", + "The `get_continuous()` function returns a tuple of two objects: a pandas data frame holding the observed values for the time period requested, and an associated metadata object. The data frame is flat, with a default integer index and one row per observation; the observation timestamps are stored in a tz-aware UTC `time` column rather than being used as the index.\n", "\n", - "Once you've got the data frame, there's several useful things you can do to explore the data." + "Once you've got the data frame, there are several useful things you can do to explore the data." ] }, { @@ -129,7 +109,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Get summary statistics for the daily streamflow values." + "Get summary statistics for the streamflow values." ] }, { @@ -163,7 +143,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The other part of the result returned from the `get_iv()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response." + "The other part of the result returned from the `get_continuous()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS Water Data API." ] }, { @@ -171,9 +151,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "print(\"The query URL used to retrieve the data from NWIS was: \" + discharge[1].url)" - ] + "source": "print(\"The query URL used to retrieve the data from the Water Data API was: \" + discharge[1].url)" }, { "cell_type": "markdown", @@ -181,9 +159,9 @@ "source": [ "### Additional Examples\n", "\n", - "#### Example 2: Get unit values for an individual site and parameter between a start and end date.\n", + "#### Example 2: Get unit values for an individual monitoring location and parameter between a start and end date.\n", "\n", - "NOTE: By default, start and end date are evaluated as local time, and the result is returned with the timestamps in the local time of the monitoring site." + "NOTE: By default, start and end date are evaluated as local time, and the result is returned with the timestamps in the local time of the monitoring location." ] }, { @@ -191,23 +169,13 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "site_id = \"05114000\"\n", - "startDate = \"2014-10-10\"\n", - "endDate = \"2014-10-10\"\n", - "\n", - "discharge2 = waterdata.get_continuous(\n", - " monitoring_location_id=site_id, parameter_code=parameterCode, time=f\"{startDate}/{endDate}\"\n", - ")\n", - "print(\"Retrieved \" + str(len(discharge2[0])) + \" data values.\")\n", - "display(discharge2[0])" - ] + "source": "site_id = \"USGS-05114000\"\nstartDate = \"2014-10-10\"\nendDate = \"2014-10-10\"\n\ndischarge2 = waterdata.get_continuous(\n monitoring_location_id=site_id, parameter_code=parameterCode, time=f\"{startDate}/{endDate}\"\n)\nprint(\"Retrieved \" + str(len(discharge2[0])) + \" data values.\")\ndisplay(discharge2[0])" }, { "cell_type": "markdown", "metadata": {}, "source": [ - "#### Example 3: Get unit values for an individual site for today" + "#### Example 3: Get unit values for an individual monitoring location for today" ] }, { @@ -230,7 +198,7 @@ "source": [ "#### Example 4: Retrieve data using UTC times\n", "\n", - "NOTE: Adding 'Z' to the input time parameters indicates that they are in UTC rather than local time. The time stamps associated with the data returned are still in the local time of the USGS monitoring site." + "NOTE: Adding 'Z' to the input time parameters indicates that they are in UTC rather than local time. The time stamps associated with the data returned are still in the local time of the USGS monitoring location." ] }, { @@ -252,7 +220,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Example 5: Get unit values for two sites, for a single parameter, between a start and end date" + "#### Example 5: Get unit values for two monitoring locations, for a single parameter, between a start and end date" ] }, { @@ -260,21 +228,13 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "discharge_multisite = waterdata.get_continuous(\n", - " monitoring_location_id=[\"04024430\", \"04024000\"],\n", - " parameter_code=parameterCode,\n", - " time=\"2013-10-01/2013-10-01\",\n", - ")\n", - "print(\"Retrieved \" + str(len(discharge_multisite[0])) + \" data values.\")\n", - "display(discharge_multisite[0])" - ] + "source": "discharge_multisite = waterdata.get_continuous(\n monitoring_location_id=[\"USGS-04024430\", \"USGS-04024000\"],\n parameter_code=parameterCode,\n time=\"2013-10-01/2013-10-01\",\n)\nprint(\"Retrieved \" + str(len(discharge_multisite[0])) + \" data values.\")\ndisplay(discharge_multisite[0])" }, { "cell_type": "markdown", "metadata": {}, "source": [ - "The following example is the same as the previous example but with multi index turned off (multi_index=False)" + "The following example requests the same two-location data as the previous example." ] }, { @@ -282,16 +242,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "discharge_multisite = waterdata.get_continuous(\n", - " monitoring_location_id=[\"04024430\", \"04024000\"],\n", - " parameter_code=parameterCode,\n", - " time=\"2013-10-01/2013-10-01\",\n", - " \n", - ")\n", - "print(\"Retrieved \" + str(len(discharge_multisite[0])) + \" data values.\")\n", - "display(discharge_multisite[0])" - ] + "source": "discharge_multisite = waterdata.get_continuous(\n monitoring_location_id=[\"USGS-04024430\", \"USGS-04024000\"],\n parameter_code=parameterCode,\n time=\"2013-10-01/2013-10-01\",\n \n)\nprint(\"Retrieved \" + str(len(discharge_multisite[0])) + \" data values.\")\ndisplay(discharge_multisite[0])" } ], "metadata": { diff --git a/demos/hydroshare/USGS_dataretrieval_DailyValues_Examples.ipynb b/demos/hydroshare/USGS_dataretrieval_DailyValues_Examples.ipynb deleted file mode 100644 index 71ea01bf..00000000 --- a/demos/hydroshare/USGS_dataretrieval_DailyValues_Examples.ipynb +++ /dev/null @@ -1,300 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# USGS dataretrieval Python Package `get_dv()` Examples\n", - "\n", - "This notebook provides examples of using the Python dataretrieval package to retrieve daily streamflow data for a United States Geological Survey (USGS) monitoring site. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Install the Package\n", - "\n", - "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install dataretrieval" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Load the package so you can use it along with other packages used in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from IPython.display import display\n", - "\n", - "from dataretrieval import nwis\n", - "import dataretrieval.waterdata as waterdata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Basic Usage\n", - "\n", - "The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the `get_dv()` function to retrieve daily streamflow data for a USGS monitoring site from NWIS. The following arguments are supported:\n", - "\n", - "Arguments (Additional arguments, if supplied, will be used as query parameters)\n", - "\n", - "* **sites** (string or list of strings): A list of USGS site identifiers for which to retrieve data.\n", - "* **parameterCd** (list of strings): A list of USGS parameter codes for which to retrieve data.\n", - "* **statCd** (list of strings): A list of USGS statistic codes for which to retrieve data.\n", - "* **start** (string): The beginning date for a period for which to retrieve data. If the waterdata parameter startDT is supplied, it will overwrite the start parameter.\n", - "* **end** (string): The ending date for a period for which to retrieve data. If the waterdata parameter endDT is supplied, it will overwrite the end parameter." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Example 1: Get daily value data for a specific parameter at a single USGS NWIS monitoring site between a begin and end date." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Set the parameters needed to retrieve data\n", - "siteNumber = \"10109000\" # LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT\n", - "parameterCode = \"00060\" # Discharge\n", - "startDate = \"2020-10-01\"\n", - "endDate = \"2021-09-30\"\n", - "\n", - "# Retrieve the data\n", - "dailyStreamflow = waterdata.get_daily(\n", - " monitoring_location_id=siteNumber, parameter_code=parameterCode, time=f\"{startDate}/{endDate}\"\n", - ")\n", - "print(\"Retrieved \" + str(len(dailyStreamflow[0])) + \" data values.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Interpreting the Result\n", - "\n", - "The result of calling the `get_dv()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the daily values for the observed variable and time period requested. The data frame is indexed by the dates associated with the data values.\n", - "\n", - "Once you've got the data frame, there's several useful things you can do to explore the data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Display the data frame as a table\n", - "display(dailyStreamflow[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Show the data types of the columns in the resulting data frame." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(dailyStreamflow[0].dtypes)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Get summary statistics for the daily streamflow values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dailyStreamflow[0].describe()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Make a quick time series plot." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ax = dailyStreamflow[0].plot(x=\"time\", y=\"value\")\n", - "ax.set_xlabel(\"Date\")\n", - "ax.set_ylabel(\"Streamflow (cfs)\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The other part of the result returned from the `get_dv()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\n", - " \"The query URL used to retrieve the data from NWIS was: \" + dailyStreamflow[1].url\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Additional Examples\n", - "\n", - "Example 2: Get daily mean and max discharge and temperature values for a site between a begin and end date.\n", - "\n", - "Parameter Code: 00010 = temperature, 00060 = discharge\n", - "See https://help.waterdata.usgs.gov/codes-and-parameters/parameters\n", - "\n", - "Statistic Code: 00001 = Maximum, 00003 = Mean\n", - "See https://help.waterdata.usgs.gov/stat_code\n", - "\n", - "NOTE: There's not full overlap in the availability of data for temperature and discharge for both statistics at this site. When data for one statistic is not available, a \"NaN\" value is returned in the data frame." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "siteID = \"04085427\"\n", - "dailyQAndT = waterdata.get_daily(\n", - " monitoring_location_id=siteID,\n", - " parameter_code=[\"00010\", \"00060\"],\n", - " time=f\"{startDate}/{endDate}\",\n", - " statistic_id=[\"00001\", \"00003\"],\n", - ")\n", - "display(dailyQAndT[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Example 3: Get daily mean and max discharge and temperature values for multiple sites between a begin and end date" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dailyMultiSites = waterdata.get_daily(\n", - " monitoring_location_id=[\"01491000\", \"01645000\"],\n", - " parameter_code=[\"00010\", \"00060\"],\n", - " time=\"2012-01-01/2012-06-30\",\n", - " statistic_id=[\"00001\", \"00003\"],\n", - ")\n", - "display(dailyMultiSites[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following example is the same as the previous example but with multi index turned off (multi_index=False)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dailyMultiSites = waterdata.get_daily(\n", - " monitoring_location_id=[\"01491000\", \"01645000\"],\n", - " parameter_code=[\"00010\", \"00060\"],\n", - " time=\"2012-01-01/2012-06-30\",\n", - " statistic_id=[\"00001\", \"00003\"],\n", - " \n", - ")\n", - "display(dailyMultiSites[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Example 4: Test for a site that is not active - returns an empty DataFrame." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "siteID = \"05212700\"\n", - "notActive = waterdata.get_daily(\n", - " monitoring_location_id=siteID, parameter_code=\"00060\", time=\"2014-01-01/2014-01-07\"\n", - ")\n", - "display(notActive[0])" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3" - } - }, - "nbformat": 4, - "nbformat_minor": 1 -} diff --git a/demos/hydroshare/USGS_dataretrieval_GroundwaterLevels_Examples.ipynb b/demos/hydroshare/USGS_dataretrieval_GroundwaterLevels_Examples.ipynb deleted file mode 100644 index 46c4edbd..00000000 --- a/demos/hydroshare/USGS_dataretrieval_GroundwaterLevels_Examples.ipynb +++ /dev/null @@ -1,300 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# USGS dataretrieval Python Package `get_gwlevels()` Examples\n", - "\n", - "This notebook provides examples of using the Python dataretrieval package to retrieve groundwater level data for a United States Geological Survey (USGS) monitoring site. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Install the Package\n", - "\n", - "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install dataretrieval" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Load the package so you can use it along with other packages used in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from IPython.display import display\n", - "\n", - "from dataretrieval import nwis\n", - "import dataretrieval.waterdata as waterdata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Basic Usage\n", - "\n", - "The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the `get_gwlevels()` function to retrieve groundwater level data from USGS NWIS. The following arguments are supported:\n", - "\n", - "Arguments (Additional parameters, if supplied, will be used as query parameters)\n", - "\n", - "* **sites** (string or list of strings): A list of USGS site identifiers for which to retrieve data.\n", - "* **start** (string): The beginning date for a period for which to retrieve data. If the waterdata parameter begin_date is supplied, it will overwrite the start parameter (defaults to '1851-01-01')\n", - "* **end** (string): The ending date for a period for which to retrieve data. If the waterdata parameter end_date is supplied, it will overwrite the end parameter." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Example 1: Get groundwater level data for a single monitoring site." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Set the parameters needed to retrieve data\n", - "site_id = \"434400121275801\"\n", - "\n", - "# Retrieve the data\n", - "data = waterdata.get_field_measurements(monitoring_location_id=site_id)\n", - "print(\"Retrieved \" + str(len(data[0])) + \" data values.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Interpreting the Result\n", - "\n", - "The result of calling the `get_gwlevels()` function is an object that contains a Pandas data frame and an associated metadata object. The Pandas data frame contains the data requested. The data frame is indexed by the dates associated with the data values.\n", - "\n", - "Once you've got the data frame, there's several useful things you can do to explore the data." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Display the data frame as a table" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "display(data[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Show the data types of the columns in the resulting data frame." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(data[0].dtypes)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Get summary statistics for the daily streamflow values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data[0][\"value\"].describe()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Make a quick time series plot." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ax = data[0].plot(x=\"time\", y=\"value\", style=\".\")\n", - "ax.set_xlabel(\"Date\")\n", - "ax.set_ylabel(\"Water Level (feet below land surface)\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The other part of the result returned from the `get_gwlevels()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\"The query URL used to retrieve the data from NWIS was: \" + data[1].url)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Additional Examples\n", - "\n", - "You can also request data for multiple sites at the same time.\n", - "\n", - "Example 2: Get data for multiple sites. Site numbers are specified using a comma delimited list of strings." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "site_ids = [\"434400121275801\", \"375907091432201\"]\n", - "data2 = waterdata.get_field_measurements(monitoring_location_id=site_ids)\n", - "print(\"Retrieved \" + str(len(data2[0])) + \" data values.\")\n", - "display(data2[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following example is the same as the previous example but with multi index turned off (multi_index=False)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "site_ids = [\"434400121275801\", \"375907091432201\"]\n", - "data2 = waterdata.get_field_measurements(monitoring_location_id=site_ids, )\n", - "print(\"Retrieved \" + str(len(data2[0])) + \" data values.\")\n", - "display(data2[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Some groundwater level data have dates that include only a year or a month and year, but no day.\n", - "\n", - "Example 3: Retrieve groundwater level data that have dates without a day." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data3 = waterdata.get_field_measurements(monitoring_location_id=\"425957088141001\")\n", - "print(\"Retrieved \" + str(len(data3[0])) + \" data values.\")\n", - "\n", - "# Print the date/time index values, which show up as NaT because\n", - "# the dates can't be converted to a date/time data type\n", - "print(data3[0].index)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you want to see the USGS RDB (delimited text) version of the data just retrieved, you can get the URL for the request that was sent to the USGS web service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Print the URL used to retrieve the data\n", - "print(\"You can examine the data retrieved from NWIS at: \" + data3[1].url)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also retrieve data for a site within a specified time window by specifying a start date and an end date.\n", - "\n", - "Example 4: Get groundwater level data for a site between a startDate and endDate." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data4 = waterdata.get_field_measurements(monitoring_location_id=site_id, time=\"1980-01-01/2000-12-31\")\n", - "print(\"Retrieved \" + str(len(data4[0])) + \" data values.\")" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} diff --git a/demos/hydroshare/USGS_dataretrieval_Measurements_Examples.ipynb b/demos/hydroshare/USGS_dataretrieval_Measurements_Examples.ipynb deleted file mode 100644 index d5b57e89..00000000 --- a/demos/hydroshare/USGS_dataretrieval_Measurements_Examples.ipynb +++ /dev/null @@ -1,201 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# USGS dataretrieval Python Package `get_discharge_measurements()` Examples\n", - "\n", - "This notebook provides examples of using the Python dataretrieval package to retrieve surface water discharge measurement data for a United States Geological Survey (USGS) monitoring site. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Install the Package\n", - "\n", - "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install dataretrieval" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Load the package so you can use it along with other packages used in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from IPython.display import display\n", - "\n", - "from dataretrieval import nwis\n", - "import dataretrieval.waterdata as waterdata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Basic Usage\n", - "\n", - "The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the `get_discharge_measurements()` function to retrieve surface water discharge measurements for a USGS monitoring site from NWIS. The function has the following arguments:\n", - "\n", - "Arguments (Additional arguments, if supplied, will be used as query parameters)\n", - "\n", - "* **sites** (list of strings): A list of USGS site codes to retrieve data for. If the qwdata parameter site_no is supplied, it will overwrite the sites parameter.\n", - "* **start** (string): The beginning date of a period for which to retrieve measurements. If the qwdata parameter begin_date is supplied, it will overwrite the start parameter.\n", - "* **end** (string): The ending date of a period for which to retrieve measurements. If the qwdata parameter end_date is supplied, it will overwrite the end parameter." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Example 1: Get all of the surface water measurements for a single site" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "measurements1 = waterdata.get_field_measurements(monitoring_location_id=\"10109000\")\n", - "print(\"Retrieved \" + str(len(measurements1[0])) + \" data values.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Interpreting the Result\n", - "\n", - "The result of calling the `get_discharge_measurements()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the discharge measurements for the time period requested.\n", - "\n", - "Once you've got the data frame, there's several useful things you can do to explore the data." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Display the data frame as a table" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "display(measurements1[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Show the data types of the columns in the resulting data frame." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(measurements1[0].dtypes)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The other part of the result returned from the `get_discharge_measurements()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\"The query URL used to retrieve the data from NWIS was: \" + measurements1[1].url)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Additional Examples\n", - "\n", - "Example 2: Get all of the surface water measurements between a start and end date" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "measurements2 = waterdata.get_field_measurements(\n", - " monitoring_location_id=\"10109000\", time=\"2019-01-01/2019-12-31\"\n", - ")\n", - "print(\"Retrieved \" + str(len(measurements2[0])) + \" data values.\")\n", - "display(measurements2[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Example 3: Get all of the surface water measurements for multiple sites" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "measurements3 = waterdata.get_field_measurements(monitoring_location_id=[\"01594440\", \"040851325\"])\n", - "print(\"Retrieved \" + str(len(measurements3[0])) + \" data values.\")\n", - "display(measurements3[0])" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} diff --git a/demos/hydroshare/USGS_dataretrieval_Peaks_Examples.ipynb b/demos/hydroshare/USGS_dataretrieval_Peaks_Examples.ipynb deleted file mode 100644 index 8a045bad..00000000 --- a/demos/hydroshare/USGS_dataretrieval_Peaks_Examples.ipynb +++ /dev/null @@ -1,213 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# USGS dataretrieval Python Package `get_discharge_peaks()` Examples\n", - "\n", - "This notebook provides examples of using the Python dataretrieval package to retrieve streamflow peak data for United States Geological Survey (USGS) monitoring sites. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Install the Package\n", - "\n", - "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install dataretrieval" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Load the package so you can use it along with other packages used in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from IPython.display import display\n", - "\n", - "from dataretrieval import nwis\n", - "from dataretrieval import waterdata\n", - "import dataretrieval.waterdata as waterdata\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Basic Usage\n", - "\n", - "The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the `get_discharge_peaks()` function to retrieve peak streamflow data for a USGS monitoring site from NWIS. The function has the following arguments:\n", - "\n", - "Arguments (Additional parameters, if supplied, will be used as query parameters)\n", - "\n", - "* **sites** (list of strings): A list of USGS site identifiers for which data will be retrieved. If the waterdata parameter site_no is supplied, it will overwrite the sites parameter.\n", - "* **start** (string): A beginning date for the period for which data will be retrieved. If the waterdata parameter begin_date is supplied, it will overwrite the start parameter.\n", - "* **end** (string): An ending date for the period for which data will be retrieved. If the waterdata parameter end_date is supplied, it will overwrite the end parameter." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Example 1: Retrieve streamflow peak data for two USGS monitoring sites" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "site_ids = [\"01594440\", \"040851325\"]\n", - "peak_data = nwis.get_discharge_peaks(site_ids)\n", - "print(\"Retrieved \" + str(len(peak_data[0])) + \" data values.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Interpreting the Result\n", - "\n", - "The result of calling the `get_discharge_peaks()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the discharge peak values for the requested site(s).\n", - "\n", - "Once you've got the data frame, there's several useful things you can do to explore the data.\n", - "\n", - "Display the data frame as a table." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "display(peak_data[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Show the data types of the columns in the resulting data frame." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(peak_data[0].dtypes)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The other part of the result returned from the `get_dv()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\"The query URL used to retrieve the data from NWIS was: \" + peak_data[1].url)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following example is the same as the previous example but with multi index turned off (multi_index=False)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "site_ids = [\"01594440\", \"040851325\"]\n", - "peak_data = nwis.get_discharge_peaks(site_ids, multi_index=False)\n", - "print(\"Retrieved \" + str(len(peak_data[0])) + \" data values.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Additional Examples\n", - "\n", - "Example 2: Retrieve discharge peaks for a single site." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "stations = \"06011000\"\n", - "data3 = nwis.get_discharge_peaks(stations)\n", - "display(data3[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Example 3: Retrieve peak discharge data for a monitoring site between two dates" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data4 = nwis.get_discharge_peaks(stations, start=\"1953-01-01\", end=\"1960-01-01\")\n", - "display(data4[0])" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/demos/hydroshare/USGS_dataretrieval_Ratings_Examples.ipynb b/demos/hydroshare/USGS_dataretrieval_Ratings_Examples.ipynb deleted file mode 100644 index 0daeba44..00000000 --- a/demos/hydroshare/USGS_dataretrieval_Ratings_Examples.ipynb +++ /dev/null @@ -1,192 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# USGS dataretrieval Python Package `get_ratings()` Examples\n", - "\n", - "This notebook provides examples of using the Python dataretrieval package to retrieve rating curve data for a United States Geological Survey (USGS) streamflow gage. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Install the Package\n", - "\n", - "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install dataretrieval" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Load the package so you can use it along with other packages used in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from IPython.display import display\n", - "\n", - "from dataretrieval import nwis\n", - "from dataretrieval import waterdata\n", - "import dataretrieval.waterdata as waterdata\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Basic Usage\n", - "\n", - "The dataretrieval package has several functions that allow you to retrieve data from different web services. This example uses the `get_ratings()` function to retrieve rating curve data for a monitoring site from USGS NWIS. The following arguments are available:\n", - "\n", - "Arguments (Additional arguments, if supplied, will be used as query parameters)\n", - "\n", - "* **site** (string): A USGS site number. This is usually an 8 digit number as a string. If the nwis parameter site_no is supplied, it will overwrite the site parameter.\n", - "* **base** (string): Can be \"base\", \"corr\", or \"exsa\"\n", - "* **county** (string): County IDs from county lookup or \"ALL\"\n", - "* **categories** (Listlike): List or comma delimited string of Two-letter category abbreviations\n", - "\n", - "NOTE: Not all active USGS streamflow gages have traditional rating curves that relate stage to flow." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Example 1: Get rating data for an NWIS Site" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Specify the USGS site number/code\n", - "site_id = \"10109000\"\n", - "\n", - "# Get the rating curve data\n", - "ratingData = nwis.get_ratings(site=site_id, file_type=\"exsa\")\n", - "print(\"Retrieved \" + str(len(ratingData[0])) + \" data values.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Interpreting the Result\n", - "\n", - "The result of calling the `get_ratings()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the rating curve data for the requested site.\n", - "\n", - "Once you've got the data frame, there's several useful things you can do to explore the data. You can execute the following code to display the data frame as a table.\n", - "\n", - "If the \"type\" parameter in the request has a value of \"base,\" then the columns in the data frame are as follows:\n", - "* INDEP - typically the gage height in feet\n", - "* DEP - typically the streamflow in cubic feet per second\n", - "* STOR - where an \"*\" indicates that the pair are a fixed point of the rating curve\n", - "\n", - "If the \"type\" parameter is specified as \"exsa,\" then an additional column called SHIFT is included that indicates the current shift in the rating for that value of INDEP.\n", - "\n", - "If the \"type\" parameter is specified as \"corr,\" then the columns are as follows:\n", - "* INDEP - typically gage height in feet\n", - "* CORR - the correction for that value\n", - "* CORRINDEP - the corrected value for CORR" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "display(ratingData[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Show the data types of the columns in the resulting data frame" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(ratingData[0].dtypes)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The other part of the result returned from the `get_ratings()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\"The query URL used to retrieve the data from NWIS was: \" + ratingData[1].url)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Example 2: Get rating data for a different NWIS site by changing the site_id" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "site_id = \"01594440\"\n", - "data = nwis.get_ratings(site=site_id, file_type=\"base\")\n", - "print(\"Retrieved \" + str(len(data[0])) + \" data values.\")" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 2 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython2" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} diff --git a/demos/hydroshare/USGS_dataretrieval_SiteInfo_Examples.ipynb b/demos/hydroshare/USGS_dataretrieval_SiteInfo_Examples.ipynb deleted file mode 100644 index 3e787d57..00000000 --- a/demos/hydroshare/USGS_dataretrieval_SiteInfo_Examples.ipynb +++ /dev/null @@ -1,242 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# USGS dataretrieval Python Package `get_info()` Examples\n", - "\n", - "This notebook provides examples of using the Python dataretrieval package to retrieve information about a United States Geological Survey (USGS) monitoring site. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Install the Package\n", - "\n", - "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install dataretrieval" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Load the package so you can use it along with other packages used in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from IPython.display import display\n", - "\n", - "from dataretrieval import nwis\n", - "import dataretrieval.waterdata as waterdata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Basic Usage\n", - "\n", - "The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the `get_info()` function to retrieve information about USGS monitoring sites. The function has several arguments, depending on the result you want to retrieve:\n", - "\n", - "#### Major Arguments (Additional arguments, if supplied, will be used as query parameters)\n", - "\n", - "Note: Must specify one major argument.\n", - "\n", - "* **sites** (string or list of strings): A list of site numbers. Sites may be prefixed with an optional agency code followed by a colon.\n", - "* **stateCd** (string): U.S. postal service (2-digit) state code. Only 1 state can be specified per request.\n", - "* **huc** (string or list of strings): A list of hydrologic unit codes (HUC) or aggregated watersheds. Only 1 major HUC can be specified per request, or up to 10 minor HUCs. A major HUC has two digits.\n", - "* **bBox** (list): A contiguous range of decimal latitude and longitude, starting with the west longitude, then the south latitude, then the east longitude, and then the north latitude with each value separated by a comma. The product of the range of latitude range and longitude cannot exceed 25 degrees. Whole or decimal degrees must be specified, up to six digits of precision. Minutes and seconds are not allowed.\n", - "* **countyCd** (string or list of strings): A list of county numbers, in a 5 digit numeric format. The first two digits of a county's code are the FIPS State Code. (url: https://help.waterdata.usgs.gov/code/county_query?fmt=html)\n", - "\n", - "#### Minor Arguments\n", - "\n", - "* **startDt** (string): Selects sites based on whether data was collected at a point in time beginning after startDt (start date). Dates must be in ISO-8601 Calendar Date format (for example: 1990-01-01).\n", - "* **endDt** (string)\n", - "* **period** (string): Selects sites based on whether or not they were active between now and a time in the past. For example, period=P10W will select sites active in the last ten weeks.\n", - "* **modifiedSince** (string): Returns only sites where site attributes or period of record data have changed during the request period.\n", - "* **parameterCd** (string or list of strings): Returns only site data for those sites containing the requested USGS parameter codes.\n", - "* **siteType** (string or list of strings): Restricts sites to those having one or more major and/or minor site types, such as stream, spring or well. For a list of all valid site types see https://help.waterdata.usgs.gov/site_tp_cd. For example, siteType='ST' returns streams only.\n", - "\n", - "#### Formatting Parameters\n", - "\n", - "NOTE: The following parameters are available via the USGS data retrieval services, but are not yet functional in the dataretrieval Python package\n", - "\n", - "* **siteOutput** (string 'basic' or 'expanded'): Indicates the richness of metadata you want for site attributes. Note that for visually oriented formats like Google Map format, this argument has no meaning. Note: for performance reasons, siteOutput='expanded' cannot be used if seriesCatalogOutput=true or with any values for outputDataTypeCd.\n", - "* **seriesCatalogOutput** (boolean): A switch that provides detailed period of record information for certain output formats. The period of record indicates date ranges for a certain kind of information about a site, for example the start and end dates for a site's daily mean streamflow.\n", - "\n", - "For additional parameter options see https://waterservices.usgs.gov/docs/site-service/site-service-details" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Example 1: Get site information for a USGS NWIS monitoring site" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Specify the site you want to retrieve information for\n", - "siteID = \"10109000\"\n", - "\n", - "# Get the site information\n", - "siteINFO = waterdata.get_monitoring_locations(monitoring_location_id=siteID)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Interpreting the Result\n", - "\n", - "The result of calling the `get_info()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the site information for the requested site.\n", - "\n", - "Once you've got the data frame, there's several useful things you can do to explore the information about the site." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Display the data frame as a table\n", - "display(siteINFO[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Show the data types of the columns in the resulting data frame." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(siteINFO[0].dtypes)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The other part of the result returned from the `get_info()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\"The query URL used to retrieve the data from NWIS was: \" + siteINFO[1].url)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Additional Examples\n", - "\n", - "#### Example 2: Get site information for multiple sites in a list" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Create a list of the site identifiers you want to retrieve information for\n", - "siteIDs = [\"05114000\", \"09423350\"]\n", - "\n", - "# Get the site information\n", - "siteINFO_multi = waterdata.get_monitoring_locations(monitoring_location_id=siteIDs)\n", - "display(siteINFO_multi[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Example 3: Get site information for all sites within a state" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get the site information for a state\n", - "siteINFO_state = waterdata.get_monitoring_locations(state_code=\"UT\")\n", - "display(siteINFO_state[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Example 4: Get site information for all \"stream\" sites within a USGS HUC" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Create a list of hucs for which to query sites\n", - "huc_list = [\"16010203\"]\n", - "\n", - "# Get the site information - limit to stream sites\n", - "siteINFO_huc = waterdata.get_monitoring_locations(hydrologic_unit_code=huc_list, site_type_code=\"ST\")\n", - "display(siteINFO_huc[0])" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/demos/hydroshare/USGS_dataretrieval_SiteInventory_Examples.ipynb b/demos/hydroshare/USGS_dataretrieval_SiteInventory_Examples.ipynb deleted file mode 100644 index 645ff1cb..00000000 --- a/demos/hydroshare/USGS_dataretrieval_SiteInventory_Examples.ipynb +++ /dev/null @@ -1,232 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# USGS dataretrieval Python Package `what_sites()` Examples\n", - "\n", - "This notebook provides examples of using the Python dataretrieval package to search NWIS for sites within a region with specific data. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Install the Package\n", - "\n", - "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install dataretrieval" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Load the package so you can use it along with other packages used in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from IPython.display import display\n", - "\n", - "from dataretrieval import nwis\n", - "import dataretrieval.waterdata as waterdata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Basic Usage\n", - "\n", - "The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the `what_sites()` function to search NWIS for sites within a region with specific data. The function has several arguments, depending on the result you want to retrieve.\n", - "\n", - "Note: Must specify one major argument.\n", - "\n", - "#### Major Arguments (Additional arguments, if supplied, will be used as query parameters)\n", - "\n", - "* **sites** (string or list): A list of site numbers. Sites may be prefixed with an optional agency code followed by a colon.\n", - "* **stateCd** (string): U.S. postal service (2-digit) state code. Only 1 state can be specified per request.\n", - "* **huc** (string or list): A list of hydrologic unit codes (HUC) or aggregated watersheds. Only 1 major HUC can be specified per request, or up to 10 minor HUCs. A major HUC has two digits.\n", - "* **bBox** (list): A contiguous range of decimal latitude and longitude, starting with the west longitude, then the south latitude, then the east longitude, and then the north latitude with each value separated by a comma. The product of the range of latitude range and longitude cannot exceed 25 degrees. Whole or decimal degrees must be specified, up to six digits of precision. Minutes and seconds are not allowed.\n", - "* **countyCd** (string or list): A list of county numbers, in a 5 digit numeric format. The first two digits of a county's code are the FIPS State Code. (url: https://help.waterdata.usgs.gov/code/county_query?fmt=html)\n", - "\n", - "#### Minor Arguments\n", - "\n", - "* **startDt** (string): Selects sites based on whether data was collected at a point in time beginning after startDt (start date). Dates must be in ISO-8601 Calendar Date format (for example: 1990-01-01).\n", - "* **endDt** (string)\n", - "* **period** (string): Selects sites based on whether or not they were active between now and a time in the past. For example, period=P10W will select sites active in the last ten weeks.\n", - "* **modifiedSince** (string): Returns only sites where site attributes or period of record data have changed during the request period.\n", - "* **parameterCd** (string or list): Returns only site data for those sites containing the requested USGS parameter codes.\n", - "* **siteType** (string or list): Restricts sites to those having one or more major and/or minor site types, such as stream, spring or well. For a list of all valid site types see https://help.waterdata.usgs.gov/site_tp_cd. For example, siteType='ST' returns streams only.\n", - "\n", - "#### Formatting Parameters\n", - "\n", - "* **siteOutput** (string 'basic' or 'expanded'): Indicates the richness of metadata you want for site attributes. Note that for visually oriented formats like Google Map format, this argument has no meaning. Note: for performance reasons, siteOutput='expanded' cannot be used if seriesCatalogOutput=true or with any values for outputDataTypeCd.\n", - "* **seriesCatalogOutput** (boolean): A switch that provides detailed period of record information for certain output formats. The period of record indicates date ranges for a certain kind of information about a site, for example the start and end dates for a site's daily mean streamflow.\n", - "\n", - "For additional parameter options see https://waterservices.usgs.gov/docs/site-service/site-service-details" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Example 1: Retrieve information about sites in Ohio where phosphorus data was collected" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "siteListPhos = waterdata.get_monitoring_locations(state_code=\"OH\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Interpreting the Result\n", - "\n", - "The result of calling the `what_sites()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the requestes site inventory data.\n", - "\n", - "Once you've got the data frame, there's several useful things you can do to explore the data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Display the data frame as a table\n", - "display(siteListPhos[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The other part of the result returned from the `what_sites()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\"The query URL used to retrieve the data from NWIS was: \" + siteListPhos[1].url)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Additional Examples\n", - "\n", - "#### Example 2: Retrieve site information for a single site" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "oneSite = waterdata.get_monitoring_locations(monitoring_location_id=\"05114000\")\n", - "display(oneSite[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Example 3: Retrieve site information for a single site and show the result with expanded output" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "oneSite = waterdata.get_monitoring_locations(monitoring_location_id=\"05114000\")\n", - "display(oneSite[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Example 4: Retrieve site information for sites in Utah with daily values data falling within a specified date range" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "UTsites = waterdata.get_monitoring_locations(\n", - " state_code=\"UT\"\n", - ")\n", - "display(UTsites[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Example 5: Retrieve site information for a single site and show the series catalog output\n", - "\n", - "The series catalog output is a list of the parameters that have been collected at that site" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "oneSite = waterdata.get_time_series_metadata(monitoring_location_id=\"05114000\")\n", - "display(oneSite[0])" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/demos/hydroshare/USGS_dataretrieval_Statistics_Examples.ipynb b/demos/hydroshare/USGS_dataretrieval_Statistics_Examples.ipynb deleted file mode 100644 index 808d2f9d..00000000 --- a/demos/hydroshare/USGS_dataretrieval_Statistics_Examples.ipynb +++ /dev/null @@ -1,240 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# USGS dataretrieval Python Package `get_stats()` Examples\n", - "\n", - "This notebook provides examples of using the Python dataretrieval package to retrieve statistics for observed variables at a United States Geological Survey (USGS) monitoring site. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Install the Package\n", - "\n", - "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install dataretrieval" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Load the package so you can use it along with other packages used in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from IPython.display import display\n", - "from matplotlib import ticker\n", - "\n", - "from dataretrieval import nwis\n", - "from dataretrieval import waterdata\n", - "import dataretrieval.waterdata as waterdata\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Basic Usage\n", - "\n", - "The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the `get_stats()` function to retrieve statistics for observed variable(s) for a USGS monitoring site from USGS NWIS. The following arguments are available:\n", - "\n", - "Arguments (Additional parameters, if supplied, will be used as query parameters).\n", - "\n", - "* **sites** (string or list of strings): A string or list of strings contining the USGS site identifiers for which to retrive data.\n", - "* **parameterCd** (string or list of strings): A list of USGS parameter codes for which to retrieve data.\n", - "* **statReportType** (string): The aggregation period for which statistics should be reported. Can be specified as 'daily' (default), 'monthly', or 'annual'.\n", - "* **statTypeCd** (string): The type of statistic to be returned in the result. Can be specified as 'all', 'mean', 'max', 'min', or 'median'" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Example 1: Get all of the annual mean discharge data for a single site" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Set the parameters needed to retrieve data\n", - "siteNumber = \"02319394\"\n", - "parameterCode = \"00060\" # Discharge\n", - "\n", - "# Retrieve the statistics\n", - "x1 = nwis.get_stats(\n", - " sites=siteNumber, parameterCd=parameterCode, statReportType=\"annual\"\n", - ")\n", - "print(\"Retrieved \" + str(len(x1[0])) + \" data values.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Interpreting the Result\n", - "\n", - "The result of calling the `get_stats()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the statistics values for the site and observed variable requested.\n", - "\n", - "Once you've got the data frame, there's several useful things you can do to explore the data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Display the data frame as a table\n", - "display(x1[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Show the data types of the columns in the resulting data frame." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(x1[0].dtypes)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Make a quick time series plot of the annual mean values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ax = x1[0].plot(x=\"year_nu\", y=\"mean_va\")\n", - "ax.xaxis.set_major_formatter(ticker.FormatStrFormatter(\"%d\"))\n", - "ax.set_xlabel(\"Year\")\n", - "ax.set_ylabel(\"Annual mean discharge (cfs)\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The other part of the result returned from the `get_stats()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\"The query URL used to retrieve the data from NWIS was: \" + x1[1].url)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Additional Examples" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Example 2: Get all of the annual mean discharge data for two sites\n", - "\n", - "Note: Passing multiple parameters (temperature and flow) looks like it returns only what is available (in this example flow, 00060)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "x2 = nwis.get_stats(\n", - " sites=[\"02319394\", \"02171500\"],\n", - " parameterCd=[\"00010\", \"00060\"],\n", - " statReportType=\"annual\",\n", - ")\n", - "display(x2[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Example 3: Request daily mean and median values for temperature and discharge for a site for years between 2000 and 2007\n", - "\n", - "NOTE: The startDt and endDt parameters are not directly supported by this function but are turned into query parameters in the request to USGS NWIS, which means that they can be used to limit the time window requested." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "x3 = nwis.get_stats(\n", - " sites=\"02171500\",\n", - " parameterCd=[\"00010\", \"00060\"],\n", - " statReportType=\"daily\",\n", - " statTypeCd=[\"mean\", \"median\"],\n", - " startDt=\"2000\",\n", - " endDt=\"2007\",\n", - ")\n", - "display(x3[0])" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/demos/hydroshare/USGS_dataretrieval_WaterSamples_Examples.ipynb b/demos/hydroshare/USGS_dataretrieval_WaterSamples_Examples.ipynb deleted file mode 100644 index 378f925b..00000000 --- a/demos/hydroshare/USGS_dataretrieval_WaterSamples_Examples.ipynb +++ /dev/null @@ -1,349 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# USGS dataretrieval Python Package `get_samples()` Examples\n", - "\n", - "This notebook provides examples of using the Python dataretrieval package to retrieve water quality sample data for United States Geological Survey (USGS) monitoring sites. The dataretrieval package provides a collection of functions to get data from the USGS Samples database and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Install the Package\n", - "\n", - "Use the following code to install the package if it doesn't exist already within your Jupyter Python environment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install dataretrieval" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Load the package so you can use it along with other packages used in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from IPython.display import display\n", - "\n", - "from dataretrieval import waterdata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Basic Usage\n", - "\n", - "The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the `get_samples()` function to retrieve water quality sample data for USGS monitoring sites from Samples. The following arguments are supported:\n", - "\n", - "* **ssl_check** : boolean, optional\n", - " Check the SSL certificate.\n", - "* **service** : string\n", - " One of the available Samples services: \"results\", \"locations\", \"activities\",\n", - " \"projects\", or \"organizations\". Defaults to \"results\".\n", - "* **profile** : string\n", - " One of the available profiles associated with a service. Options for each\n", - " service are:\n", - " results - \"fullphyschem\", \"basicphyschem\",\n", - " \"fullbio\", \"basicbio\", \"narrow\",\n", - " \"resultdetectionquantitationlimit\",\n", - " \"labsampleprep\", \"count\"\n", - " locations - \"site\", \"count\"\n", - " activities - \"sampact\", \"actmetric\",\n", - " \"actgroup\", \"count\"\n", - " projects - \"project\", \"projectmonitoringlocationweight\"\n", - " organizations - \"organization\", \"count\"\n", - "* **activityMediaName** : string or list of strings, optional\n", - " Name or code indicating environmental medium in which sample was taken.\n", - " Check the `activityMediaName_lookup()` function in this module for all\n", - " possible inputs.\n", - " Example: \"Water\".\n", - "* **activityStartDateLower** : string, optional\n", - " The start date if using a date range. Takes the format YYYY-MM-DD.\n", - " The logic is inclusive, i.e. it will also return results that\n", - " match the date. If left as None, will pull all data on or before\n", - " activityStartDateUpper, if populated.\n", - "* **activityStartDateUpper** : string, optional\n", - " The end date if using a date range. Takes the format YYYY-MM-DD.\n", - " The logic is inclusive, i.e. it will also return results that\n", - " match the date. If left as None, will pull all data after\n", - " activityStartDateLower up to the most recent available results.\n", - "* **activityTypeCode** : string or list of strings, optional\n", - " Text code that describes type of field activity performed.\n", - " Example: \"Sample-Routine, regular\".\n", - "* **characteristicGroup** : string or list of strings, optional\n", - " Characteristic group is a broad category of characteristics\n", - " describing one or more results. Check the `characteristicGroup_lookup()`\n", - " function in this module for all possible inputs.\n", - " Example: \"Organics, PFAS\"\n", - "* **characteristic** : string or list of strings, optional\n", - " Characteristic is a specific category describing one or more results.\n", - " Check the `characteristic_lookup()` function in this module for all\n", - " possible inputs.\n", - " Example: \"Suspended Sediment Discharge\"\n", - "* **characteristicUserSupplied** : string or list of strings, optional\n", - " A user supplied characteristic name describing one or more results.\n", - "* **boundingBox**: list of four floats, optional\n", - " Filters on the the associated monitoring location's point location\n", - " by checking if it is located within the specified geographic area. \n", - " The logic is inclusive, i.e. it will include locations that overlap\n", - " with the edge of the bounding box. Values are separated by commas,\n", - " expressed in decimal degrees, NAD83, and longitudes west of Greenwich\n", - " are negative.\n", - " The format is a string consisting of:\n", - " - Western-most longitude\n", - " - Southern-most latitude\n", - " - Eastern-most longitude\n", - " - Northern-most longitude \n", - " Example: [-92.8,44.2,-88.9,46.0]\n", - "* **countryFips** : string or list of strings, optional\n", - " Example: \"US\" (United States)\n", - "* **stateFips** : string or list of strings, optional\n", - " Check the `stateFips_lookup()` function in this module for all\n", - " possible inputs.\n", - " Example: \"US:15\" (United States: Hawaii)\n", - "* **countyFips** : string or list of strings, optional\n", - " Check the `countyFips_lookup()` function in this module for all\n", - " possible inputs.\n", - " Example: \"US:15:001\" (United States: Hawaii, Hawaii County)\n", - "* **siteTypeCode** : string or list of strings, optional\n", - " An abbreviation for a certain site type. Check the `siteType_lookup()`\n", - " function in this module for all possible inputs.\n", - " Example: \"GW\" (Groundwater site)\n", - "* **siteTypeName** : string or list of strings, optional\n", - " A full name for a certain site type. Check the `siteType_lookup()`\n", - " function in this module for all possible inputs.\n", - " Example: \"Well\"\n", - "* **usgsPCode** : string or list of strings, optional\n", - " 5-digit number used in the US Geological Survey computerized\n", - " data system, National Water Information System (NWIS), to\n", - " uniquely identify a specific constituent. Check the \n", - " `characteristic_lookup()` function in this module for all possible\n", - " inputs.\n", - " Example: \"00060\" (Discharge, cubic feet per second)\n", - "* **hydrologicUnit** : string or list of strings, optional\n", - " Max 12-digit number used to describe a hydrologic unit.\n", - " Example: \"070900020502\"\n", - "* **monitoringLocationIdentifier** : string or list of strings, optional\n", - " A monitoring location identifier has two parts: the agency code\n", - " and the location number, separated by a dash (-).\n", - " Example: \"USGS-040851385\"\n", - "* **organizationIdentifier** : string or list of strings, optional\n", - " Designator used to uniquely identify a specific organization.\n", - " Currently only accepting the organization \"USGS\".\n", - "* **pointLocationLatitude** : float, optional\n", - " Latitude for a point/radius query (decimal degrees). Must be used\n", - " with pointLocationLongitude and pointLocationWithinMiles.\n", - "* **pointLocationLongitude** : float, optional\n", - " Longitude for a point/radius query (decimal degrees). Must be used\n", - " with pointLocationLatitude and pointLocationWithinMiles.\n", - "* **pointLocationWithinMiles** : float, optional\n", - " Radius for a point/radius query. Must be used with\n", - " pointLocationLatitude and pointLocationLongitude\n", - "* **projectIdentifier** : string or list of strings, optional\n", - " Designator used to uniquely identify a data collection project. Project\n", - " identifiers are specific to an organization (e.g. USGS).\n", - " Example: \"ZH003QW03\"\n", - "* **recordIdentifierUserSupplied** : string or list of strings, optional\n", - " Internal AQS record identifier that returns 1 entry. Only available\n", - " for the \"results\" service." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Example 1: Get all water quality sample data for a single monitoring site" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "siteID = \"USGS-10109000\"\n", - "wq_data = waterdata.get_samples(monitoringLocationIdentifier=siteID)\n", - "print(\"Retrieved data for \" + str(len(wq_data[0])) + \" samples.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Interpreting the Result\n", - "\n", - "The result of calling the `get_samples()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the water quality sample data for the requested site, and or observed variables and time frame.\n", - "\n", - "Once you've got the data frame, there's several useful things you can do to explore the data." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Display the data frame as a table. The default data frame for this function is a long, flat table, with a row for each observed variable at a given site and date/time." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "display(wq_data[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Show the data types of the columns in the resulting data frame." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(wq_data[0].dtypes)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The other part of the result returned from the `get_samples()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\n", - " \"The query URL used to retrieve the data from USGS Samples was: \" + wq_data[1].url\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Additional Examples\n", - "\n", - "#### Example 2: Get water quality sample data for multiple sites for a single parameter" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "site_ids = [\"USGS-04024430\", \"USGS-04024000\"]\n", - "parameter_code = \"00065\"\n", - "wq_multi_site = waterdata.get_samples(\n", - " monitoringLocationIdentifier=site_ids, usgsPCode=parameter_code\n", - ")\n", - "print(\"Retrieved data for \" + str(len(wq_multi_site[0])) + \" samples.\")\n", - "display(wq_multi_site[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Example 3: Retrieve water quality sample data for multiple sites, including a list of parameters, within a time period defined by start date until present" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "site_ids = [\"USGS-04024430\", \"USGS-04024000\"]\n", - "parameterCd = [\"34247\", \"30234\", \"32104\", \"34220\"]\n", - "startDate = \"2012-01-01\"\n", - "wq_data2 = waterdata.get_samples(\n", - " monitoringLocationIdentifier=site_ids,\n", - " usgsPCode=parameterCd,\n", - " activityStartDateLower=startDate,\n", - ")\n", - "print(\"Retrieved data for \" + str(len(wq_multi_site[0])) + \" samples.\")\n", - "display(wq_data2[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Example 4: Retrieve water quality sample data for one site and convert to a wide format\n", - "\n", - "Note that the USGS Samples database returns multiple parameters in a \"long\" format: each row in the resulting table represents a single observation of a single parameters. Furthermore, every observation has 181 fields of metadata. However, if you wanted to place your water quality data into a \"wide\" format, where each column represents a water quality parameter code, the code below details one solution." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "siteID = \"USGS-10109000\"\n", - "wq_data, _ = waterdata.get_samples(monitoringLocationIdentifier=siteID)\n", - "print(\"Retrieved data for \" + str(len(wq_data)) + \" sample results.\")\n", - "\n", - "wq_data[\"characteristic_unit\"] = (\n", - " wq_data[\"Result_Characteristic\"] + \", \" + wq_data[\"Result_MeasureUnit\"]\n", - ")\n", - "wq_data_wide = wq_data.pivot_table(\n", - " index=[\"Location_Identifier\", \"Activity_StartDate\", \"Activity_StartTime\"],\n", - " columns=\"characteristic_unit\",\n", - " values=\"Result_Measure\",\n", - " aggfunc=\"first\",\n", - ")\n", - "display(wq_data_wide)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "hyswap-dev-environment", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/demos/NWIS_demo_1.ipynb b/demos/peak_streamflow_trends.ipynb similarity index 54% rename from demos/NWIS_demo_1.ipynb rename to demos/peak_streamflow_trends.ipynb index 6edaa8ab..0a0f4dda 100644 --- a/demos/NWIS_demo_1.ipynb +++ b/demos/peak_streamflow_trends.ipynb @@ -8,7 +8,7 @@ "\n", "## Introduction\n", "\n", - "This notebook demonstrates a slightly more advanced application of data_retrieval.nwis to collect using a national dataset of historical peak annual streamflow measurements. The objective is to use a regression of peak annual streamflow and time to identify any trends. But, not for a singile station," + "This notebook demonstrates a slightly more advanced application of the `dataretrieval.waterdata` module: assembling a dataset of historical annual peak streamflow and regressing peak discharge against time to look for trends — not at a single station, but across many." ] }, { @@ -16,7 +16,7 @@ "metadata": {}, "source": [ "## Setup\n", - "Before we begin any analysis, we'll need to setup our environment by importing any modules." + "Before we begin any analysis, we'll need to set up our environment by importing a few modules." ] }, { @@ -29,7 +29,7 @@ "import pandas as pd\n", "from scipy import stats\n", "\n", - "from dataretrieval import nwis" + "from dataretrieval import waterdata" ] }, { @@ -37,7 +37,7 @@ "metadata": {}, "source": [ "## Basic usage\n", - "Recall that the basic way to download data from NWIS is through through the `nwis.get_record()` function, which returns a user-specified record as a `pandas` dataframe. The `nwis.get_record()` function is really a facade of sorts, that allows the user to download data from various NWIS services through a consistant interface. To get started, we require a few simple parameters: a list of site numbers or states codes, a service, and a start date." + "The `waterdata` module is the recommended interface to USGS water data and replaces the deprecated `nwis` module. Annual peak streamflow is retrieved with `waterdata.get_peaks()`, which returns a `pandas` data frame and a metadata object. To get started we need a monitoring location ID, a parameter code, and (optionally) a time window." ] }, { @@ -46,21 +46,20 @@ "metadata": {}, "outputs": [], "source": [ - "# download annual peaks from a single site\n", - "df = nwis.get_record(sites=\"03339000\", service=\"peaks\", start=\"1970-01-01\")\n", - "df.head()\n", - "\n", - "# alternatively information for the entire state of illiois can be downloaded using\n", - "# df = nwis.get_record(state_cd='il', service='peaks', start='1970-01-01')" + "# download annual peaks (discharge, parameter 00060) from a single site\n", + "df, md = waterdata.get_peaks(\n", + " monitoring_location_id=\"USGS-03339000\",\n", + " parameter_code=\"00060\",\n", + " time=\"1970-01-01/..\",\n", + ")\n", + "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Most of the fields are empty, but no matter. All we require are date (`datetime`), site number (`site_no`), and peak streamflow (`peak_va`).\n", - "\n", - "Note that when multiple sites are specified, `nwis.get_record()` will combine `datetime` and `site_no` fields to create a multi-index dataframe." + "All we require for the trend analysis are the peak date (`time`), the monitoring location ID (`monitoring_location_id`), and the peak streamflow (`value`). The Water Data API returns a flat (single-index) data frame with one row per annual peak." ] }, { @@ -68,8 +67,7 @@ "metadata": {}, "source": [ "## Preparing the regression\n", - "Next we'll define a function that applies ordinary least squares on peak discharge and time.\n", - "After grouping the dataset by `site_no`, we will apply the regression on a per-site basis. The results from each site, will be returned as a row that includes the slope, y-intercept, r$^2$, p value, and standard error of the regression." + "Next we'll define a function that applies ordinary least squares to peak discharge versus time. After grouping the dataset by `monitoring_location_id`, we apply the regression per monitoring location. Each location's result is returned as a row with the slope, y-intercept, p value, and standard error of the regression." ] }, { @@ -79,23 +77,15 @@ "outputs": [], "source": [ "def peak_trend_regression(df):\n", - " \"\"\" \"\"\"\n", - " # convert datetimes to days for regression\n", - " peak_date = df.index\n", - " peak_date = pd.to_datetime(df.index.get_level_values(1))\n", - " df[\"peak_d\"] = (peak_date - peak_date.min()) / np.timedelta64(1, \"D\")\n", - " # df['peak_d'] = (df['peak_dt'] - df['peak_dt'].min()) / np.timedelta64(1,'D')\n", + " # convert peak dates to days since the first peak for the regression\n", + " peak_date = pd.to_datetime(df[\"time\"])\n", + " peak_d = (peak_date - peak_date.min()) / np.timedelta64(1, \"D\")\n", "\n", " # normalize the peak discharge values\n", - " df[\"peak_va\"] = (df[\"peak_va\"] - df[\"peak_va\"].mean()) / df[\"peak_va\"].std()\n", + " value = (df[\"value\"] - df[\"value\"].mean()) / df[\"value\"].std()\n", "\n", - " slope, intercept, _r_value, p_value, std_error = stats.linregress(\n", - " df[\"peak_d\"], df[\"peak_va\"]\n", - " )\n", + " slope, intercept, _r_value, p_value, std_error = stats.linregress(peak_d, value)\n", "\n", - " # df_out = pd.DataFrame({'slope':slope,'intercept':intercept,'p_value':p_value},index=df['site_no'].iloc[0])\n", - "\n", - " # return df_out\n", " return pd.Series(\n", " {\n", " \"slope\": slope,\n", @@ -119,36 +109,40 @@ "metadata": {}, "outputs": [], "source": [ - "def peak_trend_analysis(states, start_date):\n", + "def peak_trend_analysis(state_names, start_date):\n", " \"\"\"\n", - " states : list\n", - " a list containing the two-letter codes for each state to include in the\n", - " analysis.\n", - "\n", + " state_names : list\n", + " state names to include in the analysis, e.g. [\"Illinois\", \"Indiana\"].\n", " start_date : string\n", - " the date to use a the beginning of the analysis.\n", + " the date to use as the beginning of the analysis (YYYY-MM-DD).\n", " \"\"\"\n", " final_df = pd.DataFrame()\n", "\n", - " for state in states:\n", - " # download annual peak discharge records\n", - " df = nwis.get_record(state_cd=state, start=start_date, service=\"peaks\")\n", + " for state in state_names:\n", + " # find stream gages in the state\n", + " sites, _ = waterdata.get_monitoring_locations(\n", + " state_name=state, site_type_code=\"ST\", skip_geometry=True\n", + " )\n", + " # download annual peak discharge for those sites\n", + " df, _ = waterdata.get_peaks(\n", + " monitoring_location_id=sites[\"monitoring_location_id\"].tolist(),\n", + " parameter_code=\"00060\",\n", + " time=f\"{start_date}/..\",\n", + " )\n", " # group the data by site and apply our regression\n", - " temp = df.groupby(\"site_no\").apply(peak_trend_regression).dropna()\n", + " temp = (\n", + " df.groupby(\"monitoring_location_id\")\n", + " .apply(peak_trend_regression)\n", + " .dropna()\n", + " )\n", " # drop any insignificant results\n", " temp = temp[temp[\"p_value\"] < 0.05]\n", "\n", - " # now download metadata for each site, which we'll use later to plot the sites\n", - " # on a map\n", - " site_df = nwis.get_record(sites=temp.index, service=\"site\")\n", - "\n", - " if final_df.empty:\n", - " final_df = pd.merge(site_df, temp, right_index=True, left_on=\"site_no\")\n", - "\n", - " else:\n", - " final_df = final_df.append(\n", - " pd.merge(site_df, temp, right_index=True, left_on=\"site_no\")\n", - " )\n", + " # join site metadata (for mapping) with the trend results\n", + " merged = pd.merge(\n", + " sites, temp, right_index=True, left_on=\"monitoring_location_id\"\n", + " )\n", + " final_df = pd.concat([final_df, merged], ignore_index=True)\n", "\n", " return final_df" ] @@ -156,9 +150,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "To run the analysis for all states since 1970, one would only need to uncomment and run the following lines. However, pulling all that data from NWIS takes time and puts and could put a burden on resoures." - ] + "source": "To run the analysis for all states since 1970, one would only need to uncomment and run the following lines. However, pulling all that data from the Water Data API takes time and could put a burden on resources." }, { "cell_type": "code", @@ -166,20 +158,20 @@ "metadata": {}, "outputs": [], "source": [ - "# Warning these lines will download a large dataset from the web and\n", - "# will take few minutes to run.\n", + "# Warning: these lines download a large dataset from the web and\n", + "# will take a few minutes to run.\n", "\n", - "# start = '1970-01-01'\n", - "# states = codes.state_codes\n", - "# final_df = peak_trend_analysis(states=states, start_date=start)\n", - "# final_df.to_csv('datasets/peak_discharge_trends.csv')" + "# start = \"1970-01-01\"\n", + "# states = [\"Illinois\", \"Indiana\", \"Ohio\"]\n", + "# final_df = peak_trend_analysis(state_names=states, start_date=start)\n", + "# final_df.to_csv(\"datasets/peak_discharge_trends.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Instead, lets quickly load some predownloaded data, which I generated using the code above." + "Instead, let's quickly load some pre-generated results bundled with this notebook. (This example dataset was produced by an earlier run of the analysis and retains the column names from that run.)" ] }, { @@ -196,15 +188,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Notice how the data has been transformed. In addition to statistics about the peak streamflow trends, we've also used the NWIS site service to add latitude and longtitude information for each station." + "Notice how the data has been transformed. In addition to statistics about the peak streamflow trends, the analysis joined monitoring-location metadata to add latitude and longitude for each station." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Plotting the results\n", - "Finally we'll use `basemap` and `matplotlib`, along with the location information from NWIS, to plot the results on a map (shown below). Stations with increasing peak annual discharge are shown in red; whereas, stations with decreasing peaks are blue." + "## Plotting the results\n", + "Finally we'll use `basemap` and `matplotlib`, along with the location information from the Water Data API, to plot the results on a map (shown below). Monitoring locations with increasing peak annual discharge are shown in red, and those with decreasing peaks in blue." ] }, { diff --git a/docs/source/_static/.gitkeep b/docs/source/_static/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/docs/source/conf.py b/docs/source/conf.py index 276bbd98..9d478f98 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -107,7 +107,6 @@ html_theme_options = { "logo_only": False, - "display_version": True, } # Add any paths that contain custom static files (such as style sheets) here, diff --git a/docs/source/examples/USGS_NWIS_WaterUse_Examples.nblink b/docs/source/examples/USGS_NWIS_WaterUse_Examples.nblink new file mode 100644 index 00000000..b08fd2b9 --- /dev/null +++ b/docs/source/examples/USGS_NWIS_WaterUse_Examples.nblink @@ -0,0 +1,3 @@ +{ + "path": "../../../demos/hydroshare/USGS_NWIS_WaterUse_Examples.ipynb" +} diff --git a/docs/source/examples/USGS_WaterData_DailyValues_Examples.nblink b/docs/source/examples/USGS_WaterData_DailyValues_Examples.nblink new file mode 100644 index 00000000..a0cf6297 --- /dev/null +++ b/docs/source/examples/USGS_WaterData_DailyValues_Examples.nblink @@ -0,0 +1,3 @@ +{ + "path": "../../../demos/hydroshare/USGS_WaterData_DailyValues_Examples.ipynb" +} \ No newline at end of file diff --git a/docs/source/examples/USGS_WaterData_GroundwaterLevels_Examples.nblink b/docs/source/examples/USGS_WaterData_GroundwaterLevels_Examples.nblink new file mode 100644 index 00000000..ae9400f4 --- /dev/null +++ b/docs/source/examples/USGS_WaterData_GroundwaterLevels_Examples.nblink @@ -0,0 +1,3 @@ +{ + "path": "../../../demos/hydroshare/USGS_WaterData_GroundwaterLevels_Examples.ipynb" +} \ No newline at end of file diff --git a/docs/source/examples/USGS_WaterData_Measurements_Examples.nblink b/docs/source/examples/USGS_WaterData_Measurements_Examples.nblink new file mode 100644 index 00000000..2bf26789 --- /dev/null +++ b/docs/source/examples/USGS_WaterData_Measurements_Examples.nblink @@ -0,0 +1,3 @@ +{ + "path": "../../../demos/hydroshare/USGS_WaterData_Measurements_Examples.ipynb" +} \ No newline at end of file diff --git a/docs/source/examples/USGS_WaterData_ParameterCodes_Examples.nblink b/docs/source/examples/USGS_WaterData_ParameterCodes_Examples.nblink new file mode 100644 index 00000000..9e69b23b --- /dev/null +++ b/docs/source/examples/USGS_WaterData_ParameterCodes_Examples.nblink @@ -0,0 +1,3 @@ +{ + "path": "../../../demos/hydroshare/USGS_WaterData_ParameterCodes_Examples.ipynb" +} \ No newline at end of file diff --git a/docs/source/examples/USGS_WaterData_Peaks_Examples.nblink b/docs/source/examples/USGS_WaterData_Peaks_Examples.nblink new file mode 100644 index 00000000..400077bf --- /dev/null +++ b/docs/source/examples/USGS_WaterData_Peaks_Examples.nblink @@ -0,0 +1,3 @@ +{ + "path": "../../../demos/hydroshare/USGS_WaterData_Peaks_Examples.ipynb" +} \ No newline at end of file diff --git a/docs/source/examples/USGS_WaterData_Ratings_Examples.nblink b/docs/source/examples/USGS_WaterData_Ratings_Examples.nblink new file mode 100644 index 00000000..b145eac3 --- /dev/null +++ b/docs/source/examples/USGS_WaterData_Ratings_Examples.nblink @@ -0,0 +1,3 @@ +{ + "path": "../../../demos/hydroshare/USGS_WaterData_Ratings_Examples.ipynb" +} \ No newline at end of file diff --git a/docs/source/examples/USGS_WaterData_Samples_Examples.nblink b/docs/source/examples/USGS_WaterData_Samples_Examples.nblink new file mode 100644 index 00000000..f03cb32d --- /dev/null +++ b/docs/source/examples/USGS_WaterData_Samples_Examples.nblink @@ -0,0 +1,3 @@ +{ + "path": "../../../demos/hydroshare/USGS_WaterData_Samples_Examples.ipynb" +} \ No newline at end of file diff --git a/docs/source/examples/USGS_WaterData_SiteInfo_Examples.nblink b/docs/source/examples/USGS_WaterData_SiteInfo_Examples.nblink new file mode 100644 index 00000000..43c1069a --- /dev/null +++ b/docs/source/examples/USGS_WaterData_SiteInfo_Examples.nblink @@ -0,0 +1,3 @@ +{ + "path": "../../../demos/hydroshare/USGS_WaterData_SiteInfo_Examples.ipynb" +} \ No newline at end of file diff --git a/docs/source/examples/USGS_WaterData_SiteInventory_Examples.nblink b/docs/source/examples/USGS_WaterData_SiteInventory_Examples.nblink new file mode 100644 index 00000000..e9da45d2 --- /dev/null +++ b/docs/source/examples/USGS_WaterData_SiteInventory_Examples.nblink @@ -0,0 +1,3 @@ +{ + "path": "../../../demos/hydroshare/USGS_WaterData_SiteInventory_Examples.ipynb" +} \ No newline at end of file diff --git a/docs/source/examples/USGS_WaterData_Statistics_Examples.nblink b/docs/source/examples/USGS_WaterData_Statistics_Examples.nblink new file mode 100644 index 00000000..1ab51d1a --- /dev/null +++ b/docs/source/examples/USGS_WaterData_Statistics_Examples.nblink @@ -0,0 +1,3 @@ +{ + "path": "../../../demos/hydroshare/USGS_WaterData_Statistics_Examples.ipynb" +} \ No newline at end of file diff --git a/docs/source/examples/USGS_WaterData_UnitValues_Examples.nblink b/docs/source/examples/USGS_WaterData_UnitValues_Examples.nblink new file mode 100644 index 00000000..606ced5e --- /dev/null +++ b/docs/source/examples/USGS_WaterData_UnitValues_Examples.nblink @@ -0,0 +1,3 @@ +{ + "path": "../../../demos/hydroshare/USGS_WaterData_UnitValues_Examples.ipynb" +} \ No newline at end of file diff --git a/docs/source/examples/USGS_dataretrieval_DailyValues_Examples.nblink b/docs/source/examples/USGS_dataretrieval_DailyValues_Examples.nblink deleted file mode 100644 index 1a5d1603..00000000 --- a/docs/source/examples/USGS_dataretrieval_DailyValues_Examples.nblink +++ /dev/null @@ -1,3 +0,0 @@ -{ - "path": "../../../demos/hydroshare/USGS_dataretrieval_DailyValues_Examples.ipynb" -} \ No newline at end of file diff --git a/docs/source/examples/USGS_dataretrieval_GroundwaterLevels_Examples.nblink b/docs/source/examples/USGS_dataretrieval_GroundwaterLevels_Examples.nblink deleted file mode 100644 index 07ae4315..00000000 --- a/docs/source/examples/USGS_dataretrieval_GroundwaterLevels_Examples.nblink +++ /dev/null @@ -1,3 +0,0 @@ -{ - "path": "../../../demos/hydroshare/USGS_dataretrieval_GroundwaterLevels_Examples.ipynb" -} \ No newline at end of file diff --git a/docs/source/examples/USGS_dataretrieval_Measurements_Examples.nblink b/docs/source/examples/USGS_dataretrieval_Measurements_Examples.nblink deleted file mode 100644 index e9b01d72..00000000 --- a/docs/source/examples/USGS_dataretrieval_Measurements_Examples.nblink +++ /dev/null @@ -1,3 +0,0 @@ -{ - "path": "../../../demos/hydroshare/USGS_dataretrieval_Measurements_Examples.ipynb" -} \ No newline at end of file diff --git a/docs/source/examples/USGS_dataretrieval_ParameterCodes_Examples.nblink b/docs/source/examples/USGS_dataretrieval_ParameterCodes_Examples.nblink deleted file mode 100644 index 5ec041a8..00000000 --- a/docs/source/examples/USGS_dataretrieval_ParameterCodes_Examples.nblink +++ /dev/null @@ -1,3 +0,0 @@ -{ - "path": "../../../demos/hydroshare/USGS_dataretrieval_ParameterCodes_Examples.ipynb" -} \ No newline at end of file diff --git a/docs/source/examples/USGS_dataretrieval_Peaks_Examples.nblink b/docs/source/examples/USGS_dataretrieval_Peaks_Examples.nblink deleted file mode 100644 index 6dbe3ba0..00000000 --- a/docs/source/examples/USGS_dataretrieval_Peaks_Examples.nblink +++ /dev/null @@ -1,3 +0,0 @@ -{ - "path": "../../../demos/hydroshare/USGS_dataretrieval_Peaks_Examples.ipynb" -} \ No newline at end of file diff --git a/docs/source/examples/USGS_dataretrieval_Ratings_Examples.nblink b/docs/source/examples/USGS_dataretrieval_Ratings_Examples.nblink deleted file mode 100644 index ae1f1aef..00000000 --- a/docs/source/examples/USGS_dataretrieval_Ratings_Examples.nblink +++ /dev/null @@ -1,3 +0,0 @@ -{ - "path": "../../../demos/hydroshare/USGS_dataretrieval_Ratings_Examples.ipynb" -} \ No newline at end of file diff --git a/docs/source/examples/USGS_dataretrieval_SiteInfo_Examples.nblink b/docs/source/examples/USGS_dataretrieval_SiteInfo_Examples.nblink deleted file mode 100644 index b53585d0..00000000 --- a/docs/source/examples/USGS_dataretrieval_SiteInfo_Examples.nblink +++ /dev/null @@ -1,3 +0,0 @@ -{ - "path": "../../../demos/hydroshare/USGS_dataretrieval_SiteInfo_Examples.ipynb" -} \ No newline at end of file diff --git a/docs/source/examples/USGS_dataretrieval_SiteInventory_Examples.nblink b/docs/source/examples/USGS_dataretrieval_SiteInventory_Examples.nblink deleted file mode 100644 index 31a2527d..00000000 --- a/docs/source/examples/USGS_dataretrieval_SiteInventory_Examples.nblink +++ /dev/null @@ -1,3 +0,0 @@ -{ - "path": "../../../demos/hydroshare/USGS_dataretrieval_SiteInventory_Examples.ipynb" -} \ No newline at end of file diff --git a/docs/source/examples/USGS_dataretrieval_Statistics_Examples.nblink b/docs/source/examples/USGS_dataretrieval_Statistics_Examples.nblink deleted file mode 100644 index f3e2e418..00000000 --- a/docs/source/examples/USGS_dataretrieval_Statistics_Examples.nblink +++ /dev/null @@ -1,3 +0,0 @@ -{ - "path": "../../../demos/hydroshare/USGS_dataretrieval_Statistics_Examples.ipynb" -} \ No newline at end of file diff --git a/docs/source/examples/USGS_dataretrieval_UnitValues_Examples.nblink b/docs/source/examples/USGS_dataretrieval_UnitValues_Examples.nblink deleted file mode 100644 index d6b0133a..00000000 --- a/docs/source/examples/USGS_dataretrieval_UnitValues_Examples.nblink +++ /dev/null @@ -1,3 +0,0 @@ -{ - "path": "../../../demos/hydroshare/USGS_dataretrieval_UnitValues_Examples.ipynb" -} \ No newline at end of file diff --git a/docs/source/examples/USGS_dataretrieval_WaterSamples_Examples.nblink b/docs/source/examples/USGS_dataretrieval_WaterSamples_Examples.nblink deleted file mode 100644 index b0ab7d9a..00000000 --- a/docs/source/examples/USGS_dataretrieval_WaterSamples_Examples.nblink +++ /dev/null @@ -1,3 +0,0 @@ -{ - "path": "../../../demos/hydroshare/USGS_dataretrieval_WaterSamples_Examples.ipynb" -} \ No newline at end of file diff --git a/docs/source/examples/USGS_dataretrieval_WaterUse_Examples.nblink b/docs/source/examples/USGS_dataretrieval_WaterUse_Examples.nblink deleted file mode 100644 index f6989d90..00000000 --- a/docs/source/examples/USGS_dataretrieval_WaterUse_Examples.nblink +++ /dev/null @@ -1,3 +0,0 @@ -{ - "path": "../../../demos/hydroshare/USGS_dataretrieval_WaterUse_Examples.ipynb" -} \ No newline at end of file diff --git a/docs/source/examples/index.rst b/docs/source/examples/index.rst index edd43beb..6011fc4b 100644 --- a/docs/source/examples/index.rst +++ b/docs/source/examples/index.rst @@ -41,18 +41,18 @@ project repository. .. toctree:: :maxdepth: 1 - USGS_dataretrieval_DailyValues_Examples - USGS_dataretrieval_GroundwaterLevels_Examples - USGS_dataretrieval_Measurements_Examples - USGS_dataretrieval_ParameterCodes_Examples - USGS_dataretrieval_Peaks_Examples - USGS_dataretrieval_Ratings_Examples - USGS_dataretrieval_SiteInfo_Examples - USGS_dataretrieval_SiteInventory_Examples - USGS_dataretrieval_Statistics_Examples - USGS_dataretrieval_UnitValues_Examples - USGS_dataretrieval_WaterSamples_Examples - USGS_dataretrieval_WaterUse_Examples + USGS_WaterData_DailyValues_Examples + USGS_WaterData_GroundwaterLevels_Examples + USGS_WaterData_Measurements_Examples + USGS_WaterData_ParameterCodes_Examples + USGS_WaterData_Peaks_Examples + USGS_WaterData_Ratings_Examples + USGS_WaterData_SiteInfo_Examples + USGS_WaterData_SiteInventory_Examples + USGS_WaterData_Statistics_Examples + USGS_WaterData_UnitValues_Examples + USGS_WaterData_Samples_Examples + USGS_NWIS_WaterUse_Examples Using ``dataretrieval`` to obtain nation trends in peak annual streamflow @@ -61,7 +61,7 @@ Using ``dataretrieval`` to obtain nation trends in peak annual streamflow .. toctree:: :maxdepth: 2 - nwisdemo01 + peak_streamflow_trends Duplicating the R ``dataRetrieval`` vignettes functionality diff --git a/docs/source/examples/nwisdemo01.nblink b/docs/source/examples/peak_streamflow_trends.nblink similarity index 50% rename from docs/source/examples/nwisdemo01.nblink rename to docs/source/examples/peak_streamflow_trends.nblink index 48f0bc2b..1bf99495 100644 --- a/docs/source/examples/nwisdemo01.nblink +++ b/docs/source/examples/peak_streamflow_trends.nblink @@ -1,6 +1,6 @@ { - "path": "../../../demos/NWIS_demo_1.ipynb", + "path": "../../../demos/peak_streamflow_trends.ipynb", "extra-media": [ "../../../demos/datasets" ] -} \ No newline at end of file +} diff --git a/docs/source/examples/readme_examples.rst b/docs/source/examples/readme_examples.rst index 62cb6eb7..0d7a9099 100644 --- a/docs/source/examples/readme_examples.rst +++ b/docs/source/examples/readme_examples.rst @@ -1,40 +1,49 @@ -Examples from the Readme file on retrieving NWIS data ------------------------------------------------------ +Retrieving USGS water data with the ``waterdata`` module +-------------------------------------------------------- .. note:: - NWIS stands for the National Water Information System - - -.. doctest:: - - >>> # first import the functions for downloading data from NWIS - >>> import dataretrieval.nwis as nwis - - >>> # specify the USGS site code for which we want data. - >>> site = '03339000' - - >>> # get instantaneous values (iv) - >>> df = nwis.get_record(sites=site, service='iv', start='2017-12-31', end='2018-01-01') - - >>> df.head() - 00010 00010_cd site_no 00060 00060_cd ... 63680_ysi), [discontinued 10/5/21_cd 63680_hach 63680_hach_cd 99133 99133_cd - datetime ... - 2017-12-31 06:00:00+00:00 1.0 A 03339000 140.0 A ... A 3.6 A 4.61 A - 2017-12-31 06:15:00+00:00 1.0 A 03339000 138.0 A ... A 3.6 A 4.61 A - 2017-12-31 06:30:00+00:00 1.0 A 03339000 139.0 A ... A 3.4 A 4.61 A - 2017-12-31 06:45:00+00:00 1.0 A 03339000 139.0 A ... A 3.4 A 4.61 A - 2017-12-31 07:00:00+00:00 1.0 A 03339000 139.0 A ... A 3.5 A 4.61 A - - [5 rows x 21 columns] - - - >>> # get basic info about the site - >>> df3 = nwis.get_record(sites=site, service='site') - - >>> print(df3) - agency_cd site_no station_nm site_tp_cd lat_va long_va ... aqfr_cd aqfr_type_cd well_depth_va hole_depth_va depth_src_cd project_no - 0 USGS 03339000 VERMILION RIVER NEAR DANVILLE, IL ST 400603 873550 ... NaN NaN NaN NaN NaN 100 - - [1 rows x 42 columns] \ No newline at end of file + The ``waterdata`` module accesses the USGS `Water Data API`_ and is the + recommended way to retrieve USGS water data. The legacy ``nwis`` module + remains available but is deprecated. + +.. _Water Data API: https://api.waterdata.usgs.gov/ + +.. code:: python + + >>> # import the waterdata module + >>> from dataretrieval import waterdata + + >>> # a USGS monitoring location id joins the agency code and the site + >>> # number with a hyphen + >>> site = "USGS-05427718" + + >>> # get continuous (instantaneous) streamflow — parameter code 00060 — + >>> # over a one-day window + >>> df, md = waterdata.get_continuous( + ... monitoring_location_id=site, + ... parameter_code="00060", + ... time="2024-03-01/2024-03-02", + ... ) + + >>> df[["time", "value", "unit_of_measure", "approval_status"]].head() + time value unit_of_measure approval_status + 0 2024-03-01 00:00:00+00:00 18.7 ft^3/s Approved + 1 2024-03-01 00:15:00+00:00 18.5 ft^3/s Approved + 2 2024-03-01 00:30:00+00:00 18.5 ft^3/s Approved + 3 2024-03-01 00:45:00+00:00 18.5 ft^3/s Approved + 4 2024-03-01 01:00:00+00:00 18.3 ft^3/s Approved + + >>> # get descriptive metadata about the monitoring location itself + >>> info, md = waterdata.get_monitoring_locations( + ... monitoring_location_id=site, + ... skip_geometry=True, + ... ) + + >>> info[["monitoring_location_name", "state_name", "site_type", "drainage_area"]].T + 0 + monitoring_location_name YAHARA RIVER AT WINDSOR, WI + state_name Wisconsin + site_type Stream + drainage_area 73.6 diff --git a/docs/source/examples/siteinfo_examples.rst b/docs/source/examples/siteinfo_examples.rst index 55806721..f514d634 100644 --- a/docs/source/examples/siteinfo_examples.rst +++ b/docs/source/examples/siteinfo_examples.rst @@ -2,49 +2,47 @@ Retrieving site information --------------------------- -By default ``dataretrieval`` fetches the so-called "expanded" site date from -the NWIS web service. However there is an optional keyword parameter called -``seriesCatalogOutput`` that can be set to "True" if you wish to retrieve the -detailed period of record information for a site instead. Refer to the -`NWIS water services documentation`_ for additional information. The below -example illustrates the use of the ``seriesCatalogOutput`` switch and displays -the resulting column names for the output dataframes (example prompted by -`GitHub Issue #34`_). - -.. _NWIS water services documentation: https://waterservices.usgs.gov/docs/site-service/site-service-details/ - -.. _GitHub Issue #34: https://github.com/DOI-USGS/dataretrieval-python/issues/34 - -.. doctest:: - - # first import the functions for downloading data from NWIS - >>> import dataretrieval.nwis as nwis - - # fetch data from a major HUC basin with seriesCatalogOutput set to True - >>> df = nwis.get_record(huc='20', parameterCd='00060', - ... service='site', seriesCatalogOutput='True') - - >>> print(df.columns) - Index(['agency_cd', 'site_no', 'station_nm', 'site_tp_cd', 'dec_lat_va', - 'dec_long_va', 'coord_acy_cd', 'dec_coord_datum_cd', 'alt_va', - 'alt_acy_va', 'alt_datum_cd', 'huc_cd', 'data_type_cd', 'parm_cd', - 'stat_cd', 'ts_id', 'loc_web_ds', 'medium_grp_cd', 'parm_grp_cd', - 'srs_id', 'access_cd', 'begin_date', 'end_date', 'count_nu'], - dtype='object') - - # repeat the same query with seriesCatalogOutput set as False - >>> df = nwis.get_record(huc='20', parameterCd='00060', - ... service='site', seriesCatalogOutput='False') - - >>> print(df.columns) - Index(['agency_cd', 'site_no', 'station_nm', 'site_tp_cd', 'lat_va', 'long_va', - 'dec_lat_va', 'dec_long_va', 'coord_meth_cd', 'coord_acy_cd', - 'coord_datum_cd', 'dec_coord_datum_cd', 'district_cd', 'state_cd', - 'county_cd', 'country_cd', 'land_net_ds', 'map_nm', 'map_scale_fc', - 'alt_va', 'alt_meth_cd', 'alt_acy_va', 'alt_datum_cd', 'huc_cd', - 'basin_cd', 'topo_cd', 'instruments_cd', 'construction_dt', - 'inventory_dt', 'drain_area_va', 'contrib_drain_area_va', 'tz_cd', - 'local_time_fg', 'reliability_cd', 'gw_file_cd', 'nat_aqfr_cd', - 'aqfr_cd', 'aqfr_type_cd', 'well_depth_va', 'hole_depth_va', - 'depth_src_cd', 'project_no'], - dtype='object') +The ``waterdata`` module distinguishes a monitoring location's *descriptive* +metadata from the *catalog* of data available at it. + +Use ``get_monitoring_locations`` for descriptive metadata — name, location, +site type, drainage area, hydrologic unit, and so on. + +.. code:: python + + >>> from dataretrieval import waterdata + + >>> info, md = waterdata.get_monitoring_locations( + ... monitoring_location_id="USGS-05427718", + ... skip_geometry=True, + ... ) + + >>> info[["monitoring_location_name", "site_type", "drainage_area", "hydrologic_unit_code"]].T + 0 + monitoring_location_name YAHARA RIVER AT WINDSOR, WI + site_type Stream + drainage_area 73.6 + hydrologic_unit_code 070900020504 + +To discover *what data are available* at a location — the period-of-record +catalog that the legacy ``seriesCatalogOutput`` switch used to provide — use +``get_time_series_metadata``. Each row is one time series; the ``begin`` and +``end`` columns give its period of record. + +.. code:: python + + >>> series, md = waterdata.get_time_series_metadata( + ... monitoring_location_id="USGS-05427718", + ... skip_geometry=True, + ... ) + + >>> len(series) # number of available time series + 22 + + >>> series[["parameter_code", "parameter_name", "computation_period_identifier"]].head() + parameter_code parameter_name computation_period_identifier + 0 00045 Precipitation Points + 1 91060 Orthophosphate, diss Daily + 2 91057 NH3+orgN, wu as N Daily + 3 00060 Discharge Points + 4 80155 Suspnd sedmnt disch Daily diff --git a/docs/source/userguide/timeconventions.rst b/docs/source/userguide/timeconventions.rst index 8336be51..03b4d890 100644 --- a/docs/source/userguide/timeconventions.rst +++ b/docs/source/userguide/timeconventions.rst @@ -3,78 +3,74 @@ Datetime Information -------------------- -``dataretrieval`` attempts to normalize time data to UTC time when converting -web service data into dataframes. To do this, in-built pandas functions are -used; either :obj:`pandas.to_datetime()` during the initial datetime object -conversion, or :obj:`pandas.DataFrame.tz_localize()` if the datetime objects -exist but are not UTC-localized. In most cases (single-site and multi-site), -``dataretrieval`` assigns the datetime information as the dataframe *index*, -the exception to this is when incomplete datetime information is available, in -these cases integers are used as the dataframe index (see `PR#58`_ for more -details). - -.. _PR#58: https://github.com/DOI-USGS/dataretrieval-python/pull/58 +``dataretrieval`` normalizes time data to UTC when converting Water Data API +responses into data frames. Timestamps are returned in the ``time`` column (the +dataframe itself uses a default integer index). For sub-daily data — such as +continuous (instantaneous) values — ``time`` is a timezone-aware +``datetime64[us, UTC]`` column. Daily values represent a whole calendar day, +so their ``time`` column is timezone-naive (dates only). Inspecting Timestamps ********************* -For single sites, the index of the returned dataframe contains pandas -timestamps. +For continuous data, the ``time`` column holds UTC-localized pandas timestamps. + +.. code:: python + + >>> from dataretrieval import waterdata + >>> df, md = waterdata.get_continuous( + ... monitoring_location_id="USGS-05427718", + ... parameter_code="00060", + ... time="2024-03-01/2024-03-02", + ... ) + >>> df["time"].head() + 0 2024-03-01 00:00:00+00:00 + 1 2024-03-01 00:15:00+00:00 + 2 2024-03-01 00:30:00+00:00 + 3 2024-03-01 00:45:00+00:00 + 4 2024-03-01 01:00:00+00:00 + Name: time, dtype: datetime64[us, UTC] + +Each timestamp has the format ``YYYY-MM-DD HH:MM:SS+HH:MM``. Because the values +are localized to UTC, the offset (``+HH:MM``) is ``+00:00``. You can convert +them to a local timezone of your choosing with the pandas ``.dt`` accessor. .. code:: python - >>> import dataretrieval.nwis as nwis - >>> site = '03339000' - >>> df = nwis.get_record(sites=site, service='peaks', - ... start='2015-01-01', end='2017-12-31') - >>> print(df) - agency_cd site_no peak_tm peak_va peak_cd gage_ht gage_ht_cd year_last_pk ag_dt ag_tm ag_gage_ht ag_gage_ht_cd - datetime - 2015-06-08 00:00:00+00:00 USGS 03339000 17:30 25100 C 22.83 NaN NaN NaN NaN NaN NaN - 2015-12-29 00:00:00+00:00 USGS 03339000 18:45 37600 C 26.66 NaN NaN NaN NaN NaN NaN - 2017-05-05 00:00:00+00:00 USGS 03339000 04:45 17000 C 18.47 NaN NaN NaN NaN NaN NaN - -Here the index of the dataframe ``df`` is a set of datetime objects. Each has -the format, ``YYYY-MM-DD HH:MM:SS+HH:MM``. Because these timestamps are -localized to be in UTC, the expected offset (``+HH:MM``) is ``+00:00``. -These values can be converted to a local timezone of your choosing using -:obj:`pandas` functionality. + >>> df["time"] = df["time"].dt.tz_convert("America/New_York") + >>> df["time"].head() + 0 2024-02-29 19:00:00-05:00 + 1 2024-02-29 19:15:00-05:00 + 2 2024-02-29 19:30:00-05:00 + 3 2024-02-29 19:45:00-05:00 + 4 2024-02-29 20:00:00-05:00 + Name: time, dtype: datetime64[us, America/New_York] + +After conversion the timestamps carry New York's offset — ``-05:00`` during +standard time, or ``-04:00`` during daylight saving time, since New York is 4 +or 5 hours behind UTC depending on the time of year. Note that the first +midnight-UTC reading rolls back to the previous calendar day (``2024-02-29``) +once shifted into New York time. + + +Daily values +************ + +Daily data summarize a whole calendar day, so the ``time`` column is +timezone-naive — no offset is applied. .. code:: python - >>> df.index = df.index.tz_convert(tz='America/New_York') - >>> print(df) - agency_cd site_no peak_tm peak_va peak_cd gage_ht gage_ht_cd year_last_pk ag_dt ag_tm ag_gage_ht ag_gage_ht_cd - datetime - 2015-06-07 20:00:00-04:00 USGS 03339000 17:30 25100 C 22.83 NaN NaN NaN NaN NaN NaN - 2015-12-28 19:00:00-05:00 USGS 03339000 18:45 37600 C 26.66 NaN NaN NaN NaN NaN NaN - 2017-05-04 20:00:00-04:00 USGS 03339000 04:45 17000 C 18.47 NaN NaN NaN NaN NaN NaN - -Above, the index was converted to localize the timestamps to New York. -In the updated dataframe index, the resulting timestamps now have offsets of -``-04:00`` and ``-05:00`` as New York is either 4 or 5 hours behind UTC -depending on the time of year (due to daylight savings). - -When information for multiple sites is requested, ``dataretrieval`` creates a -dataframe with a multi-index, with the first entry containing the site number, -and the second containing the datetime information. - -.. doctest:: - - >>> import dataretrieval.nwis as nwis - >>> sites = ['180049066381200', '290000095192602'] - >>> df = nwis.get_record(sites=sites, service='gwlevels', - ... start='2021-10-01', end='2022-01-01') - >>> df - agency_cd site_tp_cd lev_dt lev_tm lev_tz_cd ... lev_dt_acy_cd lev_acy_cd lev_src_cd lev_meth_cd lev_age_cd - site_no datetime ... - 180049066381200 2021-10-04 19:54:00+00:00 USGS GW 2021-10-04 19:54 +0000 ... m NaN S S A - 2021-11-16 14:28:00+00:00 USGS GW 2021-11-16 14:28 +0000 ... m NaN S S A - 2021-12-09 10:43:00+00:00 USGS GW 2021-12-09 10:43 +0000 ... m NaN S S A - 290000095192602 2021-12-08 19:07:00+00:00 USGS GW 2021-12-08 19:07 +0000 ... m NaN S S P - - [4 rows x 15 columns] - -Here note that the default datetime index information returned is also UTC -localized, and therefore the offset values are ``+00:00``. \ No newline at end of file + >>> df, md = waterdata.get_daily( + ... monitoring_location_id="USGS-05427718", + ... parameter_code="00060", + ... time="2024-03-01/2024-03-05", + ... ) + >>> df["time"].head() + 0 2024-03-01 + 1 2024-03-02 + 2 2024-03-03 + 3 2024-03-04 + 4 2024-03-05 + Name: time, dtype: datetime64[us]