Data Products
CMAR Water Quality data is available through several data products. Summary reports are available on the CMAR Website. Full datasets can be downloaded from the Nova Scotia Open Data Portal and CIOOS Atlantic.
The reports and datasets can also be accessed through the CMAR Station Locations Map.
This page describes the structure of the data that can be downloaded from the Nova Scotia Open Data Portal.
County Datasets
The Water Quality datasets are organized by county. These are large datasets, and it is highly recommended that users filter the data for the station(s) and variable(s) of interest before downloading the data.
Take care when filtering based on the quality control flags, because removing a whole row with a “Failed” observation may also removed good observations. For example, a sensor may measure a temperature observation flagged as “Pass” at the same time (i.e., on the same row) as a dissolved oxygen observation flagged as “Fail”. The temperature observation should be included in the analysis, but the dissolved oxygen observation should be excluded. Ideally data should be pivoted into a “long” format using CMAR’s qaqcmar
package and then filtered.
Note that Excel files (.xlsx) have a limit of 1,048,576 rows per sheet. CSV files can hold more rows, but may not display them all. Please use caution when downloading and analyzing the data.
Data Format
There are 22 columns in each dataset, as described below.
Deployment Columns
The first 7 columns provide information on the deployment, including the location1, the deployment dates, and the sensor string configuration.
The string configuration indicates how the sensors were deployed, e.g., are they at a fixed vertical locations, or do they float with the tide. Configuration options are: sub-surface buoy, surface buoy, attached to gear, attached to fixed structure, floating dock, or unknown2, as described under Data Collection.
Sensor Columns
The sensor_*
columns provide information on the sensor the made the measurement, including the model, serial number, and the estimated depth below the surface at low tide. If sensor depth was measured, the depth_crosscheck
column checks whether this estimated sensor depth aligns with measured depth.
Measurement Columns
The timestamp_utc
column indicates the time the measurement(s) was recorded, in the UTC (Coordinated Universal Time) time zone. This time zone does not observe daylight savings time, so users should take care if required to convert to Atlantic Standard Time (AST; UTC-4 hours) or Atlantic Daylight Time (ADT; UTC-3 hours).
There is a measurement value column for each variable (and unit). The measurement columns are named in the format variable_unit
, e.g., temperature_degree_c
. If a sensor records more than one variable per timestamp, both measurements will be in the same row. Otherwise, there will be an NA
value in the measurement column. This results in many NA
values per dataset, and these should be dealt with appropriately prior to analysis.
Summary Flag Columns
The remaining columns are for summary quality control flags. These are named in the format qc_flag_variable_unit
, e.g., qc_flag_temperature_degree_c
. These columns hold the worst flag value assigned to the corresponding observation. Because measurements are recorded by row, there will be many NA
values in these columns.
The Spike Test and Rolling Standard Deviation Test inherently assign a flag of “Not Evaluated” to observations at the beginning and end of each deployment. Following QARTOD, this corresponds to a numeric value of 2, while “Pass” corresponds to a numeric value of 1. Therefore, many observations are expected to be assigned a flag of “Not Evaluated”. These are typically safe to include in an analysis.
Additional Flag Columns
Internal CMAR datasets hold a separate flag column for each variable and QC test; however, to keep the datasets more manageable for users, these were not published3. If it is crucial for a user to understand which test resulted in a specific flag value for an observation, the user can contact CMAR using the information on the website footer.