Do you generate, use, or manage ocean time series data? Please take a few minutes to complete this survey.
In preparation for this workshop, we developed this short survey on the data challenges and needs of the ocean time series community. Your feedback will help shape future cyber-infrastructure for ocean time series.
Q1 What are your primary roles/responsibilities (please check all that
Q2 What is (are) your primary discipline(s)? (please check all that apply)
Q3 What is your career stage?
Q4 How do you use ocean time series data? (please check all that apply)
Q5 Please list your primary sources of funding for ocean time series-related work (e.g., NSF, NOAA, NASA, private foundations, etc.)
- BOEM, NSF
- National Weather Service, Copernicus Marine
- NSF, NASA, NOAA
- national funding
- DOST-funded project
- NIWA SSIF funding (New Zealand Govt funding)
- Currently I am funded by a Simons Fellowship to do work at Station ALOHA, but by the time this workshop happens I will be funded by two grants (one NERC, project: CLASS, the other EU, project: COMFORT) to be doing global climate-relevant biogeochemistry data analysis, using time series & model output.
- NASA, The Gordon and Betty Moore Foundation
- NSF, private foundations
- Instituto Español de Oceanografia (IEO, Spain) Galicia Regional Government (Xunta de Galicia, Spain) European Research Funds (various EU research and innovation programmes and Interreg funds) Spanish Government Research Funds (various research projects and infrastructure funds)
- NASA, NSF
- private foundations, NSF
- NOAA, NASA, EPA, NSF
- NSF, NASA
- NSF, NOAA
- NASA, NOAA, NSF, ONR (in that order) Instituto Nacional de Investigacion y Desarrollo Pesquero (Argentina)
- BOEM, NSF
- Mexican government funding (Conacyt), Nippon Foundation and POGO, IAI
- NASA, private foundations
- Currently no sources of funding
- EU funded projects, i.e. EU COST ACTION Ocean Governance for Sustainability (OceanGov CA 1527) www.oceangov.eu - Working Group Ocean, Climate Change and Acidification
- USGS, state of California
- IMARPE, NOAA and NASA
Q6 We are trying to identify key barriers to working with ocean time series data. How difficult is it for you to:
Q7 Please comment on specific aspects of your difficulties with ocean time series data
- I often have to ask researchers directly for where their data are, and then I have to deal with
various proprietary sensor formats.
- Finding, and downloading time-series has been time consuming and sometimes unsuccessful.
Data is often in non standardised .txt, .dat, .csv files rather than netcdf.
- Inconsistent access locations, formats, metadata.
- finding a good inventory of what is available for a given time period is difficult - there are more
and more data portal with visual maps to help find data and this is a good direction, but sometimes
data that are visualized as available only have meta data - some data set exist on multiple
platforms, but do not have a common references that allow easily to see overlap.
- Lack of standardization and lack of technical "best practices" documentation (ie what database should I use, what API is preferred).
- As they are multidisciplinar.. getting first the analysis, then a merged and formated file is difficult 7/18/2019 3:59 AM
- BCO-DMO / NSF partitions time-series data into different projects based on funding. Therefore we cannot have a continuous project/dataset, leading to numerous requests for data. And comments about missing data. Maintaining project funding to avoid any data gaps is becoming increasing difficult.
- accessibility and format
- I am primarily working with my own data, from voyage planning, equipment maintenance, sample analysis, data analysis, data management, data submission. The data QA/QC submission process generally is time consuming, particularly dealing with the back data (ie data from some years ago). I don't often use data from other time series.
- Mostly I think the challenge is finding time series data that I'm not explicitly aware of but know areout there, & determining to what extent different datasets are comparable.
- There are so many requirement in some data bases that for a new user it becomr very difficult to cite the area/location of interest
- Data sets all have different formats, very little information on measurement errors, data can be difficult to assess regarding techniques and associated errors and what the measurement actually means. Techniques vary with little metadata. Little repeat measurements for assessing errors. Little to no QA/QC. The GOV data repositories not data bases so they are hard to access/filter.
- implementing QARTOD and other QC/QA checks to meet community standards, building trust in time series we acquire
- Find the description of the data, metadata and units
- Finding up to date figures prepared by others, e.g. BATS CO2 time-series data
- HOT and BATS should manage their own data - they both do a good job of making their data available in a timely fashion. BCO-DMO is not set-up well for serving these types of data, at least not in a user-friendly way like currently available through systems like HOT-DOGS. NSF should provide funding to the time series programs themselves for data management and service.
- Lack of a specific unit to deal with multidisciplinary ocean data (particularly integrating biology - plankton- data with hydrographic and biogeochemistry data)
- standard products and data layout.
- There is a difference between long term "time series" data, such as from BATS, HOTS, etc and shorter time series that only have their metadata submitted to a data base or data portal. It is the latter that are difficult to find or manage at times; sometimes only the integrated data, rather than the original profiles or rates, are submitted. And I understand why NetCDF but I hate NetCDF!
- I don't really have any difficulties. The data are easy to access for a modeler, and usually have all the meta-data I need. Time-series data are generally much easier to work with for a modeler than data from individual process studies.
- Not always clear what considerations need to be taken into account when using the data (methodological concerns, gaps, sensitivity, etc).
- Usually the problems arise when trying to aggregate deployment data and/or comparing two different data sets. These problems all come down to the way time is represented (e.g., matching up "hours since ..." with calendar day)
- [FAIR relevance indicated in ( )]: The discovery (F) to access (A) step is problematic, especially for biogeochemical data. Sites like BCO-DMO and MACAN/MARCO offer data searches, but really only deliver metadata about projects and PIs that have data (F, but ~A). They use tedious interactive map interfaces (~I). We have existing (for many years) tools like ERDDAP where geospatial/temporal extent searches are trivial (F), and the download formats are rich and flexible (A,R). If more data sets were made available via ERDDAP, even in a distributed sense, the Advanced Search capability and ability to have virtual catalogs would address all the FAIR issues with these.
- We use data from our own time series, or from other time series obtained directly from the PIs
- Researchers at UAF have worked in isolation for years, and have a collection of MATLAB scripts that generate data in formats we are used to working with. But we don't comply with standards etc so sharing our data is easy for us (hard for those who work with it). Also, our institution doesn't have the resources to support data management.
- Finding the resources to maintain ocean time series data collection]
- Getting the data from the original data collectors. Even if they give it, they do not want it shared. So "managing time series data" is not a challenge because there is not much to "manage" (at least in terms of making a public database of it).
- As someone who mostly submits data, I struggle with versioning. I work in part with DNA sequencing, and I've found that the database managers who want to upload species ID and counts as for organismal data don't understand why, once my data are "analyzed," they aren't static and may change with, e.g., newer reference sequences. I have not found a good solution here.
- Easiness in accessing is a challenge and sometimes complications of the language used in the website (not use a friend)
- Every time series dataset is accessed slightly differently and it can be difficult for a user that doesn't know the system/site to easily access these data. It would great if we could be more consistent in the platforms used to access time series data so that it could be easier for users to find what they need.
- Gaining timely (within 12-24 months of collection) access to the most updated data sets; data formatting; unit of measurement
- unwillingness to share data; incomplete metadata; uncertainties about data quality; formatting that gives the impression data 'owners' are not highly motivated to share
- It is difficult in wind data
Q8 How do you most commonly discover your data?
- Through the ocean time series websites, which do a good job of making the data accessible and "viewable".
- I access ocean related data using ERDDAP servers that solve most of these issues
- I discover existence through papers/metadata searches, but then try various comprehensive catalogs like ERDDAP at Coastwatch or PacIOOS, or some other OOSs, hoping to find the data mirrored there.
Q10 Where do you most commonly access ocean time series data?
- from the WMO GTS
- For coastal time series, EPA and NOAA sites.
- Various ERDDAP servers
Q12 Please indicate the quality of the following aspects of your data access experience with these ocean data portals:
Q13 Please comment on specific issues you’ve had with obtaining data.
- No consistent data product containing all time series data at the same QC level.
- accessibility or the lack thereof
- Usually a formatting issue
- Problem in accessing data bases
- Need to develop data-set specific QA/QC and difficult time in trying to filter out repeat observations in different archives as there are no unique data identifiers.
- Information quality is not present.
- Synthesizing paleoceanographic timeseries from NOAA Paleo or PANGEA is difficult.
- In the past it has been hard to get the most updated BATS time-series data, e.g. within 1 year of collection.
- The biggest issues are those associated with finding and using time series data from secondary sites (like BCO-DMO or Ocean SITES).
- Difficulties when integrating datasets with limited information on specific quality controls applied to each dataset
- Of them all, the NSF Ocean Observatories Initiative (OOI) access to Pioneer Coastal Array data is the worst. It is barely functional so we have written our own circumvention to the online portal and scape all the data into a private ERDDAP at http://www.myroms.org:8080/erddap/info/index.html?
- Metadata from individual researchers is non-existent.
- It depends on the type of data but many times the problems are formatting.
- Most data (and/or the most recent or complete version of the data) is only in the hands of the original collectors. Data still in the hands of individual (the collectors) is in a plethora of different formats. (Each person has a different format.)
- Metadata is key - for example, to not inappropriately compare measurements made with different techniques.
- The language used in the data source is not user-friendly. Eg if you need data for a specific location, how to set or locate properly the area in the data source/base is a challenge to new users
- It can be difficult to find the datasets I'm interested in, again, because they come from different sites that have different ways to access their data so it makes it difficult to integrate the datasets in the end.
Q14 What analysis and visualization tools and/or platforms do you most frequently use to analyze and/or visualize ocean time series data? (select all that apply)
Q15 How important do you perceive the following for improving ocean time series data systems?
Q16 Please comment further on these and other priorities for improving the experience of ocean time series data users.
- The problem of "getting people to submit data" is best solved by adding value in the form of context-empowered visualization and analysis enabled by similar data.
- As a data provider rather a data user, I identified the data submission process as the highest priority, then I think an on-line metadata editor. I like the SOCAT portal
- I think this process would be best served if a specific data product was chosen as a test data set to develop the methods and then once the standard was created and tested it would be used on others. I also think the effort should triage the different data types so that the effort can focus on the higher needs first.
- The data should be incorporated a data quality flag
- It is imperative that data managers were close to data originators. Ideally, one data expert will be required to ensure data flow (including metadata and formatting) between ca. 10-20 data generators and the data base or repository.
- Highest priority is getting the actual data into the data portal; if only metadata, ensuring the metadata are current and accurate. Nad getting all the data in in a timely manner; if it is after 2 yrs of collection, then have all the parameters, not only some. Then one has to contact the PI who most often will gladly send the data files personally. As a data generator, ensuring that the users recognize the data originators. This step is essential with the funders, even long after the project data collection phase is over.
- It's most important to allow the users to download the raw data (hopefully as netCDF files) and let us all use our favorite software to plot and analyze the data. We don't need software and visualization online of the data for scientists (perhaps for the public though.)
- 1) I believe we already have a common data model (netCDF) and convention (CF) 2) Metadata is a long, tedious process that is typically ignored by users (especially those accessing data via service). I find the typically workflow is download data, look for familiar things like "time", "temperature", etc., then email p.o.c. (PI) for questions rather then reading metadata (data equivalent of appliance manual) 3) the hang up with submitting data is it is always changing (delayed-mode QC, refinement, etc.), so it rarely is "final"
- There are already a number of communities who have grappled with these issues and developed excellent solutions, e.g. Climate-Forecast Conventions for metadata (including a vigorous active online community adding to and improving these standards), UNIDATA THREDDS/OPeNDAP services for providing easy access and subsetting, ERDDAP with Advanced Search capabilities to scan multiple data sets to find occurrence in requested interval/spaces (excellent for data discovery) and an active international user community. The panels of GOOS, GCOS and JCOMM regularly debate these issues and are knowledgeable about solution. If NSF ignores this community of practise and goes it alone on Earthcube, repeating the failures and waste of the OOI CyberInfrastructure effort, that would be unfortunate to say the least.
- We do not need more databases. We need more data ... both in terms of access to existing data and in more sampling programs. Change "publish or perish" to "share [data] or get shunned". If a person's productivity is ONLY weighed on their publications, sharing data (and spending tons of time documenting and reformatting it for submission) takes time away from more publishing. There is no incentive to share data in this model.
- If there is an effort to generate online visualization tools, there also have to be provisions to check them occasionally and make sure they're working as intended.
- The use of the map to locate the area may be of good help to a new user or simple software where you may fill your GPS locations etc in a simple language
- Again, establishing a common data model is key for accessibility and will aid in the other priorities as well.
- easy access and search tools
Q17 What are your pressing needs for short term data management of ocean time series data in the near term?
- An easy usage of multiple time series station data through consistent QC protocols/handling structures and data formats.
- Guidelines for metadata when submitting data to an archive, to better enable discovery and reuse
- I need a standard time-series API I can connect into grafana. I am currently scraping `csv`s into a graphite database, but haven't seen graphite anywhere else in the community.
- "Small" projects rarely get funding for a data management person, yet submission of data is becoming increasing complex.
- An meta data editor integrated to data submission process. Standard processing and QA methods
- Establishing emperical OA data like pH and alkalinity also SST as part of climate change and compare with other data sources for improving reliability
- would love a roll-up capability to something like GitHub that would allow specific data sets to always be updated and refreshed from the various data hubs.
- validation of data, improving data access
- faster availability (right now I see the BATS CO2 time-series figures on-line are updated to 2015, this is too slow).
- The US NSF funded time series programs (HOT and BATS), which are not part of a larger network (like LTER) do a great job with data management - fund them to continue this effort.
- Lack of experts in data management (preparing metadata, formatting, uniformizing quality-flags,...)
- dataset interconnectivity
- Requisite funding and recognition of the time needed to manage the data correctly.
- Access to a comprehensive historical data set of ocean acidification observations for U.S. coastal waters.
- To finalize our institutional database.
- Help standardizing our formats to our data can be shared
- LOL! I need more public data to actually "manage" (e.g., more people willing to share their data).
- Aragonite saturation and alkalinity data in the western Indian Ocean.
- More intuitive data management infrastructure, particularly for students that may be utilizing datasets for class projects.
- apply unique doi, user-friendly visualization and computational tools
Q18 What are your expectations for long-term data management and dissemination of ocean time series data?
- Easy submission and upload (near real-time) of new data and metadata within a consistent framework. Further, to enhance related automatization processes (e.g. QC procedures).
- Tools to enable archiving and discovery that don't require the average user to know how to program
- A solution must include federated/distributed hosting with replication, versioning, "permanent" URL & API access, and RESTful subsetting.
- A one-stop shop! For submitting QC'ed data and metadata, to a common format, that is then archived, managed (doi), and accessible by users. The ability to include additional parameters. And also, somehow compatible with other OTS platforms (floats, repeat glider sections etc),
- To have local emperical data that may be modified then uploaded to any data bases
- A closure on the metadata requirements and the QA/QC procedures. For instance, I would argue that the data providers not truncate the observations. In practice this is arbitrary and leads to significant uniform errors in the data sets across a wide range of observations.
- Would like to see things be accommodated as much as possible in common repositories such as NCEI. Would like to provide a way for users to see validation data
- Operation of a network of specific repositories for ocean (multidisciplinary)
- All data will be available in netCDF.
- software solutions will outpace any efforts to standardize data
- I hope that data owners see the value of contributing to unified data portals, rather than holding data in various specialists formats on their own data portals.
- To link our institutional database to other relevant international portals.
- We used to provide data management at the institutional level (websites, ftp, internal databases), but are looking more and more to national repositories.
- We need clear logging/tracking of the original-source identifications and usage. How do "Joe" or "Suzy" report to their bosses/funders who is using their data if the large database entities don't track usage *and* automatically report it back to "Joe" and "Suzy".
- As molecular techniques (eDNA, etc.) increase in popularity, the field needs to figure out a better way to integrate both analyzed results and info about raw sequence accessibility (NCBI SRA accession numbers, for example) into databases. We need to preserve both the results used in published literature and the possibility of re-analysis and re-interpretation within the context of the associated physicochemical time-series data.
- Looking for funding to install loggers or buoys with carbonate, alkalinity and CO2 sensors in the western Indian Ocean to generate data
- A single portal for all time series datasets that is formatted the same and easy to select what datasets and variables you want. I think BCO-DMO is doing a good job with this and it would be useful to follow their model. Also, if the different funding agencies could collaborate to be willing to have their data on the same platform that would be helpful.
- Improve interoperability across different data portals
Q19 Is there anything else you wish to add/comment on related to ocean time series data?
- This is a great initiative - thank you.
- Need scientists to work together and funding too
- Establishment of simple focussed effort. Remember that the more general the solution the more complex. One solution will not support all needs.
- Please help make sure the time-series data continue! 🙂
- As I have said, many solved by existing ERDDAP servers. It would be a shame to reinvent the
- The challenge to maintain a time series in a developing country is far higher than in well developed centers. Hence, an organized database is desirable, but it takes longer to achieve.
- Again, if everything is based only on the number-of-publications produced, where is the incentive for spending time on data management/documenting/sharing.
- Welcome scientists to work together in the western Indian ocean and Tanzania in particular so as to generate more data
- I'm glad you're having this workshop, it's needed! 6/14/2019 8:30 AM