2024-11-20
Many different types (or levels) of documentation:
What is metadata and why does it matter?
What is geospatial metadata?
How can you make metadata?
Portions of these are adapted or inspired by a fall 2022 lecture by Reina Chano Murray (repo / Google Slides).
Metadata is data about data. It describes information like who collected it, when, for what purpose, and the level of quality… You can think of metadata as little messengers to the future.
– Sarah Wakamiya (Inventory & Monitoring Program Data Manager), National Park Service from What in the World is Metadata and Why Should I Care?
“the metadata is collected so that it can fulfill a useful purpose, and sorted into known categories. It is this notion of structure that turns raw information into actionable metadata.”
– Jenn Riley, NISO, from Understanding Metadata: What is Metadata, and What is it For?: A Primer, 2017
Metadata makes your data more discoverable and understandable
Good metadata helps others trust, validate, reuse and build upon your data.
A metadata standard is a set of rules, or an agreement, that set the minimum amount of information that should be documented about a dataset (and how).
A metadata syntax is a set of rules for the structure and format of the metadata.
Standardizing content and syntax makes it easier for both humans and computers to find and understand your data!
So many standards. See Jenn Riley’s Metadata Map.
Metadata standards usually include schemas.
Schemas provide the overall structure for the metadata - provide a set of elements that should be used to describe a dataset.
Geospatial metadata is just metadata about geospatial datasets, projects, or workflows.
Important
Some elements can be generated by a GIS application—but some require manual entry.
MD iMAP Data Submission Policy (Jan. 2015): “organizations can submit data for inclusion in MD iMAP using the guidelines in this document.”
Policy and Standards for Esri ArcGIS Online (Mar. 2021): “policy and guidance on standards and the use of ArcGIS Online for Maryland (AGOL) by State agencies.”
MD iMAP Data Management Plan (Jan. 2015): “standards and specifications” to improve “data consistency and availability of information.”
EPA Metadata Specifications. See also Ecological Metadata Language (EML)
Federal agencies are encouraged to use ISO 19115: Geographic information - Metadata (a standard developed from 1999 to 2003 to make the 1998 Content Standard for Digital Geospatial Metadata (CSDGM) work with “other formal and defacto standards that support the documentation of geospatial data and services.”
Varied tools can support the creation of different elements of metadata.
How to FAIR: “how you can make your research data more FAIR by taking you through six FAIRification practices:”
READMEs and data dictionaries are your best friends
Document your data along the way - saves you time at the end!
Use descriptive file names
Use GIS applications to make metadata
If you’re using geospatial desktop software or web GIS, create your metadata in the platform/software you start in (inheritance). You can usually export the metadata as XML.
A README file is a text file containing key information about your data which gives the reader a general understanding of the purpose and history of your data set, how it is organized, and how it can be used. You can think of a README file as a manual for your data.
You can use it to capture components of your data that are not adequately captured in the metadata contained with your geoprocessing application.
While some of the information contained within your README file may overlap with the content you entered in the metadata within your geoprocessing tool, it is still a good idea to create a separate file that lives outside of your geospatial file.
This is helpful for not only if you share your data with others, but is also beneficial to yourself if you need to revisit your data in the future.
Furthermore, some geospatial formats (e.g. GeoJSON) or other formats you might save your file as (csv) just can’t store metadata like a shapefile or GeoPackage.
Cover all your bases.
Use templates and checklists!
High-level and essential information, e.g. purpose of the data set, where the files can be found
Geospatial-specific elements, e.g. coordinate reference system, geometry type
Workflow/software environment, e.g. software version, data manipulations which occurred outside of your geoprocessing tool, data version history
File naming and organization, e.g. purpose of key files and file naming scheme
{labelled}
packagePractice using the {labelled}
package to label variables with labelled::set_variable_labels()
and generate a data dictionary with labelled::generate_dictionary()
.
Created in October 2015, the International Open Data Charter developed six principles on how to publish data that can be freely used, reused, and redistributed by anyone, anytime, anywhere:
CARE Principles for Indigenous Data Governance (2019):
:::
Take a look at the prompt in our running course notes document and try evaluating whether a dataset is FAIR (Findable, Accessible, Interoperable, and Reusable).
Persistent identifiers are not just for datasets. They can also help to identify individual researchers or developers. The ORCID iD is one commonly used persistent identifier for researchers: register to sign up and get your own ORCID iD.