I have a folder having multiple geodatabase and I am calling that folder “Multiplegdb” There are 3 geodatabase in a folder which are as follows:
Each geodatabase have several 'Feature Dataset' and each 'Feature dataset' have several 'Feature Classes'.
I would like to list all the features datasets and feature classes in each geodatabase in a folder into CSV. I also would like to get the metadata information from each feature class into CSV file. The format of CSV file will be: GDB Name, Feature Dataset Name, Feature Classes name, Summary, Descriptions, Credits, Date Modified, File path.
If I were undertaking this task I would first break it into two "halves":
- Reading geodatabases and printing results
- Writing results to CSV
I think this question should focus on just the first of these, and within that start by writing a code snippet that uses ListWorkspaces, ListFeatureDatasets and ListFeatureClasses to print the name of every feature class, in every feature dataset, in every geodatabase, in a single folder.
Once that is working, if your geodatabases are not all in one folder then begin to write a code snippet that uses arcpy.da.Walk to walk through your folder structure looking for them.
Writing results to CSV is more of a StackOverflow (Python) rather than GIS This Site (ArcPy) question so while there may be snippets here to do that, if you do not find them, then the place to ask is StackOverflow.
An overview of working with feature datasets
A feature dataset is a collection of related feature classes that share a common coordinate system. Feature datasets are used to spatially or thematically integrate related feature classes. Their primary purpose is for organizing related feature classes into a common dataset for building a topology, a network dataset, a terrain dataset, or a geometric network.
Using feature datasets
- To add a topology
- To add a network dataset
- To add a geometric network
- To add a terrain dataset
- To add a parcel fabric
There are additional situations in which users apply feature datasets in their geodatabases:
- To organize thematically related feature classes
Sometimes, users will organize a collection of feature classes for a common theme into a single feature dataset. For example, users might have a feature dataset for Water that contains Hydro Points, Hydro Lines, and Hydro Polygons.
Sometimes, users organize data access privileges using feature datasets. All feature classes contained within a feature dataset have the same access privileges. For example, users might need to use more than one feature dataset to segment a series of related feature classes to account for differing access privileges between users. Each group has editing access to one of the feature datasets and its feature classes, but no editing access for the others.
In some data sharing situations, collaborating organizations might agree on a data sharing schema for sharing datasets with other users. In these situations, people might use feature datasets as folders to organize collections of simple feature classes for sharing with others.
Listing all feature datasets and classes from multiple geodatabase into CSV file? - Geographic Information Systems
Creating Feature Datasets & Vector Editing
Most of the projects you will encounter will include data that have already been developed. However, sometimes you may need to create new datasets or alter existing datasets. This section covers making edits to features in the coordinate databases used within ArcGIS.
When you create or alter feature (vector) datasets in ArcGIS, you will be using shapefiles. Shapefiles are the preferred native dataset for ArcGIS. Shapefiles are fully editable within ArcGIS, which means that they can be altered in both their spatial and attribute features. In addition, any other valid vector data sources in ArcGIS projects can be easily converted to shapefiles or geodatabase feature classes, at which time their features can be edited.
The most common "legacy" method of getting data into a GIS has been through the use of digitizing tablets.
Digitizing tablets have been used since the early days of GIS, in order to capture coordinate map data. A digitizer is a special table embedded below the surface with a series wires. The wires are arranged in tightly spaced horizontal rows and vertical columns. These wires receive signals from the digitizer cursor (which behaves like a mouse), and allow map features to be traced and saved as coordinate data. GIS software is used to transform the table coordinate values to real-world coordinate values.
Typically, a map is taped to the tablet and registered with points of known location ("tics"). Then features on the map are traced as the software "listens" to the communications port to which the digitizer is connected. Special keys on the cursor are used to control the functionality of the digitizer.
Most software applications that have been developed as complete GIS solutions have included support for both tablet and on-screen ("heads-up") digitizing.
ArcGIS supports digitizing in both modes, with a few exceptions. Shapefiles and geodatabase feature classes are the only type of spatial data source files that can be modified by digitizing. Heads-up digitizing of shapefiles and geodatabase feature classes is fully supported. Tablet digitizing is only supported on MS-Windows systems, and is only supported if the Windows drivers are installed for the brand of digitizing tablet which is connected to the machine. Due to time constraints and the lack of enough digitizing tablets, we will not cover tablet digitizing in this course. However, the help files for tablet digitizing in ArcGIS are clear and extensive.
Working with shapefiles & geodatabases
Shapefiles are the most easily managed ArcGIS vector data format. A single shapefile represents a group of points, lines, or polygons. Whereas other data sources (e.g., ArcInfo coverages, CAD drawings) may be composed of multiple feature types, shapefiles are composed of only points, lines, or polygons.
The shapefile is actually a collection of files, rather than a single file. A single shapefile is composed of at least 3 files (where in this example, the name of the shapefile is roads).
- roads.shp: feature geometry (shape and location)
- roads.shx: feature geometry index
- roads.dbf : feature attribute table
In addition to the 3 basic files, there may also be other files:
- roads.sbn: feature spatial index
- roads.sbx: feature spatial index
- roads.ain: feature attribute index
- roads.aix: feature attribute index
- roads.prj: projection and coordinate data
Index files are used to cross-reference spatial features or attributes, and speed up query, processing, and display.
Shapefiles are useful in ArcGIS because they
- draw quickly (compared to other feature data sources)
- can be created in the application
- can be fully edited in the application
- can be created from other vector data sources
- can be moved across the file structure easily and without corruption
Geodatabases are special types of database files that contain feature geometry, attribute tables, and other tables storing rules and relationships among feature datasets. Geodatabases can store multiple different feature datasets within the same database file, so this makes the geodatabase a convenient and powerful method of storing data. Also, it is possible to store relationships, such as multi-layer topology within the geodatabase. The basic data model for feature layers (point, line, polygon) is used in the geodatabase model. Vector data stored in geodatabases are referred to as feature classes or feature datasets (which are groups of individual feature classes). Rasters can also be stored in geodatabases.
In ArcGIS, there are two types of geodatabase file formats, the personal geodatabase, which is stored as a Microsoft Access MDB file, and the file geodatabase, which is stored in a special ESRI file format.
Creating a new shape layer
In addition to converting shapefiles or geodatabase feature classes from other feature data sources, it is also possible to create shapefiles or feature classes from scratch, using other feature data layers or images only as a visual guide for positional reference. For the rest of this lecture, shapefiles and feature classes will be referred to simply as "feature classes."
When a new feature class is created, the user must decide whether the feature class will represent point, line, or polygon features. You need to determine in advance what the feature type will be for your dataset. The feature class must also be given a name and a place in the file system.
The feature class is then added to the current map document, and is open for editing.
The coordinates of the new features are determined by the extent of the data frame to which the features are added and by the coordinate system of the new dataset. If you are using a new data frame without other layers, the features you add will be placed near the data frame's origin (by default, a new data frame's extent is roughly [(0,0), (1,1)]).
Here, a new point feature class is created:
The new layer is ready for editing, but contains no features or tabular attributes. This is similar to creating a new spreadsheet or word processed document when it has just been created, it is empty. In order to add features to the new shapefile, it needs to be added to an ArcMap document and opened for editing.
Adding shape layer features
Once the new layer is added to the map document and open for editing, you can add features. The Editor toolbar in ArcMap needs to be enabled. Within the Editor toolbar there are a number of different tools for creating and editing features. There are also a number of different editing tasks to choose from. We will cover the most common tools and tasks, but will not have the time to cover all editing tools and tasks.
The different editing tools are on a dropdown list of icons, each of which performs a different editing function. Depending on the application's state, one or more of the tools may be unavailable (grayed-out). The list of tools and their functions is listed here:
sketch: basic drawing
midpoint: create a point at the midpoint of a drawn line
distance-distance: create a point at known distance from 2 other locations intersection: create a point at the intersection of two existing vectors endpoint arc: create a circular section with endpoints defined direction-distance: create a point at a known distance and direction from another location arc: create a circular section by defining start, midpoint, and end of curve tangent: extend a segment with a line tangent to the existing segment trace: create a new feature that traces existing features from the same or another layer
When a new feature class is created, a "bare bones" attribute table is also created. This table will initially contain only a single record for each feature, and two fields, FID, Shape and Id. In the following table, one point has been created, and the attribute table is displayed
The user can add fields to the attribute table (or to any table in the project, for that matter). Fields are added to represent properties of the spatial features. When fields are added, the field name, data type (e.g., short integer, text, blob), length (number of characters), and/or decimal precision must be specified. The new field is appended after the last existing field in the table.
Once the fields are added to the table, values can be populated.
Editing feature classes
feature classes that have been created from scratch, or from other sources, can be edited. When a new feature class has been created, it will automatically be placed in edit mode. However, any feature class can be edited, assuming the user has write permission to the files and directories on the disk that store the feature class. The following topics illustrate some of specific edits that can be made to the spatial features of layers.
Before any edits can be made, the feature class must be placed in edit mode. In the Editor toolbar, select Editor > Start Editing from the menu. Any of the data sources in the current data frame that are editable will be open for editing. To decide what layer to edit, select the Target:
You can switch back and forth between different data sources within the same editing session.
Also, it is possible to switch back and forth between different editing tasks. The different editing tasks are mostly self-explanatory.
Once a layer is in edit mode, use the Edit tool to select individual features. Keep the <SHIFT> key pressed to select more than one feature. When features are selected, they will appear in a thick cyan symbol. Here you can see two selected polygons. If a feature is selected, it can be deleted using the <DELETE> key on the keyboard,
To see vertices of individual shapes, press and hold the v key. The shape currently under the pointer and any surrounding shapes will have their vertices exposed. This way you can understand how the shape is constructed.
Use the Reshape Features task and the Sketch tool to draw a new edge for polygonal or linear features. Here are a few simple diagrams showing how polygons and lines are reshaped
To change the location of individual vertices of a line or polygon, use the Modify Features task and using the Edit tool , click on the feature vertex you want to reshape. All vertices in the line will be marked with a small square.
Click and drag a vertex to a new position.
If you want to simultaneously edit shared edges, it is necessary to create a topological relationship. For shapefile editing, the topological relationship persists only for a given editing session. For geodatabases, it is possible to have topological rules saved in the geodatabase, so the topological rules are restored each time you edit the feature classes that participate in the topology.
When a topology is active, it is possible to use topology tools to select shared features, then to use the sketch tools and the Reshape Edge or Modify Edge tasks. Here is the same area but a common edge is selected (this simultaneously selects both sets of vertices for both adjacent shapes).
Reshaping the edge alters both adjacent polygons.
Vertices can be deleted by using the Modify Features task. Here one of the vertices is deleted to show that the previous task of modifying the shared edge did indeed change both polygons simultaneously:
Setting the snapping environment
Snapping is used to assure that new features share the common location at endpoints or nodes. Snapping will make the end of a new line join an existing line, either end-to-end, or end-to-side. Snapping is set, either interactively, or in the layer properties, to a certain tolerance. If a new line's endpoint is within the tolerance distance of an existing line, the new line will snap and join the existing line. Features that are being added or modified are subject to snapping rules. The Snapping Environment sets the rules and priorities for snapping.
Here are two lines being added to a shapefile without snapping:
Splitting lines and polygons
Existing lines and can be split using the Line Split tool . Polygons can be split by using the Cut Polygons task.
When splitting lines, click the location on the line where you want the split to occur..
Polygon splitting is similar to line splitting, except that existing polygons are split by a line rather than a single location. To split polygons, it is necessary for the splitting line to start and end outside of the polygon that is to be split.
A single splitting line may split more than one existing line or polygon at a time.
When an existing line or polygon is split, the original feature's attribute record is deleted, and new attribute records are added for each new feature. For geodatabase feature classes there are various policies that can be specified for what happens to attributes when features are split. For shapefiles, new records for split features duplicate the original values from the parent shape.
Here, one of the forest stand polygons is split into two separate polygons.
Here, one of the lines that was added previously is split into two segments.
Updating attributes with Split
When features are split, you can specify how the attributes of the new features are derived from the original features. Each field in the attribute table can be assigned rules of behavior for splitting. Should numeric fields be copied, or given their proportion of the original value? Should string (character) fields be copied, or should the fields be blank for the the new features?
ArcGIS provides rules for updating attribute values for features that have been split:
Default value values in new records are the default value for the field in the feature class attribute domain settings Duplicate values in new records are copied from the parent record Geometry ratio numeric values are proportional to the original area or length of the feature
Each field in the layer attribute table can have split policies applied.
Merging features with Union
In addition to splitting features, ArcGIS allows more than one feature to be merged. Features to be merged must be part of a selected set.
Lines that meet at the same point are joined into a single line with a single attribute record.
Polygons that overlap or share a common boundary are joined into a single polygon with a single attribute record. Polygons that do not overlap and are not contiguous may also be merged into a single polygon with a single record. In this way also, the feature class differs from other vector datasets feature classes support single polygons consisting of more than one spatial object.
Here, two forest stand polygons are unioned. The new polygon has a single attribute record.
If the polygons to be unioned are not adjacent, the features can still be unioned. Before:
Updating attributes with Merge
Union simply joins the geometries of the selected set and generates a new blank record. When features are merged, the original attribute records are deleted and a new attribute record is created. As with split, policies can be used for setting values for the new record's attributes.
Default value values in new records are the default value for the field in the feature class attribute domain settings Duplicate values in new records are copied from the parent record Geometry weighted numeric values are proportional to the original area or length of the feature
When features are merged, you need to select which feature will set the attribute values for the new feature.
The geometry of merge is identical to that of union, but in this merge, you can see the record has obtained values from one of the parent features, rather than being blank as in the union above.
More editing operations
A few more editing operations are available for polygon features. Here are some generalized features used to illustrate the operations. A single shapefile is composed of a circle overlapping a rectangle.
- Clip (discarding the area that intersects): clipping removes the area of overlap between two polygons. The polygon that is selected at the time acts as the "cookie cutter," removing area from the overlapping polygon(s). Here, after the clipping operation, the rectangle has been moved to show the effect of the operation. The image on the left shows the rectangle as the clipper the image on the right shows the circle as the clipper.
Any edits made to feature classes can be reverted by selecting Edit > Undo from the menu or using the keystroke combination <CTRL-Z>. All edits are able to be undone, up until the last save was performed, or up to the creation of the feature class if the feature class is new and has never been saved.
If you have finished editing a feature class, you can choose to save edits. It is also a good idea to save edits on a frequent basis in case of system problems. Any edits that are saved are written to the disk as part of the feature class's structure.
You will be prompted to save edits if you attempt to stop editing.
You will also be prompted to save changes if you attempt to close the map document, open another project, or close ArcGIS.
National Hydrography Dataset
The National Hydrography Dataset (NHD) represents the water drainage network of the United States with features such as rivers, streams, canals, lakes, ponds, coastline, dams, and streamgages. The NHD is the most up-to-date and comprehensive hydrography dataset for the Nation.
National Hydrography Dataset (NHD)
The most current version of the National Hydrography Dataset, the NHD High Resolution, is mapped at a scale of 1:24,000 or larger scale (1:63,360 or larger scale in Alaska). These data are updated and maintained through Stewardship partnerships with states and other collaborative bodies. The NHD High Resolution, along with the Watershed Boundary Dataset (WBD) and 3D Elevation Program (3DEP) data, is used to create the NHDPlus High Resolution.
The file geodatabase download maintains the richness of the NHD complex database model, including multiple feature datasets, feature classes, event feature classes, attribute tables, relationship classes, domains, and feature-level metadata. The shapefile download simplifies this structure by containing all of the feature classes as separate shapefiles and tables as separate data files.
NHD Data Model Overview
The NHD file geodatabase download contains NHD data in the Hydrography feature dataset. It also includes the WBD in a second feature dataset.
NHD Line features
NHDFlowline is the fundamental flow network consisting predominantly of stream/river and artificial path vector features. It represents the spatial geometry, carries the attributes, and contains linear referencing measures for locating features or “events” on the network. Additional NHDFlowline features are canal/ditch, pipeline, connector, underground conduit, and coastline.
NHDLine contains linear features not core to the network.
NHD Area features
Waterbodies such as lake/pond features are represented in NHDWaterbody. They portray the spatial geometry and the attributes of the feature. These water polygons may have NHDFlowline artificial paths drawn through them to allow the representation of water flow direction. Other NHDWaterbody features are swamp/marsh, reservoir, playa, estuary, and ice mass.
NHDArea contains many additional water-polygon features. One of the more important is the stream/river feature. It represents the areal extent of the water in a wide stream/river with a basic set of attributes. These polygons typically encompass NHDFlowline artificial paths that represent the stream network. Artificial path carries the critical attributes of the stream/river, whereas NHDArea represents the geometric extent.
NHD Point features
NHDPoint contains hydrography related point features.
NHDPointEventFC, NHDLineEventFC, and NHDAreaEventFC represent point, line, and area data events that behave as map features and linearly referenced events. Streamgages, which are point features, can be displayed and identified in the network through linear referencing with a network address.
Information about the NHD also can be obtained in a series of associated tables. This includes metadata stored in NHDFeaturetoMetadata and NHDMetadata, sources given in NHDSourceCitation, identification of model and data version given in NHDProcessingParameters, flow relations given in NHDFlow, reach code histories given in NHDReachCrossReference, the domain of feature codes given in NHDFCode, and others.
Legacy Medium Resolution NHD (1:100,000)
In the late 1990s, the USGS and the US EPA collaborated to produce the medium resolution National Hydrography Dataset at 1:100,000 scale for the conterminous U.S. In the early 2000s, the US EPA assumed the role of primary custodian for the NHD Medium Resolution to support their applications and those of other medium resolution users, while the USGS, U.S. Forest Service, and additional partners initiated the production of the NHD at 1:24,000 scale or better. More background regarding the development history of NHD and related datasets may be found in the document Making the Digital Water Flow: The Evolution of Geospatial Surfacewater Frameworks.
Today, the US EPA manages the maintenance and distribution of NHD Medium Resolution as part of the NHDPlus Version 2 suite of products, which can be downloaded from the EPA NHDPlus website.
Listing all feature datasets and classes from multiple geodatabase into CSV file? - Geographic Information Systems
Mineral Resources Data System (MRDS)
MRDS is a collection of reports describing metallic and nonmetallic mineral resources throughout the world. Included are deposit name, location, commodity, deposit description, geologic characteristics, production, reserves, resources, and references. It subsumes the original MRDS and MAS/MILS. MRDS is large, complex, and somewhat problematic. This service provides a subset of the database comprised of those data fields deemed most useful and which most frequently contain some information, but full reports of most records are available as well.
Current status: As of 2011, USGS has ceased systematic updates to MRDS, and is working to create a new database, focused primarily on the conterminous US. For locations outside the United States, MRDS remains the best collection of reports that USGS has available. For locations in Alaska, the Alaska Resource Data File remains the most coherent collection of such reports and is in continuing development.
Resource descriptions here include an indication of the overall quantity and diversity of information they contain. Many records in this database are simple reports of commodity at some location, but some records provide substantial detail of the geological setting and industrial exploitation of the resource. To help users find these more thorough records, a map interface and search form are provided that rank results by overall quality, records graded A having more information about more aspects of the resource, records graded D having only summary information about the resource. Records graded B and C are intermediate between these, and records graded E generally lack bibliographic references.
Listing all feature datasets and classes from multiple geodatabase into CSV file? - Geographic Information Systems
GETTING STARTED - The HydroLink Tool aids in generating a linear reference for point data to the NHDPlusV2.1 Medium Resolution and NHD High Resolution hydrography layers. The tool employs the researcher’s local, field-based knowledge of the geospatial features where data were collected to improve both the spatial accuracy of the research data and assist in the linkage to these national surface water datasets.
Once the point location data are verified and linear referencing is complete, the output dataset includes the verified latitude and longitude coordinates and additional attributes to document the linkage to the hydrography datasets. The dataset can then be exported as a shapefile (SHP) from the tool, exported in multiple formats through ArcGIS Online [e.g. comma separated values (CSV), file geodatabase (FGDB), and GeoJSON], or published as a web service (e.g. map services with WMS enabled).
- Create a shapefile from your data. Zip the shapefile. Alternatively create a CSV or Excel file without zipping the data. When creating files for upload follow the requirements under the "Add Shapefile" or "Add CSV/Excel" buttons.
- Click the "Add Shapefile" or "Add CSV/Excel" button to upload your file. ArcGIS Online will create a feature service from your file. When the upload is complete, you will see the new feature service appear on the HydroLink dashboard.
- Click "Edit". This opens the dataset in edit view.
- Zoom in until the NHD Hydrography Layers become visible.
- Verify spatial location using imagery as a guide. If needed move point location by clicking once to highlight the point. Hover over the highlighted point. The cursors changes to a hand, click on the point without releasing. Drag the point to the correct location and release.
- Link data to the NHD. Click “Locate”. Holding Control, hover over the High Resolution flowline where the point should be referenced and click. The tool shows reference points on both NHD networks using boxes (Red for High Resolution, Yellow for Medium Resolution). Verify locations of both reference points and select “Update” to insert data into the attribute table. The tool automatically advances to the next point. Note: As a default “Locate” identifies reference points for both versions of the NHD. If needed, uncheck a version of the NHD to only “Locate” data against one version of NHD at a time. Also the options for “no reach for this point” can be selected for either or both of the NHD networks to represent a point that was sampled in a location where the NHD networks do not have modeled flowlines.
- When all points are verified, click “Return to Dashboard” in the upper right.
- Click "Export" to download the data in shapefile form to a local machine. This dataset now has the value-added attributes of verified coordinates and linkage to the NHD datasets.
MY ARCGIS ONLINE FEATURE SERVICES - HydroLink only functions using correctly formatted Feature Services. Clicking “Add Shapefile” will automatically create a feature service in your ArcGIS Online account with the correct data structure to work against. Below is a list of all feature services in your personal ArcGIS Online account. Click the "Edit" button to begin editing a dataset, click the "View" button to view your dataset in ArcGIS Online, or click the “Export” button to download your final data in shapefile format.
|Specification Name:||DCAT-US Schema v1.1 (Project Open Data Metadata Schema)|
|Latest version:||This version|
|Publication date:||November 6th 2014|
This section contains guidance to support the use of the Project Open Data metadata to list agency datasets and application programming interfaces (APIs) as hosted at agency.gov/data. Additional technical information about the schema can be found on the Metadata Resources page.
Standard Metadata Vocabulary
Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource (NISO 2004, ISBN: 1-880124-62-9). The challenge is to define and name standard metadata fields so that a data consumer has sufficient information to process and understand the described data. The more information that can be conveyed in a standardized regular format, the more valuable data becomes. Metadata can range from basic to advanced, from allowing one to discover the mere fact that a certain data asset exists and is about a general subject all the way to providing detailed information documenting the structure, processing history, quality, relationships, and other properties of a dataset. Making metadata machine readable greatly increases its utility, but requires more detailed standardization, defining not only field names, but also how information is encoded in the metadata fields.
Establishing a common vocabulary is the key to communication. The metadata schema specified in this memorandum is based on DCAT, a hierarchical vocabulary specific to datasets. This specification defines three types of metadata elements: Required, Required-if (conditionally required), and Expanded fields. These elements were selected to represent information that is most often looked for on the web. To assist users of other metadata standards, field mappings to equivalent elements in other standards are provided.
What to Document – Datasets and Web APIs
A dataset is an identifiable collection of structured data objects unified by some criteria (authorship, subject, scope, spatial or temporal extent…). A catalog is a collection of descriptions of datasets each description is a metadata record. The intention of a data catalog is to facilitate data access by users who are first interested in a particular kind of data, and upon finding a fit-for-purpose dataset, will next want to know how to get the data.
A Web API (Application Programming Interface) allows computer programs to dynamically query a dataset using the World Wide Web. For example, a dataset of farmers markets may be made available for download as a single file (e.g., a CSV), or may be made available to developers through a Web API, such that a computer program could use a ZIP Code to retrieve a list of farmers markets in the ZIP Code area.
The catalog file for each agency should list all of the agency’s datasets that can be made public, regardless of whether they are distributed by a file download or a Web API. Please also see the extended guidance on documenting Web APIs in your data.json files.
Metadata File Format – JSON
The Implementation Guidance available as a part of Project Open Data describes Agency requirements for the development of metadata as per the Open Data Policy. A quick primer on the file format involved:
Where optional fields are included in a catalog file but are unpopulated, they may be represented by a null value. They should not be represented by an empty string ( "" ).
When a record has an accessURL or downloadURL, they should be contained as objects within a distribution. Any object may be described by title, description, format, or mediaType, though when an object contains downloadURL, it must be accompanied by mediaType.
The Project Open Data schema is case sensitive. The schema uses a camel case convention where the first letter of some words within a field are capitalized (usually all words but the first one). While it may seem subtle which characters are uppercase and lowercase, it is necessary to follow the exact same casing as defined in the schema documented here. For example:
Links to downloadable examples of metadata files developed in this and other formats are in the metadata resources. Tools to help agencies produce and maintain their data inventories are available on Labs.Data.gov.
These fields describe the entire Public Data Listing catalog file. Publishers can also use the describedBy field to reference the default JSON Schema file used to define the schema (https://project-open-data.cio.gov/v1.1/schema/catalog.json) or they may refer to their own JSON Schema file if they have extended the schema with additional schema definitions. Similarly, @context can be used to reference the default JSON-LD Context used to define the schema (https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld) or publishers can refer to their own if they have extended the schema with additional linked data vocabularies. See the Catalog section under Further Metadata Field Guidance for more details.
|@context||Metadata Context||URL or JSON object for the JSON-LD Context that defines the schema used.||No|
|@id||Metadata Catalog ID||IRI for the JSON-LD Node Identifier of the Catalog. This should be the URL of the data.json file itself.||No|
|@type||Metadata Type||IRI for the JSON-LD data type. This should be dcat:Catalog for the Catalog.||No|
|conformsTo||Schema Version||URI that identifies the version of the Project Open Data schema being used.||Always|
|describedBy||Data Dictionary||URL for the JSON Schema file that defines the schema used.||No|
|dataset||Dataset||A container for the array of Dataset objects. See Dataset Fields below for details.||Always|
See the Further Metadata Field Guidance section to learn more about the use of each element, including the range of valid entries where appropriate. Consult the field mappings to find the equivalent v1.0, DCAT, Schema.org, and CKAN fields.
|@type||Metadata Type||IRI for the JSON-LD data type. This should be dcat:Dataset for each Dataset.||No|
|title||Title||Human-readable name of the asset. Should be in plain English and include sufficient detail to facilitate search and discovery.||Always|
|description||Description||Human-readable description (e.g., an abstract) with sufficient detail to enable a user to quickly understand whether the asset is of interest.||Always|
|keyword||Tags||Tags (or keywords) help users discover your dataset please include terms that would be used by technical and non-technical users.||Always|
|modified||Last Update||Most recent date on which the dataset was changed, updated or modified.||Always|
|publisher||Publisher||The publishing entity and optionally their parent organization(s).||Always|
|contactPoint||Contact Name and Email||Contact person’s name and email for the asset.||Always|
|identifier||Unique Identifier||A unique identifier for the dataset or API as maintained within an Agency catalog or database.||Always|
|accessLevel||Public Access Level||The degree to which this dataset could be made publicly-available, regardless of whether it has been made available. Choices: public (Data asset is or could be made publicly available to all without restrictions), restricted public (Data asset is available under certain use restrictions), or non-public (Data asset is not available to members of the public).||Always|
|bureauCode USG||Bureau Code||Federal agencies, combined agency and bureau code from OMB Circular A-11, Appendix C (PDF, CSV in the format of 015:11 .||Always|
|programCode USG||Program Code||Federal agencies, list the primary program related to this data asset, from the Federal Program Inventory. Use the format of 015:001 .||Always|
|license||License||The license or non-license (i.e. Public Domain) status with which the dataset or API has been published. See Open Licenses for more information.||If-Applicable|
|rights||Rights||This may include information regarding access or restrictions based on privacy, security, or other policies. This should also serve as an explanation for the selected “accessLevel” including instructions for how to access a restricted file, if applicable, or explanation for why a “non-public” or “restricted public” data asset is not “public,” if applicable. Text, 255 characters.||If-Applicable|
|spatial||Spatial||The range of spatial applicability of a dataset. Could include a spatial region like a bounding box or a named place.||If-Applicable|
|temporal||Temporal||The range of temporal applicability of a dataset (i.e., a start and end date of applicability for the data).||If-Applicable|
|distribution||Distribution||A container for the array of Distribution objects. See Dataset Distribution Fields below for details.||If-Applicable|
|accrualPeriodicity||Frequency||The frequency with which dataset is published.||No|
|conformsTo||Data Standard||URI used to identify a standardized specification the dataset conforms to.||No|
|dataQuality USG||Data Quality||Whether the dataset meets the agency’s Information Quality Guidelines (true/false).||No|
|describedBy||Data Dictionary||URL to the data dictionary for the dataset. Note that documentation other than a data dictionary can be referenced using Related Documents ( references ).||No|
|describedByType||Data Dictionary Type||The machine-readable file format (IANA Media Type also known as MIME Type) of the dataset’s Data Dictionary ( describedBy ).||No|
|isPartOf||Collection||The collection of which the dataset is a subset.||No|
|issued||Release Date||Date of formal issuance.||No|
|language||Language||The language of the dataset.||No|
|landingPage||Homepage URL||This field is not intended for an agency’s homepage (e.g. www.agency.gov), but rather if a dataset has a human-friendly hub or landing page that users can be directed to for all resources tied to the dataset.||No|
|primaryITInvestmentUII USG||Primary IT Investment UII||For linking a dataset with an IT Unique Investment Identifier (UII).||No|
|references||Related Documents||Related documents such as technical information about a dataset, developer documentation, etc.||No|
|systemOfRecords USG||System of Records||If the system is designated as a system of records under the Privacy Act of 1974, provide the URL to the System of Records Notice related to this dataset.||No|
|theme||Category||Main thematic category of the dataset.||No|
Dataset Distribution Fields
Within a dataset, distribution is used to aggregate the metadata specific to a dataset’s resources (accessURL and downloadURL), which may be described using the following fields. Each distribution should contain one accessURL or downloadURL. A downloadURL should always be accompanied by mediaType.
|@type||Metadata Type||IRI for the JSON-LD data type. This should be dcat:Distribution for each Distribution.||No|
|accessURL||Access URL||URL providing indirect access to a dataset, for example via API or a graphical interface.||If-Applicable|
|conformsTo||Data Standard||URI used to identify a standardized specification the distribution conforms to.||No|
|describedBy||Data Dictionary||URL to the data dictionary for the distribution found at the downloadURL . Note that documentation other than a data dictionary can be referenced using Related Documents as shown in the expanded fields.||No|
|describedByType||Data Dictionary Type||The machine-readable file format (IANA Media Type or MIME Type) of the distribution’s describedBy URL.||No|
|description||Description||Human-readable description of the distribution.||No|
|downloadURL||Download URL||URL providing direct access to a downloadable file of a dataset.||If-Applicable|
|format||Format||A human-readable description of the file format of a distribution.||No|
|mediaType||Media Type||The machine-readable file format (IANA Media Type or MIME Type) of the distribution’s downloadURL .||If-Applicable|
|title||Title||Human-readable name of the distribution.||No|
Extending the Schema
“Extensional” and/or domain specific metadata can easily be added using other vocabularies even if it is not a term (entity/property) that will get indexed by the major search engines - it could still be indexed by other custom search engines and by Data.gov. Publishers are encouraged to extend their metadata descriptions using elements from the “Expanded Fields” list shown below, or from any well-known vocabulary (including Dublin Core, Schema.org, FGDC, ISO 19115, and NIEM) as long as they are properly assigned. It’s also recommended that these extensions be defined through the describedBy and @context fields at the top of the Catalog metadata.
Further Metadata Field Guidance
Additional details for each field are provided here broken down into sections for the overarching Catalog, each dataset, and each dataset’s distribution. Consult the field mappings to find the equivalent v1.0, DCAT, Schema.org, and CKAN fields.
- Required if Applicable
- Expanded (optional)
Field # @context Cardinality (0,1) Required No Accepted Values String (URL) Usage Notes The URL or JSON object for the JSON-LD Context that defines the schema used. The URL for version 1.1 of the schema is https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld Example <"@context": "https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld"> Field # @id Cardinality (0,1) Required No Accepted Values String (IRI) Usage Notes A unique identifier for the Catalog as defined by JSON-LD Node Identifiers. This should be the URL of the data.json file itself Example <"@id": "https://www.agency.gov/data.json"> Field # @type Cardinality (0,1) Required No Accepted Values String (IRI) Usage Notes The metadata type as defined by JSON-LD data types. This should be dcat:Catalog for the Catalog Example Field # conformsTo Cardinality (1,1) Required Yes, always Accepted Values String (URI) Usage Notes This is used to identify the schema version using a URI. The URI for version 1.1 of the schema is https://project-open-data.cio.gov/v1.1/schema Example <"conformsTo": "https://project-open-data.cio.gov/v1.1/schema"> Field # describedBy Cardinality (0,1) Required No Accepted Values String (URL) Usage Notes This is used to specify a JSON Schema file that defines all fields. By default, it is recommended that the canonical JSON Schema file is referenced (https://project-open-data.cio.gov/v1.1/schema/catalog.json) but if the schema had been extended, publishers may reference a file that defines those extensions. Example <"describedBy": "https://project-open-data.cio.gov/v1.1/schema/catalog.json"> Field # dataset Cardinality (1,n) Required Yes, always Accepted Values Array of Objects Usage Notes This field is a container for an array of Dataset objects. See Dataset Fields below for details Example
Field # @type Cardinality (0,1) Required No Accepted Values String (IRI) Usage Notes The metadata type as defined by JSON-LD data types. This should be dcat:Dataset for the Dataset Example Field # accessLevel Cardinality (1,1) Required Yes, always Accepted Values Must be one of the following: “public”, “restricted public”, “non-public” Usage Notes This field refers to the degree to which this dataset could be made available to the public, regardless of whether it is currently available to the public. For example, if a member of the public can walk into your agency and obtain a dataset, that entry is public even if there are no files online. A restricted public dataset is one only available under certain conditions or to certain audiences (such as researchers who sign a waiver). A non-public dataset is one that could never be made available to the public for privacy, security, or other reasons as determined by your agency. Example Field # accrualPeriodicity Cardinality (0,1) Required No Accepted Values ISO 8601 Repeating Duration (or irregular ) Usage Notes Must be an ISO 8601 repeating duration unless this is not possible because the accrual periodicity is completely irregular, in which case the value should simply be irregular . The value should not include a start or end date but rather simply express the duration of time between data publishing. For example, a dataset which is published on an annual basis would be R/P1Y every three months would be R/P3M weekly would be R/P1W and daily would be R/P1D . Further examples and documentation can be found here. Example <"accrualPeriodicity":"R/P1Y"> Field # bureauCode Cardinality (0,n) Required Yes, for United States Federal Government agencies Accepted Values Array of Strings Usage Notes Represent each bureau responsible for the dataset according to the codes found in OMB Circular A-11, Appendix C (PDF, CSV). Start with the agency code, then a colon, then the bureau code. Example The Office of the Solicitor (86) at the Department of the Interior (010) would be: <"bureauCode":["010:86"]>. If a second bureau was also responsible, the format like this: <"bureauCode":["010:86","010:04"]>. Field # conformsTo Cardinality (0,1) Required No Accepted Values String (URI) Usage Notes This is used to identify a standardized specification the dataset conforms to. If this is a technical specification associated with a particular serialization of a distribution, this should be specified with conformsTo at the distribution level. It’s recommended that this be a URI that serves as a unique identifier for the standard. The URI may or may not also be a URL that provides documentation of the specification. Example <"conformsTo": "http://www.agency.gov/common-vegetable-analysis-model/"> Field # contactPoint Cardinality (1,1) Required Yes, always Accepted Values vCard object Usage Notes This is a container for two fields that together make up the contact information for the dataset. contactPoint should always contain both the person’s appropriately formatted full name ( fn ) and email ( hasEmail ). Example See below Field # contactPoint → @type Cardinality (0,1) Required No Accepted Values String (IRI) Usage Notes The metadata type as defined by JSON-LD data types. This should be vcard:Contact for contactPoint Example Field # contactPoint → fn Cardinality (1,1) Required Yes, always Accepted Values String Usage Notes This should include included with hasEmail as part of a record’s contactPoint (see above example). Example Field # contactPoint → hasEmail Cardinality (1,1) Required Yes, always Accepted Values String Usage Notes This should be formatted per vCard specifications (see example below) and included with fn as part of a record’s contactPoint (see above example). Example Field # dataQuality Cardinality (0,1) Required No Accepted Values Must be a boolean value of true or false (not contained within quote marks) Usage Notes Indicates whether a dataset conforms to the agency’s information quality guidelines. Example Field # describedBy Cardinality (0,1) Required No Accepted Values String (URL) Usage Notes This is used to specify a data dictionary or schema that defines fields or column headings in the dataset. If this is a machine readable file, it’s recommended to be specified with describedBy at the distribution level along with the associated describedByType . At the dataset level it’s assumed to be a human readable HTML webpage or PDF document. Documentation that is not specifically a data dictionary belongs in “references” Example <"describedBy": "http://www.agency.gov/vegetables/definitions.pdf"> Field # describedByType Cardinality (0,1) Required No Accepted Values String (IANA Media Type) Usage Notes This is used to identify the media type (IANA Media Type also known as MIME Type) of the URL used for the dataset’s describedBy field. This should be specified if describedBy is not an HTML webpage. Example <"describedByType": "application/pdf"> Field # description Cardinality (1,1) Required Yes, always Accepted Values String Usage Notes This should be human-readable and understandable to an average person. Example Field # distribution Cardinality (0,n) Required Yes, if the dataset has an accessURL or downloadURL . Accepted Values Array of Objects Usage Notes This is a container for one or multiple distribution objects which group together the fields: accessURL , conformsTo , downloadURL , describedBy , describedByType , description , format , mediaType , and title . Example See below Field # distribution → @type Cardinality (0,1) Required No Accepted Values String (IRI) Usage Notes The metadata type as defined by JSON-LD data types. This should be dcat:Distribution for each distribution Example Field # distribution → accessURL Cardinality (0,1) Required Yes, if the file is accessible indirectly, through means other than direct download. Accepted Values String (URL) Usage Notes This should be the URL for an indirect means of accessing the data, such as API documentation, a ‘wizard’ or other graphical interface which is used to generate a download, feed, or a request form for the data. When accessLevel is “restricted public” but the dataset is available online indirectly, this field should be the URL that provides indirect access. This should not be a direct download URL. It is usually assumed that accessURL is an HTML webpage. Example <"accessURL":"http://www.agency.gov/api/vegetables/"> Field # distribution → conformsTo Cardinality (0,1) Required No Accepted Values String (URI) Usage Notes This is used to identify a standardized specification the distribution conforms to. It’s recommended that this be a URI that serves as a unique identifier for the standard. The URI may or may not also be a URL that provides documentation of the specification. Example <"conformsTo": "http://www.agency.gov/vegetables-data-standard/"> Field # distribution → downloadURL Cardinality (0,1) Required Yes, if the file is available for public download. Accepted Values String (URL) Usage Notes This must be the direct download URL. Other means of accessing the dataset should be expressed using accessURL . This should always be accompanied by mediaType . Example <"downloadURL":"http://www.agency.gov/vegetables/listofvegetables.csv"> Field # distribution → describedBy Cardinality (0,1) Required No Accepted Values String (URL) Usage Notes This is used to specify a data dictionary or schema that defines fields or column headings in the distribution. If this is a machine readable file the media type should be specified with describedByType - otherwise it’s assumed to be a human readable HTML webpage. Example <"describedBy": "http://www.agency.gov/vegetables/schema.json"> Field # distribution → describedByType Cardinality (0,1) Required No Accepted Values String (IANA Media Type) Usage Notes This is used to identify the media type (IANA Media Type also known as MIME Type) of the URL used for the distribution’s describedBy field. This is especially important if describedBy is a machine readable file. Example <"describedByType": "application/schema+json"> Field # distribution → description Cardinality (0,1) Required No Accepted Values String Usage Notes This should be a human-readable description of the distribution. Example Field # distribution → format Cardinality (0,1) Required No Accepted Values String Usage Notes This should be a human-readable description of the file format of the dataset, that provides useful information that might not be apparent from mediaType . Note that API should always be used to distinguish web APIs. Example Field # distribution → mediaType Cardinality (0,1) Required Yes, if the file is available for public download. Accepted Values String (IANA Media Type) Usage Notes This must describe the exact files available at downloadURL using a media type (IANA Media Type also known as MIME Type). For common Microsoft Office files, see Office Open XML MIME types Example <"mediaType":"text/csv"> Field # distribution → title Cardinality (0,1) Required No Accepted Values String Usage Notes This should be a useful title for the distribution. Acronyms should be avoided. Example Field # identifier Cardinality (1,1) Required Yes, always Accepted Values String Usage Notes This field allows third parties to maintain a consistent record for datasets even if title or URLs are updated. Agencies may integrate an existing system for maintaining unique identifiers. Each identifier must be unique across the agency’s catalog and remain fixed. It is highly recommended that a URI (preferably an HTTP URL) be used to provide a globally unique identifier. Identifier URLs should be designed and maintained to persist indefinitely regardless of whether the URL of the resource itself changes. Example <"identifier":"http://dx.doi.org/10.7927/H4PZ56R2"> Field # isPartOf Cardinality (0,1) Required No Accepted Values String Usage Notes This field allows the grouping of multiple datasets into a “collection”. This field should be employed by the individual datasets that together make up a collection. The value for this field should match the identifier of the parent dataset. Example <"isPartOf":"http://dx.doi.org/10.7927/H4PZ56R2"> Field # issued Cardinality (0,1) Required No Accepted Values ISO 8601 Date Usage Notes Dates should be ISO 8601 of least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. Example Field # keyword Cardinality (1,n) Required Yes, always Accepted Values Array of strings Usage Notes Surround each keyword with quotes. Separate keywords with commas. Avoid duplicate keywords in the same record. Example Field # landingPage Cardinality (0,1) Required No Accepted Values String (URL) Usage Notes This field is not intended for an agency’s homepage (e.g. www.agency.gov), but rather if a dataset has a human-friendly hub or landing page that users can be directed to for all resources tied to the dataset. Example <"landingPage":"http://www.agency.gov/vegetables"> Field # language Cardinality (0,n) Required No Accepted Values Array of strings Usage Notes This should adhere to the RFC 5646 standard. This language subtag lookup provides a good tool for checking and verifying language codes. A language tag is comprised of either one or two parts, the language subtag (such as en for English, sp for Spanish, wo for Wolof) and the regional subtag (such as US for United States, GB for Great Britain, MX for Mexico), separated by a hyphen. Regional subtags should only be provided when needed to distinguish a language tag from another one (such as American vs. British English). Example <"language":["en-US"]>or if multiple languages, Field # license Cardinality (0,1) Required Yes, if applicable Accepted Values String (URL) Usage Notes See list of license-free declarations and licenses. Example <"license":"http://creativecommons.org/publicdomain/zero/1.0/"> Field # modified Cardinality (1,1) Required Yes, always Accepted Values ISO 8601 Date Usage Notes Dates should be ISO 8601 of highest resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. If there is a need to reflect that the dataset is continually updated, ISO 8601 formatting can account for this with repeating intervals. For instance, R/P1D for daily, R/P2W for every two weeks, and R/PT5M for every five minutes. +Example <"modified":"2012-01-15">or <"modified":"R/P1D"> Field # primaryITInvestmentUII Cardinality (0,1) Required No Accepted Values String Usage Notes Use to link a given dataset with its related IT Unique Investment Identifier, which can often be found in Exhibit 53 documents. Example Field # programCode Cardinality (0,n) Required Yes, for United States Federal Government Agencies Accepted Values Array of strings Usage Notes Provide an array of programs related to this data asset, from the Federal Program Inventory. Example <"programCode":["015:001"]>or if multiple programs, Field # publisher Cardinality (1,1) Required Yes, always Accepted Values Object Usage Notes This is a container for a publisher object which groups together the fields: name and subOrganization . The subOrganization field can also contain a publisher object which allows one to describe an organization’s hierarchy. Where greater specificity is desired, include as many levels of publisher as is useful, in ascending order, using the below format. Example See below Field # publisher → @type Cardinality (0,1) Required No Accepted Values String (IRI) Usage Notes The metadata type as defined by JSON-LD data types. This should be org:Organization for each publisher Example Field # publisher → name Cardinality (1,1) Required Yes, always Accepted Values String Usage Notes The plaintext name of the entity publishing this dataset. Example Field # publisher → subOrganizationOf Cardinality (0,1) Required No Accepted Values publisher object Usage Notes A parent organizational entity described using the same publisher object fields. Example "subOrganizationOf": <"name": "General Services Administration", "subOrganizationOf": <"name": "U.S. Government">> Field # references Cardinality (0,n) Required No Accepted Values Array of strings (URLs) Usage Notes Enclose each URL within strings. Separate multiple URLs with a comma. Example <"references":["http://www.agency.gov/legumes/legumes_data_documentation.html"]>or if multiple URLs, <"references":["http://www.agency.gov/legumes/legumes_data_documentation.html","http://www.agency.gov/fruits/fruit_data_documentation.html"]> Field # rights Cardinality (0,1) Required Yes, if accessLevel is “restricted public” or “non-public” Accepted Values String Usage Notes This may include information regarding access or restrictions based on privacy, security, or other policies. This should also serve as an explanation for the selected “accessLevel” including instructions for how to access a restricted file, if applicable, or explanation for why a “non-public” or “restricted public” data asset is not “public,” if applicable. If the dataset can be made available through a website indirectly, use accessURL for the URL that provides such access. Example Field # spatial Cardinality (0,1) Required Yes, if the dataset is spatial Accepted Values See Usage Notes Usage Notes This field should contain one of the following types of content: (1) a bounding coordinate box for the dataset represented in latitude / longitude pairs where the coordinates are specified in decimal degrees and in the order of: minimum longitude, minimum latitude, maximum longitude, maximum latitude (2) a latitude / longitude pair (in decimal degrees) representing a point where the dataset is relevant (3) a geographic feature expressed in Geography Markup Language using the Simple Features Profile or (4) a geographic feature from the GeoNames database. Example Field # systemOfRecords Cardinality (0,1) Required No Accepted Values String (URL) Usage Notes This field should a URL to the System of Records Notice (SORN) that relates to the dataset, specifically from FederalRegister.gov. Example <"systemOfRecords":"https://www.federalregister.gov/articles/2002/04/08/02-7376/privacy-act-of-1974-publication-in-full-of-all-notices-of-systems-of-records-including-several-new#p-361"> Field # temporal Cardinality (0,1) Required Yes, if applicable Accepted Values ISO 8601 Date Usage Notes This field should contain an interval of time defined by the start and end dates for which the dataset is applicable. Dates should be formatted as pairs of in the ISO 8601 format. ISO 8601 specifies that datetimes can be formatted in a number of ways, including a simple four-digit year (eg. 2013) to a much more specific YYYY-MM-DDTHH:MM:SSZ, where the T specifies a seperator between the date and time and time is expressed in 24 hour notation in the UTC (Zulu) time zone. (e.g., 2011-02-14T12:00:00Z/2013-07-04T19:34:00Z). Use a solidus (“/”) to separate start and end times. If there is a need to define the start or end of applicability using a duration rather than a date, ISO 8601 formatting can account for this with duration based intervals. For instance, applicability starting in January 2010 and continuing for one month could be represented as 2010-01/P1M or 2010-01/2010-02 . However, when possible, full dates are preferred for both start and end times. Example <"temporal":"2000-01-15T00:45:00Z/2010-01-15T00:06:00Z">or <"temporal":"2000-01-15T00:45:00Z/P1W"> Field # theme Cardinality (0,n) Required No Accepted Values Array of strings Usage Notes Separate multiple categories with a comma. Could include ISO Topic Categories. Examples <"theme":["vegetables"]>or if multiple categories, Field # title Cardinality (1,1) Required Yes, always Accepted Values String Usage Notes Acronyms should be avoided. Example
Federal Government Fields
USG — Fields specific to the U.S. Federal Government have been denoted with the USG superscript. The Project Open Data schema has been developed as part of a U.S Federal Government open data policy. However, every attempt has been made to align the schema with existing international standards and to provide opportunities for re-use and interoperability with state and local government as well as non-profits, academic institutions, and businesses. There are however some fields that have been introduced specifically for use by the U.S. Federal Government and have special meaning in that context. These fields are: bureauCode, programCode, dataQuality, primaryITInvestmentUII, and systemOfRecords. Non-federal data publishers are encouraged to make use of this schema, but these fields should not be seen as required and may not be relevant for those entities.
Rationale for Metadata Nomenclature
We sought to be platform-independent and to align as much as possible with existing open standards.
To that end, our JSON key names are directly drawn from DCAT, with a few exceptions.
We added the accessLevel field to help easily sort datasets into our three existing categories: public, restricted public, and non-public. This field means an agency can run a basic filter against its enterprise data catalog to generate a public-facing list of datasets that are, or could one day be, made publicly available (or, in the case of restricted data, available under certain conditions). This field also makes it easy for anyone to generate a list of datasets that could be made available but have not yet been released by filtering accessLevel to public and accessURL to blank.
We added the rights field (formerly accessLevelComment) for data stewards to explain how to access restricted public datasets, and for agencies to have a place to record (even if only internally) the reason for not releasing a non-public dataset.
We added the systemOfRecords field for data stewards to optionally link to a relevant System of Records Notice URL. A System of Records is a group of any records under the control of any agency from which information is retrieved by the name of the individual or by some identifying number, symbol, or other identifier assigned to the individual.
We added the bureauCode field to ensure every dataset is connected in a standard way with an agency bureau.
We added the programCode field to ensure that when applicable, every dataset is connected in a standard way with an agency program office.
We added the dataQuality to indicate whether or not the data meets an agency’s Information Quality Guidelines.
This spreadsheet lists all of the post fire aerial photography projects flown by the Southwestern Region. With projects dating back to 1972, this imagery serves as an important record of fire extent and damage. Geospatial features describing the photo centers and footprints for these projects can be accessed in the Historical Aerial Photography feature service above, and the imagery can be requested from the Aerial Photography Field Office
Constructed Features (Southwestern Region)
Ecological Response Units (Southwestern Region)
Ecological Response Units (Version 5.3)
Ecological Response Units - The purpose of this feature class is to be an ecosystem mapping tool across all of Arizona and New Mexico.
Ecological Response Units (ERUs) facilitate landscape analyses and planning. The framework represents all major ecosystem types of the southwest region, and represents a stratification of biophysical themes. ERUs are used to define historic/reference conditions within a mapping unit by integrating site potential (soil physical and chemical properties, geology, geomorphology, aspect, slope, climate variables, and geographic location), fire regime (historic and contemporary), neighboring vegetation communities, and seral state sequence.
The shapefile data is tiled into four tiles: Arizona North, Arizona South, New Mexico North, New Mexico South. View a sample image of the tile locations
Climate Change Vulnerability Assessment
Forest Health - Insect Disease (Southwestern Region)
Forest Orders (Southwestern Region)
Posted as a map service Daily
Forest order map. Forest orders are areas of the forest where entry or use is restricted for safety and or resource protection.
Title 36, Code of Federal Regulations, Chapter II, Subpart B, may close an area to entry or may restrict the use of an area by applying any or all of the prohibitions authorized by the code of regulations.
Forest Planning (Individual Forest)
Apache Sitgreaves Management Area 2015
The purpose of this feature class is to depict and label a spatially contiguous land area identified within a planning area (the administrative boundary). The feature class will be used to provide planning information for any necessary analysis.
General Terrestrial Ecosystem Survey (Southwestern Region)
General Terrestrial Ecosystem Survey
General Terrestrial Ecosystem Survey - (United States Forest Service Southwest Region)
The data set is composed of polygon features denoting soil condition, erosion hazard, revegetation potential and vegetation cover. The scale is 1:250000. This data set was created for the USFS General Terrestrial Ecosystem Survey. It's purpose is to delineate the locations and areas of varying GTES characteristics.
Invasive (Southwestern Region)
The Invasive Plants, Invertebrate, Pathogen, and Vertebrate (Invasive) feature class contains all the Invasive Infestation polygons collected by the National Invasive Plant Inventory Protocol. Includes most recent as well as historic observations. Includes Site ID, Plant code, status etc. for the infesting species, date, area and other basic data.
This dataset at this time is mostly Plants.
Land (Southwestern Region)
Other National Designated Area
PLSS - Public Land Survey System
Section - An area defined by the Public Lands Survey System Grid. Normally, 36 sections make up a township.
Township - An area defined by the Public Lands Survey System grid that is referenced by its tier and range numbers, and is normally a rectangle approximately 6 miles on a side with boundaries conforming to meridians and parallels.
Special Interest Management Area
Surface Ownership Dissolve
Recreation (Southwestern Region)
Recreation Site Point and Recreation Site Polygon
Depicts developed recreation sites, trailhead, fishing, picnic area, campground
Recreation Opportunity Settings
Depicts the spatial location of areas showing the type of Recreation Opportunity Settings that exist (Existing Condition) either without over-snow travel or uses (summer season) or year-round when no seasonal variation exists.
RMU - Range Management Units (Southwestern Region)
RMU_Unit, RMU_SubUnit, RMU_WHB
RMU_Unit (Allotments) - Depicts the gross grazing management area (allotment) boundaries, range general resource area boundaries, and wild horse territories boundaries.
RMU_SubUnit (Pastures) - Depicts grazing implementation monitoring area boundaries within each Pasture.
RMU_WHB (Wild Horse and Burro Areas) - Depicts Wild Horse and Burro Areas
Terrestrial Ecological Unit Inventory (Southwestern Region)
TEU, Terrestrial Ecological Unit
Terrestrial Ecological Unit Inventory. Potential Natural Vegetation and Soil Class. The Land Type Map Component Vegetation, Soil, Geology (LT_MapCompVegSoilGeology) feature class has classification information for Vegetation (potential and existing), soil, geology, geomorphology, ecological types and miscellaneous classifications. Classifications are displayed in order of dominance for each type (PNV, Soil, etc) along with the percentage based on aggregating component actual percents (null values are treated as 0). If there are multiple classifications per component, that component percent is divided by the number of classifications that are attached to that component, and then that percent is aggregated up to the map unit. Map Symbol Comp Pct Vegetation Class 22 1 45 VegClass1 22 2 30 Vegclass2 22 3 25 Vegclass1 Aggregated Vegetation class Vegclass1 - 70 pct Vegclass2 - 30 pct
Transportation (Southwestern Region)
MVUM (Motor Vehicle Use Map) Data
Motorized Vehicle Use Maps (MVUM) Data, These MVUM maps are valid in as of the publication date which have been designated as open to motorized vehicles under the Travel Management Rule (36 CFR 212, Subpart B, Designation of Roads, Trails, and Areas for Motor Vehicle Use). Routes not designated for motor vehicle use (such as non-motorized trails, single-purpose roads and trails, unauthorized roads and trails, and temporary roads and trails) are not included.
Specific types of motorized vehicles allowed on the designated roads/trails and their seasons of use. These data represent the following symbol classes of roads and trails: Open to All Vehicles, Open to Highway Legal Vehicles and Roads with Seasonal Designations.
MVUM (Motor Vehicle Use Map) Downloadable Maps
Vegetation (Southwestern Region)
INREV (OSU Institute Natural Resources Existing Vegetation)
Existing vegetation mapping provides basic information on the current condition of vegetation structure and composition. Beginning in 2004 the Southwestern Region developed Mid-Scale Existing Vegetation Mapping on all National Forests and Grasslands (Mellin et al. 2008). The Southwestern Region collaborated with OSU&rsquos Institute of Natural Resources to develop new mid-scale mapping with the INREV project. Mid-scale mapping is compliant with agency technical guidance for existing vegetation (Brohman and Bryant 2005, Nelson et al. 2015). For business needs of natural resource organizations, existing vegetation mapping represents an important component in an overall inventory, monitoring, and analysis framework.
Mid-Scale Existing Vegetation Canopy Cover Map Units
Canopy cover map units of trees for tree life form polygons or shrubs for shrub life form polygons. The Southwestern Region existing vegetation mapping program (R3-EVP) is intended to meet the needs of forest plan revisions, national fire planning, and other landscape-level analyses by providing a consistent, region-wide dataset that depicts existing vegetation at the mid-scale level.
Mid-Scale Existing Vegetation Dominance Type Map Units
Polygons of dominance type in the map units. Dominance types are defined by the species or genera of greatest abundance, usually of the uppermost canopy of the plant community. The Southwestern Region existing vegetation mapping program (R3-EVP) is intended to meet the needs of forest plan revisions, national fire planning, and other landscape-level analyses by providing a consistent, region-wide dataset that depicts existing vegetation at the mid-scale level.
Mid-Scale Existing Vegetation Life Form
Polygons representing the dominant life form. The Southwestern Region existing vegetation mapping program (R3-EVP) is intended to meet the needs of forest plan revisions, national fire planning, and other landscape-level analyses by providing a consistent, region-wide dataset that depicts existing vegetation at the mid-scale level.
Mid-Scale Existing Vegetation Size Map Units
Diameter class map unit of dominant trees for tree life form polygons or shrub height class for shrub life form polygons. The Southwestern Region existing vegetation mapping program (R3-EVP) is intended to meet the needs of forest plan revisions, national fire planning, and other landscape-level analyses by providing a consistent, region-wide dataset that depicts existing vegetation at the mid-scale level.
Riparian Existing Vegetation
- (10x) 16.8 MB (10x) 8.9 MB (10x) 10.0 MB (10x) 6.5 MB (10x) 8.8 MB (10x) 25.1 MB (10x) 5.5 MB (10x) 3.3 MB (10x) 10.9 MB (10x) 8.4 MB (10x) 16.6 MB
This feature class contains attribute information from four different vegetation maps (Lifeform Type, Leaf Retention Type, Canopy Cover, and Size Class) and statistics on NDVI and lidar.
For this project canopy height and cover data were derived from lidar data found within the Prescott National Forest and used in mapping tree size. The lidar and other predictor variables from imagery were used to segment the study area into objects with similar characteristics for use in vegetation mapping. Vegetation mapping attributes were then added to each segment as classified by Random Decision Forest classifier. A 20 meter buffer from the RMAP boundary was created as a study area. Two final products are presented, this product that has been clipped to the RMAP boundary with map features designed to meet a minimum size of .25 hectares and a second that includes the 20 meter buffer and has no minimum map unit.
Riparian Potential Vegetation
Riparian Potential Vegetation - Potential Riparian plant communities across Forests and Grasslands of the US Forest Service Southwestern Region. Riparian Potential Vegetation is derived from the Ecological Response Units (ERUs) layer.
Ecological Response Units (ERUs) facilitate landscape analyses and planning. The framework represents all major ecosystem types of the southwest region, and represents a stratification of biophysical themes. ERUs are used to define historic/reference conditions within a mapping unit by integrating site potential (soil physical and chemical properties, geology, geomorphology, aspect, slope, climate variables, and geographic location), fire regime (historic and contemporary), neighboring vegetation communities, and serial state sequence.
Wild Land Fire Perimeters (Southwestern Region)
Fire History Occurrences (Points) and Perimeters (Polygons)
The FireOccurrence point layer represents ignition points from which individual wildland fires started. Data are maintained at the Forest/District level, or their equivalent, to track the occurrence and the origin of individual wildland fires. Records in FireOccurrence include historical fire point records from a variety of sources. Since 1986, FIRESTAT, the Fire Statistics System computer application, has been the authoritative data source for all wildland fire occurrences on National Forest System Lands or National Forest-Protected Lands. FIRESTAT is used by the USFS to enter and maintain information from the Individual Wildland Fire Report (FS-5100-29).
The FirePerimeter polygon layer represents final mapped wildland fire perimeters. Incidents of 10 acres or greater in size are expected. Incidents smaller than 10 acres in size may also be included. Data are maintained at the Forest/District level, or their equivalent, to track the area affected by wildland fire. Records in FirePerimeter include perimeters for wildland fires that have corresponding records in FIRESTAT, which is the authoritative data source for all wildland fire reports. FIRESTAT, the Fire Statistics System computer application, required by the USFS for all wildland fire occurrences on National Forest System Lands or National Forest-protected lands, is used to enter and maintain information from the Individual Fire Report (FS-5100-29).
Wildfire Perimeters > 100 acres
Wildland Urban Interface (WUI) Areas (Southwestern Region)
Wildland Urban Interface (WUI) Areas Adjacent to Forest Service Lands in the South Western Region of the Forest Service
Wildlife (Southwestern Region)
Legacy Forest Datasets
Links to the legacy GIS dataset on the individual Forest web pages. We are in the process of updating our website and transferring data to this Regional page from the Forest pages.
Links to global, national and FAO legacy maps (scans), also soil profiles and reports, soil degradation, management, biodiversity.
Scans of legacy maps over the whole world, downloadable (and on CD-ROM) and with metadata. To be usable in GIS these must be georeferenced, the linework digitized, and a linked attribute database created. Reference: Panagos, P., Jones, A., Bosco, C., Senthil Kumar P.S. (2011): European digital archive on soil maps (EuDASM): preserving important soil data for public free access. International Journal of Digital Earth 4(5): 434-443. DOI:10.1080/17538947.2011.596580. An expanded collection of scanned soil maps is available on-line at ISRIC.
Collects datasets from World Data Centres and others. ISRIC is a contributor. Querying for "soil map" returns > 2600 datasets.
A data broker to find datasets. Search terms include soil properties, soil moisture and temperature, soil chemistry, soil C. Can filter search by country/geography and data access conditions.
- Catalogue of soil parameter data sets from the Global Change Master Directory hosted by Goddard Space Flight Center (NASA). A very large catalogue (over 1000 links) of soil parameters (e.g. C, CEC, heat budget, macrofauna. ) with links to the actual data and metadata
from the Distributed Active Archive Center for Biogeochemical Dynamics of the Oak Ridge National (USA) Laboratory intended for global modelling of biogeochemistry.
from the International Soil Modeling Consortium (ISMC).
Collections relevant to modelling soil processes: hydrology, soil physics, pedogenesis. at scales from macropore to continental.
data access, from ISCN
"a science-based network that facilitates data sharing, assembles databases, identifies gaps in data coverage, and enables spatially explicit assessments of soil carbon in context of landscape, climate, land use, and biotic variables".
(NCSCDv2), a spatial dataset created for the purpose of quantifying storage of organic carbon in soils of the northern circumpolar permafrost region, from the Bolin Centre of Stockholm University. Includes downloadable points (to 3 m), grids and polygons of C stocks.
Method is explained inthis paper.
Includes properties from Open Land Map, the Soil and Landscape Grid of Australia, the NASA-USDA Enhanced SMAP Global Soil Moisture Data. More datasets are regularly added.
Unofficial GEE datasets, contributed by "the community" of open data. See the "Geophysical" tab. Includes Geomorpho90, Soil Grids 250 v2.0, iSDAsoil Predicted Soil Properties for Africa 30m, Polaris 30m Probabilistic Soil Properties US, HiHydroSoil v2.0, Soil Organic Carbon Stocks & Trends South Africa. More datasets are regularly added.
"Access and interactive visualizations of the Terabytes of high resolution data (1 km, 250 m or better) produced by the OpenGeoHub Foundation and/or contributing organizations."
Search in the "Themes and Datasets" panel for "Soil Properties and Classes". These are produced by machine-learning methods based on a large number of point observations and covariate layers, as explained here.
from FAO, originally developed by IIASA with contributions from many partners, notably ISRIC and the Chinese Academy of Sciences.
The Harmonized World Soil Database is a 30 arc-second raster database with over 15000 different soil mapping units that combines existing regional and national updates of soil information worldwide (SOTER, ESD, Soil Map of China, WISE) with the information contained within the 1:5 000 000 scale FAO-UNESCO Soil Map of the World (FAO, 1971--1981).
Mostly USA, but includes analytical data for about 1,100 pedons from other countries. The above link describes the dataset it can be downloaded from the NCSS Soil Characterization Database page.
From the OpenLandMap project. Points collected from various sources, provided under open data license, with detailed instructions on how to use the data. Data sources and formats explained here. Two sets: chemical and physical properties.
HC27: Generic/Prototypical Soil Profiles
27 soil profiles generated in formats compatible with the DSSAT and APSIM crop simulation models. Not a georeferenced set, modellers have to decide which of the generic profiles to use, based on soil maps or field survey. Developed by IFPRI
Scanned maps, with metadata, from ISRIC's collection, processed by the European Soil Bureau. Uneven quality, uneven coverage, but a gold mine of historical maps. Separate DVD-ROM and online access for Africa, Asia, Canada, Caribbean Islands, Europe, Latin America, the USA. Reference: Panagos, P., Jones, A., Bosco, C., Senthil Kumar P.S. (2011): European digital archive on soil maps (EuDASM): preserving important soil data for public free access. International Journal of Digital Earth 4(5): 434-443. DOI:10.1080/17538947.2011.596580
from Cranfield University (England)
"[A]n archive and catalogue of all substantial soil surveys, reports and maps made overseas, with particular reference to those by British companies and personnel, to provide a safe repository for endangered copies, and to make the accrued information widely available for consultation by interested parties.". Materials are scanned as requested, and if scanned free for download. Excellent metadata and search interface about 23k items catalogued.
From Tom Hengl at OpenGeoHub. "A public compendium of global, regional, national and sub-national soil samples and/or soil profile datasets (points with Observations and Measurements of soil properties and characteristics).
Finding datasets for current events can be tricky. Fortunately, some publications have started releasing the datasets they use in their articles.
- - FiveThirtyEight is a news and sports site with data-driven articles. They make their datasets openly available on Github. - BuzzFeed became (in)famous for their listicles and superficial pieces, but they've since expanded into investigative journalism. Their datasets are available on Github.