Access Data

TERRA REF is producing reference data sets that include direct measurements and sensor observations, derived plant phenotypes, along with genetic and genomic data.

See our documentation for more information about

Published, Public Domain Data: 
We are publishing data based on demand. Our first publication included data from Seasons 4 and 6. Phenotype data and metadata have been curated on Dryad. The Dryad record also references larger sensor and genomics files that can be found at the #terra-public endpoint on Globus.  These can be cited as:

LeBauer, D.S., Burnette, M.A., Demieville, J., Fahlgren, N., French, A.N., Garnett, R., Hu, Z., Huynh, K., Kooper, R., Li, Z., Maimaitijiang, M., Mao, J., Mockler, T.C., Morris, G.S., Newcomb, M., Ottman, M., Ozersky, P., Paheding, S., Pauli, D., Pless, R., Qin, W., Riemer, K., Rohde, S., Rooney, W.L., Sagan, V., Shakoor, N., Stylianou, A., Thorp, K., Ward, R., White, J.W., Willis, C.,  and Zender C.S. (2020). TERRA-REF, An Open Reference Data Set From High Resolution Genomics, Phenomics, and Imaging Sensors. Dryad Digital Repository. http://doi.org/10.5061/dryad.4b8gtht99

See the Dryad README for more comprehensive description of this dataset.

Datasets

TERRA REF has produced the following types of data

  1. Sensor Data from five thermal, light, and shape imaging sensors.
  2. Phenotypes include both sensor-derived and hand collected plot-level field measurements. These can be used to validate, calibrate, and train algorithms.
  3. Environmental data include time series of meteorological variables including temperature, relative humidity, precipitation, wind direction and speed, photosynthetically active radiation, and downwelling spectral radiance.
  4. Genomics data include whole-genome resequencing data for 384 varieties from the sorghum Bioenergy Association Panel (BAP) and genotyping-by-sequencing data for 768 sorghum Recombinant Inbred Lines (RILs). Contains raw and derived sorghum genome sequencing data. Raw data includes DNA sequence files in compressed FASTQ format. Derived data are available in Variant Call Format (VCF) and Hapmap files.

Sensor Data

Data generated by the following sensors are included in the public domain data. Additional sensors not represented in the first data release are listed in the section on sensors in the section on additional sensors.

Sensor Name Model Technical Specifications
Imaging Sensors    
Stereo RGB Camera Allied Vision Prosilica GT3300C  
Laser Scanner Custom Fraunhofer 3D) Spatial Resolution: 0.3 to 0.9 mm
Thermal Infrared FLIR A615) Thermal Sensitivity << 50mK @ 30C
PS II Camera LemnaTec PS II Fluorescence Prototype) Illumination 635nm x 4000 μμmol/m2/s, Camera 50 fps
Environmental Sensors    
Environmental Sensors Thies Clima 4.9200.00.000)  
VNIR Spectrometer Spectral Evolution PSR+3500 Range 350 to 800 nm
Environmental Sensors Thies Clima 4.9200.00.000)  
VNIR Spectrometer Spectral Evolution PSR+3500 Range 350 to 800 nm
PAR Sensor Quantum SQ–300 Spectral Range 410 to 655 nm

Sensor Data Products

Over 500 TB of sensor data are available and are categorized as raw, Level 1, and Level 2. This size could be substantially reduced by removing duplicate information and through compression.

Sensor data are stored on the Storage Condo at the National Center for Supercomputing Applications in Urbana, Illinois. We make them available for download with the Globus file transfer system. The following steps are required to access them: 1) get an account at globus.org; 2) search for the terra-public endpoint; 3) install the Globus Personal Connect application and transfer data. Further information is provided in the data access chapter of the TERRA-REF documentation. As an alternative, the data can be provided on hard drives for the cost of supplies, labor, and shipping.

Below is a summary of the sensor data products included in the first release of TERRA-REF data. Sensor-derived phenotypes described in the Phenotype Data section were generated from the 3D laser scanner and RGB camera sensors as described in metadata/methods.csv.

Data Product Sensor Algorithm File Format Plot Clip Full Field
Environment Thies Clima envlog2netcdf netcdf NA NA
Thermal Image FLIR ir_geotiff geotiff +  
Point Cloud Fraunhofer Laser 3D laser3d_las las +  
Point Cloud Fraunhofer Laser 3D scanner3DTop ply    
Images Time-Series PSII Camera ps2png png    
Color Images RGB Stereo bin2tiff geotiff + +
Plant Mask RGB Stereo rgb_mask geotiff   x

Sensor Data Directory Contents on Globus

The following list describes the organization and contents of the Storage Condo server that can be accessed at the ncsa#terra-public endpoint on Globus. Directory names have a leading / while file names do not.

  • Environment Logger
    • /envlog_netcdf
      • Daily aggregated files named envlog_netcdf_L1_ua-mac_[YYYY-MM-DD].nc.
      • There are also 24 hourly files for each day named [YYYY-MM-DD_HH-MM-SS]_environmentlogger.nc.
  • Laser3D
    • /laser3d_las
      • One merged file per scan across the short (E-W) axis with names ending in _merged.las. There are typically 50-100 of these each day.
    • /laser3d_las_plot
      • Each directory has the name of one plot, and there is one LAS file clipped to the plot boundaries for each scan (there may be more than one scan per day).
  • RGB Stereo:
    • /rgb_geotiff
      • File names ending in _left.tif and _right.tif represent simultaneous images from left and right stereo pair cameras.
    • /rgb_mask
      • These images have the soil represented as black pixels. For each file ending in *_left_mask.tif in the RGB Geotiff dataset, an image with black pixels representing areas that contain soil and not plants.
    • /rgb_geotiff_plots
      • For each RGB Geotiff image, a Geotiff file with the same dimensions as the plot. It contains the image clipped to the plot boundaries as well as fill values for parts of the plot not in the image.
    • /rgb_fullfield
      • Key data product is one full resolution full-field image per scan.
      • Other files include: lower resolution versions of the full field (files with names ending in _10pct.tif, _thumb.tif and .png); CSV files containing canopy cover values for each plot; a JSON file listing images contained in the fullfield mosaic; a VRT file that is a “virtual geotiff” that was used to generate the full-field mosaic.
      • These full field Geotiff images are RGB images and image masks tiled together to make up a full-field view. These full field images are not orthomosaics since they are not stitched together because doing so causes geometric aberrations.
  • PSII Camera:
    • /ps2_png:
      • 101 .png files per folder. The order of the images is indicated by the last four digits of the file name, i.e. _0000.png to _0100.png.
      • 101 georeferenced Geotiff files otherwise identical to the PNG counterparts.
      • These files represent a time series of images captured at a rate of 50 frames per second.

Phenotype Data

Tables of phenotypes can be found in the compressed files named traits/season_[n]_traits/ folder inside the trait_data.zip file. There is one subdirectory for each of seasons 4 and 6. Once uncompressed, each directory will contain one CSV file for each combination of trait and measurement method. The names of these CSV files help identify the contents because they follow the pattern season_[n]_[trait]_[measurement_type].csv. For example, the file season_6_aboveground_biomass_manual.csv contains manual measurements of above-ground biomass taken during season 6.

These CSV files have one measurement per row for a specific date, location, genotype, and measurement. The first line is a header that contains the names of the fields:

  • plot (text) Plot name, using the format <field site> Season <n> Range <m> Column <k>.
  • scientificname (text) Latin name for the crop species. This will always be Sorghum bicolor until future versions with data from additional crops are published.
  • genotype (text) Genotype or accession identifier.
  • treatment (text) Name of experimental treatment.
  • date: (YYYY-MM-DD) Date of measurement.
  • trait: (text) Name of the trait measured. Defined in the file metadata/variables.csv.
  • method: (text) The method used to measure the trait. Defined in the file metadata/methods.csv.
  • mean: (numeric) Value of the phenotype data.
  • checked: (boolean) 0 = unchecked and 1 = checked: has the data been independently reviewed?
  • author: (text) name of scientist who collected the data or who wrote the algorithm used to derive phenotypes from sensor data.
  • season: (text) Name of season: one of ‘Season 4’ or ‘Season 6’.
  • method_type: (text) Type of measurement: one of ‘manual’ or ‘sensor’.  

Environmental Data

Weather

We have data from two stations, AZMet and the field scanner system.

Field Scanner System “Environment Logger”

These data are from the Theis-Clima weather station mounted on the field scanner. The Environment Logger has more sensors and provides high temporal resolution data than the weather station. In the NetCDF files, data is recorded at five seconds intervals. This logger also collects the downwelling solar radiation every five seconds at a spectral resolution of 0.5 nm. The sensors and data collection frequency are described in the section on sensor data. We describe both the full resolution data provided as NetCDF files in the sensor data product named “envlog_netcdf” and the 5–minute aggregated data provided here as JSON files that were accessed using the Geostreams API as described in the TERRA REF tutorials. These time series are not continuous, as can be seen in Figure 3.2.

AZMET

These gap-filled and corrected data have been provided for convenience. When using these data, users must cite Brown and Russell (1996): > Brown, P. W., & Russell, B. (1996). AZMET, The Arizona Meteorological Network. Arizona Cooperative. website: https://cals.arizona.edu/AZMET/.

Soils Data

A comprehensive analysis of soil physical properties will be published alongside the TERRA-REF datasets (Babaeian et al. 2020). Please contact Markus Tuller (mtuller@cals.arizona.edu) for access to these data.

Genomics Data

The genomics dataset includes raw and derived sorghum genome sequencing data from the TERRA-REF project. Raw data includes DNA sequence files in compressed FASTQ format. Derived data is available for whole-genome resequencing and genotyping-by-sequencing. 

The Biomass Association Panel planted in Seasons 4 and 6 was described by Brenton et al. (2016). These genotypes have been sequenced and the sequence data and SNPs are available in the on Globus as well as on the CyVerse Data Store https://datacommons.cyverse.org/browse/iplant/home/shared/terraref/genomics.

 

Unpublished Data

Sample Data: Find some previously prepared data samples. These have not been validated but allow you to browse and download example datasets. See terraref.org/sample-data,

Early Access to Even More Data: Users may request early access to data by. Please contact us with your needs. We are working hard to improve documentation, search interfaces, and tutorials in order to enable end users to understand, access, and analyze these diverse datasets. You can learn how to access these data by following our Quick Start Tutorials. We have a few Videos on YouTube to help you get started.

 

Future Releases

For the first public release of data we have focused on a subset of the data that we collected, processed, and subjected to quality assurance and control. We expect that this initial data release and subsets of these data curated for specific research projects will provide users and funders with sufficient information to justify processing, validating, and storing data from additional seasons and sensors.

Here we provide a description of additional data that can be made available for future use.

This data release does not contain all of the data that was collected during the TERRA-REF project. The field scanner was operated as part of the TERRA-REF program with the mission of generating open access data from April 2016 to September of 2019. Although this release focuses on two seasons of data, the first priority is to make all of the raw data and metadata is available in the public domain. This raw data archive is in the process of being transferred to tape-storage at the Texas Advanced Computing Center and we expect it will be available in late 2020.

Additional derived products from the project are also available for use, but are not in the public domain because they have not been consistently curated, processed, and validated. These data are available for use - researchers interested in using these data or creating new datasets are invited to contact the authors for more information. The primary constraint on processing and publishing these datasets is the availabilty of scientists and engineers to process and validate the data.

Any use of these unpublished derived datasets must adhere to the data use and authorship guidelines outlined in the TERRA-REF documentation at docs.terraref.org and the file documentation/docs.terraref.org_2020_04_06.pdf.

Additional Sensors Not Included in the Current Data Release

At this point we have not sufficiently validated data or curated data from the following sensors. For the VNIR and SWIR hyperspectral imaging cameras this reflects challenges faced in calibration. Other sensors have not been prioritized.

Sensor Name Model Technical Specifications
Multi-spectral Radiometers    
Dedicated NDVI Multispectral Radiometer Skye Instruments SKR 1860D/A 650 nm, 800 nm +/- 5 nm; 1 down, 1 up
Dedicated PRI Multispectral Radiometer Skye Instruments SKR 1860ND/A 531nm +/- 3nm; PRI = Photochemical Reflectance Index
Active Reflectance Holland Scientific Crop Circle ACS-430 670 nm, 730 nm, 780 nm
VNIR Spectroradiometer Ocean Optics STS-Vis Range: 337-824 nm @ 1/2 nm
Hyper-spectral Cameras    
VNIR Hyperspectral Imager Headwall Inspector VNIR 380-1000 nm @ 2/3 nm resolution
SWIR Hyperspectral Imager Headwall Inspector SWIR 900-2500 nm @ 2/3 nm resolution
Environmental    
SWIR Spectrometer Spectral Evolution PSR+ Range 800-2500nm; Installed 2018
Open Path CO2 Sensor Vaisala CARBOCAP Carbon Dioxide Probe GMP343 Range: 0-1000 ppm

Additional Seasons Not Included in the Current Data Release

Season Crop Experiments Populations33 Planting Date Harvest
1 Sorghum Density BAP, RIL 2016-04-20 2016-07-16
2 Sorghum Uniformity Trials34 Stay Green RILs F10 2016-07-27 2016-12-02
3 Durum Wheat   Diversity Panel 2016-12-15 2017-04-05
4 Sorghum Late Season Drought   2017-04-13 2017-09-21
5 Durum Wheat   Diversity Panel 2017-11-20 2018-04-05
6 Sorghum   BAP 2018-04-20 2018-08-02
7 Sorghum Hybrid Uniformity Blocks Stay Green RILs, Mutants, F2 families 2018-08-23 2018-11-01
8 Durum Wheat Uniformity Trials Diversity Panel 2019-01-01 2019-03-31
9 S Sorghum   GRASSL x RIO RILs 2019-05-01 2019-07-28
9 N35 Sorghum   SAP 2019-04-29 2019-09-05
           

Babaeian, Ebrahim, Juan R. Gonzalez-Cena, Mohammad Gohardoust, Xiaobo Hou, Scott A. White, and Markus Tuller. 2020. “Physicochemical and Hydrologic Characterization Terra-Ref South Field.” In Prep.

Brenton, Zachary W., Elizabeth A. Cooper, Mathew T. Myers, Richard E. Boyles, Nadia Shakoor, Kelsey J. Zielinski, Bradley L. Rauh, William C. Bridges, Geoffrey P. Morris, and Stephen Kresovich. 2016. “A genomic resource for the development, improvement, and exploitation of sorghum for bioenergy.” Genetics. https://doi.org/10.1534/genetics.115.183947.

Brown, P. W., and B. Russell. 1996. “AZMET, the Arizona Meteorological Network. Arizona Cooperative Extension.” https://cals.arizona.edu/AZMET/.

Burnette, Max, David LeBauer, Solmaz Hajmohammadi, Zongyang Li, Craig Willis, Wei Qin, Patrick, and JD Maloney. 2019. terraref/extractors-multispectral: Season 6 Data Publication (2019) (version S6_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3406311.

Burnette, Max, David LeBauer, Zongyang Li, Wei Qin, Solmaz Hajmohammadi, Craig Willis, Sidke Paheding, and Nick Heyek. 2019. terraref/extractors-stereo-rgb: Season 6 Data Publication (2019) (version S6_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3406304.

Burnette, Max, David LeBauer, Wei Qin, and Yan Liu. 2019. terraref/extractors-metadata: Season 6 Data Publication (2019) (version S6_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3406329.

Burnette, Max, Jerome Mao, David LeBauer, Charlie Zender, and Harsh Agrawal. 2019. terraref/extractors-environmental: Season 6 Data Publication (2019) (version S6_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3406318.

Burnette, Maxwell, Gareth S. Rohde, Noah Fahlgren, Vasit Sagan, Paheding Sidike, Rob Kooper, Jeffrey A. Terstriep, et al. 2018. “TERRA-REF data processing infrastructure.” In ACM International Conference Proceeding Series. https://doi.org/10.1145/3219104.3219152.

Burnette, Max, Craig Willis, Chris Schnaufer, David LeBauer, Nick Heyek, Wei Qin, Solmaz Hajmohammadi, and Kristina Riemer. 2019. terraref/terrautils: Season 6 Data Publication (2019) (version S6_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3406335.

Burnette, Max, Charlie Zender, JeromeMao, David LeBauer, Rachel Shekar, Noah Fahlgren, Craig Willis, et al. 2020. terraref/computing-pipeline: Season 4 & 6 Data Publication (2019) (version S46_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3635849.

Burnette, Max, ZongyangLi, Solmaz Hajmohammadi, David LeBauer, Nick Heyek, and Craig Willis. 2019. terraref/extractors-3dscanner: Season 6 Data Publication (2019) (version S6_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3406332.

Chamberlain, Scott, Zachary Foster, Ignasi Bartomeus, David LeBauer, Chris Black, and David Harris. 2019. Traits: Species Trait Data from Around the Web. https://CRAN.R-project.org/package=traits.

LeBauer, David, Nick Heyek, Rachel Shekar, Katrin Leinweber, JD Maloney, and Tino Dornbusch. 2020. terraref/reference-data: Season 4 & 6 Data Publication (2019) (version S46_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3635863.

LeBauer, David, Craig Willis, Rachel Shekar, Max Burnette, Ting Li, Scott Rohde, Yan Liu, et al. 2020. terraref/documentation: Season 6 Data Publication (2019) (version v0.9). Zenodo. https://doi.org/10.5281/zenodo.3661373.

Mao, Jerome, Max Burnette, Henry Butowsky, Charlie Zender, David LeBauer, and Sidke Paheding. 2019. terraref/extractors-hyperspectral: Season 6 Data Publication (2019) (version S6_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3406312.

Marini, Luigi, Rob Kooper, Indira Gutierrez, Constantinos Sophocleous, Max Burnette, Todd Nicholson, Michal Ondrejcek, et al. 2019. Clowder: Open Source Data Management for Long Tail Data (version v1.7.1). Zenodo. https://doi.org/10.5281/zenodo.3300953.

Rohde, Scott, Carl Crott, Patrick Mulroony, Jeremy Kemball, David LeBauer, Rob Kooper, Jimmy Chen, et al. 2016. Bety: BETYdb 4.6. Zenodo. https://doi.org/10.5281/zenodo.48661.

Selby, Peter, Rafael Abbeloos, Jan Erik Backlund, Martin Basterrechea Salido, Guillaume Bauchet, Omar E Benites-Alfaro, Clay Birkett, et al. 2019. “BrAPI—an application programming interface for plant breeding applications.” Bioinformatics 35 (20): 4147–55. https://doi.org/10.1093/bioinformatics/btz190.

Willis, Craig, David LeBauer, Max Burnette, and Rachel Shekar. 2020. terraref/sensor-metadata: Season 4 & 6 Data Publication (2019) (version S46_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3635853.

 

Sensor Data

Sensor data include spectral reflectance, 3D point cloud, flourescence, hyperspectral imagery, multispectral, stereo RGB, and infrared heat imaging collected from multiple platforms. ...Read More

Clowder Sensor Database

Weather Data

Environmental data, including weather, irrigation, and solar radiation measurements is available through the TERRA-REF sensor data portal.  ...Read More

How to Access Weather Data

Trait Data

TERRA-REF includes agronomic and trait data from both manual collection and automated phenotyping from images generated by the LemnaTec Scanalyzer 3D platform at the Donal Danforth Plant Science Center using PlantCV.  ...Read More

Access Trait Data (Quick Start in R)

Genomics Data

Genomic data includes whole-genome resequencing data for 384 accessions of the sorghum Bioenergy Association Panel (BAP) and genotyping-by-sequencing (GBS) data for 768 sorghum recombinant inbred lines (RILs)...Read More

Genomics Data on CyVerse Data Store