Access Data
TERRA REF is producing reference data sets that include direct measurements and sensor observations, derived plant phenotypes, along with genetic and genomic data.
See our documentation for more information about
Published, Public Domain Data:
We are publishing data based on demand. Our first publication included data from Seasons 4 and 6. Phenotype data and metadata have been curated on Dryad. The Dryad record also references larger sensor and genomics files that can be found at the #terra-public endpoint on Globus. These can be cited as:
LeBauer, D.S., Burnette, M.A., Demieville, J., Fahlgren, N., French, A.N., Garnett, R., Hu, Z., Huynh, K., Kooper, R., Li, Z., Maimaitijiang, M., Mao, J., Mockler, T.C., Morris, G.S., Newcomb, M., Ottman, M., Ozersky, P., Paheding, S., Pauli, D., Pless, R., Qin, W., Riemer, K., Rohde, S., Rooney, W.L., Sagan, V., Shakoor, N., Stylianou, A., Thorp, K., Ward, R., White, J.W., Willis, C., and Zender C.S. (2020). TERRA-REF, An Open Reference Data Set From High Resolution Genomics, Phenomics, and Imaging Sensors. Dryad Digital Repository. http://doi.org/10.5061/dryad.
Datasets
TERRA REF has produced the following types of data
- Sensor Data from five thermal, light, and shape imaging sensors.
- Phenotypes include both sensor-derived and hand collected plot-level field measurements. These can be used to validate, calibrate, and train algorithms.
- Environmental data include time series of meteorological variables including temperature, relative humidity, precipitation, wind direction and speed, photosynthetically active radiation, and downwelling spectral radiance.
- Genomics data include whole-genome resequencing data for 384 varieties from the sorghum Bioenergy Association Panel (BAP) and genotyping-by-sequencing data for 768 sorghum Recombinant Inbred Lines (RILs). Contains raw and derived sorghum genome sequencing data. Raw data includes DNA sequence files in compressed FASTQ format. Derived data are available in Variant Call Format (VCF) and Hapmap files.
Sensor Data
Data generated by the following sensors are included in the public domain data. Additional sensors not represented in the first data release are listed in the section on sensors in the section on additional sensors.
Sensor Name | Model | Technical Specifications |
---|---|---|
Imaging Sensors | ||
Stereo RGB Camera | Allied Vision Prosilica GT3300C | |
Laser Scanner | Custom Fraunhofer 3D) | Spatial Resolution: 0.3 to 0.9 mm |
Thermal Infrared | FLIR A615) | Thermal Sensitivity << 50mK @ 30∘∘C |
PS II Camera | LemnaTec PS II Fluorescence Prototype) | Illumination 635nm x 4000 μμmol/m2/s, Camera 50 fps |
Environmental Sensors | ||
Environmental Sensors | Thies Clima 4.9200.00.000) | |
VNIR Spectrometer | Spectral Evolution PSR+3500 | Range 350 to 800 nm |
Environmental Sensors | Thies Clima 4.9200.00.000) | |
VNIR Spectrometer | Spectral Evolution PSR+3500 | Range 350 to 800 nm |
PAR Sensor | Quantum SQ–300 | Spectral Range 410 to 655 nm |
Sensor Data Products
Over 500 TB of sensor data are available and are categorized as raw, Level 1, and Level 2. This size could be substantially reduced by removing duplicate information and through compression.
Sensor data are stored on the Storage Condo at the National Center for Supercomputing Applications in Urbana, Illinois. We make them available for download with the Globus file transfer system. The following steps are required to access them: 1) get an account at globus.org; 2) search for the terra-public
endpoint; 3) install the Globus Personal Connect application and transfer data. Further information is provided in the data access chapter of the TERRA-REF documentation. As an alternative, the data can be provided on hard drives for the cost of supplies, labor, and shipping.
Below is a summary of the sensor data products included in the first release of TERRA-REF data. Sensor-derived phenotypes described in the Phenotype Data section were generated from the 3D laser scanner and RGB camera sensors as described in metadata/methods.csv
.
Data Product | Sensor | Algorithm | File Format | Plot Clip | Full Field |
---|---|---|---|---|---|
Environment | Thies Clima | envlog2netcdf | netcdf | NA | NA |
Thermal Image | FLIR | ir_geotiff | geotiff | + | |
Point Cloud | Fraunhofer Laser 3D | laser3d_las | las | + | |
Point Cloud | Fraunhofer Laser 3D | scanner3DTop | ply | ||
Images Time-Series | PSII Camera | ps2png | png | ||
Color Images | RGB Stereo | bin2tiff | geotiff | + | + |
Plant Mask | RGB Stereo | rgb_mask | geotiff | x |
Sensor Data Directory Contents on Globus
The following list describes the organization and contents of the Storage Condo server that can be accessed at the ncsa#terra-public
endpoint on Globus. Directory names have a leading /
while file names do not.
- Environment Logger
/envlog_netcdf
- Daily aggregated files named
envlog_netcdf_L1_ua-mac_[YYYY-MM-DD].nc
. - There are also 24 hourly files for each day named
[YYYY-MM-DD_HH-MM-SS]_environmentlogger.nc
.
- Daily aggregated files named
- Laser3D
/laser3d_las
- One merged file per scan across the short (E-W) axis with names ending in
_merged.las
. There are typically 50-100 of these each day.
- One merged file per scan across the short (E-W) axis with names ending in
/laser3d_las_plot
- Each directory has the name of one plot, and there is one LAS file clipped to the plot boundaries for each scan (there may be more than one scan per day).
- RGB Stereo:
/rgb_geotiff
- File names ending in
_left.tif
and_right.tif
represent simultaneous images from left and right stereo pair cameras.
- File names ending in
/rgb_mask
- These images have the soil represented as black pixels. For each file ending in
*_left_mask.tif
in the RGB Geotiff dataset, an image with black pixels representing areas that contain soil and not plants.
- These images have the soil represented as black pixels. For each file ending in
/rgb_geotiff_plots
- For each RGB Geotiff image, a Geotiff file with the same dimensions as the plot. It contains the image clipped to the plot boundaries as well as fill values for parts of the plot not in the image.
/rgb_fullfield
- Key data product is one full resolution full-field image per scan.
- Other files include: lower resolution versions of the full field (files with names ending in
_10pct.tif
,_thumb.tif
and.png
); CSV files containing canopy cover values for each plot; a JSON file listing images contained in the fullfield mosaic; a VRT file that is a “virtual geotiff” that was used to generate the full-field mosaic. - These full field Geotiff images are RGB images and image masks tiled together to make up a full-field view. These full field images are not orthomosaics since they are not stitched together because doing so causes geometric aberrations.
- PSII Camera:
/ps2_png
:- 101
.png
files per folder. The order of the images is indicated by the last four digits of the file name, i.e._0000.png
to_0100.png
. - 101 georeferenced Geotiff files otherwise identical to the PNG counterparts.
- These files represent a time series of images captured at a rate of 50 frames per second.
- 101
Phenotype Data
Tables of phenotypes can be found in the compressed files named traits/season_[n]_traits/
folder inside the trait_data.zip
file. There is one subdirectory for each of seasons 4 and 6. Once uncompressed, each directory will contain one CSV file for each combination of trait and measurement method. The names of these CSV files help identify the contents because they follow the pattern season_[n]_[trait]_[measurement_type].csv
. For example, the file season_6_aboveground_biomass_manual.csv
contains manual measurements of above-ground biomass taken during season 6.
These CSV files have one measurement per row for a specific date, location, genotype, and measurement. The first line is a header that contains the names of the fields:
- plot (text) Plot name, using the format
<field site> Season <n> Range <m> Column <k>
. - scientificname (text) Latin name for the crop species. This will always be Sorghum bicolor until future versions with data from additional crops are published.
- genotype (text) Genotype or accession identifier.
- treatment (text) Name of experimental treatment.
- date: (YYYY-MM-DD) Date of measurement.
- trait: (text) Name of the trait measured. Defined in the file
metadata/variables.csv
. - method: (text) The method used to measure the trait. Defined in the file
metadata/methods.csv
. - mean: (numeric) Value of the phenotype data.
- checked: (boolean) 0 = unchecked and 1 = checked: has the data been independently reviewed?
- author: (text) name of scientist who collected the data or who wrote the algorithm used to derive phenotypes from sensor data.
- season: (text) Name of season: one of ‘Season 4’ or ‘Season 6’.
- method_type: (text) Type of measurement: one of ‘manual’ or ‘sensor’.
Environmental Data
Weather
We have data from two stations, AZMet and the field scanner system.
Field Scanner System “Environment Logger”
These data are from the Theis-Clima weather station mounted on the field scanner. The Environment Logger has more sensors and provides high temporal resolution data than the weather station. In the NetCDF files, data is recorded at five seconds intervals. This logger also collects the downwelling solar radiation every five seconds at a spectral resolution of 0.5 nm. The sensors and data collection frequency are described in the section on sensor data. We describe both the full resolution data provided as NetCDF files in the sensor data product named “envlog_netcdf” and the 5–minute aggregated data provided here as JSON files that were accessed using the Geostreams API as described in the TERRA REF tutorials. These time series are not continuous, as can be seen in Figure 3.2.
AZMET
These gap-filled and corrected data have been provided for convenience. When using these data, users must cite Brown and Russell (1996): > Brown, P. W., & Russell, B. (1996). AZMET, The Arizona Meteorological Network. Arizona Cooperative. website: https://cals.arizona.edu/AZMET/.
Soils Data
A comprehensive analysis of soil physical properties will be published alongside the TERRA-REF datasets (Babaeian et al. 2020). Please contact Markus Tuller (mtuller@cals.arizona.edu) for access to these data.
Genomics Data
The genomics dataset includes raw and derived sorghum genome sequencing data from the TERRA-REF project. Raw data includes DNA sequence files in compressed FASTQ format. Derived data is available for whole-genome resequencing and genotyping-by-sequencing.
The Biomass Association Panel planted in Seasons 4 and 6 was described by Brenton et al. (2016). These genotypes have been sequenced and the sequence data and SNPs are available in the on Globus as well as on the CyVerse Data Store https://datacommons.cyverse.org/browse/iplant/home/shared/terraref/genomics.
Unpublished Data
Sample Data: Find some previously prepared data samples. These have not been validated but allow you to browse and download example datasets. See terraref.org/sample-data,
Early Access to Even More Data: Users may request early access to data by. Please contact us with your needs. We are working hard to improve documentation, search interfaces, and tutorials in order to enable end users to understand, access, and analyze these diverse datasets. You can learn how to access these data by following our Quick Start Tutorials. We have a few Videos on YouTube to help you get started.
Future Releases
For the first public release of data we have focused on a subset of the data that we collected, processed, and subjected to quality assurance and control. We expect that this initial data release and subsets of these data curated for specific research projects will provide users and funders with sufficient information to justify processing, validating, and storing data from additional seasons and sensors.
Here we provide a description of additional data that can be made available for future use.
This data release does not contain all of the data that was collected during the TERRA-REF project. The field scanner was operated as part of the TERRA-REF program with the mission of generating open access data from April 2016 to September of 2019. Although this release focuses on two seasons of data, the first priority is to make all of the raw data and metadata is available in the public domain. This raw data archive is in the process of being transferred to tape-storage at the Texas Advanced Computing Center and we expect it will be available in late 2020.
Additional derived products from the project are also available for use, but are not in the public domain because they have not been consistently curated, processed, and validated. These data are available for use - researchers interested in using these data or creating new datasets are invited to contact the authors for more information. The primary constraint on processing and publishing these datasets is the availabilty of scientists and engineers to process and validate the data.
Any use of these unpublished derived datasets must adhere to the data use and authorship guidelines outlined in the TERRA-REF documentation at docs.terraref.org and the file documentation/docs.terraref.org_2020_04_06.pdf
.
Additional Sensors Not Included in the Current Data Release
At this point we have not sufficiently validated data or curated data from the following sensors. For the VNIR and SWIR hyperspectral imaging cameras this reflects challenges faced in calibration. Other sensors have not been prioritized.
Sensor Name | Model | Technical Specifications |
---|---|---|
Multi-spectral Radiometers | ||
Dedicated NDVI Multispectral Radiometer | Skye Instruments SKR 1860D/A | 650 nm, 800 nm +/- 5 nm; 1 down, 1 up |
Dedicated PRI Multispectral Radiometer | Skye Instruments SKR 1860ND/A | 531nm +/- 3nm; PRI = Photochemical Reflectance Index |
Active Reflectance | Holland Scientific Crop Circle ACS-430 | 670 nm, 730 nm, 780 nm |
VNIR Spectroradiometer | Ocean Optics STS-Vis | Range: 337-824 nm @ 1/2 nm |
Hyper-spectral Cameras | ||
VNIR Hyperspectral Imager | Headwall Inspector VNIR | 380-1000 nm @ 2/3 nm resolution |
SWIR Hyperspectral Imager | Headwall Inspector SWIR | 900-2500 nm @ 2/3 nm resolution |
Environmental | ||
SWIR Spectrometer | Spectral Evolution PSR+ | Range 800-2500nm; Installed 2018 |
Open Path CO2 Sensor | Vaisala CARBOCAP Carbon Dioxide Probe GMP343 | Range: 0-1000 ppm |
Additional Seasons Not Included in the Current Data Release
Season | Crop | Experiments | Populations33 | Planting Date | Harvest |
---|---|---|---|---|---|
1 | Sorghum | Density | BAP, RIL | 2016-04-20 | 2016-07-16 |
2 | Sorghum | Uniformity Trials34 | Stay Green RILs F10 | 2016-07-27 | 2016-12-02 |
3 | Durum Wheat | Diversity Panel | 2016-12-15 | 2017-04-05 | |
4 | Sorghum | Late Season Drought | 2017-04-13 | 2017-09-21 | |
5 | Durum Wheat | Diversity Panel | 2017-11-20 | 2018-04-05 | |
6 | Sorghum | BAP | 2018-04-20 | 2018-08-02 | |
7 | Sorghum | Hybrid Uniformity Blocks | Stay Green RILs, Mutants, F2 families | 2018-08-23 | 2018-11-01 |
8 | Durum Wheat | Uniformity Trials | Diversity Panel | 2019-01-01 | 2019-03-31 |
9 S | Sorghum | GRASSL x RIO RILs | 2019-05-01 | 2019-07-28 | |
9 N35 | Sorghum | SAP | 2019-04-29 | 2019-09-05 | |
Babaeian, Ebrahim, Juan R. Gonzalez-Cena, Mohammad Gohardoust, Xiaobo Hou, Scott A. White, and Markus Tuller. 2020. “Physicochemical and Hydrologic Characterization Terra-Ref South Field.” In Prep.
Brenton, Zachary W., Elizabeth A. Cooper, Mathew T. Myers, Richard E. Boyles, Nadia Shakoor, Kelsey J. Zielinski, Bradley L. Rauh, William C. Bridges, Geoffrey P. Morris, and Stephen Kresovich. 2016. “A genomic resource for the development, improvement, and exploitation of sorghum for bioenergy.” Genetics. https://doi.org/10.1534/genetics.115.183947.
Brown, P. W., and B. Russell. 1996. “AZMET, the Arizona Meteorological Network. Arizona Cooperative Extension.” https://cals.arizona.edu/AZMET/.
Burnette, Max, David LeBauer, Solmaz Hajmohammadi, Zongyang Li, Craig Willis, Wei Qin, Patrick, and JD Maloney. 2019. terraref/extractors-multispectral: Season 6 Data Publication (2019) (version S6_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3406311.
Burnette, Max, David LeBauer, Zongyang Li, Wei Qin, Solmaz Hajmohammadi, Craig Willis, Sidke Paheding, and Nick Heyek. 2019. terraref/extractors-stereo-rgb: Season 6 Data Publication (2019) (version S6_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3406304.
Burnette, Max, David LeBauer, Wei Qin, and Yan Liu. 2019. terraref/extractors-metadata: Season 6 Data Publication (2019) (version S6_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3406329.
Burnette, Max, Jerome Mao, David LeBauer, Charlie Zender, and Harsh Agrawal. 2019. terraref/extractors-environmental: Season 6 Data Publication (2019) (version S6_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3406318.
Burnette, Maxwell, Gareth S. Rohde, Noah Fahlgren, Vasit Sagan, Paheding Sidike, Rob Kooper, Jeffrey A. Terstriep, et al. 2018. “TERRA-REF data processing infrastructure.” In ACM International Conference Proceeding Series. https://doi.org/10.1145/3219104.3219152.
Burnette, Max, Craig Willis, Chris Schnaufer, David LeBauer, Nick Heyek, Wei Qin, Solmaz Hajmohammadi, and Kristina Riemer. 2019. terraref/terrautils: Season 6 Data Publication (2019) (version S6_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3406335.
Burnette, Max, Charlie Zender, JeromeMao, David LeBauer, Rachel Shekar, Noah Fahlgren, Craig Willis, et al. 2020. terraref/computing-pipeline: Season 4 & 6 Data Publication (2019) (version S46_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3635849.
Burnette, Max, ZongyangLi, Solmaz Hajmohammadi, David LeBauer, Nick Heyek, and Craig Willis. 2019. terraref/extractors-3dscanner: Season 6 Data Publication (2019) (version S6_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3406332.
Chamberlain, Scott, Zachary Foster, Ignasi Bartomeus, David LeBauer, Chris Black, and David Harris. 2019. Traits: Species Trait Data from Around the Web. https://CRAN.R-project.org/package=traits.
LeBauer, David, Nick Heyek, Rachel Shekar, Katrin Leinweber, JD Maloney, and Tino Dornbusch. 2020. terraref/reference-data: Season 4 & 6 Data Publication (2019) (version S46_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3635863.
LeBauer, David, Craig Willis, Rachel Shekar, Max Burnette, Ting Li, Scott Rohde, Yan Liu, et al. 2020. terraref/documentation: Season 6 Data Publication (2019) (version v0.9). Zenodo. https://doi.org/10.5281/zenodo.3661373.
Mao, Jerome, Max Burnette, Henry Butowsky, Charlie Zender, David LeBauer, and Sidke Paheding. 2019. terraref/extractors-hyperspectral: Season 6 Data Publication (2019) (version S6_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3406312.
Marini, Luigi, Rob Kooper, Indira Gutierrez, Constantinos Sophocleous, Max Burnette, Todd Nicholson, Michal Ondrejcek, et al. 2019. Clowder: Open Source Data Management for Long Tail Data (version v1.7.1). Zenodo. https://doi.org/10.5281/zenodo.3300953.
Rohde, Scott, Carl Crott, Patrick Mulroony, Jeremy Kemball, David LeBauer, Rob Kooper, Jimmy Chen, et al. 2016. Bety: BETYdb 4.6. Zenodo. https://doi.org/10.5281/zenodo.48661.
Selby, Peter, Rafael Abbeloos, Jan Erik Backlund, Martin Basterrechea Salido, Guillaume Bauchet, Omar E Benites-Alfaro, Clay Birkett, et al. 2019. “BrAPI—an application programming interface for plant breeding applications.” Bioinformatics 35 (20): 4147–55. https://doi.org/10.1093/bioinformatics/btz190.
Willis, Craig, David LeBauer, Max Burnette, and Rachel Shekar. 2020. terraref/sensor-metadata: Season 4 & 6 Data Publication (2019) (version S46_Pub_2019). Zenodo. https://doi.org/10.5281/zenodo.3635853.
Sensor Data
Sensor data include spectral reflectance, 3D point cloud, flourescence, hyperspectral imagery, multispectral, stereo RGB, and infrared heat imaging collected from multiple platforms. ...Read More
Clowder Sensor DatabaseWeather Data
Environmental data, including weather, irrigation, and solar radiation measurements is available through the TERRA-REF sensor data portal. ...Read More
How to Access Weather DataTrait Data
TERRA-REF includes agronomic and trait data from both manual collection and automated phenotyping from images generated by the LemnaTec Scanalyzer 3D platform at the Donal Danforth Plant Science Center using PlantCV. ...Read More
Access Trait Data (Quick Start in R)Genomics Data
Genomic data includes whole-genome resequencing data for 384 accessions of the sorghum Bioenergy Association Panel (BAP) and genotyping-by-sequencing (GBS) data for 768 sorghum recombinant inbred lines (RILs)...Read More
Genomics Data on CyVerse Data Store