Field Data First Look

Image credit: Pedro Andrande-Sanchez

Field Data First Look

By David LeBauer; ARPA-E TERRA Reference Data and Computing Team Comment

Data Sources

Data from phenotyping platforms at two locations is now available. One set is the raw data from the Lemnatec Field phenotyping platform. The other data set includes images from the Lemnatec controlled environment system at the Danforth Center.

To follow the development of the phenomics data pipelines, please see links to GitHub below and sign up for our mailing list.

Lemnatec Field Scanner Sensors

Winter wheat was grown in February and March, and Sorghum was planted in mid April. The data stream is live, so the raw data products become available as soon as they are collected and transferred from the field in Arizona to computers in Illinois. Currently only the raw binary data and text meta-data generated by the field phenotyping platform is available.

Imaging sensors on the field scanner include imaging spectrometers, dedicated multispectral sensors capturing NDVI (Normalized Difference Vegetation Index) and PRI (Photochemical Reflectance Index),

Environmental sensors include a meterological station that measures wind speed and direction, temperature, downwelling photosynthetically active radiation (PAR), downwelling radiances (350-800nm), specific humidity, precipitation, air pressure, and atmospheric CO2.

For more information see this previous article on sensors, with links to sensor data sheets.

Lemnatec Controlled Environment Scanner Sensors

The Bellweather Phenotyping Facility at the Danforth Center has a conveyor system that moves plants grown in a fully climate controlled growth house through a multi-camera digital imaging system.

Imaging sensors include RGB to quantify plant color and structural morphology, NIR to estimate water content, and Fluorescent imaging to visualize chlorophyll fluorescence. The system is equipped with a dark adaptation tunnel preceding the fluorescent imaging chamber, allowing the analysis of photosystem II efficiency.

Image files and meta-data from a Sorghum pilot experiment conducted at the Danforth Center is available. Output from the PlantCV pipeline will be available shortly.

Data Products

Data products will expanded, revised, and released annually in November. The alpha (2016) and beta (2017) versions will be released to TERRA researchers and version 1.0 release (2018) is planned to be fully open access. Before the official November releases, we will provide access to data as early and openly as possible.

Version updates will depend on user feedback. Data are not publicly available at this point, but are available to TERRA scientists and collaborators. Other academic and commercial users are encouraged to contact us to discuss how you will use data.

How to Find and Use Data

There are two ways to access data. To search, select, and compute on data in the cloud use Clowder. Clowder simplifies the use use large datasets. You can select data and then open Rstudio and Jupyter interfaces to for analysis. You can also annotate, contribute, and share data. To download data directly to your computer or server, use Globus. Both of these options are described below.

Please note that clowder and the available data are under development. This is pre-alpha access.

Online Data Access and Analysis

TERRAREF uses Clowder to organize, annotate, and process data generated by phenotyping platforms. To access data from the University of Arizona Field Scanner or the Danforth Indoor Phenotyping system request an account on the TERRA REF Clowder interface by clicking “Sign up” in the upper right corner.

The video below demonstrates how to set up and use Clowder locally as well as how to search for data and launch analysis tools. This video illustrates the TERRA REF infrastructure for processing Sorghum test data from Danforth’s controlled environment phenotyping system.

Data Organization

You can explore collections and datasets through the Clowder web interface or the Clowder API. Data is organized into datasets, collections and spaces.

  • Datasets consist of one or more files with associated metadata collected by one sensor at one time point. Users can annotate, download, and use these sensor datasets.
  • Collections consist of one or more datasets. Currently, we have collections to organize sensors by collection date and sensor. Users can create their own collections.
  • Spaces contain collections and datasets. TERRA REF uses one space for each of the phenotyping platforms.

Data Analysis

After selecting a dataset, the Tool sessions menu on the lower right sidebar allows users to launch analysis tools. Currently, users can choose between launching Rstudio or Jupyter. These tools support R and Python as well as many familiar programming languages. Additional tools can be added based on user demand.

If you are willing to share source or compiled code that you have written to process data or metadata please let us know. The easiest way is to open a new issue or submit a pull request to github.com/terraref/computing-pipeline with your code in a new folder in the scripts/ directory; basic syntax is script [inputs] [outputs].

See our Clowder Documentation for more details.

Downloading Data

Globus provides a way for users to transfer large amounts of data. This is how to configure Globus to transfer data to your computer or server:

  1. sign up for Globus at globus.org
  2. send your Globus id (or University email) to David LeBauer (dlebauer@illinois.edu) with ‘TERRAREF Globus Access Request’ in the subject.
  3. you will be notified once you have been granted access
  4. log into Globus https://www.globus.org
  5. add an endpoint for the destination (e.g. your local computer) https://www.globus.org/app/endpoints/create-gcp
  6. download and setup Globus Connect (on that page)
  7. go to the ‘transfer files’ page: https://www.globus.org/app/transfer
  8. select source
    • Endpoint: Terraref
    • Path: Navigate to sensor you want under /MovingSensor/
    • select (click) a folder, e.g., at time of writing the latest stereo sensor data is in /MovingSensor/stereoTop/2016-04-07/2016-04-07__16-47-22-087/
    • select (highlight) files that you want to download
  9. at destination
    • select the endpoint that you set up above of your local computer or server
    • select the destination folder (e.g. /~/Downloads/)
  10. click ‘go’
  11. files should be on your computer

Globus Documentation: https://docs.globus.org/how-to/get-started/

Tell us what data and software you want

We are designing and building a substantial set of data products over the next few years. The volume of data generated by the Field Scanner is unprecedented and we would like to prioritize data collection from sensors that provide the most scientifically valuable information. Your feedback will help us identify and prioritize the suite of data products we produce and which interfaces and standards we support. We are open to working directly with academic and industry partners interested in using our data and software.

Provide Input on Specific Issues

Here are a few that we are currently working on

Ask Questions, Request Features, and Give Feedback

Questions and comments about the reference data products, including support for standard formats, databases, and software, please visit the Reference Data GitHub repository or our chat room.

Questions and comments about the computing pipeline, including data access and analysis, please visit the Reference Data GitHub repository and our computing pipeline chat room.

comments powered by Disqus