Processing math: 100%

Welcome to the WhoseEgg App

WhoseEgg is an R Shiny app for predicting the identification of fish eggs with an objective of detecting invasive carp (Bighead, Grass, and Silver Carp) in the Upper Mississippi River basin. Users are able to provide the required fish egg characteristics to the app, and the predicted family, genus, and species taxonomy levels will be returned. The predictions are made using random forests that are based on the models developed in Camacho et al. (2019) and validated in Goode et al. (2021).

See the first tab below for information on how to use the app. The other tabs below describe the locations where eggs were collected for training the random forests and the species present in the training data. We caution the use of WhoseEgg with eggs collected in different locations or if other species are believed to be present.

See the help page for information about the random forest models used by WhoseEgg, the definitions of the egg characteristics, and recommendations for how to handle data from different locations or containing different species.



Follow the steps below to obtain predictions. Additional instructions are inlcuded on the page corresponding to a step.


Funding for WhoseEgg was provided by the U.S. Fish and Wildlife Service through Grant #F20AP11535-00.

Data privacy statement: Data uploaded to WhoseEgg will not be saved by WhoseEgg or distributed.

Input of Egg Characteristics


Overview

This page contains the tools for providing the fish egg characteristics that will be used by the random forests to predict the fish taxonomies. To provide the egg characteristics, follow the instructions in the sidebar panel to the left.

The egg characteristic data must be formatted appropriately to work with WhoseEgg and correctly obtain predictions. Follow the guidelines in the Spreadsheet Specifications tabs below. Once the egg characteristic spreadsheet is uploaded, several additional variables will be computed based on the input values to be used by the random forests: Julian_Day, Membrane_SD, Membrane_CV, Embryo_SD, Embryo_CV, and Embryo_to_Membrane_Ratio. The uploaded variables of Year and Day are only used by WhoseEgg to compute Julian_Day and are excluded from the processed data.

Under Egg Characteristics, see the 'Input Data' tab to view data in the uploaded spreadsheet and the 'Processed Data' tab for the set of predictor variables to be used by the random forest plus the Egg ID.

See the 'Random Forest Details' tab on the help page for a full list of the predictor variables used by the random forests in WhoseEgg.

Spreadsheet Specifications


  • Fill in all variables (egg_ID and the 13 egg characteristics)

  • Use the helpers in the template to correctly enter the variable values (see the 'Template Helpers' tab for more info)

  • See the help page for detailed definitions of the egg characteristics (includes example photos)

  • Variable names must be exactly as they appear in the template


Egg Characteristics

Results from Random Forests


Overview

This page provides the ability to compute and display the random forest predictions for the egg data provided via the 'Data Input' tab. To obtain the predictions, follow the instructions in the sidebar panel to the left. The sections below provide tools for viewing and exploring the predictions.

See the Table of Predictions below for the random forest predictions and corresponding probabilities for each fish egg. The columns of Family Pred, Genus Pred, and Species Pred contain taxonomic level for the corresponding egg with the highest random forest probability. The columns of Family Prob, Genus Prob, and Species Prob contain the corresponding random forest probabilities. A random forest probability is the proportion of trees in the random forest that predict a certain level. See the 'Random Forest Details' tab on the help page for information on how random forest predictions and probabilities are determined.

See the Visualizations of Predictions below for various visualizations of the random forest predictions.


Table of Predictions


Visualizations of Predictions


Frequency of Predictions per Taxonomic Level

Each plot shows the levels of family, genus, and species included in the predictions. The length of the bars represent the total number of eggs classified within a level by the random forest.


Download Data with Predictions


Overview

This page is used to download a spreadsheet containing the input egg data and the random forest results. To download the data, follow the instructions in the sidebar panel to the left.

The Download Preview Table below contains the data that will be included in the spreadsheet when downloaded. The spreadsheet includes:

  • all initial variables uploaded to WhoseEgg,
  • variables computed to generate random forest predictions,
  • the random forest predictions, and
  • the random forest probabilities for all taxonomic levels.

Download Preview Table

Help Page

This page contains additional information to assist with the use of WhoseEgg. The tabs of Environmental Variables and Morphological Variables contain information about the egg characteristics used in WhoseEgg including their definitions and required spreadsheet formats. The Random Forest Details tab contains information on random forests in general and the random forests used for prediction in WhoseEgg. The FAQ tab contains answers to common questions users may have such as how to handle data collected at locations outside the region where the training data were collected. For questions that are not answered by the content provided here, please email whoseegg@iastate.edu.


05-help-vars-env.utf8

Day

Definition: Day of the month when the fish egg is collected

Spreadsheet Variable Name: Day

Format: Integer between 1 and 31, respective to the month

Required Upload or Computed After Upload: Required upload

Random Forest Predictor Variable: No


Conductivity

Definition: Conductivity (μ/cm) of the water where the egg is collected

Spreadsheet Variable Name: Conductivity

Format: Continuous variable greater than 0

Required Upload or Computed After Upload: Required upload

Random Forest Predictor Variable: Yes

Additional Information: Training data conductivity values range between 274 μS/cm and 781 μS/cm


Julian Day

Definition: Julian day when the fish egg is collected

Spreadsheet Variable Name: Julian_Day

Format: Integer between 1 and 365

Required Upload or Computed After Upload: Computed after upload

Random Forest Predictor Variable: Yes

Additional Information: Julian days in training data range between 113 and 243


Month

Definition: Month when the fish egg is collected

Spreadsheet Variable Name: Month

Format: Integer between 1 and 12

Required Upload or Computed After Upload: Required upload

Random Forest Predictor Variable: Yes

Additional Information: Months in training data are 4, 5, 6, 7, 8


Temperature

Definition: Temperature (degrees Celsius) of the water where the egg is collected

Spreadsheet Variable Name: Temperature

Format: Continuous variable

Required Upload or Computed After Upload: Required upload

Random Forest Predictor Variable: Yes

Additional Information: Training data temperature values range between 11 C and 30.7 C


Year

Definition: Year when the fish egg is collected

Spreadsheet Variable Name: Year

Format: YYYY

Required Upload or Computed After Upload: Required upload

Random Forest Predictor Variable: No

06-references.utf8

References

Camacho, Carlos A., Christopher J. Sullivan, Michael J. Weber, and Clay L. Pierce. 2019. Morphological Identification of Bighead Carp, Silver Carp, and Grass Carp Eggs Using Random Forests Machine Learning Classification.” North American Journal of Fisheries Management 39 (6): 1373–1384. https://doi.org/10.1002/nafm.10380.
Cutler, D. Richard, Thomas C. Edwards, Karen H. Beard, Adele Cutler, Kyle T. Hess, Jacob Gibson, and Joshua J. Lawler. 2007. Random Forests for Classifcation in Ecology.” Ecology 88 (11): 2783–92. https://doi.org/10.1890/07-0539.1.
Goode, Katherine, Michael J. Weber, Aaron Matthews, and Clay L. Pierce. n.d. “Evaluation of a Random Forest Model to Identify Invasive Carp Eggs Based on Morphometric Features.” North American Journal of Fisheries Management. https://doi.org/10.1002/nafm.10616.
Kelso, W. E., and D. A. Rutherford. 1996. Collection, preservation, and identification of fish eggs and larvae.” In Fisheries Techniques, edited by B. R. Murphy and D. W. Willis, 2nd ed., 255–302. Bethesda, Maryland: American Fisheries Society.
Liaw, Andy, and Matthew Wiener. 2002. “Classification and Regression by randomForest.” R News 2 (3): 18–22. https://CRAN.R-project.org/doc/Rnews/.