Using Bayesian Neural Networks (BNN) in regression as well as classification tasks based on the below mentioned oceanographic dataset.
The dataset used for performing the experimentations is the CalCOFI: Over 60 years of oceanographic data from Kaggle.
The CalCOFI data set represents the longest (1949-present) and most complete (more than 50,000 sampling stations) time series of oceanographic and larval fish data in the world. It includes abundance data on the larvae of over 250 species of fish; larval length frequency data and egg abundance data on key commercial species; and oceanographic and plankton data. The physical, chemical, and biological data collected at regular time and space intervals quickly became valuable for documenting climatic cycles in the California Current and a range of biological responses to them. CalCOFI research drew world attention to the biological response to the dramatic Pacific-warming event in 1957-58 and introduced the term “El Niño” into the scientific literature.
The California Cooperative Oceanic Fisheries Investigations (CalCOFI) are a unique partnership of the California Department of Fish & Wildlife, NOAA Fisheries Service and Scripps Institution of Oceanography. The organization was formed in 1949 to study the ecological aspects of the sardine population collapse off California. The focus nowadays has shifted to the study of the marine environment off the coast of California, the management of its living resources, and monitoring the indicators of El Nino and climate change. CalCOFI conducts quarterly cruises off southern & central California, collecting a suite of hydrographic and biological data on station and underway.
Data collected at depths down to 500 m include: temperature, salinity, oxygen, phosphate, silicate, nitrate and nitrite, chlorophyll, transmissometer, PAR, C14 primary productivity, phytoplankton biodiversity, zooplankton biomass, and zooplankton biodiversity.
The target of the project is to use Bayesian Neural Network in order to predict the quantity (in milliliters) of oxygen per liter of seawater which is a very crucial quality indicator of the marine ecosystem conditions. Based on the features Water temperature in degrees Celsius and Salinity (Practical Salinity Scale 1978), I try to predict the target variable. (Supervised Regression Task)
MSE: approximately 10.56 MAE: Approximately 2
Hence, the model still needs to learn the underlying patterns behind the data with more training epochs or modification in the model hyperparameters, etc.
The loss function used for the model evaluation is a combination of BKL Loss and Cross Entropy Loss. For evaluating how the model performs on the test data I use two diferent metrics: MSE or Mean Squared Error & MAE or Mean Absolute Error.