Skip to content

ijoel/predictive-modeling-forecasting-western-monarch-count

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Predictive Modeling for Forecasting Western Monarch Count

Overview

This notebook provides an analysis of Western Monarch Butterfly overwintering data using a modeling approach that combines machine learning techniques with time series forecasting to understand population dynamics and predict future trends.

The Western Monarch Count is a citizen science program coordinated by the Xerces Society for Invertebrate Conservation. This annual census represents the most comprehensive standardized monitoring effort for Western Monarch populations and serves as the primary data source for conservation decision-making.

Species Importance

Monarchs are important pollinators that support plant and animal life in western North America. Their annual journey represents one of nature's most fascinating navigational feats. But, unfortunately their numbers have dropped dramatically:

  • In the 1980s: Millions of butterflies
  • Today: Around 30,000 (a 95% decline)
  • Status: Listed as "Endangered" since 2022

Analysis Goal:

The goal is to understand how census totals change over time and use predictive modeling to forecast population counts and inform conservation strategies.

Dataset Characteristics

Strengths:

  • Nearly 3 decades of continuous monitoring (1997-2025)
  • Multiple geographic regions spanning diverse habitats
  • Rich feature set including temporal, spatial, and environmental variables
  • Site-level granularity enabling detailed spatial analysis

Data Quality Observations:

  • Missingness patterns: Site coverage varies across years, with some sites having sporadic monitoring
  • Temporal gaps: Not all sites monitored annually; coverage varies by region and year
  • Implications: Missing data reflects real-world constraints in volunteer-based ecological monitoring

Feature Engineering

  • Temporal features: Year, month, time since first observation
  • Lagged features: Historical values as predictors
  • Aggregated features: Max, min, mean of past observations
  • Derived features: Ratios, differences, log transformations

Analytical Methods

Several methods were used to analyze the data:

  • Random Forest Regressor: Best-performing model for site granular population predictions
  • TabPFN (Tabular Probabilistic Neural Network): A neural network architecture designed for tabular data
  • XGBoost: Alternative gradient boosting approach for comparison
  • ARIMA Time Series: Monthly population forecasting with seasonal decomposition
  • Feature Engineering: Selected features from temporal, spatial, and environmental variables

Summary of the Western Monarch Butterfly Analysis

The analysis leveraged machine learning and time series forecasting to study 28+ years of Western Monarch butterfly population data (1997–2025), aiming to predict trends and guide conservation.

Key Findings:

  • Best Model: Random Forest achieved an R² of 0.96, explaining 96% of population variation and demonstrating high accuracy.

  • Key Predictors: Historical site totals, seasonal stability, and temporal trends were most influential in predicting current populations.

  • Data Challenges: Zero-inflation, missing data, uneven geographic coverage were noted. However, does not compromise model accuracy and utility.

  • Forecast Outlook: A hybrid ARIMA model projected a slight upward trend through 2030, though wide confidence intervals highlight variability and uncertainty. Conservation interventions remain urgently needed.

Conclusion:

The Random Forest model provides a robust tool for forecasting and prioritizing conservation efforts.

Current Model Limitations:

  • Lack of detailed weather and microclimate data limits model precision for environmental factors.
  • Complex ecological interactions may require more advanced modeling techniques to address non-linear responses.

Future Enhancements:

  • Integrate models with real-time weather and remote sensing data to improve accuracy and responsiveness.
  • Review data collection practicies and consider a mobile application with an intuitive interface to reduce sparsity and improve feature completeness.

Conservation Recommendations:

The analysis underscores the utility of data science in conservation, but success depends on continued efforts by researchers, volunteers, and policymakers to protect the species.

  • Prioritize high-capacity sites for habitat restoration, targeting locations with historical maximum butterfly counts to support population recovery.
  • Increase monitoring frequency at sites with high seasonal variability to detect threats early.
  • Deploy the Random Forest model as an early warning system that detects if counts fall significantly below predictions.

Acknowledgements:

Sincere thank you to the Xerces Society for providing access to the Western Monarch Count database, which was essential to this study. Their dataset enabled critical insights into population trends and conservation needs, supporting robust analysis and informed decision-making.

Notebook Source:

All analysis and modeling steps are documented in predictive_modeling_for_forecasting_western_monarch_count.ipynb

About

Predictive Modeling for Forecasting Western Monarch Count

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published