# AI-VarDA **Repository Path**: AI4EarthLab/AI-VarDA ## Basic Information - **Project Name**: AI-VarDA - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-11 - **Last Updated**: 2025-11-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # AI-VarDA: A Unified Variational Data Assimilation Framework for AI and Traditional Methods This framework builds on our recent works, [FengWu-4DVar](https://openreview.net/forum?id=Y2WorV5ag6) and [VAE-Var](https://openreview.net/forum?id=utz99dx2RN), aiming to bridge traditional data assimilation techniques with modern deep generative modeling. **AI-VarDA** is a modular and extensible Python-based framework that supports both classic variational methods (e.g., 3DVar, 4DVar, GEN_BE) and AI-driven approaches (e.g., VAE-Var, LoRA-EnVar). It offers a unified interface for integrating forecast models, background error representations, and observation operators — enabling flexible configuration, comparative experimentation, and reproducible research. ## 🌟 Key Features - ✅ **Hybrid support** for AI-based and traditional DA algorithms - 🔌 **Pluggable architecture**: swap decoders, models, and observation operators - 📈 **Integrated support** for ensemble-based flow error propagation - 🧠 **Learnable background error models** via VAE or GEN_BE-style approaches - 💡 Designed for **weather forecasting**, but extensible to other geophysical applications ## 🗂️ Project Structure Overview ``` assimilation_framework/ ├── assimilation_core/ # Core data assimilation logic │ ├── assimilation_models/ # All DA algorithm implementations (3DVar, 4DVar, VAE-Var, etc.) │ ├── bg_decoder_models/ # Control-to-physical variable decoders (linear, VAE, GEN_BE) │ ├── data_reader/ # Physical field and observation data loaders │ ├── flow_models/ # Forecast model for 4DVar flow dependendies │ ├── observation_operator/ # Observation operators (identity, interpolation, etc.) │ ├── ensemble_generator.py # Ensemble background propagation │ ├── forecast_model.py # Forecast model runner interface │ ├── init_states_constructor.py # Initial ensemble state generator │ ├── evaluator.py # Evaluation: error computation, diagnostics │ └── runner.py # Main assimilation process controller ├── bgerr_learning/ # Static background error training modules │ ├── dataset/ # ERA5-based datasets and normalization │ ├── learning_algorithms/ # GEN_BE and VAE-BE learners │ ├── utils/ # Dataset builder utilities │ └── runner.py # Training entry point ├── networks/ # Deep learning architectures │ ├── bg_vae/ # Background VAE model │ ├── fengwu_hr/ # FengWu high-resolution UNet + attention variants │ ├── fengwu_lr/ # Transformer-based FengWu low-resolution model ├── utils/ # Logging, metrics, helper functions │ ├── logger.py │ ├── metrics.py │ └── misc.py scripts/ # Python entry scripts ├── run_assimilation_loop.py # Online assimilation cycle └── learn_background_error.py # Offline background error model training bin/ # Shell launch scripts (e.g., SLURM) ├── run_assimilation_loop.sh ├── run_genbe_learn_error.sh └── run_vaebe_learn_error.sh config/ # YAML-based experiment and model configs ├── assimilation_loop/ # Configs for DA cycle └── bgerr_learning/ # Configs for GEN_BE and VAE training checkpoints/ # Pretrained models and intermediate outputs ├── bgerr_models/ # Trained background error models (GEN_BE, VAE) ├── forecast_models/ # FengWu model checkpoints ├── flow_error/ # Learned ensemble perturbation statistics └── observation_masks/ # Observation availability masks experiments/ # Output directory for experiment logs and results ├── assimilation/ └── learning_background/ README.md ``` ## 🚀 Quick Start This framework supports both **static background error learning** and **online data assimilation** via configurable YAML files. We provide three runnable shell scripts in `bin/` for common workflows: ### 1. Online Data Assimilation Run the end-to-end data assimilation loop (e.g., 3DVar, VAE-4DVar, LoRA-EnVar): ``` bash bin/run_assimilation_loop.sh ``` This script launches a sequence of predefined experiments defined in: - `config/assimilation_loop/exp_*.yaml` The results and logs will be saved under: ``` experiments/assimilation// ``` ------ ### 2. Static Background Error Learning #### GEN_BE-style learning: ``` bash bin/run_genbe_learn_error.sh ``` #### VAE-based learning: ``` bash bin/run_vaebe_learn_error.sh ``` Both use configurations in: - `config/bgerr_learning/` And save output to: ``` experiments/learning_background// ``` ### 3. Customization You can create or modify your own YAML config files to define: - Assimilation algorithm (`gaussian_var`, `vae_var`, `lora_envar`, etc.) - Forecast model and resolution - Observation scenarios (random, gridded, interpolated) - Background error representation (GEN_BE, VAE) - Ensemble settings, time window size, training parameters, etc. > 📁 See `config/assimilation_loop/` and `config/bgerr_learning/` for examples. Then, update the corresponding shell script or launch manually: ``` python scripts/run_assimilation_loop.py --config your_config.yaml --prefix your_run_name ``` ## 📄 Related Publications This framework implements and extends the following research works: - **FengWu-4DVar**: Xiao et al., *FengWu-4DVar: Coupling the Data-Driven Weather Forecasting Model with 4D Variational Assimilation*. [[conference link](https://openreview.net/forum?id=Y2WorV5ag6)] [[arxiv link](https://arxiv.org/abs/2312.12455)] > Introduces a differentiable AI-forecast-driven 4DVar system capable of stable 1-year cycling - **VAE-Var**: Xiao et al., *VAE-Var: Variational-Autoencoder-Enhanced Variational Assimilation in Meteorology*. [[Conference link](https://openreview.net/forum?id=utz99dx2RN)] > Proposes a novel method to model background error distributions using deep generative models, enabling assimilation of off-grid observations. ## 📌 Citation If you use this framework or build upon FengWu-4DVar or VAE-Var, please cite: ``` @article{xiao2023fengwu, title={Fengwu-4dvar: Coupling the data-driven weather forecasting model with 4d variational assimilation}, author={Xiao, Yi and Bai, Lei and Xue, Wei and Chen, Kang and Han, Tao and Ouyang, Wanli}, journal={arXiv preprint arXiv:2312.12455}, year={2023} } @inproceedings{xiao2024towards, title={Towards a self-contained data-driven global weather forecasting framework}, author={Xiao, Yi and Bai, Lei and Xue, Wei and Chen, Hao and Chen, Kun and Han, Tao and Ouyang, Wanli and others}, booktitle={Forty-first International Conference on Machine Learning}, year={2024} } @inproceedings{xiao2025vae, title={VAE-Var: Variational autoencoder-enhanced variational methods for data assimilation in meteorology}, author={Xiao, Yi and Jia, Qilong and Chen, Kun and Bai, Lei and Xue, Wei}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025} } ``` ## ✅ TODO We are actively developing and expanding the framework. Below is a roadmap of planned features: ### 🔜 Short-Term Goals - **Support for real-world station observations** Integration of non-gridded, irregularly located surface station data into the assimilation pipeline (via configurable observation operators). ### 🛠️ Long-Term Plans - **Implement additional traditional DA algorithms** Including hybrid EnKF-Var, weak-constraint 4DVar, and flow-dependent covariance models. - **Integrate with recent AI-based assimilation methods** Coupling with models like [DiffDA](https://arxiv.org/abs/2401.05932), [APPA](https://arxiv.org/abs/2504.18720), and other diffusion-based or neural inverse solvers. - **Assimilation of satellite and radar observations** Building generalized observation operators and preprocessing pipelines for satellite radiances and radar reflectivity. *Feel free to open an issue or pull request if you'd like to contribute to any of these features!* Maintainer: [Yi Xiao](mailto:xiaoyi200018@gmail.com)