# PaddleFormers **Repository Path**: paddlepaddle/PaddleFormers ## Basic Information - **Project Name**: PaddleFormers - **Description**: No description available - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: develop - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-09-15 - **Last Updated**: 2025-12-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

------------------------------------------------------------------------------------------

News | Highlights | Installation | Quickstart | Community

**PaddleFormers** is a Transformer model library built on the [PaddlePaddle](https://www.paddlepaddle.org.cn) deep learning framework, delivering both **ease of use** and **high-performance capabilities**. It provides a unified model definition interface, modular training components, and comprehensive distributed training strategies specifically designed for large language model development pipelines. This enables developers to train large models efficiently with minimal complexity, making it suitable for diverse scenarios ranging from academic research to industrial applications. ## News [2025/06/28] 🎉 **PaddleFormers 0.1** is officially released! This initial version supports SFT/DPO training paradigms, configurable distributed training via unified Trainer API, and integrates PEFT, MergeKit, and Quantization APIs for diverse LLM applications. ## Highlights ### ⚙️ Simplified Distributed Training Implements 4D parallel strategies through unified Trainer API, lowering the barrier to distributed LLM training. ### 🛠 Efficient Post-Training Integrates Packing dataflow and [FlashMask](https://arxiv.org/abs/2410.01359) operators for SFT/DPO training, eliminating padding waste and boosting throughput. ### 💾 Industrial Storage Solution Features **Unified Checkpoint** storage tools for LLMs, enabling training resumption and dynamic resource scaling. Additionally implements asynchronous storage (up to 95% faster) and Optimizer State Quantization (78% storage reduction), ensuring industrial training meets both efficiency and stability requirements. ## Installation Requires Python 3.8+ and [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick) 3.1+. ```bash # Install via pip # cuda12.6 pip install paddleformers --extra-index-url https://www.paddlepaddle.org.cn/packages/nightly/cu126/ # cuda12.9 pip install paddleformers --extra-index-url https://www.paddlepaddle.org.cn/packages/nightly/cu129/ # cuda13.0 pip install paddleformers --extra-index-url https://www.paddlepaddle.org.cn/packages/nightly/cu130/ # Install development version git clone https://github.com/PaddlePaddle/PaddleFormers.git cd PaddleFormers # cuda12.6 pip install -e . --extra-index-url https://www.paddlepaddle.org.cn/packages/nightly/cu126/ # cuda12.9 pip install -e . --extra-index-url https://www.paddlepaddle.org.cn/packages/nightly/cu129/ # cuda13.0 pip install -e . --extra-index-url https://www.paddlepaddle.org.cn/packages/nightly/cu130/ ``` ## Quickstart ### Text Generation This example shows how to load Qwen model for text generation with PaddleFormers `Auto API`: ```python from paddleformers.transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B-Base") model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B-Base", dtype="bfloat16", convert_from_hf=True).eval() input_features = tokenizer("Give me a short introduction to large language model.", return_tensors="pd") outputs = model.generate(**input_features, max_new_tokens=128) print(tokenizer.batch_decode(outputs[0], skip_special_tokens=True)) ``` ### SFT Training Getting started with supervised fine-tuning (SFT) using PaddleFormers: ```python from paddleformers.trl import SFTConfig, SFTTrainer from datasets import load_dataset dataset = load_dataset("ZHUI/alpaca_demo", split="train") training_args = SFTConfig(output_dir="Qwen/Qwen3-0.6B-SFT", device="gpu", model_init_kwargs={"convert_from_hf": True}) trainer = SFTTrainer( args=training_args, model="Qwen/Qwen3-0.6B-Base", train_dataset=dataset, ) trainer.train() ``` ## Community We welcome all contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. ## License This repository's source code is available under the [Apache 2.0 License](LICENSE).