# serving **Repository Path**: zshellzhang/serving ## Basic Information - **Project Name**: serving - **Description**: A flexible, high-performance serving system for machine learning models - **Primary Language**: C++ - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-09-06 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # TensorFlow Serving [![Ubuntu Build Status](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/ubuntu.svg)](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/ubuntu.html) ![Docker CPU Nightly Build Status](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/docker-cpu-nightly.svg) ![Docker GPU Nightly Build Status](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/docker-gpu-nightly.svg) TensorFlow Serving is an open-source software library for serving machine learning models. It deals with the *inference* aspect of machine learning, taking models after *training* and managing their lifetimes, providing clients with versioned access via a high-performance, reference-counted lookup table. Multiple models, or indeed multiple versions of the same model, can be served simultaneously. This flexibility facilitates canarying new versions, non-atomically migrating clients to new models or versions, and A/B testing experimental models. The primary use-case is high-performance production serving, but the same serving infrastructure can also be used in bulk-processing (e.g. map-reduce) jobs to pre-compute inference results or analyze model performance. In both scenarios, GPUs can substantially increase inference throughput. TensorFlow Serving comes with a scheduler that groups individual inference requests into batches for joint execution on a GPU, with configurable latency controls. TensorFlow Serving has out-of-the-box support for TensorFlow models (naturally), but at its core it manages arbitrary versioned items (*servables*) with pass-through to their native APIs. In addition to trained TensorFlow models, servables can include other assets needed for inference such as embeddings, vocabularies and feature transformation configs, or even non-TensorFlow-based machine learning models. The architecture is highly modular. You can use some parts individually (e.g. batch scheduling) or use all the parts together. There are numerous plug-in points; perhaps the most useful ways to extend the system are: (a) [creating a new type of servable](tensorflow_serving/g3doc/custom_servable.md); (b) [creating a custom source of servable versions](tensorflow_serving/g3doc/custom_source.md). **If you'd like to contribute to TensorFlow Serving, be sure to review the [contribution guidelines](CONTRIBUTING.md).** **We use [GitHub issues](https://github.com/tensorflow/serving/issues) for tracking requests and bugs.** # Download and Setup See [install instructions](tensorflow_serving/g3doc/setup.md). ## Tutorials * [Basic tutorial](tensorflow_serving/g3doc/serving_basic.md) * [Advanced tutorial](tensorflow_serving/g3doc/serving_advanced.md) ## For more information * [Serving architecture overview](tensorflow_serving/g3doc/overview.md) * [TensorFlow website](http://tensorflow.org)