AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
PyTorch Extension Library of Optimized Scatter Operations
TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)
Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU)...
最近更新: 4天前TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)
最近更新: 4天前The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
最近更新: 4天前Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
最近更新: 4天前A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
最近更新: 4天前Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
最近更新: 4天前GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify
最近更新: 4天前Data manipulation and transformation for audio signal processing, powered by PyTorch
最近更新: 4天前