# MIPP **Repository Path**: eric3495/MIPP ## Basic Information - **Project Name**: MIPP - **Description**: MIPP(Multiple Instruction Per Packet)是一个高性能的数据包处理框架,旨在提高网络处理速度和网络应用的效率。该项目基于C++编程语言开发,它利用了现代CPU的SIMD(单指令多数据)指令集,通过并行处理技术来加速数据包的处理。 - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2025-12-12 - **Last Updated**: 2025-12-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # MyIntrinsics++ (MIPP) [![pipeline status](https://gitlab.com/aff3ct/MIPP/badges/master/pipeline.svg)](https://gitlab.com/aff3ct/MIPP/pipelines) [![coverage report](https://gitlab.com/aff3ct/MIPP/badges/master/coverage.svg)](https://aff3ct.gitlab.io/MIPP/) ![](mipp.jpg) ## Purpose MIPP is a portable and Open-source wrapper (MIT license) for vector intrinsic functions (SIMD) written in C++11. It works for SSE, AVX, AVX-512, ARM NEON and SVE (work in progress) instructions. MIPP wrapper supports simple/double precision floating-point numbers and also signed/unsigned integer arithmetic (64-bit, 32-bit, 16-bit and 8-bit). With the MIPP wrapper you do not need to write a specific intrinsic code anymore. Just use provided functions and the wrapper will automatically generates the right intrisic calls for your specific architecture. If you are interested by ARM SVE development status, [please follow this link](#arm-sve). ## Short Documentation ### Supported Compilers At this time, MIPP has been tested on the following compilers: - Intel: `icpc` >= `16`, - GNU: `g++` >= `4.8`, - Clang: `clang++` >= `3.6`, - Microsoft: `msvc` >= `14`. On `msvc` `14.10` (Microsoft Visual Studio 2017), the performances are reduced compared to the other compilers, the compiler is not able to fully inline all the MIPP methods. This has been fixed on `msvc` `14.21` (Microsoft Visual Studio 2019) and now you can expect high performances. ### Install and Configure your Code You don't have to install MIPP because it is a simple C++ header file. The headers are located in the `include` folder (note that this location has changed since commit `6795891`, before they were located in the `src` folder). Just include the header into your source files when the wrapper is needed. ```cpp #include "mipp.h" ``` mipp.h use a C++ `namespace`: `mipp`, if you do not want to prefix all the MIPP calls by `mipp::` you can do that: ```cpp #include "mipp.h" using namespace mipp; ``` Before trying to compile, think to tell the compiler what kind of vector instructions you want to use. For instance, if you are using GNU compiler (`g++`) you simply have to add the `-march=native` option for SSE and AVX CPUs compatible. For ARMv7 CPUs with NEON instructions you have to add the `-mfpu=neon` option (since most of current NEONv1 instructions are not IEEE-754 compliant). However, this is no more the case on ARMv8 processors, so the `-march=native` option will work too. MIPP also uses some nice features provided by the C++11 and so we have to add the `-std=c++11` flag to compile the code. You are now ready to run your code with the MIPP wrapper. In the case where MIPP is installed on the system it can be integrated into a cmake projet in a standard way. Example ```sh # install MIPP cd MIPP/ export MIPP_ROOT=$PWD/build/install cmake -B build -DCMAKE_INSTALL_PREFIX=$MIPP_ROOT cmake --build build -j5 cmake --install build ``` In your `CMakeLists.txt`: ```cmake # find the installation of MIPP on the system find_package(MIPP REQUIRED) # define your executable add_executable(gemm gemm.cpp) # link your executable to MIPP target_link_libraries(gemm PRIVATE MIPP::mipp) ``` ```sh cd your_project/ # if MIPP is installed in a system standard path: MIPP will be found automatically with cmake cmake -B build # if MIPP is installed in a non-standard path: use CMAKE_PREFIX_PATH cmake -B build -DCMAKE_PREFIX_PATH=$MIPP_ROOT ``` #### Generate Sources & Compile the Static Library MIPP is mainly a header only library. However, some macro operations require to compile a small library. This is particularly true for the `compress` operation that relies on generated LUTs stored in the static library. To generate the source files containing these LUTs you need to install Python3 with the Jinja2 package: ```bash sudo apt install python3 python3-pip pip3 install --user -r codegen/requirements.txt ``` Then you can call the generator as follow: ```bash python3 codegen/gen_compress.py ``` And, finally you can compile the MIPP static library: ```bash cmake -B build -DMIPP_STATIC_LIB=ON cmake --build build -j4 ``` Note that **the compilation of the static library is optional**. You can choose to do not compile the static library then only some macro operations will be missing. ### Sequential Mode By default, MIPP tries to recognize the instruction set from the preprocessor definitions. If MIPP can't match the instruction set (for instance when MIPP does not support the targeted instruction set), MIPP falls back on standard sequential instructions. In this mode, the vectorization is not guarantee anymore but the compiler can still perform auto-vectorization. It is possible to force MIPP to use the sequential mode with the following compiler definition: `-DMIPP_NO_INTRINSICS`. Sometime it can be useful for debugging or to bench a code. If you want to check the MIPP mode configuration, you can print the following global variable: `mipp::InstructionFullType` (`std::string`). ### Vector Register Declaration Just use the `mipp::Reg` type. ```cpp mipp::Reg r1, r2, r3; // we have declared 3 vector registers ``` But we do not know the number of elements per register here. This number of elements can be obtained by calling the `mipp::N()` function (`T` is a template parameter, it can be `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t` or `uint8_t` type). ```cpp for (int i = 0; i < n; i += mipp::N()) { // ... } ``` The register size directly depends on the precision of the data we are working on. ### Register `load` and `store` Instructions Loading memory from a vector into a register: ```cpp int n = mipp::N() * 10; std::vector myVector(n); int i = 0; mipp::Reg r1; r1.load(&myVector[i*mipp::N()]); ``` The last two lines can be shorten as follow where the `load` call becomes implicit: ```cpp mipp::Reg r1 = &myVector[i*mipp::N()]; ``` Store can be done with the `store(...)` method: ```cpp int n = mipp::N() * 10; std::vector myVector(n); int i = 0; mipp::Reg r1 = &myVector[i*mipp::N()]; // do something with r1 r1.store(&myVector[(i+1)*mipp::N()]); ``` By default the loads and stores work on **unaligned memory**. It is possible to control this behavior with the `-DMIPP_ALIGNED_LOADS` definition: when specified, the loads and stores work on **aligned memory** by default. In the **aligned memory** mode, it is still possible to perform unaligned memory operations with the `mipp::loadu` and `mipp::storeu` functions. However, it is not possible to perform aligned loads and stores in the **unaligned memory** mode. To allocate aligned data you can use the MIPP aligned memory allocator wrapped into the `mipp::vector` class. `mipp::vector` is fully retro-compatible with the standard `std::vector` class and it can be use everywhere you can use `std::vector`. ```cpp mipp::vector myVector(n); ``` ### Register Initialization You can initialize a vector register from a scalar value: ```cpp mipp::Reg r1; // r1 = | unknown | unknown | unknown | unknown | r1 = 1.0; // r1 = | +1.0 | +1.0 | +1.0 | +1.0 | ``` Or from an initializer list (`std::initializer_list`): ```cpp mipp::Reg r1; // r1 = | unknown | unknown | unknown | unknown | r1 = {1.0, 2.0, 3.0, 4.0}; // r1 = | +1.0 | +2.0 | +3.0 | +4.0 | ``` ### Computational Instructions **Add** two vector registers: ```cpp mipp::Reg r1, r2, r3; r1 = 1.0; // r1 = | +1.0 | +1.0 | +1.0 | +1.0 | r2 = 2.0; // r2 = | +2.0 | +2.0 | +2.0 | +2.0 | r3 = r1 + r2; // r3 = | +3.0 | +3.0 | +3.0 | +3.0 | ``` **Subtract** two vector registers: ```cpp mipp::Reg r1, r2, r3; r1 = 1.0; // r1 = | +1.0 | +1.0 | +1.0 | +1.0 | r2 = 2.0; // r2 = | +2.0 | +2.0 | +2.0 | +2.0 | r3 = r1 - r2; // r3 = | -1.0 | -1.0 | -1.0 | -1.0 | ``` **Multiply** two vector registers: ```cpp mipp::Reg r1, r2, r3; r1 = 1.0; // r1 = | +1.0 | +1.0 | +1.0 | +1.0 | r2 = 2.0; // r2 = | +2.0 | +2.0 | +2.0 | +2.0 | r3 = r1 * r2; // r3 = | +2.0 | +2.0 | +2.0 | +2.0 | ``` **Divide** two vector registers: ```cpp mipp::Reg r1, r2, r3; r1 = 1.0; // r1 = | +1.0 | +1.0 | +1.0 | +1.0 | r2 = 2.0; // r2 = | +2.0 | +2.0 | +2.0 | +2.0 | r3 = r1 / r2; // r3 = | +0.5 | +0.5 | +0.5 | +0.5 | ``` **Fused multiply and add** of three vector registers: ```cpp mipp::Reg r1, r2, r3, r4; r1 = 2.0; // r1 = | +2.0 | +2.0 | +2.0 | +2.0 | r2 = 3.0; // r2 = | +3.0 | +3.0 | +3.0 | +3.0 | r3 = 1.0; // r3 = | +1.0 | +1.0 | +1.0 | +1.0 | // r4 = (r1 * r2) + r3 r4 = mipp::fmadd(r1, r2, r3); // r4 = | +7.0 | +7.0 | +7.0 | +7.0 | ``` **Fused negative multiply and add** of three vector registers: ```cpp mipp::Reg r1, r2, r3, r4; r1 = 2.0; // r1 = | +2.0 | +2.0 | +2.0 | +2.0 | r2 = 3.0; // r2 = | +3.0 | +3.0 | +3.0 | +3.0 | r3 = 1.0; // r3 = | +1.0 | +1.0 | +1.0 | +1.0 | // r4 = -(r1 * r2) + r3 r4 = mipp::fnmadd(r1, r2, r3); // r4 = | -5.0 | -5.0 | -5.0 | -5.0 | ``` **Square root** of a vector register: ```cpp mipp::Reg r1, r2; r1 = 9.0; // r1 = | +9.0 | +9.0 | +9.0 | +9.0 | r2 = mipp::sqrt(r1); // r2 = | +3.0 | +3.0 | +3.0 | +3.0 | ``` **Reciprocal square root** of a vector register (be careful: this intrinsic exists only for simple precision floating-point numbers): ```cpp mipp::Reg r1, r2; r1 = 9.0; // r1 = | +9.0 | +9.0 | +9.0 | +9.0 | r2 = mipp::rsqrt(r1); // r2 = | +0.3 | +0.3 | +0.3 | +0.3 | ``` ### Selections Select the **minimum** between two vector registers: ```cpp mipp::Reg r1, r2, r3; r1 = 2.0; // r1 = | +2.0 | +2.0 | +2.0 | +2.0 | r2 = 3.0; // r2 = | +3.0 | +3.0 | +3.0 | +3.0 | r3 = mipp::min(r1, r2); // r3 = | +2.0 | +2.0 | +2.0 | +2.0 | ``` Select the **maximum** between two vector registers: ```cpp mipp::Reg r1, r2, r3; r1 = 2.0; // r1 = | +2.0 | +2.0 | +2.0 | +2.0 | r2 = 3.0; // r2 = | +3.0 | +3.0 | +3.0 | +3.0 | r3 = mipp::max(r1, r2); // r3 = | +3.0 | +3.0 | +3.0 | +3.0 | ``` ### Permutations The `rrot(...)` method allows you to perform a **right rotation** (a cyclic permutation) of the elements inside the register: ```cpp mipp::Reg r1, r2; r1 = {3.0, 2.0, 1.0, 0.0} // r1 = | +3.0 | +2.0 | +1.0 | +0.0 | r2 = mipp::rrot(r1); // r2 = | +0.0 | +3.0 | +2.0 | +1.0 | r1 = mipp::rrot(r2); // r1 = | +1.0 | +0.0 | +3.0 | +2.0 | r2 = mipp::rrot(r1); // r2 = | +2.0 | +1.0 | +0.0 | +3.0 | r1 = mipp::rrot(r2); // r1 = | +3.0 | +2.0 | +1.0 | +0.0 | ``` Of course there are many more available instructions in the MIPP wrapper and you can find these instructions at the [end of this page](#list-of-mipp-functions). ### Addition of Two Vectors ```cpp #include // rand() #include "mipp.h" int main() { // data allocation const int n = 32000; // size of the vA, vB, vC vectors mipp::vector vA(n); // in mipp::vector vB(n); // in mipp::vector vC(n); // out // data initialization for (int i = 0; i < n; i++) vA[i] = rand() % 10; for (int i = 0; i < n; i++) vB[i] = rand() % 10; // declare 3 vector registers mipp::Reg rA, rB, rC; // compute rC with the MIPP vectorized functions for (int i = 0; i < n; i += mipp::N()) { rA.load(&vA[i]); // unaligned load by default (use the -DMIPP_ALIGNED_LOADS rB.load(&vB[i]); // macro definition to force aligned loads and stores). rC = rA + rB; rC.store(&vC[i]); } return 0; } ``` ### Vectorizing an Existing Code #### Scalar Code ```cpp // ... for (int i = 0; i < n; i++) { out[i] = 0.75f * in1[i] * std::exp(in2[i]); } // ... ``` #### Vectorized Code ```cpp // ... // compute the vectorized loop size which is a multiple of 'mipp::N()'. auto vecLoopSize = (n / mipp::N()) * mipp::N(); mipp::Reg rout, rin1, rin2; for (int i = 0; i < vecLoopSize; i += mipp::N()) { rin1.load(&in1[i]); // unaligned load by default (use the -DMIPP_ALIGNED_LOADS rin2.load(&in2[i]); // macro definition to force aligned loads and stores). // the '0.75f' constant will be broadcast in a vector but it has to be at // the right of a 'mipp::Reg', this is why it has been moved at the right // of the 'rin1' register. Notice that 'std::exp' has been replaced by // 'mipp::exp'. rout = rin1 * 0.75f * mipp::exp(rin2); rout.store(&out[i]); } // scalar tail loop: compute the remaining elements that can't be vectorized. for (int i = vecLoopSize; i < n; i++) { out[i] = 0.75f * in1[i] * std::exp(in2[i]); } // ... ``` ### Masked Instructions MIPP comes with two generic and templatized masked functions (`mask` and `maskz`). Those functions allow you to benefit from the AVX-512 and SVE masked instructions. `mask` and `maskz` functions are retro compatible with older instruction sets. ```cpp mipp::Reg< float > ZMM1 = { 40, -30, 60, 80}; mipp::Reg< float > ZMM2 = 0.1; // broadcast mipp::Msk()> k1 = {false, true, false, false}; // ZMM3 = k1 ? ZMM1 * ZMM2 : ZMM1; auto ZMM3 = mipp::mask(k1, ZMM1, ZMM1, ZMM2); std::cout << ZMM3 << std::endl; // output: "[40, -3, 60, 80]" // ZMM4 = k1 ? ZMM1 * ZMM2 : 0; auto ZMM4 = mipp::maskz(k1, ZMM1, ZMM2); std::cout << ZMM4 << std::endl; // output: "[0, -3, 0, 0]" ``` ## List of MIPP Functions This section presents an exhaustive list of all the available functions in MIPP. Of course the MIPP wrapper does not cover all the possible intrinsics of each instruction set but it tries to give you the most important and useful ones. In the following tables, `T`, `T1` and `T2` stand for data types (`double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t` or `uint8_t`). `N` stands for the number or elements in a mask or in a register. `N` is a strictly positive integer and can easily be deduced from the data type: `constexpr int N = mipp::N()`. When `T` and `N` are mixed in a prototype, `N` has to satisfy the previous constraint (`N = mipp::N()`). In the documentation there are some terms that requires to be clarified: - **register element**: a SIMD register is composed by multiple scalar elements, those elements are built-in data types (`double`, `float`, `int64_t`, ...), - **register lane**: modern instruction sets can have multiple implicit sub parts in an entire SIMD register, those sub parts are called lanes (SSE has one lane of 128 bits, AVX has two lanes of 128 bits, AVX-512 has four lanes of 128 bits). ### Memory Operations | **Short name** | **Prototype** | **Documentation** | **Supported types** | | :--- | :--- | :--- | :--- | | `load` | `Reg load (const T* mem)` | Loads aligned data from `mem` to a register. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `loadu` | `Reg loadu (const T* mem)` | Loads unaligned data from `mem` to a register. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `store` | `void store (T* mem, const Reg r)` | Stores the `r` register in the `mem` aligned data. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `storeu` | `void storeu (T* mem, const Reg r)` | Stores the `r` register in the `mem` unaligned data. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `maskzld` | `Reg maskzld (const Msk m, const T* mem)` | Loads elements according to the mask `m` (puts zero when the mask value is false). | `double`, `float`, `int64_t`, `int32_t`, `int16_t`, `int8_t` | | `maskzlds` | `Reg maskzlds (const Msk m, const T* mem)` | Loads elements according to the mask `m` (puts zero when the mask value is false). Safe version, only reads masked elements in memory. | `double`, `float`, `int64_t`, `int32_t`, `int16_t`, `int8_t` | | `maskst` | `void maskst (const Msk m, T* mem, const Reg r)` | Stores elements from the `r` register according to the mask `m` in the `mem` memory. | `double`, `float`, `int64_t`, `int32_t`, `int16_t`, `int8_t` | | `masksts` | `void masksts (const Msk m, T* mem, const Reg r)` | Stores elements from the `r` register according to the mask `m` in the `mem` memory. Safe version, only writes masked elements in memory. | `double`, `float`, `int64_t`, `int32_t`, `int16_t`, `int8_t` | | `gather` | `Reg gather (const TD* mem, const Reg idx)` | Gathers elements from `mem` to a register. Selects elements according to the indices in `idx`. | `double`, `float`, `int64_t`, `int32_t`, `int16_t`, `int8_t` | | `scatter` | `void scatter (TD* mem, const Reg idx, const Reg r)` | Scatters elements into `mem` from the `r` register. Writes elements at the `idx` indices in `mem`. | `double`, `float`, `int64_t`, `int32_t`, `int16_t`, `int8_t` | | `maskzgat` | `Reg gather (const Msk m, const TD* mem, const Reg idx)` | Gathers elements from `mem` to a register (according to the mask `m`). Selects elements according to the indices in `idx` (puts zero when the mask value is false). | `double`, `float`, `int64_t`, `int32_t`, `int16_t`, `int8_t` | | `masksca` | `void scatter (const Msk m, TD* mem, const Reg idx, const Reg r)` | Scatters elements into `mem` from the `r` register (according to the mask `m`). Writes elements at the `idx` indices in `mem`. | `double`, `float`, `int64_t`, `int32_t`, `int16_t`, `int8_t` | | `set` | `Reg set (const T[N] vals)` | Sets a register from the values in `vals`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `set` | `Msk set (const bool[N] bits)` | Sets a mask from the bits in `bits`. | | | `set1` | `Reg set1 (const T val)` | Broadcasts `val` in a register. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `set1` | `Msk set1 (const bool bit)` | Broadcasts `bit` in a mask. | | | `set0` | `Reg set0 ()` | Initializes a register to zero. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `set0` | `Msk set0 ()` | Initializes a mask to false. | | | `get` | `T get (const Reg r, const size_t index)` | Gets a specific element from the register `r` at the `index` position. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `get` | `T get (const Reg_2 r, const size_t index)` | Gets a specific element from the register `r` at the `index` position. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `get` | `bool get (const Msk m, const size_t index)` | Gets a specific element from the register `m` at the `index` position. | | | `getfirst` | `T getfirst (const Reg r)` | Gets the first element from the register `r`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `getfirst` | `T getfirst (const Reg_2 r)` | Gets the first element from the register `r`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `getfirst` | `bool getfirst (const Msk m)` | Gets the first element from the register `m`. | | | `low` | `Reg_2 low (const Reg r)` | Gets the low part of the `r` register. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `high` | `Reg_2 high (const Reg r)` | Gets the high part of the `r` register. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `combine` | `Reg combine (const Reg_2 r1, const Reg_2 r2)` | Combine two half registers in a full register, `r1` will be the low part and `r2` the high part. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `combine` | `Reg combine (const Reg r1, const Reg r2)` | `S` elements of `r1` are shifted to the left, `(S - N) + N` elements of `r2` are shifted to the right. Shifted `r1` and `r2` are combined to give the result. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `compress` | `Reg compress (const Reg r1, const Msk m)` | Pack the elements of `r1` at the beginning of the register according to the bitmask `m` (if the bit is 1 then element is picked, otherwise it is not). | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `cmask` | `Reg cmask (const uint32_t[N ] ids)` | Creates a cmask from an indexes list (indexes have to be between 0 and N-1). | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `cmask2` | `Reg cmask2 (const uint32_t[N/2] ids)` | Creates a cmask2 from an indexes list (indexes have to be between 0 and (N/2)-1). | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `cmask4` | `Reg cmask4 (const uint32_t[N/4] ids)` | Creates a cmask4 from an indexes list (indexes have to be between 0 and (N/4)-1). | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `shuff` | `Reg shuff (const Reg r, const Reg cm)` | Shuffles the elements of `r` according to the cmask `cm`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `shuff2` | `Reg shuff2 (const Reg r, const Reg cm2)` | Shuffles the elements of `r` according to the cmask2 `cm2` (same shuffle is applied in both lanes). | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `shuff4` | `Reg shuff4 (const Reg r, const Reg cm4)` | Shuffles the elements of `r` according to the cmask4 `cm4` (same shuffle is applied in the four lanes). | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `interleave` | `Regx2 interleave (const Reg r1, const Reg r2)` | Interleaves `r1` and `r2` : `[r1_1, r2_1, r1_2, r2_2, ..., r1_n, r2_n]`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `deinterleave` | `Regx2 deinterleave (const Reg r1, const Reg r2)` | Reverts the previous defined interleave operation. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `interleave2` | `Regx2 interleave2 (const Reg r1, const Reg r2)` | Interleaves `r1` and `r2` considering two lanes. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `interleave4` | `Regx2 interleave4 (const Reg r1, const Reg r2)` | Interleaves `r1` and `r2` considering four lanes. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `interleavelo` | `Reg interleavelo (const Reg r1, const Reg r2)` | Interleaves the low part of `r1` with the low part of `r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `interleavelo2` | `Reg interleavelo2 (const Reg r1, const Reg r2)` | Interleaves the low part of `r1` with the low part of `r2` (considering two lanes). | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `interleavelo4` | `Reg interleavelo4 (const Reg r1, const Reg r2)` | Interleaves the low part of `r1` with the low part of `r2` (considering four lanes). | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `interleavehi` | `Reg interleavehi (const Reg r1, const Reg r2)` | Interleaves the high part of `r1` with the high part of `r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `interleavehi2` | `Reg interleavehi2 (const Reg r1, const Reg r2)` | Interleaves the high part of `r1` with the high part of `r2` (considering two lanes). | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `interleavehi4` | `Reg interleavehi4 (const Reg r1, const Reg r2)` | Interleaves the high part of `r1` with the high part of `r2` (considering four lanes). | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `lrot` | `Reg lrot (const Reg r)` | Rotates the `r` register from the left (cyclic permutation). | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `rrot` | `Reg rrot (const Reg r)` | Rotates the `r` register from the right (cyclic permutation). | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `blend` | `Reg blend (const Reg r1, const Reg r2, const Msk m)` | Combines `r1` and `r2` register following the `m` mask values (`m_i ? r1_i : r2_i`). | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `select` | `Reg select (const Msk m, const Reg r1, const Reg r2)` | Alias for the previous `blend` function. Parameters order is a little bit different. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | ### Bitwise Operations The `pipe` keyword stands for the "|" binary operator. | **Short name** | **Operator** | **Prototype** | **Documentation** | **Supported types** | | :--- | :--- | :--- | :--- | :--- | | `andb` | `&` and `&=` | `Reg andb (const Reg r1, const Reg r2)` | Computes the bitwise AND: `r1 & r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `andb` | `&` and `&=` | `Msk andb (const Msk m1, const Msk m2)` | Computes the bitwise AND: `m1 & m2`. | | | `andnb` | | `Reg andnb (const Reg r1, const Reg r1)` | Computes the bitwise AND NOT: `(~r1) & r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `andnb` | | `Msk andnb (const Msk m1, const Msk m2)` | Computes the bitwise AND NOT: `(~m1) & m2`. | | | `orb` | `pipe` and `pipe=` | `Reg orb (const Reg r1, const Reg r2)` | Computes the bitwise OR: `r1 pipe r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `orb` | `pipe` and `pipe=` | `Msk orb (const Msk m1, const Msk m2)` | Computes the bitwise OR: `m1 pipe m2`. | | | `xorb` | `^` and `^=` | `Reg xorb (const Reg r1, const Reg r2)` | Computes the bitwise XOR: `r1 ^ r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `xorb` | `^` and `^=` | `Msk xorb (const Msk m1, const Msk m2)` | Computes the bitwise XOR: `m1 ^ m2`. | | | `lshift` | `<<` and `<<=` | `Reg lshift (const Reg r, const uint32_t n)` | Computes the bitwise LEFT SHIFT: `r << n`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `lshiftr` | `<<` and `<<=` | `Reg lshiftr (const Reg r1, const Reg r2)` | Computes the bitwise LEFT SHIFT: `r1 << r2`. | `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `lshift` | `<<` and `<<=` | `Msk lshift (const Msk m, const uint32_t n)` | Computes the bitwise LEFT SHIFT: `m << n`. | | | `rshift` | `>>` and `>>=` | `Reg rshift (const Reg r, const uint32_t n)` | Computes the bitwise RIGHT SHIFT: `r >> n`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `rshiftr` | `>>` and `>>=` | `Reg rshiftr (const Reg r1, const Reg r2)` | Computes the bitwise RIGHT SHIFT: `r1 >> r2`. | `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `rshift` | `>>` and `>>=` | `Msk rshift (const Msk m, const uint32_t n)` | Computes the bitwise RIGHT SHIFT: `m >> n`. | | | `notb` | `~` | `Reg notb (const Reg r)` | Computes the bitwise NOT: `~r`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `notb` | `~` | `Msk notb (const Msk m)` | Computes the bitwise NOT: `~m`. | | ### Logical Comparisons | **Short name** | **Operator** | **Prototype** | **Documentation** | **Supported types** | | :--- | :--- | :--- | :--- | :--- | | `cmpeq` | `==` | `Msk cmpeq (const Reg r1, const Reg r2)` | Compares if equal to: `r1 == r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `cmpneq` | `!=` | `Msk cmpneq (const Reg r1, const Reg r2)` | Compares if not equal to: `r1 != r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `cmpge` | `>=` | `Msk cmpge (const Reg r1, const Reg r2)` | Compares if greater or equal to: `r1 >= r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `cmpgt` | `>` | `Msk cmpgt (const Reg r1, const Reg r2)` | Compares if strictly greater than: `r1 > r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `cmple` | `<=` | `Msk cmple (const Reg r1, const Reg r2)` | Compares if lower or equal to: `r1 <= r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `cmplt` | `<` | `Msk cmplt (const Reg r1, const Reg r2)` | Compares if strictly lower than: `r1 < r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | ### Conversions and Packing | **Short name** | **Prototype** | **Documentation** | **Supported types** | | :--- | :--- | :--- | :--- | | `toReg` | `Reg toReg (const Msk m)` | Converts the mask `m` into a register of type `T`, the number of elements `N` has to be the same for the mask and the register. If the mask is `false` then all the bits of the corresponding element are set to 0, otherwise if the mask is `true` then all the bits are set to 1 (be careful, for float datatypes `true` is interpreted as NaN!). | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `cvt` | `Reg cvt (const Reg r)` | Converts the elements of `r` into an other representation (the new representation and the original one have to have the same size). | `float -> int32_t`, `float -> uint32_t`, `int32_t -> float`, `uint32_t -> float`, `double -> int64_t`, `double -> uint64_t`, `int64_t -> double`, `uint64_t -> double` | | `cvt` | `Reg cvt (const Reg_2 r)` | Converts elements of `r` into bigger elements (in bits). | `int8_t -> int16_t`, `uint8_t -> uint16_t`, `int16_t -> int32_t`, `uint16_t -> uint32_t`, `int32_t -> int64_t`, `uint32_t -> uint64_t` | | `pack` | `Reg pack (const Reg r1, const Reg r2)` | Packs elements of `r1` and `r2` into smaller elements (some information can be lost in the conversion). | `int32_t -> int16_t`, `uint32_t -> uint16_t`, `int16_t -> int8_t`, `uint16_t -> uint8_t` | ### Arithmetic Operations | **Short name** | **Operator** | **Prototype** | **Documentation** | **Supported types** | | :--- | :--- | :--- | :--- | :--- | | `add` | `+` and `+=` | `Reg add (const Reg r1, const Reg r2)` | Performs the arithmetic addition: `r1 + r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `sub` | `-` and `-=` | `Reg sub (const Reg r1, const Reg r2)` | Performs the arithmetic subtraction: `r1 - r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `mul` | `*` and `*=` | `Reg mul (const Reg r1, const Reg r2)` | Performs the arithmetic multiplication: `r1 * r2`. | `double`, `float`, `int32_t`, `int16_t`, `int8_t` | | `div` | `/` and `/=` | `Reg div (const Reg r1, const Reg r2)` | Performs the arithmetic division: `r1 / r2`. | `double`, `float` | | `fmadd` | | `Reg fmadd (const Reg r1, const Reg r2, const Reg r3)` | Performs the fused multiplication and addition: `r1 * r2 + r3`. | `double`, `float` | | `fnmadd` | | `Reg fnmadd (const Reg r1, const Reg r2, const Reg r3)` | Performs the negative fused multiplication and addition: `-(r1 * r2) + r3`. | `double`, `float` | | `fmsub` | | `Reg fmsub (const Reg r1, const Reg r2, const Reg r3)` | Performs the fused multiplication and subtraction: `r1 * r2 - r3`. | `double`, `float` | | `fnmsub` | | `Reg fnmsub (const Reg r1, const Reg r2, const Reg r3)` | Performs the negative fused multiplication and subtraction: `-(r1 * r2) - r3`. | `double`, `float` | | `min` | | `Reg min (const Reg r1, const Reg r2)` | Selects the minimum: `r1_i < r2_i ? r1_i : r2_i`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `max` | | `Reg max (const Reg r1, const Reg r2)` | Selects the maximum: `r1_i > r2_i ? r1_i : r2_i`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `div2` | | `Reg div2 (const Reg r)` | Performs the arithmetic division by two: `r / 2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `div4` | | `Reg div4 (const Reg r)` | Performs the arithmetic division by four: `r / 4`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `abs` | | `Reg abs (const Reg r)` | Computes the absolute value of `r`. | `double`, `float`, `int64_t`, `int32_t`, `int16_t`, `int8_t` | | `sqrt` | | `Reg sqrt (const Reg r)` | Computes the square root of `r`. | `double`, `float` | | `rsqrt` | | `Reg rsqrt (const Reg r)` | Computes the reciprocal square root of `r`: `1 / sqrt(r)`. | `double`, `float` | | `sat` | | `Reg sat (const Reg r, const T minv, const T maxv)` | Saturates the register values: `max(min(r, minv), maxv)`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `neg` | | `Reg neg (const Reg r, const Msk m)` | Negates the register elements following the mask values: `m_i ? -r_i : r_i`. | `double`, `float`, `int64_t`, `int32_t`, `int16_t`, `int8_t` | | `neg` | | `Reg neg (const Reg r1, const Reg r2)` | Negates the register elements following the last register values: `r2_i < 0 ? -r1_i : r1_i`. | `double`, `float`, `int64_t`, `int32_t`, `int16_t`, `int8_t` | | `sign` | | `Msk sign (const Reg r)` | Returns the sign: `r < 0`. | `double`, `float`, `int64_t`, `int32_t`, `int16_t`, `int8_t` | | `round` | | `Reg round (const Reg r)` | Rounds the register values: `fractional_part(r) >= 0.5 ? integral_part(r) + 1 : integral_part(r)`. | `double`, `float` | | `trunc` | | `Reg trunc (const Reg r)` | Truncates the register values: `integral_part(r) `. | `double`, `float` | ### Arithmetic Operations on Complex Numbers The complex operations are exclusively performed on `Regx2` objects (one `Regx2` object contains two `Reg` hardware registers). Each `Regx2` object contains `mipp::N()` complex number. If we declare a `Regx2 cmplx` object, the `cmplx[0]` register will contain the real part of the complex numbers and `cmplx[1]` will contain the imaginary part. Depending on how you stored your complex numbers in memory you can need to use reordering before calling a complex operation. For instance, if you choose to store the complex numbers in a mixed format like this: `r0, i0, r1, i1, r2, i2, ..., rn, in` you will need to call the `mipp::deinterleave` operation before and the `mipp::interleave` operation after the complex operation. | **Short name** | **Operator** | **Prototype** | **Documentation** | **Supported types** | | :--- | :--- | :--- | :--- | :--- | | `cadd` | `+` and `+=` | `Regx2 cadd (const Regx2 r1, const Regx2 r2)` | Performs the complex addition: `r1 + r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `csub` | `-` and `-=` | `Regx2 csub (const Regx2 r1, const Regx2 r2)` | Performs the complex subtraction: `r1 - r2`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `cmul` | `*` and `*=` | `Regx2 cmul (const Regx2 r1, const Regx2 r2)` | Performs the complex multiplication: `r1 * r2`. | `double`, `float`, `int32_t`, `int16_t`, `int8_t` | | `cdiv` | `/` and `/=` | `Regx2 cdiv (const Regx2 r1, const Regx2 r2)` | Performs the complex division: `r1 / r2`. | `double`, `float` | | `cmulconj` | | `Regx2 cmulconj (const Regx2 r1, const Regx2 r2)` | Performs the complex multiplication with conjugate: `r1 * conj(r2)`. | `double`, `float`, `int32_t`, `int16_t`, `int8_t` | | `conj` | | `Regx2 cmulconj (const Regx2 r)` | Computes the conjugate: `conj(r)`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `norm` | | `Reg norm (const Regx2 r)` | Computes the squared magnitude: `norm(r)`. | `double`, `float`, `int32_t`, `int16_t`, `int8_t` | ### Reductions (Horizontal Functions) | **Short name** | **Prototype** | **Documentation** | **Supported types** | | :--- | :--- | :--- | :--- | | `hadd` or `sum` | `T hadd (const Reg r)` | Sums all the elements in the register `r`: `r_1 + r_2 + ... + r_n`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `hmul` | `T hmul (const Reg r)` | Multiplies all the elements in the register `r` : `r_1 * r_2 * ... * r_n`. | `double`, `float`, `int64_t`, `int32_t`, `int16_t`, `int8_t` | | `hmin` | `T hmin (const Reg r)` | Selects the minimum element in the register `r` : `min(min(min(..., r_1), r_2), r_n)`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `hmax` | `T hmax (const Reg r)` | Selects the maximum element in the register `r` : `max(max(max(..., r_1), r_2), r_n)`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `testz` | `bool testz (const Reg r1, const Reg r2)` | Mainly tests if all the elements of the registers are zeros: `r = (r1 & r2); !(r_1 OR r_2 OR ... OR r_n)`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `testz` | `bool testz (const Msk m1, const Msk m2)` | Mainly tests if all the elements of the masks are zeros: `m = (m1 & m2); !(m_1 OR m_2 OR ... OR m_n)`. | | | `testz` | `bool testz (const Reg r)` | Tests if all the elements of the register are zeros: `!(r_1 OR r_2 OR ... OR r_n)`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | | `testz` | `bool testz (const Msk m)` | Tests if all the elements of the mask are zeros: `!(m_1 OR m_2 OR ... OR m_n)`. | | | `Reduction` | `T Reduction::sapply (const Reg r)` | Generic reduction operation, can take a user defined operator `OP` and will performs the reduction with it on `r`. | `double`, `float`, `int64_t`, `uint64_t`, `int32_t`, `uint32_t`, `int16_t`, `uint16_t`, `int8_t`, `uint8_t` | ### Math Functions | **Short name** | **Prototype** | **Documentation** | **Supported types** | | :--- | :--- | :--- | :--- | | `exp` | `Reg exp (const Reg r)` | Computes the exponential of `r`. | `double` (only on `icpc`), `float` | | `log` | `Reg log (const Reg r)` | Computes the logarithm of `r`. | `double` (only on `icpc`), `float` | | `sin` | `Reg sin (const Reg r)` | Computes the sines of `r`. | `double` (only on `icpc`), `float` | | `cos` | `Reg cos (const Reg r)` | Computes the cosines of `r`. | `double` (only on `icpc`), `float` | | `tan` | `Reg tan (const Reg r)` | Computes the tangent of `r`. | `double` (only on `icpc`), `float` | | `sincos` | `void sincos (const Reg r, Reg& s, Reg& c)` | Computes at once the sines (in `s`) and the cosines (in `c`) of `r`. | `double` (only on `icpc`), `float` | | `sincos` | `Regx2 sincos (const Reg r)` | Computes and returns at once the sines and the cosines of `r`. | `double` (only on `icpc`), `float` | | `cossin` | `Regx2 cossin (const Reg r)` | Computes and returns at once the cosines and the sines of `r`. | `double` (only on `icpc`), `float` | | `sinh` | `Reg sinh (const Reg r)` | Computes the hyperbolic sines of `r`. | `double` (only on `icpc`), `float` | | `cosh` | `Reg cosh (const Reg r)` | Computes the hyperbolic cosines of `r`. | `double` (only on `icpc`), `float` | | `tanh` | `Reg tanh (const Reg r)` | Computes the hyperbolic tangent of `r`. | `double` (only on `icpc`), `float` | | `asinh` | `Reg asinh (const Reg r)` | Computes the inverse hyperbolic sines of `r`. | `double` (only on `icpc`), `float` | | `acosh` | `Reg acosh (const Reg r)` | Computes the inverse hyperbolic cosines of `r`. | `double` (only on `icpc`), `float` | | `atanh` | `Reg atanh (const Reg r)` | Computes the inverse hyperbolic tangent of `r`. | `double` (only on `icpc`), `float` | ## ARM SVE ### SVE Length Specific An ARM SVE version is under construction. This version uses *SVE length specific* which is more appropriated to the MIPP architecture. This way, the size of the *MIPP registers* is defined at the compilation time. As a reminder, the vector length can vary from a minimum of 128 bits up to a maximum of 2048 bits, at 128-bit increments. On GNU and Clang compilers, it is specified at the compilation time with the `-msve-vector-bits=` flag. ### Supported MIPP Operations - **Memory operations:** `load`, `store`, `blend`, `set`, `set1`, `gather`, `scatter`, `maskzld`, `maskst`, `maskzgat`, `masksca` - **Logical comparisons:** `cmpeq`, `cmneq` - **Bitwise operations:** `andb`, `notb` (msk) - **Arithmetic operations:** `fmadd`, `add`, `sub`, `mul`, `div` - **Reductions:** `testz` (msk), `Reduce` *Byte* and *word* operations are not yet implemented. ## How to cite MIPP We recommend you to cite the following article: - Adrien Cassagne, Olivier Aumage, Denis Barthou, Camille Leroux and Christophe Jégo, [**MIPP: a Portable C++ SIMD Wrapper and its use for Error Correction Coding in 5G Standard**](https://doi.org/10.1145/3178433.3178435), *The 5th International Workshop on Programming Models for SIMD/Vector Processing (WPMVP 2018), February 2018.*