diff --git a/db-operation/cluster-settings-config.md b/db-operation/cluster-settings-config.md index 729406db088b9248999fd9500765330b510e09b5..98c4bbc82c9fc19f9418aba8165471bfd4f4919e 100644 --- a/db-operation/cluster-settings-config.md +++ b/db-operation/cluster-settings-config.md @@ -311,8 +311,7 @@ KWDB 支持通过 `SET CLUSTER SETTING` 语句修改集群设置,设置后立 | `ts.count.use_statistics.enabled` | 指定 `count(*)` 函数查询时序数据时是否使用已写入的行数,以优化查询时间。默认开启,支持关闭;关闭后可能会影响 `count(*)` 的查询性能。 | `true` | bool | | `ts.compact.max_limit` | 控制单次合并操作中处理的 Last Segment 最大数量。触发合并操作时,系统会将多个 Last Segment 的数据合并到 Entity Segment。
该参数用于控制合并操作的资源消耗。减小此值可以减少单次合并的工作量,降低 CPU 和 I/O 峰值压力,但需要更多次合并操作才能处理完所有数据。增大此值可以提高单次合并效率,减少合并次数,但会增加单次操作的资源消耗和执行时间。 | `10` | int | | `ts.compression.last_segment.enabled` | 控制是否对 last segment 启用压缩。支持以下设置值:
- `true`:启用压缩。last segment 数据写入时进行压缩,可减少内存和磁盘占用,但会增加写入时的 CPU 消耗。
- `false`:禁用压缩。last segment 数据以原始格式存储,写入性能更高,但占用更多内存和磁盘空间。
该参数用于在写入性能和存储效率之间进行权衡。写入密集型场景建议设置为 `false` 以提高写入吞吐量;存储敏感型场景建议设置为 `true` 以减少资源占用。 | `true` | boolean | -| `ts.compression.level` | 控制二级压缩的级别,仅在 `ts.compression.stage=2` 时生效。支持以下设置值:
- `low`:低压缩级别,压缩速度快,压缩比较低。
- `medium`:中等压缩级别,在压缩比和性能之间取得平衡。
- `high`:高压缩级别,压缩比最高,但压缩和解压速度较慢。
**注意**:当前版本暂不支持该参数配置。 | `medium` | string | -| `ts.compression.stage` | 控制时序数据的压缩层级。支持以下设置值:
- `0`:不压缩,数据以原始格式存储。
- `1`:一级压缩,根据列类型对列进行编码。
- `2`:二级压缩,使用通用压缩算法,可进一步减少存储空间,但会增加 CPU 消耗。
该参数用于在存储空间和计算资源之间进行权衡。较高的压缩层级可显著减少磁盘占用,但会增加数据写入和读取时的 CPU 开销。 | `2` | int | +| `ts.compress.stage` | 控制时序数据的压缩层级。支持以下设置值:
- `0`:不压缩,数据以原始格式存储。
- `1`:一级压缩,根据列类型对列进行编码。
- `2`:二级压缩,使用通用压缩算法,可进一步减少存储空间,但会增加 CPU 消耗。
该参数用于在存储空间和计算资源之间进行权衡。较高的压缩层级可显著减少磁盘占用,但会增加数据写入和读取时的 CPU 开销。 | `2` | int | | `ts.dedup.rule` | 数据去重策略。支持设置为以下参数:
- `override`:整行去重,后写入的数据覆盖已写入的具有相同时间戳的数据。
- `merge`:对相同时间戳的数据进行去重和整合。同一时间戳的数据多次写入时,后写入的非 NULL 列值会覆盖先前写入的对应列值,最终整合为一行记录。该模式适用于相同时间戳不同字段数据分批写入的场景。
- `discard`:忽略新写入的重复数据,保留已写入的数据。当重复数据写入失败后,客户端会收到成功插入数据的条数,并以 Notice 的形式展示未成功插入数据的条数。
- `keep`:允许重复数据写入且不去重。目前只支持单节点部署时设置。| `override` | string | | `ts.last_cache_size.max_limit` | 设置时序数据 `last_row` 读缓存功能的内存限制,即每个 vgroup 为每个设备的最新数据分配的缓存内存大小。
取值范围为 [0, 1,073,741,824] 字节,默认值为 1 GB (1,073,741,824 字节)。
设置为 `0` 时关闭读缓存功能;缓存占用内存超过设置值时,系统将自动淘汰部分缓存数据。开启再关闭读缓存功能后,系统会在第一次写入数据时淘汰所有已缓存数据,后续写入也不再缓存。
该参数适用于写入设备数量适中且需要频繁查询最新数据的场景。开启读缓存功能后,`last` 和 `last_row` 查询可直接从内存读取数据,显著提升查询响应速度。
**注意**:在设备数量巨大的场景下使用该功能可能影响数据写入性能,适当增大参数值可降低缓存淘汰频率,减少对写入性能的影响。建议根据实际业务需求动态调整该参数,在查询性能和写入性能之间取得平衡。 | `1073741824` | byte size | | `ts.mem_segment_size.max_limit` | 控制单个 VGroup 中 Mem Segment 的最大大小。Mem Segment 是数据写入内存的缓冲区,当其大小达到此限制时,会触发数据持久化操作,将内存数据写入磁盘的 Last Segment。
该参数用于平衡内存占用和刷盘频率,减小此值可以加快数据持久化频率,降低内存占用,但会增加磁盘 I/O 次数;增大此值可以减少持久化频率,降低磁盘 I/O 开销,但会增加内存占用和潜在的数据丢失风险(故障时未持久化的数据量更大)。 | `536870912` | byte size | diff --git a/en/db-operation/cluster-settings-config.md b/en/db-operation/cluster-settings-config.md index e6e4c666eaeea108091f861fffc4d1f726269a41..71384d84683edfd2ce60b079d7af62b993939a6d 100644 --- a/en/db-operation/cluster-settings-config.md +++ b/en/db-operation/cluster-settings-config.md @@ -308,6 +308,8 @@ The table below lists all cluster parameters supported by KWDB along with their | `ts.block.lru_cache.max_limit` | Sets the maximum memory size for the node's time-series block LRU (Least Recently Used) cache, in bytes. This cache optimizes time-series data query performance by caching hot data blocks to reduce disk I/O operations. When the cache reaches the maximum limit, the least recently used data blocks will be evicted according to the LRU policy.
Default is `1073741824` (1GB). Setting to `0` disables the block cache.
It is recommended to adjust based on the node's actual available memory. Increasing the value can improve query performance, but excessively large values may lead to out-of-memory (OOM) errors. | `1073741824` | int | | `ts.count.use_statistics.enabled` | Enables query optimization for `count(*)` operations on time-series data by using the count of previously written rows. Enabled by default. Disabling this option may reduce `count(*)` query performance. | `true` | bool | | `ts.compact.max_limit` | Controls the maximum number of last segments processed in a single compaction operation. When a compaction operation is triggered, the system merges data from multiple last segments into entity segments.
This parameter controls the resource consumption of compaction operations. Reducing this value can decrease the workload of a single compaction, reducing CPU and I/O peak pressure, but requires more compaction operations to process all data. Increasing this value can improve single compaction efficiency and reduce the number of compactions, but will increase the resource consumption and execution time of a single operation. | `10` | int | +| `ts.compression.last_segment.enabled` | Controls whether compression is enabled for the last segment. Supports the following values:
- `true`: Enable compression. Last segment data is compressed when written, which reduces memory and disk usage but increases CPU consumption during write operations.
- `false`: Disable compression. Last segment data is stored in raw format, which provides higher write performance but consumes more memory and disk space.
This parameter trades off between write performance and storage efficiency. In write-intensive scenarios, set this to `false` to improve write throughput; in storage-sensitive scenarios, set this to `true` to reduce resource consumption. | `true` | boolean | +| `ts.compress.stage` | Controls the compression stage for time-series data. Supports the following values:
- `0`: No compression, data is stored in raw format.
- `1`: Primary compression, compresses columns using algorithms suited to their data types.
- `2`: Secondary compression, applies additional compression algorithms to further reduce storage space but increases CPU consumption.
This parameter trades off between storage space and computational resources. Higher compression stages can significantly reduce disk usage but will increase CPU overhead for both read and write operations. | `2` | int | | `ts.dedup.rule` | Data deduplication strategy. Supports the following parameters:
- `override`: Full row deduplication; later written data overwrites existing data with the same timestamp.
- `merge`: Deduplicate and consolidate data with the same timestamp. When data with the same timestamp is written multiple times, later written non-NULL column values overwrite the previously written corresponding column values, ultimately consolidating into a single row. This mode is suitable for scenarios where different fields with the same timestamp are written in batches.
- `discard`: Ignore newly written duplicate data and keep existing data. When duplicate data write fails, the client receives the count of successfully and unsuccessfully inserted data as a notice.
- `keep`: Allow duplicate data to be written without deduplication. Currently only supported in single-node deployment. | `override` | string | | `ts.last_cache_size.max_limit` | Sets the memory limit for the last_row read cache of time-series data—that is, the cache memory allocated by each vgroup for storing the most recent data of each device.
The valid range is [0, 1,073,741,824] bytes, with a default of 1 GB (1,073,741,824 bytes).
Setting this to `0` disables the last_row read cache feature. When cache memory usage exceeds the configured limit, the system automatically evicts cached data. If you disable the read cache after enabling it, the system will evict all cached data on the first write operation, and subsequent writes will no longer be cached.
This parameter is best suited for scenarios with a moderate number of devices and frequent queries for the latest data. When the read cache is enabled, `last` and `last_row` queries can read data directly from memory, greatly reducing query response time.
**Note**: Using this feature in scenarios with a large number of devices may impact write performance. Increasing this parameter value can reduce cache eviction frequency and minimize the impact on write performance. It is recommended to adjust this parameter dynamically based on your actual workload to balance query performance and write performance. | `1073741824` | byte size | | `ts.mem_segment_size.max_limit` | Controls the maximum size of a mem segment within a single VGroup. The mem segment is the buffer for data written to memory. When its size reaches this limit, it triggers a data persistence operation that writes the in-memory data to the last segment on disk.
This parameter balances memory usage and flush frequency. Decreasing this value accelerates data persistence and reduces memory usage, but increases disk I/O operations. Increasing this value reduces persistence frequency and disk I/O overhead, but increases memory usage and potential data loss risk (more unwritten data at risk during failures). | `536870912` | byte size | diff --git a/en/sql-reference/other-sql-statements/show-distribution-sql.md b/en/sql-reference/other-sql-statements/show-distribution-sql.md new file mode 100644 index 0000000000000000000000000000000000000000..581acd5c3094dd446551247405bf0734493984b4 --- /dev/null +++ b/en/sql-reference/other-sql-statements/show-distribution-sql.md @@ -0,0 +1,66 @@ +--- +title: Storage and Compression +id: show-distribution-sql +--- + +# Storage and Compression + +## Viewing Storage and Compression Information + +The `SHOW DISTRIBUTION` statement displays the storage space and compression ratio of a specified time-series database or time-series table. + +::: warning Note +The compression ratio is a rough calculation and may differ from the actual compression performance. When calculating the compression ratio, the uncompressed data length is calculated based on the column width specified when creating the table. Therefore, for variable-length columns (such as `VARCHAR` type), if the defined column width is much larger than the actual written data length (for example, defining `VARCHAR(1000)` but only writing a few characters), the compression ratio will be overestimated. +::: + +### Privileges + +None + +### Syntax + +![](../../../static/sql-reference/show-distribution.png) + +### Parameters + +| Parameter | Description | +| --- | --- | +| `db_name` | The name of the time-series database to view. | +| `table_name` | The name of the time-series table to view. Supports specifying tables in other time-series databases using the `db_name.table_name` format. When no database is specified, it refers to tables in the current database. | + +### Return Fields + +**Return fields when viewing a database:** + +| Field | Description | +| --- | --- | +| `node` | Node identifier. | +| `blocks_num` | Number of data blocks. | +| `blocks_size` | Disk space used. | +| `avg_size` | Average data block size. | +| `compression_ratio` | Roughly calculated compression ratio. | + +**Return fields when viewing a table:** + +| Field | Description | +| --- | --- | +| `node_id` | Node identifier. | +| `level` | Compression level, including `last segment` (latest segment), `entity segment` (entity segment), and `total` (total). | +| `blocks_num` | Number of data blocks. | +| `blocks_size` | Disk space used. | +| `avg_size` | Average data block size. | +| `compression_ratio` | Roughly calculated compression ratio. | + +### Examples + +- View storage and compression information for a specified database. + + ```sql + SHOW DISTRIBUTION FROM DATABASE iot; + ``` + +- View storage and compression information for a specified table. + + ```sql + SHOW DISTRIBUTION FROM TABLE sensors; + ``` \ No newline at end of file diff --git a/en/sql-reference/overview.md b/en/sql-reference/overview.md index abbb9fb084b42360dd97b9ef064e2218f2333da1..1fa600a4e414d02aa5b7346319cbce3d4a82a0f7 100644 --- a/en/sql-reference/overview.md +++ b/en/sql-reference/overview.md @@ -81,6 +81,7 @@ This section describes data types, functions, operators, and SQL statements supp - [Cursors](./other-sql-statements/cursor-sql.md) - [Delimiter](./other-sql-statements/delimiter-sql.md) - [Stream](./other-sql-statements/stream-sql.md) + - [Storage and Compression](./other-sql-statements/show-distribution-sql.md) - [Vacuum](./other-sql-statements/vacuum.md) - System Views - [kwdb_internal](./system-view/kwdb_internal.md) diff --git a/sql-reference/other-sql-statements/show-distribution-sql.md b/sql-reference/other-sql-statements/show-distribution-sql.md index e9a8a65f3fbe74c90a365de5b626be11ed35a293..524338ca5f2f3d1722398fa290324840f37242f1 100644 --- a/sql-reference/other-sql-statements/show-distribution-sql.md +++ b/sql-reference/other-sql-statements/show-distribution-sql.md @@ -45,7 +45,7 @@ id: show-distribution-sql | 字段 | 说明 | | --- | --- | | `node_id` | 节点标识。| -| `level` | 压缩层级,包括 `last segment`(最新段)、`entity segment`(实体段)和 `total`(总计)。| +| `level` | 压缩层级,包括 `last segment` (最新段)、`entity segment` (实体段)和 `total` (总计)。| | `blocks_num` | 数据块个数。| | `blocks_size` | 磁盘占用空间大小。| | `avg_size` | 平均数据块占用空间大小。|