LowCardinality(T)
Description
LowCardinality
is an optimized column encoding type in ClickHouse, specifically designed to reduce storage space for columns with many duplicate values and accelerate query efficiency. It is particularly suitable for "tags", "status codes", "types" and other enumeration/categorical fields.
LowCardinality(String)
essentially "dictionarizes" string columns, converting large amounts of duplicate values into integer indices, thereby saving storage and improving performance.
LowCardinality(data_type)
Principle
Original data column: ["GET", "POST", "GET", "GET", "PUT"]
Dictionary = ["GET", "POST", "PUT"]
Becomes indices: [ 0 , 1 , 0 , 0 , 2 ]
- Uses a dictionary to store all unique values (deduplicated original values)
- Uses integer IDs to represent which dictionary item each data value corresponds to
- This array is smaller than storing original values directly (especially for long strings)
- When querying with filters and grouping, operations can be performed on integer values first, then restored to actual values
- Very suitable for operations like
GROUP BY
,WHERE col = 'xxx'
Example
CREATE TABLE access_logs (
method LowCardinality(String),
status_code LowCardinality(UInt16),
path String
) ENGINE = MergeTree
In this design, method
and status_code
are fields with limited discrete values. Using LowCardinality
will significantly reduce space and improve aggregation performance. However, path
is not suitable for this optimization due to its high variability (such as containing parameters and diverse paths).