Skip to content

LowCardinality(T)

Description

LowCardinality is an optimized column encoding type in ClickHouse, specifically designed to reduce storage space for columns with many duplicate values and accelerate query efficiency. It is particularly suitable for "tags", "status codes", "types" and other enumeration/categorical fields.

LowCardinality(String) essentially "dictionarizes" string columns, converting large amounts of duplicate values into integer indices, thereby saving storage and improving performance.

Syntax
sql
LowCardinality(data_type)

Principle

txt
Original data column: ["GET", "POST", "GET", "GET", "PUT"]
Dictionary = ["GET", "POST", "PUT"]
Becomes indices:  [  0 ,    1 ,   0 ,   0 ,   2  ]
  • Uses a dictionary to store all unique values (deduplicated original values)
  • Uses integer IDs to represent which dictionary item each data value corresponds to
  • This array is smaller than storing original values directly (especially for long strings)
  • When querying with filters and grouping, operations can be performed on integer values first, then restored to actual values
  • Very suitable for operations like GROUP BY, WHERE col = 'xxx'

Example

sql
CREATE TABLE access_logs (
    method LowCardinality(String),
    status_code LowCardinality(UInt16),
    path String
) ENGINE = MergeTree

In this design, method and status_code are fields with limited discrete values. Using LowCardinality will significantly reduce space and improve aggregation performance. However, path is not suitable for this optimization due to its high variability (such as containing parameters and diverse paths).