Skip to main content

OPTIMIZE Statement

https://clickhouse.com/docs/sql-reference/statements/optimize


Introduction

Merge Multiple Parts into Fewer or Single Parts

  • Merges multiple data parts (files) into one, reducing fragmentation and improving query performance.
  • Only affects parts within the same partition .

Apply TTL Deletes and Moves

  • During a merge, expired rows (per TTL rules) are removed.
  • TTL MOVE operations (e.g. to different disks/volumes) may also be executed.

Re-Aggregate in SummingMergeTree

  • When using SummingMergeTree, the merge will:
    • Detect rows with the same primary key
    • Recompute sums during the merge
  • This reduces multiple rows with the same key to a single aggregated row .

Trigger Deduplication (DEDUPLICATE)

  • In ReplicatedMergeTree, OPTIMIZE ... DEDUPLICATE removes duplicate rows based on primary key + version.
  • Useful when manually inserted duplicate data needs cleanup.

Specify Partition

OPTIMIZE TABLE table_name PARTITION '2025-05-30' FINAL;

DEDUPLICATE

Syntax
OPTIMIZE TABLE table DEDUPLICATE; -- all columns
OPTIMIZE TABLE table DEDUPLICATE BY *; -- excludes MATERIALIZED and ALIAS columns
OPTIMIZE TABLE table DEDUPLICATE BY colX,colY,colZ;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT (colX, colY);
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex');
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT (colX, colY);
Example Schema
CREATE TABLE default.example
(
    `primary_key` Int32,
    `secondary_key` Int32,
    `value` UInt32,
    `partition_key` UInt32,
    `materialized_value` UInt32 MATERIALIZED 12345,
    `aliased_value` UInt32 ALIAS 2
)
ENGINE = MergeTree
PARTITION BY partition_key
PRIMARY KEY primary_key
ORDER BY (primary_key, secondary_key);
Insert Data
INSERT INTO example (primary_key, secondary_key, value, partition_key)
VALUES (0, 0, 0, 0), (0, 0, 0, 0), (1, 1, 2, 2), (1, 1, 2, 3), (1, 1, 3, 3);
Table Rows
   ┌─primary_key─┬─secondary_key─┬─value─┬─partition_key─┐
1. │           1 │             1 │     2 │             2 │
2. │           0 │             0 │     0 │             0 │
3. │           0 │             0 │     0 │             0 │
4. │           1 │             1 │     2 │             3 │
5. │           1 │             1 │     3 │             3 │
   └─────────────┴───────────────┴───────┴───────────────┘

All Columns

Performs deduplication on all columns.

OPTIMIZE TABLE example FINAL DEDUPLICATE;
   ┌─primary_key─┬─secondary_key─┬─value─┬─partition_key─┐
1. │           0 │             0 │     0 │             0 │
2. │           1 │             1 │     2 │             2 │
3. │           1 │             1 │     2 │             3 │
4. │           1 │             1 │     3 │             3 │
   └─────────────┴───────────────┴───────┴───────────────┘

Exclude MATERIALIZED/ALIAS

Performs deduplication on all columns except MATERIALIZED and ALIAS.

OPTIMIZE TABLE example FINAL DEDUPLICATE BY *;
   ┌─primary_key─┬─secondary_key─┬─value─┬─partition_key─┐
1. │           0 │             0 │     0 │             0 │
2. │           1 │             1 │     2 │             3 │
3. │           1 │             1 │     3 │             3 │
4. │           1 │             1 │     2 │             2 │
   └─────────────┴───────────────┴───────┴───────────────┘

Specific Column

Performs deduplication only on specified columns.

OPTIMIZE TABLE table FINAL DEDUPLICATE BY colX,colY,colZ;

Exclude Single Column

Performs deduplication on all columns except the specified single column.

OPTIMIZE TABLE example FINAL DEDUPLICATE BY * EXCEPT value;
   ┌─primary_key─┬─secondary_key─┬─value─┬─partition_key─┐
1. │           0 │             0 │     0 │             0 │
2. │           1 │             1 │     2 │             3 │
3. │           1 │             1 │     2 │             2 │
   └─────────────┴───────────────┴───────┴───────────────┘

Exclude Multiple Columns

Performs deduplication on all columns except the specified multiple columns.

OPTIMIZE TABLE example FINAL DEDUPLICATE BY primary_key, secondary_key, partition_key;
   ┌─primary_key─┬─secondary_key─┬─value─┬─partition_key─┐
1. │           1 │             1 │     2 │             2 │
2. │           1 │             1 │     2 │             3 │
3. │           0 │             0 │     0 │             0 │
   └─────────────┴───────────────┴───────┴───────────────┘

Regex Column Matching

Uses regular expression to match column names for deduplication.

OPTIMIZE TABLE example FINAL DEDUPLICATE BY COLUMNS('.*_key');
   ┌─primary_key─┬─secondary_key─┬─value─┬─partition_key─┐
1. │           1 │             1 │     2 │             3 │
2. │           1 │             1 │     2 │             2 │
3. │           0 │             0 │     0 │             0 │
   └─────────────┴───────────────┴───────┴───────────────┘