升级ClickHouse
这篇文章是《2021年《Standby Advent Calendar》的第二天文章。
昨天是@kencom2400先生关于如何在MySQL中方便地使用变量的文章。
升级ClickHouse很容易。只需按照以下文件中的指示安装最新版本并重新启动服务器即可完成。
基本来说,向下兼容性是保持着的,一些变更和被移除的功能也会在几个版本中被标记为不建议使用并发出警告,所以只要在此期间进行适应,就没有问题。
是的,在进行定期维护的情况下通常不会出问题,但如果经过数年的时间会怎样呢?
在这里我们考虑从1.1.54343升级到21.10。我们将包括使用Docker在本地进行确认的步骤来进行检查。
经过4年的时间可能有相当多的变化。它是否能正常运行呢?
本地验证
使用Docker在本地搭建Clickhouse进行验证。由于集群验证太过冗长,所以我们将在这里进行单一服务器的验证。
然后,在升级过程中将进行以下操作进行确认。
-
- 作成したユーザがそのまま使用できること
-
- 作成したDB、テーブル、データが参照できること
- 作成したテーブルにデータが登録できること
构建旧版本
因为在Dockerhub上可以找到,所以可以轻松地使用Docker创建。
> mkdir $HOME/clickhouse_upgrade
> docker run -d --name clickhouse-upgrade-server \
--ulimit nofile=262144:262144 \
--volume=$HOME/clickhouse_upgrade:/var/lib/clickhouse \
yandex/clickhouse-server:1.1.54343
创建用户,更改服务器配置
在旧版本中,不存在CREATE USER等用户操作命令。要创建用户或更改服务器的全局设置,需要修改配置文件并重新启动服务器。
使用Docker在本地启动的服务器中进入并将默认配置文件带到本地是简单的。一旦服务器启动,进入服务器并将配置文件复制到挂载的目录中即可。这样就可以修改配置了。
> docker exec -it clickhouse-upgrade-server bash
> cp -p /etc/clickhouse-server/config.xml /var/lib/clickhouse/.
> cp -p /etc/clickhouse-server/users.xml /var/lib/clickhouse/.
> exit
在这里,只进行用户的添加。我们将重复使用默认的profiles和quotas。
-
- ID: test
PW: test
<yandex>
<profiles></profiles>
<users>
<default></default>
<test>
<password_sha256_hex>9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08</password_sha256_hex>
<networks incl="networks" replace="replace">
<ip>::/0</ip>
</networks>
<profile>default</profile>
<quota>default</quota>
</test>
</users>
<quotas></quotas>
</yandex>
为了加载配置文件,重新启动Docker。
> docker rm -f clickhouse-upgrade-server
> docker run -d --name clickhouse-upgrade-server \
--ulimit nofile=262144:262144 \
--volume=$HOME/clickhouse_upgrade:/var/lib/clickhouse \
--volume=$HOME/clickhouse_upgrade/config.xml:/etc/clickhouse-server/config.xml \
--volume=$HOME/clickhouse_upgrade/users.xml:/etc/clickhouse-server/users.xml \
yandex/clickhouse-server:1.1.54343
要以作成的用户登录到客户端,按照以下步骤进行登录。
> docker run -it --rm --link clickhouse-upgrade-server:clickhouse-server \
yandex/clickhouse-client:1.1.54343 \
--host clickhouse-server \
--user test \
--password test
因为能够创建用户,所以我们将继续注册数据。
测试数据的注册
:) CREATE DATABASE sample
:) USE sample
:) CREATE TABLE sample_table_20211201(id Int32, val String, d Date, dt DateTime)
ENGINE = MergeTree(d, (id, d), 8192)
:) CREATE TABLE sample_table_20211202(id Int32, val String, d Date, dt DateTime)
ENGINE = MergeTree(d, (id, d), 8192)
:) CREATE TABLE sample_table AS sample_table_20211201 ENGINE = Merge(sample, '^sample_table_')
:) INSERT INTO sample_table_20211201 VALUES
(1, 'sample1', '2021-12-01', '2021-12-01T00:00:00')
,(2, 'sample2', '2021-12-01', '2021-12-01T00:00:00')
,(3, 'sample3', '2021-12-01', '2021-12-01T00:00:00')
,(4, 'sample4', '2021-12-01', '2021-12-01T00:00:00')
,(5, 'sample5', '2021-12-01', '2021-12-01T00:00:00')
:) INSERT INTO sample_table_20211202 VALUES
(6, 'sample6', '2021-12-02', '2021-12-02T00:00:00')
,(7, 'sample7', '2021-12-02', '2021-12-02T00:00:00')
,(8, 'sample8', '2021-12-02', '2021-12-02T00:00:00')
,(9, 'sample9', '2021-12-02', '2021-12-02T00:00:00')
,(10, 'sample10', '2021-12-02', '2021-12-02T00:00:00')
:) SELECT * FROM sample_table
┌─id─┬─val─────┬──────────d─┬──────────────────dt─┐
│ 1 │ sample1 │ 2021-12-01 │ 2021-12-01 00:00:00 │
│ 2 │ sample2 │ 2021-12-01 │ 2021-12-01 00:00:00 │
│ 3 │ sample3 │ 2021-12-01 │ 2021-12-01 00:00:00 │
│ 4 │ sample4 │ 2021-12-01 │ 2021-12-01 00:00:00 │
│ 5 │ sample5 │ 2021-12-01 │ 2021-12-01 00:00:00 │
└────┴─────────┴────────────┴─────────────────────┘
┌─id─┬─val──────┬──────────d─┬──────────────────dt─┐
│ 6 │ sample6 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 7 │ sample7 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 8 │ sample8 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 9 │ sample9 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 10 │ sample10 │ 2021-12-02 │ 2021-12-02 00:00:00 │
└────┴──────────┴────────────┴─────────────────────┘
10 rows in set. Elapsed: 0.019 sec.
点击House升级
停止Docker,并启动21.10。
> docker rm -f clickhouse-upgrade-server
> docker run -d --name clickhouse-upgrade-server \
--ulimit nofile=262144:262144 \
--volume=$HOME/clickhouse_upgrade:/var/lib/clickhouse \
--volume=$HOME/clickhouse_upgrade/config.xml:/etc/clickhouse-server/config.xml \
--volume=$HOME/clickhouse_upgrade/users.xml:/etc/clickhouse-server/users.xml \
clickhouse/clickhouse-server:21.10
创建的用户可直接使用
确认在服务器启动后可以使用创建的用户进行连接。
> docker run -it --rm --link clickhouse-upgrade-server:clickhouse-server \
clickhouse/clickhouse-client:21.10 \
--host clickhouse-server \
--user test \
--password test
ClickHouse client version 21.10.2.15 (official build).
Connecting to clickhouse-server:9000 as user test.
Connected to ClickHouse server version 21.10.2 revision 54449.
:)
能够访问已创建的数据库、表和数据。
确认能够搜索并检索使用旧版本创建的表格。
:) USE sample
:) SELECT * FROM sample_table
┌─id─┬─val─────┬──────────d─┬──────────────────dt─┐
│ 1 │ sample1 │ 2021-12-01 │ 2021-12-01 00:00:00 │
│ 2 │ sample2 │ 2021-12-01 │ 2021-12-01 00:00:00 │
│ 3 │ sample3 │ 2021-12-01 │ 2021-12-01 00:00:00 │
│ 4 │ sample4 │ 2021-12-01 │ 2021-12-01 00:00:00 │
│ 5 │ sample5 │ 2021-12-01 │ 2021-12-01 00:00:00 │
└────┴─────────┴────────────┴─────────────────────┘
┌─id─┬─val──────┬──────────d─┬──────────────────dt─┐
│ 6 │ sample6 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 7 │ sample7 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 8 │ sample8 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 9 │ sample9 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 10 │ sample10 │ 2021-12-02 │ 2021-12-02 00:00:00 │
└────┴──────────┴────────────┴─────────────────────┘
10 rows in set. Elapsed: 0.022 sec.
能够将数据添加到创建的表格中。
我可以访问数据。
最后,我们会确认数据的注册。同时,我们还会添加一个表到合并表中,以确保新旧表可以混合。
:) INSERT INTO sample_table_20211202 VALUES
(11, 'sample11', '2021-12-02', '2021-12-02T00:00:00')
,(12, 'sample12', '2021-12-02', '2021-12-02T00:00:00')
,(13, 'sample13', '2021-12-02', '2021-12-02T00:00:00')
,(14, 'sample14', '2021-12-02', '2021-12-02T00:00:00')
,(15, 'sample15', '2021-12-02', '2021-12-02T00:00:00')
:) SELECT * FROM sample_table_20211202
┌─id─┬─val──────┬──────────d─┬──────────────────dt─┐
│ 6 │ sample6 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 7 │ sample7 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 8 │ sample8 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 9 │ sample9 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 10 │ sample10 │ 2021-12-02 │ 2021-12-02 00:00:00 │
└────┴──────────┴────────────┴─────────────────────┘
┌─id─┬─val──────┬──────────d─┬──────────────────dt─┐
│ 11 │ sample11 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 12 │ sample12 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 13 │ sample13 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 14 │ sample14 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 15 │ sample15 │ 2021-12-02 │ 2021-12-02 00:00:00 │
└────┴──────────┴────────────┴─────────────────────┘
10 rows in set. Elapsed: 0.013 sec.
我已经成功地注册了数据。接下来,我将创建一个表格。
:) CREATE TABLE sample_table_20211203(id Int32, val String, d Date, dt DateTime) ENGINE = MergeTree(d, (id, d), 8192)
:) INSERT INTO sample_table_20211203 VALUES
(16, 'sample16', '2021-12-03', '2021-12-03T00:00:00')
,(17, 'sample17', '2021-12-03', '2021-12-03T00:00:00')
,(18, 'sample18', '2021-12-03', '2021-12-03T00:00:00')
,(19, 'sample19', '2021-12-03', '2021-12-03T00:00:00')
,(20, 'sample20', '2021-12-03', '2021-12-03T00:00:00')
:) SELECT * FROM sample_table
┌─id─┬─val──────┬──────────d─┬──────────────────dt─┐
│ 11 │ sample11 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 12 │ sample12 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 13 │ sample13 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 14 │ sample14 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 15 │ sample15 │ 2021-12-02 │ 2021-12-02 00:00:00 │
└────┴──────────┴────────────┴─────────────────────┘
┌─id─┬─val──────┬──────────d─┬──────────────────dt─┐
│ 16 │ sample16 │ 2021-12-03 │ 2021-12-03 00:00:00 │
│ 17 │ sample17 │ 2021-12-03 │ 2021-12-03 00:00:00 │
│ 18 │ sample18 │ 2021-12-03 │ 2021-12-03 00:00:00 │
│ 19 │ sample19 │ 2021-12-03 │ 2021-12-03 00:00:00 │
│ 20 │ sample20 │ 2021-12-03 │ 2021-12-03 00:00:00 │
└────┴──────────┴────────────┴─────────────────────┘
┌─id─┬─val──────┬──────────d─┬──────────────────dt─┐
│ 6 │ sample6 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 7 │ sample7 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 8 │ sample8 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 9 │ sample9 │ 2021-12-02 │ 2021-12-02 00:00:00 │
│ 10 │ sample10 │ 2021-12-02 │ 2021-12-02 00:00:00 │
└────┴──────────┴────────────┴─────────────────────┘
┌─id─┬─val─────┬──────────d─┬──────────────────dt─┐
│ 1 │ sample1 │ 2021-12-01 │ 2021-12-01 00:00:00 │
│ 2 │ sample2 │ 2021-12-01 │ 2021-12-01 00:00:00 │
│ 3 │ sample3 │ 2021-12-01 │ 2021-12-01 00:00:00 │
│ 4 │ sample4 │ 2021-12-01 │ 2021-12-01 00:00:00 │
│ 5 │ sample5 │ 2021-12-01 │ 2021-12-01 00:00:00 │
└────┴─────────┴────────────┴─────────────────────┘
20 rows in set. Elapsed: 0.023 sec.
数据的添加没有问题。可以使用旧的CREATE TABLE语法进行表的创建,但是考虑到这种语法已经被弃用多年了,所以建议进行修正。
如果仅仅使用MergeTree引擎,根据我所见,从版本1.1.54343升级到21.10似乎没有任何问题。
不兼容的变更,升级注意事项
以下是从官方文档中筛选出的仅包含不兼容变更的内容列表。这是一个我个人使用的备忘录,我不想逐个检查大量的Changelog。
2023年11月2日,也就是2023年10月23日。
-
- There is no longer an option to automatically remove broken data parts. This closes #55174. #55184 (Alexey Milovidov). #55557 (Jihyuk Bok).
-
- The obsolete in-memory data parts can no longer be read from the write-ahead log. If you have configured in-memory parts before, they have to be removed before the upgrade. #55186 (Alexey Milovidov).
-
- Remove the integration with Meilisearch. Reason: it was compatible only with the old version 0.18. The recent version of Meilisearch changed the protocol and does not work anymore. Note: we would appreciate it if you help to return it back. #55189 (Alexey Milovidov).
-
- Rename directory monitor concept into background INSERT. All the settings directory_monitor had been renamed to distributed_background_insert*. Backward compatibility should be preserved (since old settings had been added as an alias). #55978 (Azat Khuzhin).
-
- Do not interpret the send_timeout set on the client side as the receive_timeout on the server side and vise-versa. #56035 (Azat Khuzhin).
-
- Comparison of time intervals with different units will throw an exception. This closes #55942. You might have occasionally rely on the previous behavior when the underlying numeric values were compared regardless of the units. #56090 (Alexey Milovidov).
- Rewrited the experimental S3Queue table engine completely: changed the way we keep information in zookeeper which allows to make less zookeeper requests, added caching of zookeeper state in cases when we know the state will not change, improved the polling from s3 process to make it less aggressive, changed the way ttl and max set for trached files is maintained, now it is a background process. Added system.s3queue and system.s3queue_log tables. Closes #54998. #54422 (Kseniia Sumarokova).
2023年9月28日,23.9
-
- Remove the status_info configuration option and dictionaries status from the default Prometheus handler. #54090 (Alexey Milovidov).
- The experimental parts metadata cache is removed from the codebase. #54215 (Alexey Milovidov).
23.8 长期支持版,有效期至 2023 年 8 月 31 日。
-
- If a dynamic disk contains a name, it should be specified as disk = disk(name = ‘disk_name’, …) in disk function arguments. In previous version it could be specified as disk = disk_(…), which is no longer supported. #52820 (Kseniia Sumarokova).
-
- clickhouse-benchmark will establish connections in parallel when invoked with –concurrency more than one. Previously it was unusable if you ran it with 1000 concurrent connections from Europe to the US. Correct calculation of QPS for connections with high latency. Backward incompatible change: the option for JSON output of clickhouse-benchmark is removed. If you’ve used this option, you can also extract data from the system.query_log in JSON format as a workaround. #53293 (Alexey Milovidov).
-
- The microseconds column is removed from the system.text_log, and the milliseconds column is removed from the system.metric_log, because they are redundant in the presence of the event_time_microseconds column. #53601 (Alexey Milovidov).
-
- Deprecate the metadata cache feature. It is experimental and we have never used it. The feature is dangerous: #51182. Remove the system.merge_tree_metadata_cache system table. The metadata cache is still available in this version but will be removed soon. This closes #39197. #51303 (Alexey Milovidov).
- Disable support for 3DES in TLS connections. #52893 (Kenji Noguchi).
2023年7月27日,23.7
-
- Add NAMED COLLECTION access type (aliases USE NAMED COLLECTION, NAMED COLLECTION USAGE). This PR is backward incompatible because this access type is disabled by default (because a parent access type NAMED COLLECTION ADMIN is disabled by default as well). Proposed in #50277. To grant use GRANT NAMED COLLECTION ON collection_name TO user or GRANT NAMED COLLECTION ON * TO user, to be able to give these grants named_collection_admin is required in config (previously it was named named_collection_control, so will remain as an alias). #50625 (Kseniia Sumarokova).
-
- Fixing a typo in the system.parts column name last_removal_attemp_time. Now it is named last_removal_attempt_time. #52104 (filimonov).
-
- Bump version of the distributed_ddl_entry_format_version to 5 by default (enables opentelemetry and initial_query_idd pass through). This will not allow to process existing entries for distributed DDL after downgrade (but note, that usually there should be no such unprocessed entries). #52128 (Azat Khuzhin).
-
- Check projection metadata the same way we check ordinary metadata. This change may prevent the server from starting in case there was a table with an invalid projection. An example is a projection that created positional columns in PK (e.g. projection p (select * order by 1, 4) which is not allowed in table PK and can cause a crash during insert/merge). Drop such projections before the update. Fixes #52353. #52361 (Nikolai Kochetov).
- The experimental feature hashid is removed due to a bug. The quality of implementation was questionable at the start, and it didn’t get through the experimental status. This closes #52406. #52449 (Alexey Milovidov).
2023年6月29日,23.6
-
- Delete feature do_not_evict_index_and_mark_files in the fs cache. This feature was only making things worse. #51253 (Kseniia Sumarokova).
-
- Remove ALTER support for experimental LIVE VIEW. #51287 (Alexey Milovidov).
-
- Decrease the default values for http_max_field_value_size and http_max_field_name_size to 128 KiB. #51163 (Mikhail f. Shiryaev).
- CGroups metrics related to CPU are replaced with one metric, CGroupMaxCPU for better usability. The Normalized CPU usage metrics will be normalized to CGroups limits instead of the total number of CPUs when they are set. This closes #50836. #50835 (Alexey Milovidov).
2023年6月8日,23点30分
-
- Compress marks and primary key by default. It significantly reduces the cold query time. Upgrade notes: the support for compressed marks and primary key has been added in version 22.9. If you turned on compressed marks or primary key or installed version 23.5 or newer, which has compressed marks or primary key on by default, you will not be able to downgrade to version 22.8 or earlier. You can also explicitly disable compressed marks or primary keys by specifying the compress_marks and compress_primary_key settings in the section of the server configuration file. Upgrade notes: If you upgrade from versions prior to 22.9, you should either upgrade all replicas at once or disable the compression before upgrade, or upgrade through an intermediate version, where the compressed marks are supported but not enabled by default, such as 23.3. #42587 (Alexey Milovidov).
-
- Make local object storage work consistently with s3 object storage, fix problem with append (closes #48465), make it configurable as independent storage. The change is backward incompatible because the cache on top of local object storage is not compatible to previous versions. #48791 (Kseniia Sumarokova).
-
- The experimental feature “in-memory data parts” is removed. The data format is still supported, but the settings are no-op, and compact or wide parts will be used instead. This closes #45409. #49429 (Alexey Milovidov).
-
- Changed default values of settings parallelize_output_from_storages and input_format_parquet_preserve_order. This allows ClickHouse to reorder rows when reading from files (e.g. CSV or Parquet), greatly improving performance in many cases. To restore the old behavior of preserving order, use parallelize_output_from_storages = 0, input_format_parquet_preserve_order = 1. #49479 (Michael Kolupaev).
-
- Make projections production-ready. Add the optimize_use_projections setting to control whether the projections will be selected for SELECT queries. The setting allow_experimental_projection_optimization is obsolete and does nothing. #49719 (Alexey Milovidov).
-
- Mark joinGet as non-deterministic (so as dictGet). It allows using them in mutations without an extra setting. #49843 (Azat Khuzhin).
-
- Revert the “groupArray returns cannot be nullable” change (due to binary compatibility breakage for groupArray/groupArrayLast/groupArraySample over Nullable types, which likely will lead to TOO_LARGE_ARRAY_SIZE or CANNOT_READ_ALL_DATA). #49971 (Azat Khuzhin).
- Setting enable_memory_bound_merging_of_aggregation_results is enabled by default. If you update from version prior to 22.12, we recommend to set this flag to false until update is finished. #50319 (Nikita Taranov).
2023年4月26日。
-
- Formatter ‘%M’ in function formatDateTime() now prints the month name instead of the minutes. This makes the behavior consistent with MySQL. The previous behavior can be restored using setting “formatdatetime_parsedatetime_m_is_month_name = 0”. #47246 (Robert Schulze).
-
- This change makes sense only if you are using the virtual filesystem cache. If path in the virtual filesystem cache configuration is not empty and is not an absolute path, then it will be put in /caches/. #48784 (Kseniia Sumarokova).
- Primary/secondary indices and sorting keys with identical expressions are now rejected. This behavior can be disabled using setting allow_suspicious_indices. #48536 (凌涛).
23.3 长期支持版,2023年3月30日
-
- Lightweight DELETEs are production ready and enabled by default. The DELETE query for MergeTree tables is now available by default.
-
- The behavior of domainRFC and netloc functions is slightly changed: relaxed the set of symbols that are allowed in the URL authority for better conformance. #46841 (Azat Khuzhin).
-
- Prohibited creating tables based on KafkaEngine with DEFAULT/EPHEMERAL/ALIAS/MATERIALIZED statements for columns. #47138 (Aleksandr Musorin).
-
- An “asynchronous connection drain” feature is removed. Related settings and metrics are removed as well. It was an internal feature, so the removal should not affect users who had never heard about that feature. #47486 (Alexander Tokmakov).
-
- Support 256-bit Decimal data type (more than 38 digits) in arraySum/Min/Max/Avg/Product, arrayCumSum/CumSumNonNegative, arrayDifference, array construction, IN operator, query parameters, groupArrayMovingSum, statistical functions, min/max/any/argMin/argMax, PostgreSQL wire protocol, MySQL table engine and function, sumMap, mapAdd, mapSubtract, arrayIntersect. Add support for big integers in arrayIntersect. Statistical aggregate functions involving moments (such as corr or various TTests) will use Float64 as their internal representation (they were using Decimal128 before this change, but it was pointless), and these functions can return nan instead of inf in case of infinite variance. Some functions were allowed on Decimal256 data types but returned Decimal128 in previous versions – now it is fixed. This closes #47569. This closes #44864. This closes #28335. #47594 (Alexey Milovidov).
-
- Make backup_threads/restore_threads server settings (instead of user settings). #47881 (Azat Khuzhin).
- Do not allow const and non-deterministic secondary indices #46839 (Anton Popov).
2023年2月23日
-
- Extend function “toDayOfWeek()” (alias: “DAYOFWEEK”) with a mode argument that encodes whether the week starts on Monday or Sunday and whether counting starts at 0 or 1. For consistency with other date time functions, the mode argument was inserted between the time and the time zone arguments. This breaks existing usage of the (previously undocumented) 2-argument syntax “toDayOfWeek(time, time_zone)”. A fix is to rewrite the function into “toDayOfWeek(time, 0, time_zone)”. #45233 (Robert Schulze).
-
- Rename setting max_query_cache_size to filesystem_cache_max_download_size. #45614 (Kseniia Sumarokova).
-
- The default user will not have permissions for access type SHOW NAMED COLLECTION by default (e.g. default user will no longer be able to grant ALL to other users as it was before, therefore this PR is backward incompatible). #46010 (Kseniia Sumarokova).
-
- If the SETTINGS clause is specified before the FORMAT clause, the settings will be applied to formatting as well. #46003 (Azat Khuzhin).
-
- Remove support for setting materialized_postgresql_allow_automatic_update (which was by default turned off). #46106 (Kseniia Sumarokova).
-
- Slightly improve performance of countDigits on realistic datasets. This closed #44518. In previous versions, countDigits(0) returned 0; now it returns 1, which is more correct, and follows the existing documentation. #46187 (Alexey Milovidov).
- Disallow creation of new columns compressed by a combination of codecs “Delta” or “DoubleDelta” followed by codecs “Gorilla” or “FPC”. This can be bypassed using setting “allow_suspicious_codecs = true”. #45652 (Robert Schulze).
2023年1月26日,這是23.1.
-
- The SYSTEM RESTART DISK query becomes a no-op. #44647 (alesapin).
-
- The PREALLOCATE option for HASHED/SPARSE_HASHED dictionaries becomes a no-op. #45388 (Azat Khuzhin). It does not give significant advantages anymore.
-
- Disallow Gorilla codec on columns of non-Float32 or non-Float64 type. #45252 (Robert Schulze). It was pointless and led to inconsistencies.
-
- Parallel quorum inserts might work incorrectly with *MergeTree tables created with the deprecated syntax. Therefore, parallel quorum inserts support is completely disabled for such tables. It does not affect tables created with a new syntax. #45430 (Alexander Tokmakov).
-
- Use the GetObjectAttributes request instead of the HeadObject request to get the size of an object in AWS S3. This change fixes handling endpoints without explicit regions after updating the AWS SDK, for example. #45288 (Vitaly Baranov). AWS S3 and Minio are tested, but keep in mind that various S3-compatible services (GCS, R2, B2) may have subtle incompatibilities. This change also may require you to adjust the ACL to allow the GetObjectAttributes request.
- Forbid paths in timezone names. For example, a timezone name like /usr/share/zoneinfo/Asia/Aden is not allowed; the IANA timezone database name like Asia/Aden should be used. #44225 (Kruglov Pavel).
修复了使用字符串参数的min,max,any*,argMin,argMax聚合函数在状态序列化和反序列化时的向后不兼容问题。这个不兼容性影响到22.9,22.10和22.11分支(分别从22.9.6、22.10.4和22.11.2版本开始修复)。部分22.3、22.7和22.8分支的次要版本也受到影响:22.3.13…22.3.14(从22.3.15版本开始修复),22.8.6…22.8.9(从22.8.10版本开始修复),22.7.6及更新版本(在22.7版本中不会修复,建议升级到22.8.10或更新版本)。这个版本说明不涉及从未使用过受影响版本的用户。不兼容版本在读取上述聚合函数状态时,会在字符串末尾附加一个额外的’\0’字符。例如,如果旧版本将anyState(‘foobar’)的状态保存到state_column中,不兼容的版本在anyMerge(state_column)中会输出’foobar\0’。同时,不兼容的版本在写入聚合函数的状态时不会在末尾添加’\0’字符。新版本(已修复该问题)可以正确地读取所有版本的数据,包括不兼容版本,但有一个特殊情况除外。如果不兼容版本保存了以null字符结尾的字符串状态,那么新版本在读取受影响的聚合函数状态时会删除结尾的’\0’字符。例如,如果不兼容版本将anyState(‘abrac\0dabra\0’)的状态保存到state_column中,新版本会在anyMerge(state_column)中输出’abrac\0dabra’。这个问题也会影响到在集群中使用不兼容版本与旧版本或新版本共同工作时的分布式查询。#43038(Alexander Tokmakov, )。注意:所有官方的ClickHouse构建版本都已包含了这些补丁。非官方的第三方构建版本不一定包含这些补丁,建议避免使用这些非官方构建版本。
2022年11月22日,2022年11月17日
JSONExtract函数族现在将尝试强制转换为请求的类型。#41502( Martins)。
2022年10月22日,2022年10月25日
重命名缓存命令:show caches -> show filesystem caches, describe cache -> describe filesystem cache。#41508(Kseniia Sumarokova)。
删除LIVE VIEW的WITH TIMEOUT部分的支持。#40557。#42173(Alexey Milovidov)。
删除客户端提示符中的{database}宏。如果数据库未指定,并且在USE语句上未进行更新,它将显示不正确。#25891。#42508(Alexey Milovidov)。
2022年9月22日,2022年9月22日
如果存在任何ReplicatedMergeTree表,则从20.3和更旧版本升级到22.9和更新版本必须通过一个中间版本完成,否则新版本的服务器将无法启动。#40641(Alexander Tokmakov)。
删除accurate_Cast和accurate_CastOrNull函数(它们与accurateCast和accurateCastOrNull函数的名称中的下划线不同,并且不受cast_keep_nullable设置的影响)。这些函数是未文档化的,未经测试的,未使用的,也是不需要的。它们似乎因为代码的泛化而生存下来的。#40682(Alexey Milovidov)。
添加一个测试以确保每个新表函数都能被文档化。请参见#40649。将表函数MeiliSearch重命名为meilisearch。#40709(Alexey Milovidov)。
添加一个测试以确保每个新函数都能被文档化。请参见#40649。lemmatize,synonyms,stem函数是因为错误而区分大小写。现在它们是大小写敏感的。#40711(Alexey Milovidov)。
使YAML配置的解释更加常规化。#41044(Vitaly Baranov)。
2022年8月22日,2022年8月18日
扩展Date32和DateTime64的范围,以支持从1900年到2299年的日期。在之前的版本中,支持的时间范围仅为1925年到2283年。该实现使用了公历(符合ISO 8601:2004(第3.2.1节公历))而不是考虑从儒略历到公历的历史转变。此更改影响了超出范围参数的具体实现行为。例如,如果在旧版本中,1899-01-01的值被截断为1925-01-01,则在新版本中它将被截断为1900-01-01。如果使用的是INTERVAL 3 QUARTER,并且包含一个季度的情况下,这将改变toStartOfInterval的舍入行为,因为间隔是从特定时间点开始计算的。关闭#28216, 改进#38393。#39425(Roman Vasin)。
现在,所有相关的字典源
修复了先前存在的一个功能问题。禁止Kafka/RabbitMQ/FileLog的直接选择。可以通过设置stream_like_engine_allow_direct_select来启用。即使被设置启用,也不允许直接选择,以防附有材料化视图。对于允许的Kafka和RabbitMQ的直接选择,默认情况下不会提交消息。要启用直接选择的提交,用户必须使用存储级别设置kafka{rabbitmq}_commit_on_select=1(默认为0)。#31053 (Kseniia Sumarokova)。
对新功能的行为做了轻微更改。在JSON_VALUE中返回非引用字符串。关闭#27965。#31008 (Kseniia Sumarokova)。
更改设置名称。为TSV/CSV输入格式添加自定义空表示支持。修复TSV/CSV/JSONCompactStringsEachRow/JSONStringsEachRow输入格式中Nullable(String)的反序列化问题。相应地将output_format_csv_null_representation和output_format_tsv_null_representation重命名为format_csv_null_representation和format_tsv_null_representation。#30497 (Kruglov Pavel)。
进一步废弃已不再使用的代码。这仅适用于ClickHouse 20.6之前的版本用户。ReplicatedMergeTree中的“领导选举”机制已被删除,因为20.6版本后支持多个领导者。如果您正在从较旧的版本升级,并且某个具有旧版本的副本是领导者,则服务器在升级后将无法启动。停止使用旧版的副本以启动新版。之后将无法降级到20.6版本之前的版本。#32140 (tavplubix)。
默认启用use_compact_format_in_distributed_parts_names(请参阅文档)。#16728(Azat Khuzhin)。
在创建使用File引擎的表时,在SETTINGS子句中接受与文件格式相关的用户设置(例如format_csv_delimiter),并在所有INSERT和SELECT操作中使用这些设置。当前用户会话中更改的文件格式设置或DML查询本身的SETTINGS子句不再影响查询。#16591(Alexander Kuzmenkov)。
v20.11.2.1, 2020-11-11
如果在distributed_ddl配置部分中指定了某个配置文件,则此配置文件可以在服务器启动时覆盖默认配置文件的设置。现在已修复,分布式DDL查询的设置不应影响全局服务器设置。#16635(tavplubix)。
限制在键(排序键,主键,分区键等)中使用不可比较的数据类型(如AggregateFunction)。#16601(alesapin)。
删除ANALYZE和AST查询,并使enable_debug_queries设置无效,因为现在它是完整功能EXPLAIN查询的一部分。#16536(Ivan)。
聚合函数boundingRatio,rankCorr,retention,timeSeriesGroupSum,timeSeriesGroupRateSum,windowFunnel被错误地设置为不区分大小写。现在,它们的名称将按设计区分大小写。只有指定在SQL标准中或为与其他DBMS兼容或类似的函数才应区分大小写。#16407(alexey-milovidov)。
使rankCorr函数在数据不足时返回nan。#16124。#16135(hexiaoting)
当从早于20.5的版本升级时
为了统一性,将HTTP头的Query-Id重命名为X-ClickHouse-Query-Id。#4972(Mikhail)2019-02-13, 19.3.3
移除了allow_experimental_low_cardinality_type设置。LowCardinality数据类型已经为生产环境准备就绪。#4323(alexey-milovidov)
根据可用内存量相应减少标记缓存大小和未压缩缓存大小。#4240(Lopatin Konstantin)
在CREATE TABLE查询中添加了关键字INDEX。具有名称index的列必须使用倒引号或双引号引用:index。#4143(Nikita Vasilev)
sumMap现在提升结果类型而不是溢出。通过使用sumMapWithOverflow函数可以获得旧的sumMap行为。#4151( Ercolanelli)
2019-01-24, 19.1.6
移除了未记录的特性ALTER MODIFY PRIMARY KEY,因为它已被ALTER MODIFY ORDER BY命令取代。#3887(Alex Zatelepin)
移除了函数shardByHash。#3833(alexey-milovidov)
禁止使用具有AggregateFunction类型结果的标量子查询。#3865(Ivan)
2018-12-14, 18.16.0
不再允许将Date类型与数字进行比较。必须使用显式类型转换=toDate(17883)来替代toDate(’2018-12-18’)= 17883。#3687
2018-10-16, 18.14.9
移除了allow_experimental_decimal_type选项。Decimal数据类型可默认使用。#3329
2018-09-16, 18.12.17
启用enable_optimize_predicate_expression选项默认值(相当乐观)。如果出现与搜索列名相关的查询分析错误,请将enable_optimize_predicate_expression设置为0。Winter Zhang
2018-09-10, 18.12.13
在带有JOIN的查询中,星号扩展为所有表中的列列表,符合SQL标准。您可以通过在用户配置级别上将asterisk_left_columns_only设置为1以恢复旧的行为。
2018-08-13, 18.10.3
不再支持Distributed表的CHECK TABLE查询。
2018-07-28, 18.4.0
Kafka引擎的参数已经从Kafka(kafka_broker_list,kafka_topic_list,kafka_group_name,kafka_format [,kafka_schema,kafka_num_consumers])更改为Kafka(kafka_broker_list,kafka_topic_list,kafka_group_name,kafka_format [,kafka_row_delimiter,kafka_schema,kafka_num_consumers])。如果您的表使用kafka_schema或kafka_num_consumers参数,则必须手动编辑元数据文件路径/元数据/数据库/表.sql,并添加带有”值的kafka_row_delimiter参数。
2018-07-23, 18.1.0
无法将包含数字零的字符串转换为DateTime。示例:SELECT toDateTime(’0’)。这也是DateTime DEFAULT ‘0’在表中不起作用的原因,以及字典中的0。解决方法:将0替换为0000-00-00 00:00:00。
2018-06-28, 1.1.54388
移除了Vertical和Pretty *格式中的转义,并删除了VerticalRaw格式。
如果同时在分布式查询中同时使用版本为1.1.54388(或更高版本)的服务器和旧版本的服务器,并且查询具有没有AS关键字的cast(x,’Type’)表达式,并且没有大写字母的cast单词,将抛出异常,并显示类似于Not found column cast(0,’UInt8’)in block的消息。解决方法:更新整个群集上的服务器。
2018-04-21, 1.1.54380
不再支持类似(a,b) IN(SELECT(a,b))的表达式(可以使用等效的(a,b) IN(SELECT a,b)表达式)。在以前的版本中,这些表达式导致不确定的WHERE过滤或引起错误。
2018-04-16, 1.1.54378
移除了当左侧指定了数组时,IN表达式的特殊解释。以前,表达式arr IN(set)被解释为“至少有一个arr元素属于集合”。要在新版本中获得相同的行为,请编写arrayExists(x-> x IN(set),arr)。
禁用了错误地在Poco库中默认启用的socket选项SO_REUSEPORT的使用。注意,在Linux上,同时为listen指定地址::和0.0.0.0不再有任何原因 – 使用::,这样可以同时监听IPv4和IPv6连接(使用默认的内核配置设置)。您也可以通过在配置中指定1来恢复到以前版本的行为。
2018-03-11, 1.1.54362
移除了distributed_ddl_allow_replicated_alter选项。此行为默认启用。
移除了strict_insert_defaults设置。如果您使用此功能,请写信到clickhouse-feedback@yandex-team.com。
移除了UnsortedMergeTree引擎。
2018-01-18, 1.1.54337
向后不兼容地更改了Log类型表中包含Nullable列的标记格式。如果您有这些表,请在启动新服务器版本之前将它们转换为TinyLog类型。为此,请在元数据目录中的相应.sql文件中将ENGINE = Log替换为ENGINE = TinyLog。如果您的表没有Nullable列或者您的表的类型不是Log,则不需要做任何操作。
移除了experimental_allow_extended_storage_definition_syntax设置。现在此功能默认启用。
将runningIncome函数重命名为runningDifferenceStartingWithFirstvalue以避免混淆。
删除用于演示目的的BlockTabSeparated格式。
更改了聚合函数varSamp、varPop、stddevSamp、stddevPop、covarSamp、covarPop、corr的状态格式。如果您在表中存储了这些聚合函数的状态(使用AggregateFunction数据类型或带有相应状态的materialized view),请写信到clickhouse-feedback@yandex-team.com。
在以前的服务器版本中,有一个未记录的功能:如果聚合函数依赖于参数,则仍然可以在不带参数的AggregateFunction数据类型中指定它。例如:AggregateFunction(quantiles,UInt64)而不是AggregateFunction(quantiles(0.5,0.9),UInt64)。此功能已丢失。虽然它未记录,但我们计划在未来的版本中支持它。
最小/最大聚合函数中不能使用枚举数据类型。此功能将在下一个版本中返回。
2017-11-01, 1.1.54310
不允许使用Memory以外的其他引擎创建临时表。
不允许显式使用View或MaterializedView引擎创建表。
在表创建过程中,新的检查验证了采样键表达式是否包含在主键中。
2017-08-16, 1.1.54276
更改了groupArray(array_column)函数的数组的聚合状态的二进制格式。
2017-07-04, 1.1.54245
移除了SET GLOBAL
最后
ClickHouse, Inc. 成立了。恭喜!
我认为这款产品已经变得更加有发展前景,资金筹集也进行得顺利。