Cassandra集群的配置和修复
首先
我们正在使用Cassandra作为分布式大数据数据库。在节点集群中进行节点添加、节点删除、复制因子更改以及修复等操作。
Cassandra 是什么?
网址:http://cassandra.apache.org/

来源:https://db-engines.com/en/ranking
集群構成策略
参考链接:http://cassandra.apache.org/doc/latest/architecture/dynamo.html#复制策略
网络拓扑策略
如果需要配置多个数据中心,应选择NetworkTopologyStrategy。
简单策略
如果只考虑单一数据中心的架构,选择SimpleStrategy。
查看Keyspace的当前Strategy。
select * from system_schema.keyspaces where keyspace_name = 'my_keyspcae'

更改Keyspace的Strategy
ALTER KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};
调整JVM参数
配置文件:/etc/cassandra/conf/jvm.options
这是关于堆设置和垃圾回收等设置的内容。
例如:
#################
# HEAP SETTINGS #
#################
# Heap size is automatically calculated by cassandra-env based on this
# formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB))
# That is:
# - calculate 1/2 ram and cap to 1024MB
# - calculate 1/4 ram and cap to 8192MB
# - pick the max
#
# For production use you may wish to adjust this for your environment.
# If that's the case, uncomment the -Xmx and Xms options below to override the
# automatic calculation of JVM heap memory.
#
# It is recommended to set min (-Xms) and max (-Xmx) heap sizes to
# the same value to avoid stop-the-world GC pauses during resize, and
# so that we can lock the heap in memory on startup to prevent any
# of it from being swapped out.
-Xms8G
-Xmx8G
请参考这篇文章:https://docs.datastax.com/en/ddac/doc/datastax_enterprise/operations/opsTuneJVM.html
在Cluster中添加Node。
更改cassandra.yaml的配置
设置文件:/etc/cassandra/conf/cassandra.yaml
# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'MyCluster'
# Whether to start the thrift rpc server.
start_rpc: true
# For security reasons, you should not expose this port to the internet. Firewall it if needed.
rpc_address: <Local ip>
# any class that implements the SeedProvider interface and has a
# constructor that takes a Map<String, String> of parameters will do.
seed_provider:
# Addresses of hosts that are deemed contact points.
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring. You must change this if you are running
# multiple nodes!
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
# seeds is actually a comma-delimited list of addresses.
# Ex: "<ip1>,<ip2>,<ip3>"
- seeds: "<Local ip>,<Node1 ip>"
# Address or interface to bind to and tell other Cassandra nodes to connect to.
# You _must_ change this if you want multiple nodes to be able to communicate!
#
# Set listen_address OR listen_interface, not both.
#
# Leaving it blank leaves it up to InetAddress.getLocalHost(). This
# will always do the Right Thing _if_ the node is properly configured
# (hostname, name resolution, etc), and the Right Thing is to use the
# address associated with the hostname (it might not be).
#
# Setting listen_address to 0.0.0.0 is always wrong.
#
listen_address: <Local ip>
参考页面:http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html
参考网页:http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html
将其重新启动以将其添加到集群中。
rm -rf /var/lib/cassandra/data/system/*
service cassandra start
确认集群状态
nodetool status

对于多个节点的情况,它们会显示在列表中,并可了解到每个节点数据的所占比例。
如果只有一个节点,那么它将占据100%。当`replication_factor`是3、节点数量也是3时,`Owns`将为100%。
当节点数量超过4时,`Owns`将小于100%的数值。
從 Cluster 中刪除 Node
nodetool --host 外したいNodeのIP decommission -f
在登录想要退出的服务器后
nodetool decommission -f
修改复制因子后的修复
完全修复 xiū fù)
如果有更改replication_factor的情况,最好执行这个操作,因为它需要一段时间。
登录到任何一个节点,执行以下命令(仅一次)。
nodetool repair --full
分区器范围修复
nodetool repair -pr
在Keyspace中进行的Repair操作
nodetool repair mykeyspace
在指定的桌子上进行修复。
使用 nodetool repair 命令修复选项来修复中和这两个表。
nodetool repair mykeyspace mytable

参考网页:http://cassandra.apache.org/doc/latest/operating/repair.html?highlight=repair
以上 is the Chinese paraphrase for “above” or “the foregoing”.