clickhouse副本: clickhouse副本可以增加数据的可靠性,防止数据丢失,并且可以分担读写负载 clickhouse分片: clickhouse的分片主要是解决单节点容量不足,需要把数据分摊到多个节点,起到类似mysql的分库分表的作用 假设我们现在有10.100.0.1-10.100.0.4四个服务节点,我们搭建一个2分片1副本的集群,所有的配置内容如下所示:
10.100.0.1节点 对应分片1:
<yandex>
<zookeeper-server>
<node index="1">
<host>zk1</host>
<port>2181</port>
</node>
</zookeeper-server>
<macros>
<replica>10.100.0.1</replica>
<shard>1</shard>
</macros>
<cluster_config_placement>
<towShardOneReplica>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>10.100.0.1</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
<replica>
<host>10.100.0.2</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>10.100.0.3</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
<replica>
<host>10.100.0.4</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
</shard>
</towShardOneReplica>
</cluster_config_placement>
</yandex>
10.100.0.2节点 对应分片1:
<yandex>
<zookeeper-server>
<node index="1">
<host>zk1</host>
<port>2181</port>
</node>
</zookeeper-server>
<macros>
<replica>10.100.0.2</replica>
<shard>1</shard>
</macros>
<cluster_config_placement>
<towShardOneReplica>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>10.100.0.1</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
<replica>
<host>10.100.0.2</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>10.100.0.3</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
<replica>
<host>10.100.0.4</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
</shard>
</towShardOneReplica>
</cluster_config_placement>
</yandex>
10.100.0.3节点 对应分片2:
<yandex>
<zookeeper-server>
<node index="1">
<host>zk1</host>
<port>2181</port>
</node>
</zookeeper-server>
<macros>
<replica>10.100.0.3</replica>
<shard>2</shard>
</macros>
<cluster_config_placement>
<towShardOneReplica>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>10.100.0.1</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
<replica>
<host>10.100.0.2</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>10.100.0.3</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
<replica>
<host>10.100.0.4</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
</shard>
</towShardOneReplica>
</cluster_config_placement>
</yandex>
10.100.0.4节点 对应分片2:
<yandex>
<zookeeper-server>
<node index="1">
<host>zk1</host>
<port>2181</port>
</node>
</zookeeper-server>
<macros>
<replica>10.100.0.4</replica>
<shard>2</shard>
</macros>
<cluster_config_placement>
<towShardOneReplica>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>10.100.0.1</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
<replica>
<host>10.100.0.2</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>10.100.0.3</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
<replica>
<host>10.100.0.4</host>
<port>9000</port>
<user></user>
<password></password>
</replica>
</shard>
</towShardOneReplica>
</cluster_config_placement>
</yandex>
这样我们的集群配置就创建好了,10.100.0.1和10.100.0.2互为分片1副本,同理10.100.0.3和10.100.0.4互为分片2副本。 创建本地表
CREATE TABLE default.local_table on cluster towShardOneReplica
(
`id` int32,
`name` String,
`create_time` DateTime
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/local_table ','{replica}')
PARTITION BY toYYYYMM(create_time)
ORDER BY id
这样我们的四个服务节点上面就创建好了本地表 创建分布式表:
CREATE TABLE default.distribute_table on cluster towShardOneReplica
(
`id` int32,
`name` String,
`create_time` DateTime
)
ENGINE = Distributed('towShardOneReplica','default','local_table', rand())
这样我们的四个服务节点上面就创建好了分布式表 一般来说如果我们现在都是通过分布式表读写数据的话,分布式表会处理好以下几种情形: a.假设10.100.0.1节点异常,那么当进行数据写入时,分布式表会自动选择另外的节点10.100.0.2进行数据写入,完全不影响写入,当进行查询时,也会自动选择存活的10.100.0.2进行查询,不影响数据查询. b.假设10.100.0.1和10.100.0.2 分片1的全部节点都异常,那么分布式表写入的时候由于本来就有写放大问题,也就是说数据本来就会先会写入一份属于分片1的临时数据到比如10.100.0.3的节点上,所以当10.100.0.1或者10.100.0.2恢复服务的时候,分布式表会把数据最终写入10.100.0.1的节点,保证数据的最终一致性,这种场景下如果进行数据查询,那么此时是抛出异常还是只返回分片2的数据由配置参数决定.
|