1. 分片概念

分片是指跨机器拆分数据的过程,有时也会用术语分区。MongoDB既可以手工分片,也支持自动分片

2. 理解集群组件

分片的目标之一是由多个分片组成的集群对应用程序来说就像是一台服务器。为了此实现,需要在分片前面运行一个或多个称为mongos的路由进程。mongos维护着一个“目录”,指明了哪个分片包含哪些数据。
应用程序可以正常连接到此路由器并发出请求。路由服务器哪些数据在哪些分片上,可以将请求转发到适当的分片。如果有对请求的响应,路由服务器会收集他们,并在必要时进行合并,然后在发送给应用程序

3. 在单机集群上进行分片
# mongo --nodb --norc
MongoDB shell version v4.2.6
>st = ShardingTest({
    name: "one-min-shards",
    chunkSize:1,
    shards: 2,
    rs:{
        nodes:3,
        oplogSize:10
    },
    other: {
        enableBalancer:true
    }
})

当ShardingTest完成集群设置后,将启动并运行10个进程:两个副本集(各有3个节点)、一个配置服务器副本集(有3个节点)、一个mongos

在新终端窗口执行ps -ef|grep mongo可以看到如下信息:

00:00:09 /usr/bin/mongod --oplogSize 10 --port 20000 --replSet one-min-shards-rs0 --dbpath /data/db/one-min-shards-rs0-0 --shardsvr --setParameter migrationLockAcquisitionMaxWaitMS=30000 --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root       105    66  1 21:57 pts/0    00:00:10 /usr/bin/mongod --oplogSize 10 --port 20001 --replSet one-min-shards-rs0 --dbpath /data/db/one-min-shards-rs0-1 --shardsvr --setParameter migrationLockAcquisitionMaxWaitMS=30000 --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root       140    66  1 21:57 pts/0    00:00:10 /usr/bin/mongod --oplogSize 10 --port 20002 --replSet one-min-shards-rs0 --dbpath /data/db/one-min-shards-rs0-2 --shardsvr --setParameter migrationLockAcquisitionMaxWaitMS=30000 --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root       293    66  1 21:58 pts/0    00:00:09 /usr/bin/mongod --oplogSize 10 --port 20003 --replSet one-min-shards-rs1 --dbpath /data/db/one-min-shards-rs1-0 --shardsvr --setParameter migrationLockAcquisitionMaxWaitMS=30000 --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root       328    66  1 21:58 pts/0    00:00:09 /usr/bin/mongod --oplogSize 10 --port 20004 --replSet one-min-shards-rs1 --dbpath /data/db/one-min-shards-rs1-1 --shardsvr --setParameter migrationLockAcquisitionMaxWaitMS=30000 --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root       363    66  1 21:58 pts/0    00:00:09 /usr/bin/mongod --oplogSize 10 --port 20005 --replSet one-min-shards-rs1 --dbpath /data/db/one-min-shards-rs1-2 --shardsvr --setParameter migrationLockAcquisitionMaxWaitMS=30000 --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root       525    66  2 21:58 pts/0    00:00:11 /usr/bin/mongod --oplogSize 40 --port 20006 --replSet one-min-shards-configRS --dbpath /data/db/one-min-shards-configRS-0 --journal --configsvr --storageEngine wiredTiger --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root       567    66  1 21:58 pts/0    00:00:11 /usr/bin/mongod --oplogSize 40 --port 20007 --replSet one-min-shards-configRS --dbpath /data/db/one-min-shards-configRS-1 --journal --configsvr --storageEngine wiredTiger --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root       609    66  1 21:58 pts/0    00:00:10 /usr/bin/mongod --oplogSize 40 --port 20008 --replSet one-min-shards-configRS --dbpath /data/db/one-min-shards-configRS-2 --journal --configsvr --storageEngine wiredTiger --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root       792    66  0 21:58 pts/0    00:00:00 /usr/bin/mongos -v --port 20009 --configdb one-min-shards-configRS/2bffe09ec303:20006,2bffe09ec303:20007,2bffe09ec303:20008 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true

整个集群会将日志存储到当前shell中,因此打开第二个终端窗口,并启动另一个mongo shell:

# mongo --port 20009
mongos> use accounts
switched to db accounts
mongos> for(var i=0; i<100000;i++){db.users.insert({'username': 'user'+i, 'created_at': new Date()})}
mongos> db.users.count()
100000
mongos> sh.status()
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("62d2c414601aeb7a88c169f7")
  }
  shards:
        {  "_id" : "one-min-shards-rs0",  "host" : "one-min-shards-rs0/2bffe09ec303:20000,2bffe09ec303:20001,2bffe09ec303:20002",  "state" : 1 }
        {  "_id" : "one-min-shards-rs1",  "host" : "one-min-shards-rs1/2bffe09ec303:20003,2bffe09ec303:20004,2bffe09ec303:20005",  "state" : 1 }
  active mongoses:
        "4.2.6" : 1
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours:
                No recent migrations
  databases:
        {  "_id" : "accounts",  "primary" : "one-min-shards-rs1",  "partitioned" : false,  "version" : {  "uuid" : UUID("ba658ebe-bbd1-44cd-b2c9-cb48d3810551"),  "lastMod" : 1 } }
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
                config.system.sessions
                        shard key: { "_id" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                one-min-shards-rs0      1
                        { "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : one-min-shards-rs0 Timestamp(1, 0)

要对一个特定的集合进行分片,首先需要在集合的数据库上启用分片,如下:

mongos> sh.enableSharding("accounts")

在对集合进行分片时,需要选择一个片键。片键是MongoDB用来拆分数据的一个或几个字段。随着集合的增大,片键会成为集合中最重要的索引。只有创建了索引的字段才能够作为片键。

mongos> db.users.createIndex({'username':1})

现在可以通过username片键来对集合进行分片了

mongos> sh.shardCollection("accounts.users", {"username":1})
mongos> sh.status()
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("62d2c414601aeb7a88c169f7")
  }
  shards:
        {  "_id" : "one-min-shards-rs0",  "host" : "one-min-shards-rs0/2bffe09ec303:20000,2bffe09ec303:20001,2bffe09ec303:20002",  "state" : 1 }
        {  "_id" : "one-min-shards-rs1",  "host" : "one-min-shards-rs1/2bffe09ec303:20003,2bffe09ec303:20004,2bffe09ec303:20005",  "state" : 1 }
  active mongoses:
        "4.2.6" : 1
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours:
                6 : Success
  databases:
        {  "_id" : "accounts",  "primary" : "one-min-shards-rs1",  "partitioned" : true,  "version" : {  "uuid" : UUID("ba658ebe-bbd1-44cd-b2c9-cb48d3810551"),  "lastMod" : 1 } }
                accounts.users
                        shard key: { "username" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                one-min-shards-rs0      6
                                one-min-shards-rs1      7
                        { "username" : { "$minKey" : 1 } } -->> { "username" : "user17256" } on : one-min-shards-rs0 Timestamp(2, 0)
                        { "username" : "user17256" } -->> { "username" : "user24515" } on : one-min-shards-rs0 Timestamp(3, 0)
                        { "username" : "user24515" } -->> { "username" : "user31775" } on : one-min-shards-rs0 Timestamp(4, 0)
                        { "username" : "user31775" } -->> { "username" : "user39034" } on : one-min-shards-rs0 Timestamp(5, 0)
                        { "username" : "user39034" } -->> { "username" : "user46294" } on : one-min-shards-rs0 Timestamp(6, 0)
                        { "username" : "user46294" } -->> { "username" : "user53553" } on : one-min-shards-rs0 Timestamp(7, 0)
                        { "username" : "user53553" } -->> { "username" : "user60812" } on : one-min-shards-rs1 Timestamp(7, 1)
                        { "username" : "user60812" } -->> { "username" : "user68072" } on : one-min-shards-rs1 Timestamp(1, 7)
                        { "username" : "user68072" } -->> { "username" : "user75331" } on : one-min-shards-rs1 Timestamp(1, 8)
                        { "username" : "user75331" } -->> { "username" : "user82591" } on : one-min-shards-rs1 Timestamp(1, 9)
                        { "username" : "user82591" } -->> { "username" : "user89851" } on : one-min-shards-rs1 Timestamp(1, 10)
                        { "username" : "user89851" } -->> { "username" : "user9711" } on : one-min-shards-rs1 Timestamp(1, 11)
                        { "username" : "user9711" } -->> { "username" : { "$maxKey" : 1 } } on : one-min-shards-rs1 Timestamp(1, 12)
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
                config.system.sessions
                        shard key: { "_id" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                one-min-shards-rs0      1
                        { "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : one-min-shards-rs0 Timestamp(1, 0)

可以看到,这个集合被分为了13个块,其中6个块在分片one-min-shards-rs0上,7个块在分片one-min-shards-rs1上。

现在数据已经分布在分片上了,现在我们查询特定用户名:

mongos> db.users.find({username: "user12345"})
{ "_id" : ObjectId("62d2ca068dbf17063ba4d62f"), "username" : "user12345", "created_at" : ISODate("2022-07-16T14:24:06.891Z") }

通过explain可以观察到数据分布在分片one-min-shards-rs0上

mongos> db.users.find({username: "user12345"}).explain()
{
        "queryPlanner" : {
                "mongosPlannerVersion" : 1,
                "winningPlan" : {
                        "stage" : "SINGLE_SHARD",
                        "shards" : [
                                {
                                        "shardName" : "one-min-shards-rs0",
                                        "connectionString" : "one-min-shards-rs0/2bffe09ec303:20000,2bffe09ec303:20001,2bffe09ec303:20002",
                                        "serverInfo" : {
                                                "host" : "2bffe09ec303",
                                                "port" : 20001,
                                                "version" : "4.2.6",
                                                "gitVersion" : "20364840b8f1af16917e4c23c1b5f5efd8b352f8"
                                        },
                                        "plannerVersion" : 1,
                                        "namespace" : "accounts.users",
                                        "indexFilterSet" : false,
                                        "parsedQuery" : {
                                                "username" : {
                                                        "$eq" : "user12345"
                                                }
                                        },
                                        "queryHash" : "379E82C5",
                                        "planCacheKey" : "965E0A67",
                                        "winningPlan" : {
                                                "stage" : "FETCH",
                                                "inputStage" : {
                                                        "stage" : "SHARDING_FILTER",
                                                        "inputStage" : {
                                                                "stage" : "IXSCAN",
                                                                "keyPattern" : {
                                                                        "username" : 1
                                                                },
                                                                "indexName" : "username_1",
                                                                "isMultiKey" : false,
                                                                "multiKeyPaths" : {
                                                                        "username" : [ ]
                                                                },
                                                                "isUnique" : false,
                                                                "isSparse" : false,
                                                                "isPartial" : false,
                                                                "indexVersion" : 2,
                                                                "direction" : "forward",
                                                                "indexBounds" : {
                                                                        "username" : [
                                                                                "[\"user12345\", \"user12345\"]"
                                                                        ]
                                                                }
                                                        }
                                                }
                                        },
                                        "rejectedPlans" : [ ]
                                }
                        ]
                }
        }
}

当查询全部数据时,可以发现查询必须方位两个分片才能找到所有数据。通常来说,如果查询中没有使用片键,mongos就不得不将查询发送到每个分片上。

mongos> db.users.find({}).explain()
{
        "queryPlanner" : {
                "mongosPlannerVersion" : 1,
                "winningPlan" : {
                        "stage" : "SHARD_MERGE",
                        "shards" : [
                                {
                                        "shardName" : "one-min-shards-rs0",
                                        "connectionString" : "one-min-shards-rs0/2bffe09ec303:20000,2bffe09ec303:20001,2bffe09ec303:20002",
                                        "serverInfo" : {
                                                "host" : "2bffe09ec303",
                                                "port" : 20001,
                                                "version" : "4.2.6",
                                                "gitVersion" : "20364840b8f1af16917e4c23c1b5f5efd8b352f8"
                                        },
                                        "plannerVersion" : 1,
                                        "namespace" : "accounts.users",
                                        "indexFilterSet" : false,
                                        "parsedQuery" : {

                                        },
                                        "queryHash" : "8B3D4AB8",
                                        "planCacheKey" : "8B3D4AB8",
                                        "winningPlan" : {
                                                "stage" : "SHARDING_FILTER",
                                                "inputStage" : {
                                                        "stage" : "COLLSCAN",
                                                        "direction" : "forward"
                                                }
                                        },
                                        "rejectedPlans" : [ ]
                                },
                                {
                                        "shardName" : "one-min-shards-rs1",
                                        "connectionString" : "one-min-shards-rs1/2bffe09ec303:20003,2bffe09ec303:20004,2bffe09ec303:20005",
                                        "serverInfo" : {
                                                "host" : "2bffe09ec303",
                                                "port" : 20005,
                                                "version" : "4.2.6",
                                                "gitVersion" : "20364840b8f1af16917e4c23c1b5f5efd8b352f8"
                                        },
                                        "plannerVersion" : 1,
                                        "namespace" : "accounts.users",
                                        "indexFilterSet" : false,
                                        "parsedQuery" : {

                                        },
                                        "queryHash" : "8B3D4AB8",
                                        "planCacheKey" : "8B3D4AB8",
                                        "winningPlan" : {
                                                "stage" : "SHARDING_FILTER",
                                                "inputStage" : {
                                                        "stage" : "COLLSCAN",
                                                        "direction" : "forward"
                                                }
                                        },
                                        "rejectedPlans" : [ ]
                                }
                        ]
                }
        },
        "serverInfo" : {
                "host" : "2bffe09ec303",
                "port" : 20009,
                "version" : "4.2.6",
                "gitVersion" : "20364840b8f1af16917e4c23c1b5f5efd8b352f8"
        },
        "ok" : 1,
        "operationTime" : Timestamp(1658027749, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1658030075, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}

定向查询:包含片键并可以发送到单个分片或分片子集的查询

分散-收集查询:必须发送到所有分片的查询

现在可以在原来的shell,按几次Enter键返回命令行,运行sh.stop()干净的关闭所有服务器

欢迎关注公众号算法小生沈健的技术博客

原文地址:http://www.cnblogs.com/shenjian-online/p/16800491.html

1. 本站所有资源来源于用户上传和网络,如有侵权请邮件联系站长! 2. 分享目的仅供大家学习和交流,请务用于商业用途! 3. 如果你也有好源码或者教程,可以到用户中心发布,分享有积分奖励和额外收入! 4. 本站提供的源码、模板、插件等等其他资源,都不包含技术服务请大家谅解! 5. 如有链接无法下载、失效或广告,请联系管理员处理! 6. 本站资源售价只是赞助,收取费用仅维持本站的日常运营所需! 7. 如遇到加密压缩包,默认解压密码为"gltf",如遇到无法解压的请联系管理员! 8. 因为资源和程序源码均为可复制品,所以不支持任何理由的退款兑现,请斟酌后支付下载 声明:如果标题没有注明"已测试"或者"测试可用"等字样的资源源码均未经过站长测试.特别注意没有标注的源码不保证任何可用性