Here is my mongo cluster (sharding with set) configuration.
replica sets:
rs0 - IP1, IP2, IP3 || port - 27017
rs1 - IP4, IP5, IP6 || port - 27017
config server replica set - IP7, IP8, IP9 || port - 26017
mongos - IP7, IP8, IP9 || port - 26000
This is a test setup and the configuration was setup using IPs(not hostnames). Unfortunately, all host were down following a maintenance & all host IPs changed when we brought the nodes up. Obviously replica set(mongod), config server(mongod) and mongos didn't come up due to unreachable IP addresses.
To bring up the setup, I did the following
- Updated replica set host IP addresses following https://www.mongodb.com/docs/v4.2/tutorial/change-hostnames-in-a-replica-set/
- Updated config server replica set host IPs following the same mongo document. Started mongod services w/o sharding.
- Didn't find any proper documentation around changing config server & mongos IP address/hostname change. On config server replica set, updated "shards" collection in config db.
cfg1 = db.shards.findOne( { "_id": "rs0" } )
cfg1.host = "rs0/new_IP1:27017,new_IP2:27017,new_IP3:27017"
db.shards.update({ "_id" : "rs0" } , cfg1 )
cfg2 = db.shards.findOne( { "_id": "rs1" } )
cfg2.host = "rs1/new_IP3:27017,new_IP4:27017,new_IP5:27017"
db.shards.update({ "_id" : "rs1" } , cfg2 )
- Started config server and mongos properly.
- Now restarting replicaset members to make use of shading. However the replica set mongod processes are not starting citing references to old config server replica set IPs. Following error I am getting on mongod.log.
2022-05-17T21:20:39.654+0530 W SHARDING [initandlisten] Error initializing sharding state, sleeping for 2 seconds and trying again :: caused by :: FailedToSatisfyReadPreference: Error loading clusterID :: caused by :: Could not find host matching read preference { mode: "nearest" } for set csrs
2022-05-17T21:20:40.154+0530 I ASIO [ReplicaSetMonitor-TaskExecutor] Connecting to x.x.x.x:26017
2022-05-17T21:20:41.655+0530 I ASIO [ReplicaSetMonitor-TaskExecutor] Connecting to y.y.y.y:26017
2022-05-17T21:20:42.660+0530 I ASIO [ReplicaSetMonitor-TaskExecutor] Failed to connect to z.z.z.z:26017 - HostUnreachable: Error connecting to 10.0.13.206:26017 :: caused by :: No route to host
I couldn't find any help on web to recover from this scenario. Requesting assistance in recovering the setup without loosing any data as we have loaded TBs of data on this cluster.
