All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* Why do each node have different views on the nodes that rejoin the network in a fully mesh RDMACM configuration?
@ 2024-03-28  2:47 jason
  0 siblings, 0 replies; only message in thread
From: jason @ 2024-03-28  2:47 UTC (permalink / raw
  To: linux-rdma

We have four nodes: A, B, C, and D. They use RDMACM for full
connectivity, which means they are both servers and clients to each
other. When the process on node C is stopped out and restarted after
few minutes, the other three nodes act as clients and initiate an
active connection to node C. However, only node D successfully
connects, while for nodes A and B, connection failure occurs on node C
due to receiving the RDMA_CM_EVENT_REJECTED event. The status value of
the event is 10 (according to IBTA, it means a stale connection). It
seems that each node has different opinions on the rejoining of the
rejoined C node.

Even more strangely, just after node D successfully connected to node
C, the connection between node A and node D(D as server), and the
connection between node B and node D(D as server too)  are almost
simultaneously disconnected, because they received the
RDMA_CM_EVENT_DISCONNECTED event from each other.

Could you please help me check what the problem is? Thank you!

-- 
B.R.,
Zhijiang

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-03-28  2:47 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-28  2:47 Why do each node have different views on the nodes that rejoin the network in a fully mesh RDMACM configuration? jason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.