[PATCH 0/6] iser-target: Fix active I/O shutdown related issues

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/6] iser-target: Fix active I/O shutdown related issues
@ 2014-03-04  0:00 Nicholas A. Bellinger
       [not found] ` <1393891265-22910-1-git-send-email-nab-PEzghdH756F8UrSeD/g0lQ@public.gmane.org>
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Nicholas A. Bellinger @ 2014-03-04  0:00 UTC (permalink / raw
  To: target-devel
  Cc: linux-rdma, linux-scsi, Or Gerlitz, Sagi Grimberg,
	Nicholas Bellinger

From: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>

Hi Or & Sagi,

This series addresses a number of active I/O shutdown related issues
in iser-target code that have come up recently during stress testing.

Note there is still a seperate iser-target network portal shutdown
bug being tracked down, but this series addresses all existing issues
related to active I/O session shutdown.

The patch breakdown looks like:

Patch #1 fixes a long-standing bug where TPGs in shutdown incorrectly
could be referenced by new login attempts.

Patch #2 converts list_del -> list_del_init for iscsi_cmd->i_conn_node
so that list_empty works correctly.

Patch #3 addresses isert_conn->state related bugs resulting in hung
shutdown, and splits isert_free_conn() into seperate code that is
called earlier during shutdown to ensure that all outstanding I/O
has completed.

Patch #4 fixes incorrect accounting of ->post_send_buf_count during
active I/O shutdown with outstanding RDMA WRITE + RDMA READ work
requests.

Patch #5 addresses a bug related to active I/O shutdown with
outstanding FRMR work requests.  Note this patch is specific to
v3.12+ code.

Patch #6 addresses bugs related to active I/O shutdown with
outstanding completion interrupt coalescing batches. Note this patch
is specific to v3.13+ code.

Please review.

--nab

Nicholas Bellinger (6):
  iscsi-target: Fix iscsit_get_tpg_from_np tpg_state bug
  iscsi/iser-target: Use list_del_init for ->i_conn_node
  iscsi/iser-target: Fix isert_conn->state hung shutdown issues
  iser-target: Fix post_send_buf_count for RDMA READ/WRITE
  iser-target: Ignore completions for FRWRs in isert_cq_tx_work
  iser-target: Fix command leak for tx_desc->comp_llnode_batch

 drivers/infiniband/ulp/isert/ib_isert.c  |  180 ++++++++++++++++++------------
 drivers/infiniband/ulp/isert/ib_isert.h  |    7 +-
 drivers/target/iscsi/iscsi_target.c      |   10 +-
 drivers/target/iscsi/iscsi_target_erl2.c |   16 +--
 drivers/target/iscsi/iscsi_target_tpg.c  |    2 +-
 include/target/iscsi/iscsi_transport.h   |    1 +
 6 files changed, 129 insertions(+), 87 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] iscsi-target: Fix iscsit_get_tpg_from_np tpg_state bug
       [not found] ` <1393891265-22910-1-git-send-email-nab-PEzghdH756F8UrSeD/g0lQ@public.gmane.org>
@ 2014-03-04  0:01   ` Nicholas A. Bellinger
  0 siblings, 0 replies; 18+ messages in thread
From: Nicholas A. Bellinger @ 2014-03-04  0:01 UTC (permalink / raw
  To: target-devel
  Cc: linux-rdma, linux-scsi, Or Gerlitz, Sagi Grimberg,
	Nicholas Bellinger

From: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>

This patch fixes a bug in iscsit_get_tpg_from_np() where the
tpg->tpg_state sanity check was looking for TPG_STATE_FREE,
instead of != TPG_STATE_ACTIVE.

The latter is expected during a normal TPG shutdown once the
tpg_state goes into TPG_STATE_INACTIVE in order to reject any
new incoming login attempts.

Signed-off-by: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
---
 drivers/target/iscsi/iscsi_target_tpg.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/iscsi/iscsi_target_tpg.c b/drivers/target/iscsi/iscsi_target_tpg.c
index 3976183..44a5471 100644
--- a/drivers/target/iscsi/iscsi_target_tpg.c
+++ b/drivers/target/iscsi/iscsi_target_tpg.c
@@ -137,7 +137,7 @@ struct iscsi_portal_group *iscsit_get_tpg_from_np(
 	list_for_each_entry(tpg, &tiqn->tiqn_tpg_list, tpg_list) {
 
 		spin_lock(&tpg->tpg_state_lock);
-		if (tpg->tpg_state == TPG_STATE_FREE) {
+		if (tpg->tpg_state != TPG_STATE_ACTIVE) {
 			spin_unlock(&tpg->tpg_state_lock);
 			continue;
 		}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/6] iscsi/iser-target: Use list_del_init for ->i_conn_node
  2014-03-04  0:00 [PATCH 0/6] iser-target: Fix active I/O shutdown related issues Nicholas A. Bellinger
       [not found] ` <1393891265-22910-1-git-send-email-nab-PEzghdH756F8UrSeD/g0lQ@public.gmane.org>
@ 2014-03-04  0:01 ` Nicholas A. Bellinger
  2014-03-04  0:01 ` [PATCH 3/6] iscsi/iser-target: Fix isert_conn->state hung shutdown issues Nicholas A. Bellinger
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Nicholas A. Bellinger @ 2014-03-04  0:01 UTC (permalink / raw
  To: target-devel
  Cc: linux-rdma, linux-scsi, Or Gerlitz, Sagi Grimberg,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

There are a handful of uses of list_empty() for cmd->i_conn_node
within iser-target code that expect to return false once a cmd
has been removed from the per connect list.

This patch changes all uses of list_del -> list_del_init in order
to ensure that list_empty() returns false as expected.

Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/infiniband/ulp/isert/ib_isert.c  |    6 +++---
 drivers/target/iscsi/iscsi_target.c      |    6 +++---
 drivers/target/iscsi/iscsi_target_erl2.c |   16 ++++++++--------
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 63bcf69..05a575e 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -1451,7 +1451,7 @@ isert_put_cmd(struct isert_cmd *isert_cmd)
 	case ISCSI_OP_SCSI_CMD:
 		spin_lock_bh(&conn->cmd_lock);
 		if (!list_empty(&cmd->i_conn_node))
-			list_del(&cmd->i_conn_node);
+			list_del_init(&cmd->i_conn_node);
 		spin_unlock_bh(&conn->cmd_lock);
 
 		if (cmd->data_direction == DMA_TO_DEVICE)
@@ -1463,7 +1463,7 @@ isert_put_cmd(struct isert_cmd *isert_cmd)
 	case ISCSI_OP_SCSI_TMFUNC:
 		spin_lock_bh(&conn->cmd_lock);
 		if (!list_empty(&cmd->i_conn_node))
-			list_del(&cmd->i_conn_node);
+			list_del_init(&cmd->i_conn_node);
 		spin_unlock_bh(&conn->cmd_lock);
 
 		transport_generic_free_cmd(&cmd->se_cmd, 0);
@@ -1473,7 +1473,7 @@ isert_put_cmd(struct isert_cmd *isert_cmd)
 	case ISCSI_OP_TEXT:
 		spin_lock_bh(&conn->cmd_lock);
 		if (!list_empty(&cmd->i_conn_node))
-			list_del(&cmd->i_conn_node);
+			list_del_init(&cmd->i_conn_node);
 		spin_unlock_bh(&conn->cmd_lock);
 
 		/*
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index a99637e..72f1576 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -784,7 +784,7 @@ static void iscsit_ack_from_expstatsn(struct iscsi_conn *conn, u32 exp_statsn)
 	spin_unlock_bh(&conn->cmd_lock);
 
 	list_for_each_entry_safe(cmd, cmd_p, &ack_list, i_conn_node) {
-		list_del(&cmd->i_conn_node);
+		list_del_init(&cmd->i_conn_node);
 		iscsit_free_cmd(cmd, false);
 	}
 }
@@ -3709,7 +3709,7 @@ iscsit_immediate_queue(struct iscsi_conn *conn, struct iscsi_cmd *cmd, int state
 		break;
 	case ISTATE_REMOVE:
 		spin_lock_bh(&conn->cmd_lock);
-		list_del(&cmd->i_conn_node);
+		list_del_init(&cmd->i_conn_node);
 		spin_unlock_bh(&conn->cmd_lock);
 
 		iscsit_free_cmd(cmd, false);
@@ -4152,7 +4152,7 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
 	spin_lock_bh(&conn->cmd_lock);
 	list_for_each_entry_safe(cmd, cmd_tmp, &conn->conn_cmd_list, i_conn_node) {
 
-		list_del(&cmd->i_conn_node);
+		list_del_init(&cmd->i_conn_node);
 		spin_unlock_bh(&conn->cmd_lock);
 
 		iscsit_increment_maxcmdsn(cmd, sess);
diff --git a/drivers/target/iscsi/iscsi_target_erl2.c b/drivers/target/iscsi/iscsi_target_erl2.c
index 33be1fb..4ca8fd2 100644
--- a/drivers/target/iscsi/iscsi_target_erl2.c
+++ b/drivers/target/iscsi/iscsi_target_erl2.c
@@ -138,7 +138,7 @@ void iscsit_free_connection_recovery_entires(struct iscsi_session *sess)
 		list_for_each_entry_safe(cmd, cmd_tmp,
 				&cr->conn_recovery_cmd_list, i_conn_node) {
 
-			list_del(&cmd->i_conn_node);
+			list_del_init(&cmd->i_conn_node);
 			cmd->conn = NULL;
 			spin_unlock(&cr->conn_recovery_cmd_lock);
 			iscsit_free_cmd(cmd, true);
@@ -160,7 +160,7 @@ void iscsit_free_connection_recovery_entires(struct iscsi_session *sess)
 		list_for_each_entry_safe(cmd, cmd_tmp,
 				&cr->conn_recovery_cmd_list, i_conn_node) {
 
-			list_del(&cmd->i_conn_node);
+			list_del_init(&cmd->i_conn_node);
 			cmd->conn = NULL;
 			spin_unlock(&cr->conn_recovery_cmd_lock);
 			iscsit_free_cmd(cmd, true);
@@ -216,7 +216,7 @@ int iscsit_remove_cmd_from_connection_recovery(
 	}
 	cr = cmd->cr;
 
-	list_del(&cmd->i_conn_node);
+	list_del_init(&cmd->i_conn_node);
 	return --cr->cmd_count;
 }
 
@@ -297,7 +297,7 @@ int iscsit_discard_unacknowledged_ooo_cmdsns_for_conn(struct iscsi_conn *conn)
 		if (!(cmd->cmd_flags & ICF_OOO_CMDSN))
 			continue;
 
-		list_del(&cmd->i_conn_node);
+		list_del_init(&cmd->i_conn_node);
 
 		spin_unlock_bh(&conn->cmd_lock);
 		iscsit_free_cmd(cmd, true);
@@ -335,7 +335,7 @@ int iscsit_prepare_cmds_for_realligance(struct iscsi_conn *conn)
 	/*
 	 * Only perform connection recovery on ISCSI_OP_SCSI_CMD or
 	 * ISCSI_OP_NOOP_OUT opcodes.  For all other opcodes call
-	 * list_del(&cmd->i_conn_node); to release the command to the
+	 * list_del_init(&cmd->i_conn_node); to release the command to the
 	 * session pool and remove it from the connection's list.
 	 *
 	 * Also stop the DataOUT timer, which will be restarted after
@@ -351,7 +351,7 @@ int iscsit_prepare_cmds_for_realligance(struct iscsi_conn *conn)
 				" CID: %hu\n", cmd->iscsi_opcode,
 				cmd->init_task_tag, cmd->cmd_sn, conn->cid);
 
-			list_del(&cmd->i_conn_node);
+			list_del_init(&cmd->i_conn_node);
 			spin_unlock_bh(&conn->cmd_lock);
 			iscsit_free_cmd(cmd, true);
 			spin_lock_bh(&conn->cmd_lock);
@@ -371,7 +371,7 @@ int iscsit_prepare_cmds_for_realligance(struct iscsi_conn *conn)
 		 */
 		if (!(cmd->cmd_flags & ICF_OOO_CMDSN) && !cmd->immediate_cmd &&
 		     iscsi_sna_gte(cmd->cmd_sn, conn->sess->exp_cmd_sn)) {
-			list_del(&cmd->i_conn_node);
+			list_del_init(&cmd->i_conn_node);
 			spin_unlock_bh(&conn->cmd_lock);
 			iscsit_free_cmd(cmd, true);
 			spin_lock_bh(&conn->cmd_lock);
@@ -393,7 +393,7 @@ int iscsit_prepare_cmds_for_realligance(struct iscsi_conn *conn)
 
 		cmd->sess = conn->sess;
 
-		list_del(&cmd->i_conn_node);
+		list_del_init(&cmd->i_conn_node);
 		spin_unlock_bh(&conn->cmd_lock);
 
 		iscsit_free_all_datain_reqs(cmd);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/6] iscsi/iser-target: Fix isert_conn->state hung shutdown issues
  2014-03-04  0:00 [PATCH 0/6] iser-target: Fix active I/O shutdown related issues Nicholas A. Bellinger
       [not found] ` <1393891265-22910-1-git-send-email-nab-PEzghdH756F8UrSeD/g0lQ@public.gmane.org>
  2014-03-04  0:01 ` [PATCH 2/6] iscsi/iser-target: Use list_del_init for ->i_conn_node Nicholas A. Bellinger
@ 2014-03-04  0:01 ` Nicholas A. Bellinger
  2014-03-04  0:01 ` [PATCH 4/6] iser-target: Fix post_send_buf_count for RDMA READ/WRITE Nicholas A. Bellinger
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Nicholas A. Bellinger @ 2014-03-04  0:01 UTC (permalink / raw
  To: target-devel
  Cc: linux-rdma, linux-scsi, Or Gerlitz, Sagi Grimberg,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

This patch addresses a couple of different hug shutdown issues
related to wait_event() + isert_conn->state.  First, it changes
isert_conn->conn_wait + isert_conn->conn_wait_comp_err from
waitqueues to completions, and sets ISER_CONN_TERMINATING from
within isert_disconnect_work().

Second, it splits isert_free_conn() into isert_wait_conn() that
is called earlier in iscsit_close_connection() to ensure that
all outstanding commands have completed before continuing.

Finally, it breaks isert_cq_comp_err() into seperate TX / RX
related code, and adds logic in isert_cq_rx_comp_err() to wait
for outstanding commands to complete before setting ISER_CONN_DOWN
and calling complete(&isert_conn->conn_wait_comp_err).

Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/infiniband/ulp/isert/ib_isert.c |  106 ++++++++++++++-----------------
 drivers/infiniband/ulp/isert/ib_isert.h |    4 +-
 drivers/target/iscsi/iscsi_target.c     |    4 ++
 include/target/iscsi/iscsi_transport.h  |    1 +
 4 files changed, 55 insertions(+), 60 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 05a575e..8283f5a 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -479,8 +479,8 @@ isert_connect_request(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
 	isert_conn->state = ISER_CONN_INIT;
 	INIT_LIST_HEAD(&isert_conn->conn_accept_node);
 	init_completion(&isert_conn->conn_login_comp);
-	init_waitqueue_head(&isert_conn->conn_wait);
-	init_waitqueue_head(&isert_conn->conn_wait_comp_err);
+	init_completion(&isert_conn->conn_wait);
+	init_completion(&isert_conn->conn_wait_comp_err);
 	kref_init(&isert_conn->conn_kref);
 	kref_get(&isert_conn->conn_kref);
 	mutex_init(&isert_conn->conn_mutex);
@@ -675,11 +675,11 @@ isert_disconnect_work(struct work_struct *work)
 
 	pr_debug("isert_disconnect_work(): >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n");
 	mutex_lock(&isert_conn->conn_mutex);
-	isert_conn->state = ISER_CONN_DOWN;
+	if (isert_conn->state == ISER_CONN_UP)
+		isert_conn->state = ISER_CONN_TERMINATING;
 
 	if (isert_conn->post_recv_buf_count == 0 &&
 	    atomic_read(&isert_conn->post_send_buf_count) == 0) {
-		pr_debug("Calling wake_up(&isert_conn->conn_wait);\n");
 		mutex_unlock(&isert_conn->conn_mutex);
 		goto wake_up;
 	}
@@ -699,7 +699,7 @@ isert_disconnect_work(struct work_struct *work)
 	mutex_unlock(&isert_conn->conn_mutex);
 
 wake_up:
-	wake_up(&isert_conn->conn_wait);
+	complete(&isert_conn->conn_wait);
 	isert_put_conn(isert_conn);
 }
 
@@ -1576,7 +1576,7 @@ isert_do_control_comp(struct work_struct *work)
 		pr_debug("Calling iscsit_logout_post_handler >>>>>>>>>>>>>>\n");
 		/*
 		 * Call atomic_dec(&isert_conn->post_send_buf_count)
-		 * from isert_free_conn()
+		 * from isert_wait_conn()
 		 */
 		isert_conn->logout_posted = true;
 		iscsit_logout_post_handler(cmd, cmd->conn);
@@ -1678,31 +1678,39 @@ isert_send_completion(struct iser_tx_desc *tx_desc,
 }
 
 static void
-isert_cq_comp_err(struct iser_tx_desc *tx_desc, struct isert_conn *isert_conn)
+isert_cq_tx_comp_err(struct iser_tx_desc *tx_desc, struct isert_conn *isert_conn)
 {
 	struct ib_device *ib_dev = isert_conn->conn_cm_id->device;
+	struct isert_cmd *isert_cmd = tx_desc->isert_cmd;
+
+	if (!isert_cmd)
+		isert_unmap_tx_desc(tx_desc, ib_dev);
+	else
+		isert_completion_put(tx_desc, isert_cmd, ib_dev);
+}
+
+static void
+isert_cq_rx_comp_err(struct isert_conn *isert_conn)
+{
+	struct ib_device *ib_dev = isert_conn->conn_cm_id->device;
+	struct iscsi_conn *conn = isert_conn->conn;
 
-	if (tx_desc) {
-		struct isert_cmd *isert_cmd = tx_desc->isert_cmd;
+	if (isert_conn->post_recv_buf_count)
+		return;
 
-		if (!isert_cmd)
-			isert_unmap_tx_desc(tx_desc, ib_dev);
-		else
-			isert_completion_put(tx_desc, isert_cmd, ib_dev);
+	if (conn->sess) {
+		target_sess_cmd_list_set_waiting(conn->sess->se_sess);
+		target_wait_for_sess_cmds(conn->sess->se_sess);
 	}
 
-	if (isert_conn->post_recv_buf_count == 0 &&
-	    atomic_read(&isert_conn->post_send_buf_count) == 0) {
-		pr_debug("isert_cq_comp_err >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n");
-		pr_debug("Calling wake_up from isert_cq_comp_err\n");
+	while (atomic_read(&isert_conn->post_send_buf_count))
+		msleep(3000);
 
-		mutex_lock(&isert_conn->conn_mutex);
-		if (isert_conn->state != ISER_CONN_DOWN)
-			isert_conn->state = ISER_CONN_TERMINATING;
-		mutex_unlock(&isert_conn->conn_mutex);
+	mutex_lock(&isert_conn->conn_mutex);
+	isert_conn->state = ISER_CONN_DOWN;
+	mutex_unlock(&isert_conn->conn_mutex);
 
-		wake_up(&isert_conn->conn_wait_comp_err);
-	}
+	complete(&isert_conn->conn_wait_comp_err);
 }
 
 static void
@@ -1727,8 +1735,9 @@ isert_cq_tx_work(struct work_struct *work)
 			pr_debug("TX wc.status != IB_WC_SUCCESS >>>>>>>>>>>>>>\n");
 			pr_debug("TX wc.status: 0x%08x\n", wc.status);
 			pr_debug("TX wc.vendor_err: 0x%08x\n", wc.vendor_err);
+
 			atomic_dec(&isert_conn->post_send_buf_count);
-			isert_cq_comp_err(tx_desc, isert_conn);
+			isert_cq_tx_comp_err(tx_desc, isert_conn);
 		}
 	}
 
@@ -1772,7 +1781,7 @@ isert_cq_rx_work(struct work_struct *work)
 					 wc.vendor_err);
 			}
 			isert_conn->post_recv_buf_count--;
-			isert_cq_comp_err(NULL, isert_conn);
+			isert_cq_rx_comp_err(isert_conn);
 		}
 	}
 
@@ -2691,22 +2700,11 @@ isert_free_np(struct iscsi_np *np)
 	kfree(isert_np);
 }
 
-static int isert_check_state(struct isert_conn *isert_conn, int state)
-{
-	int ret;
-
-	mutex_lock(&isert_conn->conn_mutex);
-	ret = (isert_conn->state == state);
-	mutex_unlock(&isert_conn->conn_mutex);
-
-	return ret;
-}
-
-static void isert_free_conn(struct iscsi_conn *conn)
+static void isert_wait_conn(struct iscsi_conn *conn)
 {
 	struct isert_conn *isert_conn = conn->context;
 
-	pr_debug("isert_free_conn: Starting \n");
+	pr_debug("isert_wait_conn: Starting \n");
 	/*
 	 * Decrement post_send_buf_count for special case when called
 	 * from isert_do_control_comp() -> iscsit_logout_post_handler()
@@ -2716,38 +2714,29 @@ static void isert_free_conn(struct iscsi_conn *conn)
 		atomic_dec(&isert_conn->post_send_buf_count);
 
 	if (isert_conn->conn_cm_id && isert_conn->state != ISER_CONN_DOWN) {
-		pr_debug("Calling rdma_disconnect from isert_free_conn\n");
+		pr_debug("Calling rdma_disconnect from isert_wait_conn\n");
 		rdma_disconnect(isert_conn->conn_cm_id);
 	}
 	/*
 	 * Only wait for conn_wait_comp_err if the isert_conn made it
 	 * into full feature phase..
 	 */
-	if (isert_conn->state == ISER_CONN_UP) {
-		pr_debug("isert_free_conn: Before wait_event comp_err %d\n",
-			 isert_conn->state);
-		mutex_unlock(&isert_conn->conn_mutex);
-
-		wait_event(isert_conn->conn_wait_comp_err,
-			  (isert_check_state(isert_conn, ISER_CONN_TERMINATING)));
-
-		wait_event(isert_conn->conn_wait,
-			  (isert_check_state(isert_conn, ISER_CONN_DOWN)));
-
-		isert_put_conn(isert_conn);
-		return;
-	}
 	if (isert_conn->state == ISER_CONN_INIT) {
 		mutex_unlock(&isert_conn->conn_mutex);
-		isert_put_conn(isert_conn);
 		return;
 	}
-	pr_debug("isert_free_conn: wait_event conn_wait %d\n",
-		 isert_conn->state);
+	if (isert_conn->state == ISER_CONN_UP)
+		isert_conn->state = ISER_CONN_TERMINATING;
 	mutex_unlock(&isert_conn->conn_mutex);
 
-	wait_event(isert_conn->conn_wait,
-		  (isert_check_state(isert_conn, ISER_CONN_DOWN)));
+	wait_for_completion(&isert_conn->conn_wait_comp_err);
+
+	wait_for_completion(&isert_conn->conn_wait);
+}
+
+static void isert_free_conn(struct iscsi_conn *conn)
+{
+	struct isert_conn *isert_conn = conn->context;
 
 	isert_put_conn(isert_conn);
 }
@@ -2760,6 +2749,7 @@ static struct iscsit_transport iser_target_transport = {
 	.iscsit_setup_np	= isert_setup_np,
 	.iscsit_accept_np	= isert_accept_np,
 	.iscsit_free_np		= isert_free_np,
+	.iscsit_wait_conn	= isert_wait_conn,
 	.iscsit_free_conn	= isert_free_conn,
 	.iscsit_get_login_rx	= isert_get_login_rx,
 	.iscsit_put_login_tx	= isert_put_login_tx,
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 708a069..41e799f 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -116,8 +116,8 @@ struct isert_conn {
 	struct isert_device	*conn_device;
 	struct work_struct	conn_logout_work;
 	struct mutex		conn_mutex;
-	wait_queue_head_t	conn_wait;
-	wait_queue_head_t	conn_wait_comp_err;
+	struct completion	conn_wait;
+	struct completion	conn_wait_comp_err;
 	struct kref		conn_kref;
 	struct list_head	conn_fr_pool;
 	int			conn_fr_pool_size;
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 72f1576..e927863 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -4197,6 +4197,10 @@ int iscsit_close_connection(
 	iscsit_stop_timers_for_cmds(conn);
 	iscsit_stop_nopin_response_timer(conn);
 	iscsit_stop_nopin_timer(conn);
+
+	if (conn->conn_transport->iscsit_wait_conn)
+		conn->conn_transport->iscsit_wait_conn(conn);
+
 	iscsit_free_queue_reqs_for_conn(conn);
 
 	/*
diff --git a/include/target/iscsi/iscsi_transport.h b/include/target/iscsi/iscsi_transport.h
index ae5a171..4483fad 100644
--- a/include/target/iscsi/iscsi_transport.h
+++ b/include/target/iscsi/iscsi_transport.h
@@ -12,6 +12,7 @@ struct iscsit_transport {
 	int (*iscsit_setup_np)(struct iscsi_np *, struct __kernel_sockaddr_storage *);
 	int (*iscsit_accept_np)(struct iscsi_np *, struct iscsi_conn *);
 	void (*iscsit_free_np)(struct iscsi_np *);
+	void (*iscsit_wait_conn)(struct iscsi_conn *);
 	void (*iscsit_free_conn)(struct iscsi_conn *);
 	int (*iscsit_get_login_rx)(struct iscsi_conn *, struct iscsi_login *);
 	int (*iscsit_put_login_tx)(struct iscsi_conn *, struct iscsi_login *, u32);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] iser-target: Fix post_send_buf_count for RDMA READ/WRITE
  2014-03-04  0:00 [PATCH 0/6] iser-target: Fix active I/O shutdown related issues Nicholas A. Bellinger
                   ` (2 preceding siblings ...)
  2014-03-04  0:01 ` [PATCH 3/6] iscsi/iser-target: Fix isert_conn->state hung shutdown issues Nicholas A. Bellinger
@ 2014-03-04  0:01 ` Nicholas A. Bellinger
  2014-03-04  7:49   ` Or Gerlitz
  2014-03-04  0:01 ` [PATCH 5/6] iser-target: Ignore completions for FRWRs in isert_cq_tx_work Nicholas A. Bellinger
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Nicholas A. Bellinger @ 2014-03-04  0:01 UTC (permalink / raw
  To: target-devel
  Cc: linux-rdma, linux-scsi, Or Gerlitz, Sagi Grimberg,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

This patch fixes the incorrect setting of ->post_send_buf_count
related to RDMA WRITEs + READs where isert_rdma_rw->send_wr_num
was not being taken into account.

This includes incrementing ->post_send_buf_count within
isert_put_datain() + isert_get_dataout(), decrementing within
__isert_send_completion() + isert_response_completion(), and
clearing wr->send_wr_num within isert_completion_rdma_read()

This is necessary because even though IB_SEND_SIGNALED is
not set for RDMA WRITEs + READs, during a QP failure event
the work requests will be returned with exception status
from the TX completion queue.

Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/infiniband/ulp/isert/ib_isert.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 8283f5a..c9d488f 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -1536,6 +1536,7 @@ isert_completion_rdma_read(struct iser_tx_desc *tx_desc,
 	iscsit_stop_dataout_timer(cmd);
 	device->unreg_rdma_mem(isert_cmd, isert_conn);
 	cmd->write_data_done = wr->cur_rdma_length;
+	wr->send_wr_num = 0;
 
 	pr_debug("Cmd: %p RDMA_READ comp calling execute_cmd\n", isert_cmd);
 	spin_lock_bh(&cmd->istate_lock);
@@ -1600,6 +1601,7 @@ isert_response_completion(struct iser_tx_desc *tx_desc,
 			  struct ib_device *ib_dev)
 {
 	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
+	struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;
 
 	if (cmd->i_state == ISTATE_SEND_TASKMGTRSP ||
 	    cmd->i_state == ISTATE_SEND_LOGOUTRSP ||
@@ -1611,7 +1613,7 @@ isert_response_completion(struct iser_tx_desc *tx_desc,
 		queue_work(isert_comp_wq, &isert_cmd->comp_work);
 		return;
 	}
-	atomic_dec(&isert_conn->post_send_buf_count);
+	atomic_sub(wr->send_wr_num + 1, &isert_conn->post_send_buf_count);
 
 	cmd->i_state = ISTATE_SENT_STATUS;
 	isert_completion_put(tx_desc, isert_cmd, ib_dev);
@@ -1649,7 +1651,7 @@ __isert_send_completion(struct iser_tx_desc *tx_desc,
 	case ISER_IB_RDMA_READ:
 		pr_debug("isert_send_completion: Got ISER_IB_RDMA_READ:\n");
 
-		atomic_dec(&isert_conn->post_send_buf_count);
+		atomic_sub(wr->send_wr_num, &isert_conn->post_send_buf_count);
 		isert_completion_rdma_read(tx_desc, isert_cmd);
 		break;
 	default:
@@ -2375,12 +2377,12 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 	isert_init_send_wr(isert_conn, isert_cmd,
 			   &isert_cmd->tx_desc.send_wr, true);
 
-	atomic_inc(&isert_conn->post_send_buf_count);
+	atomic_add(wr->send_wr_num + 1, &isert_conn->post_send_buf_count);
 
 	rc = ib_post_send(isert_conn->conn_qp, wr->send_wr, &wr_failed);
 	if (rc) {
 		pr_warn("ib_post_send() failed for IB_WR_RDMA_WRITE\n");
-		atomic_dec(&isert_conn->post_send_buf_count);
+		atomic_sub(wr->send_wr_num + 1, &isert_conn->post_send_buf_count);
 	}
 	pr_debug("Cmd: %p posted RDMA_WRITE + Response for iSER Data READ\n",
 		 isert_cmd);
@@ -2408,12 +2410,12 @@ isert_get_dataout(struct iscsi_conn *conn, struct iscsi_cmd *cmd, bool recovery)
 		return rc;
 	}
 
-	atomic_inc(&isert_conn->post_send_buf_count);
+	atomic_add(wr->send_wr_num, &isert_conn->post_send_buf_count);
 
 	rc = ib_post_send(isert_conn->conn_qp, wr->send_wr, &wr_failed);
 	if (rc) {
 		pr_warn("ib_post_send() failed for IB_WR_RDMA_READ\n");
-		atomic_dec(&isert_conn->post_send_buf_count);
+		atomic_sub(wr->send_wr_num, &isert_conn->post_send_buf_count);
 	}
 	pr_debug("Cmd: %p posted RDMA_READ memory for ISER Data WRITE\n",
 		 isert_cmd);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/6] iser-target: Ignore completions for FRWRs in isert_cq_tx_work
  2014-03-04  0:00 [PATCH 0/6] iser-target: Fix active I/O shutdown related issues Nicholas A. Bellinger
                   ` (3 preceding siblings ...)
  2014-03-04  0:01 ` [PATCH 4/6] iser-target: Fix post_send_buf_count for RDMA READ/WRITE Nicholas A. Bellinger
@ 2014-03-04  0:01 ` Nicholas A. Bellinger
  2014-03-04 14:51   ` Sagi Grimberg
  2014-03-04  0:01 ` [PATCH 6/6] iser-target: Fix command leak for tx_desc->comp_llnode_batch Nicholas A. Bellinger
  2014-03-04 15:17 ` [PATCH 0/6] iser-target: Fix active I/O shutdown related issues Sagi Grimberg
  6 siblings, 1 reply; 18+ messages in thread
From: Nicholas A. Bellinger @ 2014-03-04  0:01 UTC (permalink / raw
  To: target-devel
  Cc: linux-rdma, linux-scsi, Or Gerlitz, Sagi Grimberg,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

This patch changes IB_WR_FAST_REG_MR + IB_WR_LOCAL_INV related
work requests to include a ISER_FRWR_LI_WRID value in order to
signal isert_cq_tx_work() that these requests should be ignored.

This is necessary because even though IB_SEND_SIGNALED is not
set for either work request, during a QP failure event the work
requests will be returned with exception status from the TX
completion queue.

Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/infiniband/ulp/isert/ib_isert.c |    8 ++++++--
 drivers/infiniband/ulp/isert/ib_isert.h |    1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index c9d488f..003b5d0 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -1738,8 +1738,10 @@ isert_cq_tx_work(struct work_struct *work)
 			pr_debug("TX wc.status: 0x%08x\n", wc.status);
 			pr_debug("TX wc.vendor_err: 0x%08x\n", wc.vendor_err);
 
-			atomic_dec(&isert_conn->post_send_buf_count);
-			isert_cq_tx_comp_err(tx_desc, isert_conn);
+			if (wc.wr_id != ISER_FRWR_LI_WRID) {
+				atomic_dec(&isert_conn->post_send_buf_count);
+				isert_cq_tx_comp_err(tx_desc, isert_conn);
+			}
 		}
 	}
 
@@ -2202,6 +2204,7 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc,
 
 	if (!fr_desc->valid) {
 		memset(&inv_wr, 0, sizeof(inv_wr));
+		inv_wr.wr_id = ISER_FRWR_LI_WRID;
 		inv_wr.opcode = IB_WR_LOCAL_INV;
 		inv_wr.ex.invalidate_rkey = fr_desc->data_mr->rkey;
 		wr = &inv_wr;
@@ -2212,6 +2215,7 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc,
 
 	/* Prepare FASTREG WR */
 	memset(&fr_wr, 0, sizeof(fr_wr));
+	fr_wr.wr_id = ISER_FRWR_LI_WRID;
 	fr_wr.opcode = IB_WR_FAST_REG_MR;
 	fr_wr.wr.fast_reg.iova_start =
 		fr_desc->data_frpl->page_list[0] + page_off;
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 41e799f..599b4e2 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -6,6 +6,7 @@
 
 #define ISERT_RDMA_LISTEN_BACKLOG	10
 #define ISCSI_ISER_SG_TABLESIZE		256
+#define ISER_FRWR_LI_WRID		0xffffffffffffffffULL
 
 enum isert_desc_type {
 	ISCSI_TX_CONTROL,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/6] iser-target: Fix command leak for tx_desc->comp_llnode_batch
  2014-03-04  0:00 [PATCH 0/6] iser-target: Fix active I/O shutdown related issues Nicholas A. Bellinger
                   ` (4 preceding siblings ...)
  2014-03-04  0:01 ` [PATCH 5/6] iser-target: Ignore completions for FRWRs in isert_cq_tx_work Nicholas A. Bellinger
@ 2014-03-04  0:01 ` Nicholas A. Bellinger
  2014-03-04 15:17 ` [PATCH 0/6] iser-target: Fix active I/O shutdown related issues Sagi Grimberg
  6 siblings, 0 replies; 18+ messages in thread
From: Nicholas A. Bellinger @ 2014-03-04  0:01 UTC (permalink / raw
  To: target-devel
  Cc: linux-rdma, linux-scsi, Or Gerlitz, Sagi Grimberg,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

This patch addresses a number of active I/O shutdown issues
related to isert_cmd descriptors being leaked that are part
of a completion interrupt coalescing batch.

This includes adding logic in isert_cq_tx_comp_err() to
drain any associated tx_desc->comp_llnode_batch, as well
as isert_cq_drain_comp_llist() to drain any associated
isert_conn->conn_comp_llist.

Also, set tx_desc->llnode_active in isert_init_send_wr()
in order to determine when work requests need to be skipped
in isert_cq_tx_work() exception path code.

Finally, update isert_init_send_wr() to only allow interrupt
coalescing when ISER_CONN_UP.

Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/infiniband/ulp/isert/ib_isert.c |   50 +++++++++++++++++++++++++++----
 drivers/infiniband/ulp/isert/ib_isert.h |    2 +-
 2 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 003b5d0..fdd3042 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -484,7 +484,6 @@ isert_connect_request(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
 	kref_init(&isert_conn->conn_kref);
 	kref_get(&isert_conn->conn_kref);
 	mutex_init(&isert_conn->conn_mutex);
-	mutex_init(&isert_conn->conn_comp_mutex);
 	spin_lock_init(&isert_conn->conn_lock);
 
 	cma_id->context = isert_conn;
@@ -875,16 +874,17 @@ isert_init_send_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
 	 * Coalesce send completion interrupts by only setting IB_SEND_SIGNALED
 	 * bit for every ISERT_COMP_BATCH_COUNT number of ib_post_send() calls.
 	 */
-	mutex_lock(&isert_conn->conn_comp_mutex);
-	if (coalesce &&
+	mutex_lock(&isert_conn->conn_mutex);
+	if (coalesce && isert_conn->state == ISER_CONN_UP &&
 	    ++isert_conn->conn_comp_batch < ISERT_COMP_BATCH_COUNT) {
+		tx_desc->llnode_active = true;
 		llist_add(&tx_desc->comp_llnode, &isert_conn->conn_comp_llist);
-		mutex_unlock(&isert_conn->conn_comp_mutex);
+		mutex_unlock(&isert_conn->conn_mutex);
 		return;
 	}
 	isert_conn->conn_comp_batch = 0;
 	tx_desc->comp_llnode_batch = llist_del_all(&isert_conn->conn_comp_llist);
-	mutex_unlock(&isert_conn->conn_comp_mutex);
+	mutex_unlock(&isert_conn->conn_mutex);
 
 	send_wr->send_flags = IB_SEND_SIGNALED;
 }
@@ -1680,10 +1680,45 @@ isert_send_completion(struct iser_tx_desc *tx_desc,
 }
 
 static void
+isert_cq_drain_comp_llist(struct isert_conn *isert_conn, struct ib_device *ib_dev)
+{
+	struct llist_node *llnode;
+	struct isert_rdma_wr *wr;
+	struct iser_tx_desc *t;
+
+	mutex_lock(&isert_conn->conn_mutex);
+	llnode = llist_del_all(&isert_conn->conn_comp_llist);
+	isert_conn->conn_comp_batch = 0;
+	mutex_unlock(&isert_conn->conn_mutex);
+
+	while (llnode) {
+		t = llist_entry(llnode, struct iser_tx_desc, comp_llnode);
+		llnode = llist_next(llnode);
+		wr = &t->isert_cmd->rdma_wr;
+
+		atomic_sub(wr->send_wr_num + 1, &isert_conn->post_send_buf_count);
+		isert_completion_put(t, t->isert_cmd, ib_dev);
+	}
+}
+
+static void
 isert_cq_tx_comp_err(struct iser_tx_desc *tx_desc, struct isert_conn *isert_conn)
 {
 	struct ib_device *ib_dev = isert_conn->conn_cm_id->device;
 	struct isert_cmd *isert_cmd = tx_desc->isert_cmd;
+	struct llist_node *llnode = tx_desc->comp_llnode_batch;
+	struct isert_rdma_wr *wr;
+	struct iser_tx_desc *t;
+
+	while (llnode) {
+		t = llist_entry(llnode, struct iser_tx_desc, comp_llnode);
+		llnode = llist_next(llnode);
+		wr = &t->isert_cmd->rdma_wr;
+
+		atomic_sub(wr->send_wr_num + 1, &isert_conn->post_send_buf_count);
+		isert_completion_put(t, t->isert_cmd, ib_dev);
+	}
+	tx_desc->comp_llnode_batch = NULL;
 
 	if (!isert_cmd)
 		isert_unmap_tx_desc(tx_desc, ib_dev);
@@ -1700,6 +1735,8 @@ isert_cq_rx_comp_err(struct isert_conn *isert_conn)
 	if (isert_conn->post_recv_buf_count)
 		return;
 
+	isert_cq_drain_comp_llist(isert_conn, ib_dev);
+
 	if (conn->sess) {
 		target_sess_cmd_list_set_waiting(conn->sess->se_sess);
 		target_wait_for_sess_cmds(conn->sess->se_sess);
@@ -1739,6 +1776,9 @@ isert_cq_tx_work(struct work_struct *work)
 			pr_debug("TX wc.vendor_err: 0x%08x\n", wc.vendor_err);
 
 			if (wc.wr_id != ISER_FRWR_LI_WRID) {
+				if (tx_desc->llnode_active)
+					continue;
+
 				atomic_dec(&isert_conn->post_send_buf_count);
 				isert_cq_tx_comp_err(tx_desc, isert_conn);
 			}
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 599b4e2..6059ae1 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -46,6 +46,7 @@ struct iser_tx_desc {
 	struct isert_cmd *isert_cmd;
 	struct llist_node *comp_llnode_batch;
 	struct llist_node comp_llnode;
+	bool		llnode_active;
 	struct ib_send_wr send_wr;
 } __packed;
 
@@ -127,7 +128,6 @@ struct isert_conn {
 #define ISERT_COMP_BATCH_COUNT	8
 	int			conn_comp_batch;
 	struct llist_head	conn_comp_llist;
-	struct mutex		conn_comp_mutex;
 };
 
 #define ISERT_MAX_CQ 64
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/6] iser-target: Fix post_send_buf_count for RDMA READ/WRITE
  2014-03-04  0:01 ` [PATCH 4/6] iser-target: Fix post_send_buf_count for RDMA READ/WRITE Nicholas A. Bellinger
@ 2014-03-04  7:49   ` Or Gerlitz
  2014-03-04  9:21     ` Sagi Grimberg
  0 siblings, 1 reply; 18+ messages in thread
From: Or Gerlitz @ 2014-03-04  7:49 UTC (permalink / raw
  To: Nicholas A. Bellinger, target-devel
  Cc: linux-rdma, linux-scsi, Sagi Grimberg, Nicholas Bellinger

On 04/03/2014 02:01, Nicholas A. Bellinger wrote:
> This is necessary because even though IB_SEND_SIGNALED is
> not set for RDMA WRITEs + READs, during a QP failure event
> the work requests will be returned with exception status
> from the TX completion queue.

Impossible... for rdma reads we must ask for completing, since we should 
write the data
for the back-end, I assume it's just wrong mentioning of rdma-read here, 
right?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/6] iser-target: Fix post_send_buf_count for RDMA READ/WRITE
  2014-03-04  7:49   ` Or Gerlitz
@ 2014-03-04  9:21     ` Sagi Grimberg
  0 siblings, 0 replies; 18+ messages in thread
From: Sagi Grimberg @ 2014-03-04  9:21 UTC (permalink / raw
  To: Or Gerlitz, Nicholas A. Bellinger, target-devel
  Cc: linux-rdma, linux-scsi, Sagi Grimberg, Nicholas Bellinger

On 3/4/2014 9:49 AM, Or Gerlitz wrote:
> On 04/03/2014 02:01, Nicholas A. Bellinger wrote:
>> This is necessary because even though IB_SEND_SIGNALED is
>> not set for RDMA WRITEs + READs, during a QP failure event
>> the work requests will be returned with exception status
>> from the TX completion queue.
>
> Impossible... for rdma reads we must ask for completing, since we 
> should write the data
> for the back-end, I assume it's just wrong mentioning of rdma-read 
> here, right?
>

No, In case of multiple RDMA READs (for non-fastreg case) isert asks for 
completion only on the last one.

This is fine.

Sagi.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] iser-target: Ignore completions for FRWRs in isert_cq_tx_work
  2014-03-04  0:01 ` [PATCH 5/6] iser-target: Ignore completions for FRWRs in isert_cq_tx_work Nicholas A. Bellinger
@ 2014-03-04 14:51   ` Sagi Grimberg
       [not found]     ` <5315E85F.2090904-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Sagi Grimberg @ 2014-03-04 14:51 UTC (permalink / raw
  To: Nicholas A. Bellinger, target-devel
  Cc: linux-rdma, linux-scsi, Or Gerlitz, Sagi Grimberg,
	Nicholas Bellinger

On 3/4/2014 2:01 AM, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <nab@linux-iscsi.org>
>
> This patch changes IB_WR_FAST_REG_MR + IB_WR_LOCAL_INV related
> work requests to include a ISER_FRWR_LI_WRID value in order to
> signal isert_cq_tx_work() that these requests should be ignored.
>
> This is necessary because even though IB_SEND_SIGNALED is not
> set for either work request, during a QP failure event the work
> requests will be returned with exception status from the TX
> completion queue.
>
> Cc: Sagi Grimberg <sagig@mellanox.com>
> Cc: Or Gerlitz <ogerlitz@mellanox.com>
> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> ---
>   drivers/infiniband/ulp/isert/ib_isert.c |    8 ++++++--
>   drivers/infiniband/ulp/isert/ib_isert.h |    1 +
>   2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
> index c9d488f..003b5d0 100644
> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> @@ -1738,8 +1738,10 @@ isert_cq_tx_work(struct work_struct *work)
>   			pr_debug("TX wc.status: 0x%08x\n", wc.status);
>   			pr_debug("TX wc.vendor_err: 0x%08x\n", wc.vendor_err);
>   
> -			atomic_dec(&isert_conn->post_send_buf_count);
> -			isert_cq_tx_comp_err(tx_desc, isert_conn);
> +			if (wc.wr_id != ISER_FRWR_LI_WRID) {

Better to use ISER_FASTREG_LI_WRID - I changed it in the initiator.

> +				atomic_dec(&isert_conn->post_send_buf_count);
> +				isert_cq_tx_comp_err(tx_desc, isert_conn);
> +			}
>   		}
>   	}
>   
> @@ -2202,6 +2204,7 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc,
>   
>   	if (!fr_desc->valid) {
>   		memset(&inv_wr, 0, sizeof(inv_wr));
> +		inv_wr.wr_id = ISER_FRWR_LI_WRID;
>   		inv_wr.opcode = IB_WR_LOCAL_INV;
>   		inv_wr.ex.invalidate_rkey = fr_desc->data_mr->rkey;
>   		wr = &inv_wr;
> @@ -2212,6 +2215,7 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc,
>   
>   	/* Prepare FASTREG WR */
>   	memset(&fr_wr, 0, sizeof(fr_wr));
> +	fr_wr.wr_id = ISER_FRWR_LI_WRID;
>   	fr_wr.opcode = IB_WR_FAST_REG_MR;
>   	fr_wr.wr.fast_reg.iova_start =
>   		fr_desc->data_frpl->page_list[0] + page_off;
> diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
> index 41e799f..599b4e2 100644
> --- a/drivers/infiniband/ulp/isert/ib_isert.h
> +++ b/drivers/infiniband/ulp/isert/ib_isert.h
> @@ -6,6 +6,7 @@
>   
>   #define ISERT_RDMA_LISTEN_BACKLOG	10
>   #define ISCSI_ISER_SG_TABLESIZE		256
> +#define ISER_FRWR_LI_WRID		0xffffffffffffffffULL
>   
>   enum isert_desc_type {
>   	ISCSI_TX_CONTROL,


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] iser-target: Fix active I/O shutdown related issues
  2014-03-04  0:00 [PATCH 0/6] iser-target: Fix active I/O shutdown related issues Nicholas A. Bellinger
                   ` (5 preceding siblings ...)
  2014-03-04  0:01 ` [PATCH 6/6] iser-target: Fix command leak for tx_desc->comp_llnode_batch Nicholas A. Bellinger
@ 2014-03-04 15:17 ` Sagi Grimberg
       [not found]   ` <5315EE7C.3030806-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  6 siblings, 1 reply; 18+ messages in thread
From: Sagi Grimberg @ 2014-03-04 15:17 UTC (permalink / raw
  To: Nicholas A. Bellinger, target-devel
  Cc: linux-rdma, linux-scsi, Or Gerlitz, Sagi Grimberg,
	Nicholas Bellinger

On 3/4/2014 2:00 AM, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <nab@linux-iscsi.org>
>
> Hi Or & Sagi,
>
> This series addresses a number of active I/O shutdown related issues
> in iser-target code that have come up recently during stress testing.
>
> Note there is still a seperate iser-target network portal shutdown
> bug being tracked down, but this series addresses all existing issues
> related to active I/O session shutdown.
>
> The patch breakdown looks like:
>
> Patch #1 fixes a long-standing bug where TPGs in shutdown incorrectly
> could be referenced by new login attempts.
>
> Patch #2 converts list_del -> list_del_init for iscsi_cmd->i_conn_node
> so that list_empty works correctly.
>
> Patch #3 addresses isert_conn->state related bugs resulting in hung
> shutdown, and splits isert_free_conn() into seperate code that is
> called earlier during shutdown to ensure that all outstanding I/O
> has completed.
>
> Patch #4 fixes incorrect accounting of ->post_send_buf_count during
> active I/O shutdown with outstanding RDMA WRITE + RDMA READ work
> requests.
>
> Patch #5 addresses a bug related to active I/O shutdown with
> outstanding FRMR work requests.  Note this patch is specific to
> v3.12+ code.
>
> Patch #6 addresses bugs related to active I/O shutdown with
> outstanding completion interrupt coalescing batches. Note this patch
> is specific to v3.13+ code.
>
> Please review.

Hey Nic,

So besides a minor comment, you have my Ack on this set.

More on cleanup flow. isert_cma_handler does not handle 
RDMA_CM_EVENT_TIMEWAIT_EXIT.
To be more specific, according to IB spec, when initiating disconnect 
(rdma_disconnect/ib_send_cm_dreq),
one should not destroy a used qp until getting TIMEWAIT_EXIT CM event. 
We are working on this in iSER initiator.
It might lead to "stale connection" CM rejects on future connections 
(SRP also does not do that).

Sagi.

> --nab
>
> Nicholas Bellinger (6):
>    iscsi-target: Fix iscsit_get_tpg_from_np tpg_state bug
>    iscsi/iser-target: Use list_del_init for ->i_conn_node
>    iscsi/iser-target: Fix isert_conn->state hung shutdown issues
>    iser-target: Fix post_send_buf_count for RDMA READ/WRITE
>    iser-target: Ignore completions for FRWRs in isert_cq_tx_work
>    iser-target: Fix command leak for tx_desc->comp_llnode_batch
>
>   drivers/infiniband/ulp/isert/ib_isert.c  |  180 ++++++++++++++++++------------
>   drivers/infiniband/ulp/isert/ib_isert.h  |    7 +-
>   drivers/target/iscsi/iscsi_target.c      |   10 +-
>   drivers/target/iscsi/iscsi_target_erl2.c |   16 +--
>   drivers/target/iscsi/iscsi_target_tpg.c  |    2 +-
>   include/target/iscsi/iscsi_transport.h   |    1 +
>   6 files changed, 129 insertions(+), 87 deletions(-)
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] iser-target: Ignore completions for FRWRs in isert_cq_tx_work
       [not found]     ` <5315E85F.2090904-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2014-03-04 23:56       ` Nicholas A. Bellinger
  0 siblings, 0 replies; 18+ messages in thread
From: Nicholas A. Bellinger @ 2014-03-04 23:56 UTC (permalink / raw
  To: Sagi Grimberg
  Cc: Nicholas A. Bellinger, target-devel, linux-rdma, linux-scsi,
	Or Gerlitz, Sagi Grimberg

On Tue, 2014-03-04 at 16:51 +0200, Sagi Grimberg wrote:
> On 3/4/2014 2:01 AM, Nicholas A. Bellinger wrote:
> > From: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
> >
> > This patch changes IB_WR_FAST_REG_MR + IB_WR_LOCAL_INV related
> > work requests to include a ISER_FRWR_LI_WRID value in order to
> > signal isert_cq_tx_work() that these requests should be ignored.
> >
> > This is necessary because even though IB_SEND_SIGNALED is not
> > set for either work request, during a QP failure event the work
> > requests will be returned with exception status from the TX
> > completion queue.
> >
> > Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Cc: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Signed-off-by: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
> > ---
> >   drivers/infiniband/ulp/isert/ib_isert.c |    8 ++++++--
> >   drivers/infiniband/ulp/isert/ib_isert.h |    1 +
> >   2 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
> > index c9d488f..003b5d0 100644
> > --- a/drivers/infiniband/ulp/isert/ib_isert.c
> > +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> > @@ -1738,8 +1738,10 @@ isert_cq_tx_work(struct work_struct *work)
> >   			pr_debug("TX wc.status: 0x%08x\n", wc.status);
> >   			pr_debug("TX wc.vendor_err: 0x%08x\n", wc.vendor_err);
> >   
> > -			atomic_dec(&isert_conn->post_send_buf_count);
> > -			isert_cq_tx_comp_err(tx_desc, isert_conn);
> > +			if (wc.wr_id != ISER_FRWR_LI_WRID) {
> 
> Better to use ISER_FASTREG_LI_WRID - I changed it in the initiator.

<nod>, updating that bit now..

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] iser-target: Fix active I/O shutdown related issues
       [not found]   ` <5315EE7C.3030806-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2014-03-05  0:06     ` Nicholas A. Bellinger
  2014-03-05 12:12       ` Sagi Grimberg
  0 siblings, 1 reply; 18+ messages in thread
From: Nicholas A. Bellinger @ 2014-03-05  0:06 UTC (permalink / raw
  To: Sagi Grimberg
  Cc: Nicholas A. Bellinger, target-devel, linux-rdma, linux-scsi,
	Or Gerlitz, Sagi Grimberg

On Tue, 2014-03-04 at 17:17 +0200, Sagi Grimberg wrote:
> On 3/4/2014 2:00 AM, Nicholas A. Bellinger wrote:
> > From: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
> >
> > Hi Or & Sagi,
> >
> > This series addresses a number of active I/O shutdown related issues
> > in iser-target code that have come up recently during stress testing.
> >
> > Note there is still a seperate iser-target network portal shutdown
> > bug being tracked down, but this series addresses all existing issues
> > related to active I/O session shutdown.
> >
> > The patch breakdown looks like:
> >
> > Patch #1 fixes a long-standing bug where TPGs in shutdown incorrectly
> > could be referenced by new login attempts.
> >
> > Patch #2 converts list_del -> list_del_init for iscsi_cmd->i_conn_node
> > so that list_empty works correctly.
> >
> > Patch #3 addresses isert_conn->state related bugs resulting in hung
> > shutdown, and splits isert_free_conn() into seperate code that is
> > called earlier during shutdown to ensure that all outstanding I/O
> > has completed.
> >
> > Patch #4 fixes incorrect accounting of ->post_send_buf_count during
> > active I/O shutdown with outstanding RDMA WRITE + RDMA READ work
> > requests.
> >
> > Patch #5 addresses a bug related to active I/O shutdown with
> > outstanding FRMR work requests.  Note this patch is specific to
> > v3.12+ code.
> >
> > Patch #6 addresses bugs related to active I/O shutdown with
> > outstanding completion interrupt coalescing batches. Note this patch
> > is specific to v3.13+ code.
> >
> > Please review.
> 
> Hey Nic,
> 
> So besides a minor comment, you have my Ack on this set.
> 

Thanks!

> More on cleanup flow. isert_cma_handler does not handle 
> RDMA_CM_EVENT_TIMEWAIT_EXIT.
> To be more specific, according to IB spec, when initiating disconnect 
> (rdma_disconnect/ib_send_cm_dreq),
> one should not destroy a used qp until getting TIMEWAIT_EXIT CM event. 
> We are working on this in iSER initiator.
> It might lead to "stale connection" CM rejects on future connections 
> (SRP also does not do that).
> 

<nod>, I noticed that as well during recent debugging.

However, AFAICT the RDMA_CM_EVENT_TIMEWAIT_EVENT doesn't (always) occur
on the target side after a RDMA_CM_EVENT_DISCONNECTED, and thus far I've
not been able to ascertain what's different about the shutdown sequence
that would make this happen, or not happen..

Any ideas..?

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] iser-target: Fix active I/O shutdown related issues
  2014-03-05  0:06     ` Nicholas A. Bellinger
@ 2014-03-05 12:12       ` Sagi Grimberg
  2014-03-05 22:04         ` Nicholas A. Bellinger
  0 siblings, 1 reply; 18+ messages in thread
From: Sagi Grimberg @ 2014-03-05 12:12 UTC (permalink / raw
  To: Nicholas A. Bellinger
  Cc: Nicholas A. Bellinger, target-devel, linux-rdma, linux-scsi,
	Or Gerlitz, Sagi Grimberg

On 3/5/2014 2:06 AM, Nicholas A. Bellinger wrote:
> On Tue, 2014-03-04 at 17:17 +0200, Sagi Grimberg wrote:
>> On 3/4/2014 2:00 AM, Nicholas A. Bellinger wrote:
>>> From: Nicholas Bellinger <nab@linux-iscsi.org>
>>>
>>> Hi Or & Sagi,
>>>
>>> This series addresses a number of active I/O shutdown related issues
>>> in iser-target code that have come up recently during stress testing.
>>>
>>> Note there is still a seperate iser-target network portal shutdown
>>> bug being tracked down, but this series addresses all existing issues
>>> related to active I/O session shutdown.
>>>
>>> The patch breakdown looks like:
>>>
>>> Patch #1 fixes a long-standing bug where TPGs in shutdown incorrectly
>>> could be referenced by new login attempts.
>>>
>>> Patch #2 converts list_del -> list_del_init for iscsi_cmd->i_conn_node
>>> so that list_empty works correctly.
>>>
>>> Patch #3 addresses isert_conn->state related bugs resulting in hung
>>> shutdown, and splits isert_free_conn() into seperate code that is
>>> called earlier during shutdown to ensure that all outstanding I/O
>>> has completed.
>>>
>>> Patch #4 fixes incorrect accounting of ->post_send_buf_count during
>>> active I/O shutdown with outstanding RDMA WRITE + RDMA READ work
>>> requests.
>>>
>>> Patch #5 addresses a bug related to active I/O shutdown with
>>> outstanding FRMR work requests.  Note this patch is specific to
>>> v3.12+ code.
>>>
>>> Patch #6 addresses bugs related to active I/O shutdown with
>>> outstanding completion interrupt coalescing batches. Note this patch
>>> is specific to v3.13+ code.
>>>
>>> Please review.
>> Hey Nic,
>>
>> So besides a minor comment, you have my Ack on this set.
>>
> Thanks!
>
>> More on cleanup flow. isert_cma_handler does not handle
>> RDMA_CM_EVENT_TIMEWAIT_EXIT.
>> To be more specific, according to IB spec, when initiating disconnect
>> (rdma_disconnect/ib_send_cm_dreq),
>> one should not destroy a used qp until getting TIMEWAIT_EXIT CM event.
>> We are working on this in iSER initiator.
>> It might lead to "stale connection" CM rejects on future connections
>> (SRP also does not do that).
>>
> <nod>, I noticed that as well during recent debugging.
>
> However, AFAICT the RDMA_CM_EVENT_TIMEWAIT_EVENT doesn't (always) occur
> on the target side after a RDMA_CM_EVENT_DISCONNECTED, and thus far I've
> not been able to ascertain what's different about the shutdown sequence
> that would make this happen, or not happen..
>
> Any ideas..?

That's probably because the cm_id is destroyed before you get the event. 
There is a specific
timout computation to get this event (see IB spec). If you will attempt 
to disconnect while
the link is down (initiator won't receive it and send you disconnect 
back), you should be able
to see this event. As I understand, in order to comply the spec, the QP 
(and the cm_id afterwards)
should be destroyed only when getting this event and not before.

Sagi.

> --nab
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] iser-target: Fix active I/O shutdown related issues
  2014-03-05 12:12       ` Sagi Grimberg
@ 2014-03-05 22:04         ` Nicholas A. Bellinger
       [not found]           ` <1394057083.20601.51.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Nicholas A. Bellinger @ 2014-03-05 22:04 UTC (permalink / raw
  To: Sagi Grimberg
  Cc: Nicholas A. Bellinger, target-devel, linux-rdma, linux-scsi,
	Or Gerlitz, Sagi Grimberg

On Wed, 2014-03-05 at 14:12 +0200, Sagi Grimberg wrote:
> On 3/5/2014 2:06 AM, Nicholas A. Bellinger wrote:
> > On Tue, 2014-03-04 at 17:17 +0200, Sagi Grimberg wrote:
> >> On 3/4/2014 2:00 AM, Nicholas A. Bellinger wrote:
> >>> From: Nicholas Bellinger <nab@linux-iscsi.org>
> >>>

<SNIP>

> >> More on cleanup flow. isert_cma_handler does not handle
> >> RDMA_CM_EVENT_TIMEWAIT_EXIT.
> >> To be more specific, according to IB spec, when initiating disconnect
> >> (rdma_disconnect/ib_send_cm_dreq),
> >> one should not destroy a used qp until getting TIMEWAIT_EXIT CM event.
> >> We are working on this in iSER initiator.
> >> It might lead to "stale connection" CM rejects on future connections
> >> (SRP also does not do that).
> >>
> > <nod>, I noticed that as well during recent debugging.
> >
> > However, AFAICT the RDMA_CM_EVENT_TIMEWAIT_EVENT doesn't (always) occur
> > on the target side after a RDMA_CM_EVENT_DISCONNECTED, and thus far I've
> > not been able to ascertain what's different about the shutdown sequence
> > that would make this happen, or not happen..
> >
> > Any ideas..?
> 
> That's probably because the cm_id is destroyed before you get the event. 
> There is a specific
> timout computation to get this event (see IB spec). If you will attempt 
> to disconnect while
> the link is down (initiator won't receive it and send you disconnect 
> back), you should be able
> to see this event. As I understand, in order to comply the spec, the QP 
> (and the cm_id afterwards)
> should be destroyed only when getting this event and not before.
> 

<nod>, thanks for the additional background.

So currently rdma_destroy_qp() + rdma_destroy_id() is being done via
isert_connect_release(), which occurs after the final isert_put_conn()
happens from either the RDMA_CM_EVENT_DISCONNECTED handler, or within
isert_free_conn() in one of the per connection kernel thread contexts
via iscsit_close_connection().

If I understand the above correctly, the isert_put_conn() should move
from the RDMA_CM_EVENT_DISCONNECTED handler into the TIMEWAIT_EVENT
handler, yes..?

And it's safe to assume that DISCONNECTED will always occur before
TIMEWAIT_EVENT, right..?

--nab

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] iser-target: Fix active I/O shutdown related issues
       [not found]           ` <1394057083.20601.51.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
@ 2014-03-06 14:05             ` sagi grimberg
  2014-03-10 22:00               ` Nicholas A. Bellinger
  0 siblings, 1 reply; 18+ messages in thread
From: sagi grimberg @ 2014-03-06 14:05 UTC (permalink / raw
  To: Nicholas A. Bellinger, Sagi Grimberg
  Cc: Nicholas A. Bellinger, target-devel, linux-rdma, linux-scsi,
	Or Gerlitz

On 3/6/2014 12:04 AM, Nicholas A. Bellinger wrote:
> On Wed, 2014-03-05 at 14:12 +0200, Sagi Grimberg wrote:
>> On 3/5/2014 2:06 AM, Nicholas A. Bellinger wrote:
>>> On Tue, 2014-03-04 at 17:17 +0200, Sagi Grimberg wrote:
>>>> On 3/4/2014 2:00 AM, Nicholas A. Bellinger wrote:
>>>>> From: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
>>>>>
> <SNIP>
>
>>>> More on cleanup flow. isert_cma_handler does not handle
>>>> RDMA_CM_EVENT_TIMEWAIT_EXIT.
>>>> To be more specific, according to IB spec, when initiating disconnect
>>>> (rdma_disconnect/ib_send_cm_dreq),
>>>> one should not destroy a used qp until getting TIMEWAIT_EXIT CM event.
>>>> We are working on this in iSER initiator.
>>>> It might lead to "stale connection" CM rejects on future connections
>>>> (SRP also does not do that).
>>>>
>>> <nod>, I noticed that as well during recent debugging.
>>>
>>> However, AFAICT the RDMA_CM_EVENT_TIMEWAIT_EVENT doesn't (always) occur
>>> on the target side after a RDMA_CM_EVENT_DISCONNECTED, and thus far I've
>>> not been able to ascertain what's different about the shutdown sequence
>>> that would make this happen, or not happen..
>>>
>>> Any ideas..?
>> That's probably because the cm_id is destroyed before you get the event.
>> There is a specific
>> timout computation to get this event (see IB spec). If you will attempt
>> to disconnect while
>> the link is down (initiator won't receive it and send you disconnect
>> back), you should be able
>> to see this event. As I understand, in order to comply the spec, the QP
>> (and the cm_id afterwards)
>> should be destroyed only when getting this event and not before.
>>
> <nod>, thanks for the additional background.
>
> So currently rdma_destroy_qp() + rdma_destroy_id() is being done via
> isert_connect_release(), which occurs after the final isert_put_conn()
> happens from either the RDMA_CM_EVENT_DISCONNECTED handler, or within
> isert_free_conn() in one of the per connection kernel thread contexts
> via iscsit_close_connection().
>
> If I understand the above correctly, the isert_put_conn() should move
> from the RDMA_CM_EVENT_DISCONNECTED handler into the TIMEWAIT_EVENT
> handler, yes..?

Yes.

> And it's safe to assume that DISCONNECTED will always occur before
> TIMEWAIT_EVENT, right..?

DISCONNECTED event may not even come at all (in case the initiator 
didn't call rdma_disconnect). no guarantees here..
But, if once we get the TIMEWAIT event, we destroy the qp and the 
*cm_id*, we won't get any CM events at all.
As I understand, we don't even need to explicitly destroy the cm_id, we 
can just return a non-zero return from cma_handler
for TIMEWAIT events which will cause rdma_cm to implicitly destroy the 
cm_id.

Hope this helps,

Sagi.

> --nab
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] iser-target: Fix active I/O shutdown related issues
  2014-03-06 14:05             ` sagi grimberg
@ 2014-03-10 22:00               ` Nicholas A. Bellinger
  2014-03-11 10:27                 ` Sagi Grimberg
  0 siblings, 1 reply; 18+ messages in thread
From: Nicholas A. Bellinger @ 2014-03-10 22:00 UTC (permalink / raw
  To: sagi grimberg
  Cc: Sagi Grimberg, Nicholas A. Bellinger, target-devel, linux-rdma,
	linux-scsi, Or Gerlitz

On Thu, 2014-03-06 at 16:05 +0200, sagi grimberg wrote:
> On 3/6/2014 12:04 AM, Nicholas A. Bellinger wrote:
> > On Wed, 2014-03-05 at 14:12 +0200, Sagi Grimberg wrote:
> >> On 3/5/2014 2:06 AM, Nicholas A. Bellinger wrote:
> >>> On Tue, 2014-03-04 at 17:17 +0200, Sagi Grimberg wrote:
> >>>> On 3/4/2014 2:00 AM, Nicholas A. Bellinger wrote:

<SNIP>

> >>> <nod>, I noticed that as well during recent debugging.
> >>>
> >>> However, AFAICT the RDMA_CM_EVENT_TIMEWAIT_EVENT doesn't (always) occur
> >>> on the target side after a RDMA_CM_EVENT_DISCONNECTED, and thus far I've
> >>> not been able to ascertain what's different about the shutdown sequence
> >>> that would make this happen, or not happen..
> >>>
> >>> Any ideas..?
> >> That's probably because the cm_id is destroyed before you get the event.
> >> There is a specific
> >> timout computation to get this event (see IB spec). If you will attempt
> >> to disconnect while
> >> the link is down (initiator won't receive it and send you disconnect
> >> back), you should be able
> >> to see this event. As I understand, in order to comply the spec, the QP
> >> (and the cm_id afterwards)
> >> should be destroyed only when getting this event and not before.
> >>
> > <nod>, thanks for the additional background.
> >
> > So currently rdma_destroy_qp() + rdma_destroy_id() is being done via
> > isert_connect_release(), which occurs after the final isert_put_conn()
> > happens from either the RDMA_CM_EVENT_DISCONNECTED handler, or within
> > isert_free_conn() in one of the per connection kernel thread contexts
> > via iscsit_close_connection().
> >
> > If I understand the above correctly, the isert_put_conn() should move
> > from the RDMA_CM_EVENT_DISCONNECTED handler into the TIMEWAIT_EVENT
> > handler, yes..?
> 
> Yes.
> 
> > And it's safe to assume that DISCONNECTED will always occur before
> > TIMEWAIT_EVENT, right..?
> 
> DISCONNECTED event may not even come at all (in case the initiator 
> didn't call rdma_disconnect). no guarantees here..
> But, if once we get the TIMEWAIT event, we destroy the qp and the 
> *cm_id*, we won't get any CM events at all.
> As I understand, we don't even need to explicitly destroy the cm_id, we 
> can just return a non-zero return from cma_handler
> for TIMEWAIT events which will cause rdma_cm to implicitly destroy the 
> cm_id.
> 

Mmmm, if that's the case then I'm more confused about how reference
counting for isert_conn should work wrt TIMEWAIT_EVENT than before..  ;)

As mentioned earlier, the first isert_put_conn() occurs from the per
connection process context after calling rdma_disconnect(), and the
second from the disconnected event handler.

Your comment above would seem to indicate that iser-target code should
be waiting to receive TIMEWAIT_EVENT, instead of pro-actively calling
rdma_disconnect() to trigger the disconnect.  Is that correct..?

--nab


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] iser-target: Fix active I/O shutdown related issues
  2014-03-10 22:00               ` Nicholas A. Bellinger
@ 2014-03-11 10:27                 ` Sagi Grimberg
  0 siblings, 0 replies; 18+ messages in thread
From: Sagi Grimberg @ 2014-03-11 10:27 UTC (permalink / raw
  To: Nicholas A. Bellinger, sagi grimberg
  Cc: Nicholas A. Bellinger, target-devel, linux-rdma, linux-scsi,
	Or Gerlitz

On 3/11/2014 12:00 AM, Nicholas A. Bellinger wrote:
> On Thu, 2014-03-06 at 16:05 +0200, sagi grimberg wrote:
>> On 3/6/2014 12:04 AM, Nicholas A. Bellinger wrote:
>>> On Wed, 2014-03-05 at 14:12 +0200, Sagi Grimberg wrote:
>>>> On 3/5/2014 2:06 AM, Nicholas A. Bellinger wrote:
>>>>> On Tue, 2014-03-04 at 17:17 +0200, Sagi Grimberg wrote:
>>>>>> On 3/4/2014 2:00 AM, Nicholas A. Bellinger wrote:
> <SNIP>
>
>>>>> <nod>, I noticed that as well during recent debugging.
>>>>>
>>>>> However, AFAICT the RDMA_CM_EVENT_TIMEWAIT_EVENT doesn't (always) occur
>>>>> on the target side after a RDMA_CM_EVENT_DISCONNECTED, and thus far I've
>>>>> not been able to ascertain what's different about the shutdown sequence
>>>>> that would make this happen, or not happen..
>>>>>
>>>>> Any ideas..?
>>>> That's probably because the cm_id is destroyed before you get the event.
>>>> There is a specific
>>>> timout computation to get this event (see IB spec). If you will attempt
>>>> to disconnect while
>>>> the link is down (initiator won't receive it and send you disconnect
>>>> back), you should be able
>>>> to see this event. As I understand, in order to comply the spec, the QP
>>>> (and the cm_id afterwards)
>>>> should be destroyed only when getting this event and not before.
>>>>
>>> <nod>, thanks for the additional background.
>>>
>>> So currently rdma_destroy_qp() + rdma_destroy_id() is being done via
>>> isert_connect_release(), which occurs after the final isert_put_conn()
>>> happens from either the RDMA_CM_EVENT_DISCONNECTED handler, or within
>>> isert_free_conn() in one of the per connection kernel thread contexts
>>> via iscsit_close_connection().
>>>
>>> If I understand the above correctly, the isert_put_conn() should move
>>> from the RDMA_CM_EVENT_DISCONNECTED handler into the TIMEWAIT_EVENT
>>> handler, yes..?
>> Yes.
>>
>>> And it's safe to assume that DISCONNECTED will always occur before
>>> TIMEWAIT_EVENT, right..?
>> DISCONNECTED event may not even come at all (in case the initiator
>> didn't call rdma_disconnect). no guarantees here..
>> But, if once we get the TIMEWAIT event, we destroy the qp and the
>> *cm_id*, we won't get any CM events at all.
>> As I understand, we don't even need to explicitly destroy the cm_id, we
>> can just return a non-zero return from cma_handler
>> for TIMEWAIT events which will cause rdma_cm to implicitly destroy the
>> cm_id.
>>
> Mmmm, if that's the case then I'm more confused about how reference
> counting for isert_conn should work wrt TIMEWAIT_EVENT than before..  ;)

Yes, it is indeed confusing.
The below might be even more confusing... I'll try to simplify.

> As mentioned earlier, the first isert_put_conn() occurs from the per
> connection process context after calling rdma_disconnect(), and the
> second from the disconnected event handler.
>
> Your comment above would seem to indicate that iser-target code should
> be waiting to receive TIMEWAIT_EVENT, instead of pro-actively calling
> rdma_disconnect() to trigger the disconnect.  Is that correct..?

Not instead. iser-target should call rdma_disconnect upon DISCONNECTED 
event but
destroy the qp and cm_id when getting TIMEWAIT.

There is a matter of sending disconnect request/response and a matter of 
when
to destroy the QP.

DISCONNECTED events are coming from 2 conditions:
static int cma_ib_handler()
{
         ...
         case IB_CM_DREQ_RECEIVED:    /* remote side sent a disconnect 
request */
         case IB_CM_DREP_RECEIVED:    /* remote side responded our 
disconnect request */
                 if (!cma_comp_exch(id_priv, RDMA_CM_CONNECT,
                                    RDMA_CM_DISCONNECT))
                         goto out;
                 event.event = RDMA_CM_EVENT_DISCONNECTED;
                 break;
         ...
}

So regardless if iser_target initiated rdma_disconnect - once it gets 
DISCONNECTED event
it should call it *again* to respond the initiator disconnect request 
(rdma_disconnect sends
CM DREQ and if fails calls CM_DREP - this is to cover that both sides 
will send DREQ and DREP).

The removal of the QP and cm_id should come once getting TIMEWAIT event.

Hope this wasn't even more confusing...

Sagi.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-03-11 10:27 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-04  0:00 [PATCH 0/6] iser-target: Fix active I/O shutdown related issues Nicholas A. Bellinger
     [not found] ` <1393891265-22910-1-git-send-email-nab-PEzghdH756F8UrSeD/g0lQ@public.gmane.org>
2014-03-04  0:01   ` [PATCH 1/6] iscsi-target: Fix iscsit_get_tpg_from_np tpg_state bug Nicholas A. Bellinger
2014-03-04  0:01 ` [PATCH 2/6] iscsi/iser-target: Use list_del_init for ->i_conn_node Nicholas A. Bellinger
2014-03-04  0:01 ` [PATCH 3/6] iscsi/iser-target: Fix isert_conn->state hung shutdown issues Nicholas A. Bellinger
2014-03-04  0:01 ` [PATCH 4/6] iser-target: Fix post_send_buf_count for RDMA READ/WRITE Nicholas A. Bellinger
2014-03-04  7:49   ` Or Gerlitz
2014-03-04  9:21     ` Sagi Grimberg
2014-03-04  0:01 ` [PATCH 5/6] iser-target: Ignore completions for FRWRs in isert_cq_tx_work Nicholas A. Bellinger
2014-03-04 14:51   ` Sagi Grimberg
     [not found]     ` <5315E85F.2090904-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-03-04 23:56       ` Nicholas A. Bellinger
2014-03-04  0:01 ` [PATCH 6/6] iser-target: Fix command leak for tx_desc->comp_llnode_batch Nicholas A. Bellinger
2014-03-04 15:17 ` [PATCH 0/6] iser-target: Fix active I/O shutdown related issues Sagi Grimberg
     [not found]   ` <5315EE7C.3030806-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-03-05  0:06     ` Nicholas A. Bellinger
2014-03-05 12:12       ` Sagi Grimberg
2014-03-05 22:04         ` Nicholas A. Bellinger
     [not found]           ` <1394057083.20601.51.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2014-03-06 14:05             ` sagi grimberg
2014-03-10 22:00               ` Nicholas A. Bellinger
2014-03-11 10:27                 ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.