[PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-07 15:33 ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, david.vrabel, konrad.wilk, boris.ostrovsky,
	wei.liu2, roger.pau

Hi all,

ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
hypercall interface and PV protocol are always based on 4KB page granularity.

Any attempt to boot a Linux guest with 64KB pages enabled will result to a
guest crash.

This series is a first attempt to allow those Linux running with the current
hypercall interface and PV protocol.

This solution has been chosen because we want to run Linux 64KB in released
Xen ARM version or/and platform using an old version of Linux DOM0.

There is room for improvement, such as support of 64KB grant, modification
of PV protocol to support different page size... They will be explored in a
separate patch series later.

TODO list:
    - Convert swiotlb to 64KB
    - Convert xenfb to 64KB
    - Support for multiple page ring support
    - Support for 64KB in gnttdev
    - Support of non-indirect grant with 64KB frontend
    - It may be possible to move some common define between
    netback/netfront and blkfront/blkback in an header

I've got most of the patches for the TODO items. I'm planning to send them as
a follow-up as it's not a requirement for a basic guests.

All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
to run it on x86 as I don't have a box with Xen x86 running. I would be
happy if someone give a try and see possible regression for x86.

I know that Konrad as a test-suite for x86. Konrand, would it be possible to
give a run to for this series?

A branch based on the latest xentip/for-linus-4.3 can be found here:

git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4

Comments, suggestions are welcomed.

Sincerely yours,

Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: boris.ostrovsky@oracle.com
Cc: wei.liu2@citrix.com
Cc: roger.pau@citrix.com

Status of each patch:

A: Reviewed-by - Acked-by
M: Patch modified in this series
m: Minor changes in this series (i.e renaming due to previous patches, typoes)
L: Missing Acked-by from a Linux maintainers (Boris, David, Konrad)
N: Missing Acked-by from a Netback maintainers (Ian or Wei)

Julien Grall (20):
A       net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
A       arm/xen: Drop pte_mfn and mfn_pte
A M L   xen: Add Xen specific page definition
A M     xen/grant: Introduce helpers to split a page into grant
A       xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one
A       block/xen-blkfront: Split blkif_queue_request in 2
A m     block/xen-blkfront: Store a page rather a pfn in the grant structure
A       block/xen-blkfront: split get_grant in 2
A m L   xen/biomerge: Don't allow biovec's to be merged when Linux is not
          using 4KB pages
A       xen/xenbus: Use Xen page definition
A m L   tty/hvc: xen: Use xen page definition
  M L   xen/balloon: Don't rely on the page granularity is the same for Xen
          and Linux
A       xen/events: fifo: Make it running on 64KB granularity
A       xen/grant-table: Make it running on 64KB granularity
A m     block/xen-blkfront: Make it running on 64KB page granularity
A       block/xen-blkback: Make it running on 64KB page granularity
A m     net/xen-netfront: Make it running on 64KB page granularity
  m  N  net/xen-netback: Make it running on 64KB page granularity
A m     xen/privcmd: Add support for Linux 64KB page granularity
A       arm/xen: Add support for 64KB page granularity

 arch/arm/include/asm/xen/page.h     |  18 +-
 arch/arm/xen/enlighten.c            |   6 +-
 arch/arm/xen/p2m.c                  |   6 +-
 arch/x86/include/asm/xen/page.h     |   2 +-
 drivers/block/xen-blkback/blkback.c |   5 +-
 drivers/block/xen-blkback/common.h  |  17 +-
 drivers/block/xen-blkback/xenbus.c  |   9 +-
 drivers/block/xen-blkfront.c        | 552 +++++++++++++++++++++++-------------
 drivers/net/xen-netback/common.h    |  18 +-
 drivers/net/xen-netback/netback.c   | 163 +++++++----
 drivers/net/xen-netfront.c          | 122 +++++---
 drivers/tty/hvc/hvc_xen.c           |   4 +-
 drivers/xen/balloon.c               |  59 +++-
 drivers/xen/biomerge.c              |   8 +
 drivers/xen/events/events_base.c    |   2 +-
 drivers/xen/events/events_fifo.c    |   2 +-
 drivers/xen/grant-table.c           |  32 ++-
 drivers/xen/privcmd.c               |   8 +-
 drivers/xen/xenbus/xenbus_client.c  |   6 +-
 drivers/xen/xenbus/xenbus_probe.c   |   3 +-
 drivers/xen/xlate_mmu.c             | 124 +++++---
 include/xen/grant_table.h           |  51 ++++
 include/xen/page.h                  |  27 +-
 23 files changed, 855 insertions(+), 389 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-07 15:33 ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
hypercall interface and PV protocol are always based on 4KB page granularity.

Any attempt to boot a Linux guest with 64KB pages enabled will result to a
guest crash.

This series is a first attempt to allow those Linux running with the current
hypercall interface and PV protocol.

This solution has been chosen because we want to run Linux 64KB in released
Xen ARM version or/and platform using an old version of Linux DOM0.

There is room for improvement, such as support of 64KB grant, modification
of PV protocol to support different page size... They will be explored in a
separate patch series later.

TODO list:
    - Convert swiotlb to 64KB
    - Convert xenfb to 64KB
    - Support for multiple page ring support
    - Support for 64KB in gnttdev
    - Support of non-indirect grant with 64KB frontend
    - It may be possible to move some common define between
    netback/netfront and blkfront/blkback in an header

I've got most of the patches for the TODO items. I'm planning to send them as
a follow-up as it's not a requirement for a basic guests.

All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
to run it on x86 as I don't have a box with Xen x86 running. I would be
happy if someone give a try and see possible regression for x86.

I know that Konrad as a test-suite for x86. Konrand, would it be possible to
give a run to for this series?

A branch based on the latest xentip/for-linus-4.3 can be found here:

git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4

Comments, suggestions are welcomed.

Sincerely yours,

Cc: david.vrabel at citrix.com
Cc: konrad.wilk at oracle.com
Cc: boris.ostrovsky at oracle.com
Cc: wei.liu2 at citrix.com
Cc: roger.pau at citrix.com

Status of each patch:

A: Reviewed-by - Acked-by
M: Patch modified in this series
m: Minor changes in this series (i.e renaming due to previous patches, typoes)
L: Missing Acked-by from a Linux maintainers (Boris, David, Konrad)
N: Missing Acked-by from a Netback maintainers (Ian or Wei)

Julien Grall (20):
A       net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
A       arm/xen: Drop pte_mfn and mfn_pte
A M L   xen: Add Xen specific page definition
A M     xen/grant: Introduce helpers to split a page into grant
A       xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one
A       block/xen-blkfront: Split blkif_queue_request in 2
A m     block/xen-blkfront: Store a page rather a pfn in the grant structure
A       block/xen-blkfront: split get_grant in 2
A m L   xen/biomerge: Don't allow biovec's to be merged when Linux is not
          using 4KB pages
A       xen/xenbus: Use Xen page definition
A m L   tty/hvc: xen: Use xen page definition
  M L   xen/balloon: Don't rely on the page granularity is the same for Xen
          and Linux
A       xen/events: fifo: Make it running on 64KB granularity
A       xen/grant-table: Make it running on 64KB granularity
A m     block/xen-blkfront: Make it running on 64KB page granularity
A       block/xen-blkback: Make it running on 64KB page granularity
A m     net/xen-netfront: Make it running on 64KB page granularity
  m  N  net/xen-netback: Make it running on 64KB page granularity
A m     xen/privcmd: Add support for Linux 64KB page granularity
A       arm/xen: Add support for 64KB page granularity

 arch/arm/include/asm/xen/page.h     |  18 +-
 arch/arm/xen/enlighten.c            |   6 +-
 arch/arm/xen/p2m.c                  |   6 +-
 arch/x86/include/asm/xen/page.h     |   2 +-
 drivers/block/xen-blkback/blkback.c |   5 +-
 drivers/block/xen-blkback/common.h  |  17 +-
 drivers/block/xen-blkback/xenbus.c  |   9 +-
 drivers/block/xen-blkfront.c        | 552 +++++++++++++++++++++++-------------
 drivers/net/xen-netback/common.h    |  18 +-
 drivers/net/xen-netback/netback.c   | 163 +++++++----
 drivers/net/xen-netfront.c          | 122 +++++---
 drivers/tty/hvc/hvc_xen.c           |   4 +-
 drivers/xen/balloon.c               |  59 +++-
 drivers/xen/biomerge.c              |   8 +
 drivers/xen/events/events_base.c    |   2 +-
 drivers/xen/events/events_fifo.c    |   2 +-
 drivers/xen/grant-table.c           |  32 ++-
 drivers/xen/privcmd.c               |   8 +-
 drivers/xen/xenbus/xenbus_client.c  |   6 +-
 drivers/xen/xenbus/xenbus_probe.c   |   3 +-
 drivers/xen/xlate_mmu.c             | 124 +++++---
 include/xen/grant_table.h           |  51 ++++
 include/xen/page.h                  |  27 +-
 23 files changed, 855 insertions(+), 389 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-07 15:33 Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, david.vrabel, boris.ostrovsky, linux-arm-kernel,
	roger.pau

Hi all,

ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
hypercall interface and PV protocol are always based on 4KB page granularity.

Any attempt to boot a Linux guest with 64KB pages enabled will result to a
guest crash.

This series is a first attempt to allow those Linux running with the current
hypercall interface and PV protocol.

This solution has been chosen because we want to run Linux 64KB in released
Xen ARM version or/and platform using an old version of Linux DOM0.

There is room for improvement, such as support of 64KB grant, modification
of PV protocol to support different page size... They will be explored in a
separate patch series later.

TODO list:
    - Convert swiotlb to 64KB
    - Convert xenfb to 64KB
    - Support for multiple page ring support
    - Support for 64KB in gnttdev
    - Support of non-indirect grant with 64KB frontend
    - It may be possible to move some common define between
    netback/netfront and blkfront/blkback in an header

I've got most of the patches for the TODO items. I'm planning to send them as
a follow-up as it's not a requirement for a basic guests.

All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
to run it on x86 as I don't have a box with Xen x86 running. I would be
happy if someone give a try and see possible regression for x86.

I know that Konrad as a test-suite for x86. Konrand, would it be possible to
give a run to for this series?

A branch based on the latest xentip/for-linus-4.3 can be found here:

git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4

Comments, suggestions are welcomed.

Sincerely yours,

Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: boris.ostrovsky@oracle.com
Cc: wei.liu2@citrix.com
Cc: roger.pau@citrix.com

Status of each patch:

A: Reviewed-by - Acked-by
M: Patch modified in this series
m: Minor changes in this series (i.e renaming due to previous patches, typoes)
L: Missing Acked-by from a Linux maintainers (Boris, David, Konrad)
N: Missing Acked-by from a Netback maintainers (Ian or Wei)

Julien Grall (20):
A       net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
A       arm/xen: Drop pte_mfn and mfn_pte
A M L   xen: Add Xen specific page definition
A M     xen/grant: Introduce helpers to split a page into grant
A       xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one
A       block/xen-blkfront: Split blkif_queue_request in 2
A m     block/xen-blkfront: Store a page rather a pfn in the grant structure
A       block/xen-blkfront: split get_grant in 2
A m L   xen/biomerge: Don't allow biovec's to be merged when Linux is not
          using 4KB pages
A       xen/xenbus: Use Xen page definition
A m L   tty/hvc: xen: Use xen page definition
  M L   xen/balloon: Don't rely on the page granularity is the same for Xen
          and Linux
A       xen/events: fifo: Make it running on 64KB granularity
A       xen/grant-table: Make it running on 64KB granularity
A m     block/xen-blkfront: Make it running on 64KB page granularity
A       block/xen-blkback: Make it running on 64KB page granularity
A m     net/xen-netfront: Make it running on 64KB page granularity
  m  N  net/xen-netback: Make it running on 64KB page granularity
A m     xen/privcmd: Add support for Linux 64KB page granularity
A       arm/xen: Add support for 64KB page granularity

 arch/arm/include/asm/xen/page.h     |  18 +-
 arch/arm/xen/enlighten.c            |   6 +-
 arch/arm/xen/p2m.c                  |   6 +-
 arch/x86/include/asm/xen/page.h     |   2 +-
 drivers/block/xen-blkback/blkback.c |   5 +-
 drivers/block/xen-blkback/common.h  |  17 +-
 drivers/block/xen-blkback/xenbus.c  |   9 +-
 drivers/block/xen-blkfront.c        | 552 +++++++++++++++++++++++-------------
 drivers/net/xen-netback/common.h    |  18 +-
 drivers/net/xen-netback/netback.c   | 163 +++++++----
 drivers/net/xen-netfront.c          | 122 +++++---
 drivers/tty/hvc/hvc_xen.c           |   4 +-
 drivers/xen/balloon.c               |  59 +++-
 drivers/xen/biomerge.c              |   8 +
 drivers/xen/events/events_base.c    |   2 +-
 drivers/xen/events/events_fifo.c    |   2 +-
 drivers/xen/grant-table.c           |  32 ++-
 drivers/xen/privcmd.c               |   8 +-
 drivers/xen/xenbus/xenbus_client.c  |   6 +-
 drivers/xen/xenbus/xenbus_probe.c   |   3 +-
 drivers/xen/xlate_mmu.c             | 124 +++++---
 include/xen/grant_table.h           |  51 ++++
 include/xen/page.h                  |  27 +-
 23 files changed, 855 insertions(+), 389 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 01/20] net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
  2015-09-07 15:33 ` Julien Grall
  (?)
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, Wei Liu, netdev

The skb doesn't change within the function. Therefore it's only
necessary to check if we need GSO once at the beginning.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

---
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org

    Changes in v4:
        - Add Wei's acked

    Changes in v2:
        - Patch added
---
 drivers/net/xen-netback/netback.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 7c64c74..d4c1bc7 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -277,6 +277,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 	unsigned long bytes;
 	int gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
+	if (skb_is_gso(skb)) {
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+	}
+
 	/* Data must not cross a page boundary. */
 	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
 
@@ -336,13 +343,6 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 		}
 
 		/* Leave a gap for the GSO descriptor. */
-		if (skb_is_gso(skb)) {
-			if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
-				gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
-			else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
-				gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
-		}
-
 		if (*head && ((1 << gso_type) & queue->vif->gso_mask))
 			queue->rx.req_cons++;
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 01/20] net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, stefano.stabellini, netdev, linux-kernel,
	Julien Grall, linux-arm-kernel

The skb doesn't change within the function. Therefore it's only
necessary to check if we need GSO once at the beginning.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

---
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org

    Changes in v4:
        - Add Wei's acked

    Changes in v2:
        - Patch added
---
 drivers/net/xen-netback/netback.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 7c64c74..d4c1bc7 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -277,6 +277,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 	unsigned long bytes;
 	int gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
+	if (skb_is_gso(skb)) {
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+	}
+
 	/* Data must not cross a page boundary. */
 	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
 
@@ -336,13 +343,6 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 		}
 
 		/* Leave a gap for the GSO descriptor. */
-		if (skb_is_gso(skb)) {
-			if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
-				gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
-			else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
-				gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
-		}
-
 		if (*head && ((1 << gso_type) & queue->vif->gso_mask))
 			queue->rx.req_cons++;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 01/20] net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
  2015-09-07 15:33 ` Julien Grall
  (?)
  (?)
@ 2015-09-07 15:33 ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, stefano.stabellini, netdev, linux-kernel,
	Julien Grall, linux-arm-kernel

The skb doesn't change within the function. Therefore it's only
necessary to check if we need GSO once at the beginning.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

---
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org

    Changes in v4:
        - Add Wei's acked

    Changes in v2:
        - Patch added
---
 drivers/net/xen-netback/netback.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 7c64c74..d4c1bc7 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -277,6 +277,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 	unsigned long bytes;
 	int gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
+	if (skb_is_gso(skb)) {
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+	}
+
 	/* Data must not cross a page boundary. */
 	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
 
@@ -336,13 +343,6 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 		}
 
 		/* Leave a gap for the GSO descriptor. */
-		if (skb_is_gso(skb)) {
-			if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
-				gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
-			else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
-				gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
-		}
-
 		if (*head && ((1 << gso_type) & queue->vif->gso_mask))
 			queue->rx.req_cons++;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 01/20] net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

The skb doesn't change within the function. Therefore it's only
necessary to check if we need GSO once at the beginning.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

---
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev at vger.kernel.org

    Changes in v4:
        - Add Wei's acked

    Changes in v2:
        - Patch added
---
 drivers/net/xen-netback/netback.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 7c64c74..d4c1bc7 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -277,6 +277,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 	unsigned long bytes;
 	int gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
+	if (skb_is_gso(skb)) {
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+	}
+
 	/* Data must not cross a page boundary. */
 	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
 
@@ -336,13 +343,6 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 		}
 
 		/* Leave a gap for the GSO descriptor. */
-		if (skb_is_gso(skb)) {
-			if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
-				gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
-			else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
-				gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
-		}
-
 		if (*head && ((1 << gso_type) & queue->vif->gso_mask))
 			queue->rx.req_cons++;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 02/20] arm/xen: Drop pte_mfn and mfn_pte
  2015-09-07 15:33 ` Julien Grall
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, Russell King

They are not used in common code expect in one place in balloon.c which is
only compiled when Linux is using PV MMU. It's not the case on ARM.

Rather than worrying how to handle the 64KB case, drop them.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Russell King <linux@arm.linux.org.uk>

    Changes in v4:
        - Add Stefano's reviewed

    Changes in v3:
        - Patch added
---
 arch/arm/include/asm/xen/page.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 1279563..98c9fc3 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -13,9 +13,6 @@
 
 #define phys_to_machine_mapping_valid(pfn) (1)
 
-#define pte_mfn	    pte_pfn
-#define mfn_pte	    pfn_pte
-
 /* Xen machine address */
 typedef struct xmaddr {
 	phys_addr_t maddr;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 02/20] arm/xen: Drop pte_mfn and mfn_pte
  2015-09-07 15:33 ` Julien Grall
                   ` (3 preceding siblings ...)
  (?)
@ 2015-09-07 15:33 ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Russell King, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, linux-arm-kernel

They are not used in common code expect in one place in balloon.c which is
only compiled when Linux is using PV MMU. It's not the case on ARM.

Rather than worrying how to handle the 64KB case, drop them.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Russell King <linux@arm.linux.org.uk>

    Changes in v4:
        - Add Stefano's reviewed

    Changes in v3:
        - Patch added
---
 arch/arm/include/asm/xen/page.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 1279563..98c9fc3 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -13,9 +13,6 @@
 
 #define phys_to_machine_mapping_valid(pfn) (1)
 
-#define pte_mfn	    pte_pfn
-#define mfn_pte	    pfn_pte
-
 /* Xen machine address */
 typedef struct xmaddr {
 	phys_addr_t maddr;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 02/20] arm/xen: Drop pte_mfn and mfn_pte
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

They are not used in common code expect in one place in balloon.c which is
only compiled when Linux is using PV MMU. It's not the case on ARM.

Rather than worrying how to handle the 64KB case, drop them.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Russell King <linux@arm.linux.org.uk>

    Changes in v4:
        - Add Stefano's reviewed

    Changes in v3:
        - Patch added
---
 arch/arm/include/asm/xen/page.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 1279563..98c9fc3 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -13,9 +13,6 @@
 
 #define phys_to_machine_mapping_valid(pfn) (1)
 
-#define pte_mfn	    pte_pfn
-#define mfn_pte	    pfn_pte
-
 /* Xen machine address */
 typedef struct xmaddr {
 	phys_addr_t maddr;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 03/20] xen: Add Xen specific page definition
  2015-09-07 15:33 ` Julien Grall
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	David Vrabel

The Xen hypercall interface is always using 4K page granularity on ARM
and x86 architecture.

With the incoming support of 64K page granularity for ARM64 guest, it
won't be possible to re-use the Linux page definition in Xen drivers.

Introduce Xen page definition helpers based on the Linux page
definition. They have exactly the same name but prefixed with
XEN_/xen_ prefix.

Also modify xen_page_to_gfn to use new Xen page definition.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

    Changes in v4:
        - Typoes
        - Rename xen_page_to_pfn to page_to_xen_pfn

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - Rename pfn to xen_pfn in xen_pfn_to_page
        - Add a comment that we assume PAGE_SIZE to be a multiple of
        XEN_PAGE_SIZE
        - s/MFN/GFN/ according to new naming
        - Add Stefano's reviewed-by

    Changes in v2:
        - Add XEN_PFN_UP
        - Add a comment describing the behavior of page_to_pfn
---
 include/xen/page.h | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/xen/page.h b/include/xen/page.h
index 1daae48..96294ac 100644
--- a/include/xen/page.h
+++ b/include/xen/page.h
@@ -1,11 +1,36 @@
 #ifndef _XEN_PAGE_H
 #define _XEN_PAGE_H
 
+#include <asm/page.h>
+
+/* The hypercall interface supports only 4KB page */
+#define XEN_PAGE_SHIFT	12
+#define XEN_PAGE_SIZE	(_AC(1, UL) << XEN_PAGE_SHIFT)
+#define XEN_PAGE_MASK	(~(XEN_PAGE_SIZE-1))
+#define xen_offset_in_page(p)	((unsigned long)(p) & ~XEN_PAGE_MASK)
+
+/*
+ * We assume that PAGE_SIZE is a multiple of XEN_PAGE_SIZE
+ * XXX: Add a BUILD_BUG_ON?
+ */
+
+#define xen_pfn_to_page(xen_pfn)	\
+	((pfn_to_page(((unsigned long)(xen_pfn) << XEN_PAGE_SHIFT) >> PAGE_SHIFT)))
+#define page_to_xen_pfn(page)		\
+	(((page_to_pfn(page)) << PAGE_SHIFT) >> XEN_PAGE_SHIFT)
+
+#define XEN_PFN_PER_PAGE	(PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define XEN_PFN_DOWN(x)	((x) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_UP(x)	(((x) + XEN_PAGE_SIZE-1) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_PHYS(x)	((phys_addr_t)(x) << XEN_PAGE_SHIFT)
+
 #include <asm/xen/page.h>
 
+/* Return the GFN associated to the first 4KB of the page */
 static inline unsigned long xen_page_to_gfn(struct page *page)
 {
-	return pfn_to_gfn(page_to_pfn(page));
+	return pfn_to_gfn(page_to_xen_pfn(page));
 }
 
 struct xen_memory_region {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 03/20] xen: Add Xen specific page definition
  2015-09-07 15:33 ` Julien Grall
                   ` (4 preceding siblings ...)
  (?)
@ 2015-09-07 15:33 ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, Julien Grall,
	David Vrabel, Boris Ostrovsky, linux-arm-kernel

The Xen hypercall interface is always using 4K page granularity on ARM
and x86 architecture.

With the incoming support of 64K page granularity for ARM64 guest, it
won't be possible to re-use the Linux page definition in Xen drivers.

Introduce Xen page definition helpers based on the Linux page
definition. They have exactly the same name but prefixed with
XEN_/xen_ prefix.

Also modify xen_page_to_gfn to use new Xen page definition.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

    Changes in v4:
        - Typoes
        - Rename xen_page_to_pfn to page_to_xen_pfn

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - Rename pfn to xen_pfn in xen_pfn_to_page
        - Add a comment that we assume PAGE_SIZE to be a multiple of
        XEN_PAGE_SIZE
        - s/MFN/GFN/ according to new naming
        - Add Stefano's reviewed-by

    Changes in v2:
        - Add XEN_PFN_UP
        - Add a comment describing the behavior of page_to_pfn
---
 include/xen/page.h | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/xen/page.h b/include/xen/page.h
index 1daae48..96294ac 100644
--- a/include/xen/page.h
+++ b/include/xen/page.h
@@ -1,11 +1,36 @@
 #ifndef _XEN_PAGE_H
 #define _XEN_PAGE_H
 
+#include <asm/page.h>
+
+/* The hypercall interface supports only 4KB page */
+#define XEN_PAGE_SHIFT	12
+#define XEN_PAGE_SIZE	(_AC(1, UL) << XEN_PAGE_SHIFT)
+#define XEN_PAGE_MASK	(~(XEN_PAGE_SIZE-1))
+#define xen_offset_in_page(p)	((unsigned long)(p) & ~XEN_PAGE_MASK)
+
+/*
+ * We assume that PAGE_SIZE is a multiple of XEN_PAGE_SIZE
+ * XXX: Add a BUILD_BUG_ON?
+ */
+
+#define xen_pfn_to_page(xen_pfn)	\
+	((pfn_to_page(((unsigned long)(xen_pfn) << XEN_PAGE_SHIFT) >> PAGE_SHIFT)))
+#define page_to_xen_pfn(page)		\
+	(((page_to_pfn(page)) << PAGE_SHIFT) >> XEN_PAGE_SHIFT)
+
+#define XEN_PFN_PER_PAGE	(PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define XEN_PFN_DOWN(x)	((x) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_UP(x)	(((x) + XEN_PAGE_SIZE-1) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_PHYS(x)	((phys_addr_t)(x) << XEN_PAGE_SHIFT)
+
 #include <asm/xen/page.h>
 
+/* Return the GFN associated to the first 4KB of the page */
 static inline unsigned long xen_page_to_gfn(struct page *page)
 {
-	return pfn_to_gfn(page_to_pfn(page));
+	return pfn_to_gfn(page_to_xen_pfn(page));
 }
 
 struct xen_memory_region {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 03/20] xen: Add Xen specific page definition
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

The Xen hypercall interface is always using 4K page granularity on ARM
and x86 architecture.

With the incoming support of 64K page granularity for ARM64 guest, it
won't be possible to re-use the Linux page definition in Xen drivers.

Introduce Xen page definition helpers based on the Linux page
definition. They have exactly the same name but prefixed with
XEN_/xen_ prefix.

Also modify xen_page_to_gfn to use new Xen page definition.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

    Changes in v4:
        - Typoes
        - Rename xen_page_to_pfn to page_to_xen_pfn

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - Rename pfn to xen_pfn in xen_pfn_to_page
        - Add a comment that we assume PAGE_SIZE to be a multiple of
        XEN_PAGE_SIZE
        - s/MFN/GFN/ according to new naming
        - Add Stefano's reviewed-by

    Changes in v2:
        - Add XEN_PFN_UP
        - Add a comment describing the behavior of page_to_pfn
---
 include/xen/page.h | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/xen/page.h b/include/xen/page.h
index 1daae48..96294ac 100644
--- a/include/xen/page.h
+++ b/include/xen/page.h
@@ -1,11 +1,36 @@
 #ifndef _XEN_PAGE_H
 #define _XEN_PAGE_H
 
+#include <asm/page.h>
+
+/* The hypercall interface supports only 4KB page */
+#define XEN_PAGE_SHIFT	12
+#define XEN_PAGE_SIZE	(_AC(1, UL) << XEN_PAGE_SHIFT)
+#define XEN_PAGE_MASK	(~(XEN_PAGE_SIZE-1))
+#define xen_offset_in_page(p)	((unsigned long)(p) & ~XEN_PAGE_MASK)
+
+/*
+ * We assume that PAGE_SIZE is a multiple of XEN_PAGE_SIZE
+ * XXX: Add a BUILD_BUG_ON?
+ */
+
+#define xen_pfn_to_page(xen_pfn)	\
+	((pfn_to_page(((unsigned long)(xen_pfn) << XEN_PAGE_SHIFT) >> PAGE_SHIFT)))
+#define page_to_xen_pfn(page)		\
+	(((page_to_pfn(page)) << PAGE_SHIFT) >> XEN_PAGE_SHIFT)
+
+#define XEN_PFN_PER_PAGE	(PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define XEN_PFN_DOWN(x)	((x) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_UP(x)	(((x) + XEN_PAGE_SIZE-1) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_PHYS(x)	((phys_addr_t)(x) << XEN_PAGE_SHIFT)
+
 #include <asm/xen/page.h>
 
+/* Return the GFN associated to the first 4KB of the page */
 static inline unsigned long xen_page_to_gfn(struct page *page)
 {
-	return pfn_to_gfn(page_to_pfn(page));
+	return pfn_to_gfn(page_to_xen_pfn(page));
 }
 
 struct xen_memory_region {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 04/20] xen/grant: Introduce helpers to split a page into grant
  2015-09-07 15:33 ` Julien Grall
  (?)
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, David Vrabel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86

Currently, a grant is always based on the Xen page granularity (i.e
4KB). When Linux is using a different page granularity, a single page
will be split between multiple grants.

The new helpers will be in charge of splitting the Linux page into grants
and call a function given by the caller on each grant.

Also provide an helper to count the number of grants within a given
contiguous region.

Note that the x86/include/asm/xen/page.h is now including
xen/interface/grant_table.h rather than xen/grant_table.h. It's
necessary because xen/grant_table.h depends on asm/xen/page.h and will
break the compilation. Furthermore, only definition in
interface/grant_table.h is required.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org

    Changes in v4:
        - Typoes
        - Rename gnttab_one_grant into gnttab_for_one_grant
        - Add Stefano and David's reviewed-by
        - s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming

    Changes in v3:
        - Fix error reported by checkpatch.pl
        - Typoes
        - s/pfn/xen_pfn/ in gnttab_foreach_grant
        - Drop the possibility to use less data. The complexity is moved
        in netback which is the only user
        - Rename gnttab_foreach_grant into gnttab_foreach_grant_in_range
        - s/offset/start/ in gnttab_count_grant and update the
        description of the parameter
        - s/mfn/gfn base on the new terminologies
        - Add EXPORT_SYMBOL_GPL for gnttab_foreach_grant_in_range
        - Use xen_offset_in_page and XEN_PFN_DOWN whenever it's possible
        - Fix compilation on x86.

    Changes in v2:
        - Patch added
---
 arch/x86/include/asm/xen/page.h |  2 +-
 drivers/xen/grant-table.c       | 26 +++++++++++++++++++++++++
 include/xen/grant_table.h       | 42 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 0b762f6..501479e 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -12,7 +12,7 @@
 #include <asm/pgtable.h>
 
 #include <xen/interface/xen.h>
-#include <xen/grant_table.h>
+#include <xen/interface/grant_table.h>
 #include <xen/features.h>
 
 /* Xen machine address */
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 62f591f..7b4e1cf 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -776,6 +776,32 @@ void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count)
 }
 EXPORT_SYMBOL_GPL(gnttab_batch_copy);
 
+void gnttab_foreach_grant_in_range(struct page *page,
+				   unsigned int offset,
+				   unsigned int len,
+				   xen_grant_fn_t fn,
+				   void *data)
+{
+	unsigned int goffset;
+	unsigned int glen;
+	unsigned long xen_pfn;
+
+	len = min_t(unsigned int, PAGE_SIZE - offset, len);
+	goffset = xen_offset_in_page(offset);
+
+	xen_pfn = page_to_xen_pfn(page) + XEN_PFN_DOWN(offset);
+
+	while (len) {
+		glen = min_t(unsigned int, XEN_PAGE_SIZE - goffset, len);
+		fn(pfn_to_gfn(xen_pfn), goffset, glen, data);
+
+		goffset = 0;
+		xen_pfn++;
+		len -= glen;
+	}
+}
+EXPORT_SYMBOL_GPL(gnttab_foreach_grant_in_range);
+
 int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 		    struct gnttab_map_grant_ref *kmap_ops,
 		    struct page **pages, unsigned int count)
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 4478f4b..05b5b08 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -45,8 +45,10 @@
 #include <asm/xen/hypervisor.h>
 
 #include <xen/features.h>
+#include <xen/page.h>
 #include <linux/mm_types.h>
 #include <linux/page-flags.h>
+#include <linux/kernel.h>
 
 #define GNTTAB_RESERVED_XENSTORE 1
 
@@ -224,4 +226,44 @@ static inline struct xen_page_foreign *xen_page_foreign(struct page *page)
 #endif
 }
 
+/* Split Linux page in chunk of the size of the grant and call fn
+ *
+ * Parameters of fn:
+ *	gfn: guest frame number
+ *	offset: offset in the grant
+ *	len: length of the data in the grant.
+ *	data: internal information
+ */
+typedef void (*xen_grant_fn_t)(unsigned long gfn, unsigned int offset,
+			       unsigned int len, void *data);
+
+void gnttab_foreach_grant_in_range(struct page *page,
+				   unsigned int offset,
+				   unsigned int len,
+				   xen_grant_fn_t fn,
+				   void *data);
+
+/* Helper to get to call fn only on the first "grant chunk" */
+static inline void gnttab_for_one_grant(struct page *page, unsigned int offset,
+					unsigned len, xen_grant_fn_t fn,
+					void *data)
+{
+	/* The first request is limited to the size of one grant */
+	len = min_t(unsigned int, XEN_PAGE_SIZE - (offset & ~XEN_PAGE_MASK),
+		    len);
+
+	gnttab_foreach_grant_in_range(page, offset, len, fn, data);
+}
+
+/* Get the number of grant in a specified region
+ *
+ * start: Offset from the beginning of the first page
+ * len: total length of data (can cross multiple page)
+ */
+static inline unsigned int gnttab_count_grant(unsigned int start,
+					      unsigned int len)
+{
+	return XEN_PFN_UP(xen_offset_in_page(start) + len);
+}
+
 #endif /* __ASM_GNTTAB_H__ */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 04/20] xen/grant: Introduce helpers to split a page into grant
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: x86, ian.campbell, stefano.stabellini, linux-kernel, Julien Grall,
	Ingo Molnar, David Vrabel, H. Peter Anvin, Boris Ostrovsky,
	Thomas Gleixner, linux-arm-kernel

Currently, a grant is always based on the Xen page granularity (i.e
4KB). When Linux is using a different page granularity, a single page
will be split between multiple grants.

The new helpers will be in charge of splitting the Linux page into grants
and call a function given by the caller on each grant.

Also provide an helper to count the number of grants within a given
contiguous region.

Note that the x86/include/asm/xen/page.h is now including
xen/interface/grant_table.h rather than xen/grant_table.h. It's
necessary because xen/grant_table.h depends on asm/xen/page.h and will
break the compilation. Furthermore, only definition in
interface/grant_table.h is required.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org

    Changes in v4:
        - Typoes
        - Rename gnttab_one_grant into gnttab_for_one_grant
        - Add Stefano and David's reviewed-by
        - s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming

    Changes in v3:
        - Fix error reported by checkpatch.pl
        - Typoes
        - s/pfn/xen_pfn/ in gnttab_foreach_grant
        - Drop the possibility to use less data. The complexity is moved
        in netback which is the only user
        - Rename gnttab_foreach_grant into gnttab_foreach_grant_in_range
        - s/offset/start/ in gnttab_count_grant and update the
        description of the parameter
        - s/mfn/gfn base on the new terminologies
        - Add EXPORT_SYMBOL_GPL for gnttab_foreach_grant_in_range
        - Use xen_offset_in_page and XEN_PFN_DOWN whenever it's possible
        - Fix compilation on x86.

    Changes in v2:
        - Patch added
---
 arch/x86/include/asm/xen/page.h |  2 +-
 drivers/xen/grant-table.c       | 26 +++++++++++++++++++++++++
 include/xen/grant_table.h       | 42 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 0b762f6..501479e 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -12,7 +12,7 @@
 #include <asm/pgtable.h>
 
 #include <xen/interface/xen.h>
-#include <xen/grant_table.h>
+#include <xen/interface/grant_table.h>
 #include <xen/features.h>
 
 /* Xen machine address */
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 62f591f..7b4e1cf 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -776,6 +776,32 @@ void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count)
 }
 EXPORT_SYMBOL_GPL(gnttab_batch_copy);
 
+void gnttab_foreach_grant_in_range(struct page *page,
+				   unsigned int offset,
+				   unsigned int len,
+				   xen_grant_fn_t fn,
+				   void *data)
+{
+	unsigned int goffset;
+	unsigned int glen;
+	unsigned long xen_pfn;
+
+	len = min_t(unsigned int, PAGE_SIZE - offset, len);
+	goffset = xen_offset_in_page(offset);
+
+	xen_pfn = page_to_xen_pfn(page) + XEN_PFN_DOWN(offset);
+
+	while (len) {
+		glen = min_t(unsigned int, XEN_PAGE_SIZE - goffset, len);
+		fn(pfn_to_gfn(xen_pfn), goffset, glen, data);
+
+		goffset = 0;
+		xen_pfn++;
+		len -= glen;
+	}
+}
+EXPORT_SYMBOL_GPL(gnttab_foreach_grant_in_range);
+
 int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 		    struct gnttab_map_grant_ref *kmap_ops,
 		    struct page **pages, unsigned int count)
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 4478f4b..05b5b08 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -45,8 +45,10 @@
 #include <asm/xen/hypervisor.h>
 
 #include <xen/features.h>
+#include <xen/page.h>
 #include <linux/mm_types.h>
 #include <linux/page-flags.h>
+#include <linux/kernel.h>
 
 #define GNTTAB_RESERVED_XENSTORE 1
 
@@ -224,4 +226,44 @@ static inline struct xen_page_foreign *xen_page_foreign(struct page *page)
 #endif
 }
 
+/* Split Linux page in chunk of the size of the grant and call fn
+ *
+ * Parameters of fn:
+ *	gfn: guest frame number
+ *	offset: offset in the grant
+ *	len: length of the data in the grant.
+ *	data: internal information
+ */
+typedef void (*xen_grant_fn_t)(unsigned long gfn, unsigned int offset,
+			       unsigned int len, void *data);
+
+void gnttab_foreach_grant_in_range(struct page *page,
+				   unsigned int offset,
+				   unsigned int len,
+				   xen_grant_fn_t fn,
+				   void *data);
+
+/* Helper to get to call fn only on the first "grant chunk" */
+static inline void gnttab_for_one_grant(struct page *page, unsigned int offset,
+					unsigned len, xen_grant_fn_t fn,
+					void *data)
+{
+	/* The first request is limited to the size of one grant */
+	len = min_t(unsigned int, XEN_PAGE_SIZE - (offset & ~XEN_PAGE_MASK),
+		    len);
+
+	gnttab_foreach_grant_in_range(page, offset, len, fn, data);
+}
+
+/* Get the number of grant in a specified region
+ *
+ * start: Offset from the beginning of the first page
+ * len: total length of data (can cross multiple page)
+ */
+static inline unsigned int gnttab_count_grant(unsigned int start,
+					      unsigned int len)
+{
+	return XEN_PFN_UP(xen_offset_in_page(start) + len);
+}
+
 #endif /* __ASM_GNTTAB_H__ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 04/20] xen/grant: Introduce helpers to split a page into grant
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

Currently, a grant is always based on the Xen page granularity (i.e
4KB). When Linux is using a different page granularity, a single page
will be split between multiple grants.

The new helpers will be in charge of splitting the Linux page into grants
and call a function given by the caller on each grant.

Also provide an helper to count the number of grants within a given
contiguous region.

Note that the x86/include/asm/xen/page.h is now including
xen/interface/grant_table.h rather than xen/grant_table.h. It's
necessary because xen/grant_table.h depends on asm/xen/page.h and will
break the compilation. Furthermore, only definition in
interface/grant_table.h is required.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86 at kernel.org

    Changes in v4:
        - Typoes
        - Rename gnttab_one_grant into gnttab_for_one_grant
        - Add Stefano and David's reviewed-by
        - s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming

    Changes in v3:
        - Fix error reported by checkpatch.pl
        - Typoes
        - s/pfn/xen_pfn/ in gnttab_foreach_grant
        - Drop the possibility to use less data. The complexity is moved
        in netback which is the only user
        - Rename gnttab_foreach_grant into gnttab_foreach_grant_in_range
        - s/offset/start/ in gnttab_count_grant and update the
        description of the parameter
        - s/mfn/gfn base on the new terminologies
        - Add EXPORT_SYMBOL_GPL for gnttab_foreach_grant_in_range
        - Use xen_offset_in_page and XEN_PFN_DOWN whenever it's possible
        - Fix compilation on x86.

    Changes in v2:
        - Patch added
---
 arch/x86/include/asm/xen/page.h |  2 +-
 drivers/xen/grant-table.c       | 26 +++++++++++++++++++++++++
 include/xen/grant_table.h       | 42 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 0b762f6..501479e 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -12,7 +12,7 @@
 #include <asm/pgtable.h>
 
 #include <xen/interface/xen.h>
-#include <xen/grant_table.h>
+#include <xen/interface/grant_table.h>
 #include <xen/features.h>
 
 /* Xen machine address */
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 62f591f..7b4e1cf 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -776,6 +776,32 @@ void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count)
 }
 EXPORT_SYMBOL_GPL(gnttab_batch_copy);
 
+void gnttab_foreach_grant_in_range(struct page *page,
+				   unsigned int offset,
+				   unsigned int len,
+				   xen_grant_fn_t fn,
+				   void *data)
+{
+	unsigned int goffset;
+	unsigned int glen;
+	unsigned long xen_pfn;
+
+	len = min_t(unsigned int, PAGE_SIZE - offset, len);
+	goffset = xen_offset_in_page(offset);
+
+	xen_pfn = page_to_xen_pfn(page) + XEN_PFN_DOWN(offset);
+
+	while (len) {
+		glen = min_t(unsigned int, XEN_PAGE_SIZE - goffset, len);
+		fn(pfn_to_gfn(xen_pfn), goffset, glen, data);
+
+		goffset = 0;
+		xen_pfn++;
+		len -= glen;
+	}
+}
+EXPORT_SYMBOL_GPL(gnttab_foreach_grant_in_range);
+
 int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 		    struct gnttab_map_grant_ref *kmap_ops,
 		    struct page **pages, unsigned int count)
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 4478f4b..05b5b08 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -45,8 +45,10 @@
 #include <asm/xen/hypervisor.h>
 
 #include <xen/features.h>
+#include <xen/page.h>
 #include <linux/mm_types.h>
 #include <linux/page-flags.h>
+#include <linux/kernel.h>
 
 #define GNTTAB_RESERVED_XENSTORE 1
 
@@ -224,4 +226,44 @@ static inline struct xen_page_foreign *xen_page_foreign(struct page *page)
 #endif
 }
 
+/* Split Linux page in chunk of the size of the grant and call fn
+ *
+ * Parameters of fn:
+ *	gfn: guest frame number
+ *	offset: offset in the grant
+ *	len: length of the data in the grant.
+ *	data: internal information
+ */
+typedef void (*xen_grant_fn_t)(unsigned long gfn, unsigned int offset,
+			       unsigned int len, void *data);
+
+void gnttab_foreach_grant_in_range(struct page *page,
+				   unsigned int offset,
+				   unsigned int len,
+				   xen_grant_fn_t fn,
+				   void *data);
+
+/* Helper to get to call fn only on the first "grant chunk" */
+static inline void gnttab_for_one_grant(struct page *page, unsigned int offset,
+					unsigned len, xen_grant_fn_t fn,
+					void *data)
+{
+	/* The first request is limited to the size of one grant */
+	len = min_t(unsigned int, XEN_PAGE_SIZE - (offset & ~XEN_PAGE_MASK),
+		    len);
+
+	gnttab_foreach_grant_in_range(page, offset, len, fn, data);
+}
+
+/* Get the number of grant in a specified region
+ *
+ * start: Offset from the beginning of the first page
+ * len: total length of data (can cross multiple page)
+ */
+static inline unsigned int gnttab_count_grant(unsigned int start,
+					      unsigned int len)
+{
+	return XEN_PFN_UP(xen_offset_in_page(start) + len);
+}
+
 #endif /* __ASM_GNTTAB_H__ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 05/20] xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one
  2015-09-07 15:33 ` Julien Grall
  (?)
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, Julien Grall, David Vrabel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky

Many PV drivers contain the idiom:

pfn = page_to_gfn(...) /* Or similar */
gnttab_grant_foreign_access_ref

Replace it by a new helper. Note that when Linux is using a different
page granularity than Xen, the helper only gives access to the first 4KB
grant.

This is useful where drivers are allocating a full Linux page for each
grant.

Also include xen/interface/grant_table.h rather than xen/grant_table.h in
asm/page.h for x86 to fix a compilation issue [1]. Only the former is
useful in order to get the structure definition.

[1] Interdependency between asm/page.h and xen/grant_table.h which result
to page_mfn not being defined when necessary.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    Changes in v3:
        - Rename gnttab_page_grant_foreign_access_ref into
        gnttab_page_grant_foreign_access_ref_one
        - Fix typo in the commit message
        - s/mfn/gfn based on the new naming
        - Add David and Stefano's reviewed-by

    Changes in v2:
        - Patch added
---
 include/xen/grant_table.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 05b5b08..e17a4b3 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -131,6 +131,15 @@ void gnttab_cancel_free_callback(struct gnttab_free_callback *callback);
 void gnttab_grant_foreign_access_ref(grant_ref_t ref, domid_t domid,
 				     unsigned long frame, int readonly);
 
+/* Give access to the first 4K of the page */
+static inline void gnttab_page_grant_foreign_access_ref_one(
+	grant_ref_t ref, domid_t domid,
+	struct page *page, int readonly)
+{
+	gnttab_grant_foreign_access_ref(ref, domid, xen_page_to_gfn(page),
+					readonly);
+}
+
 void gnttab_grant_foreign_transfer_ref(grant_ref_t, domid_t domid,
 				       unsigned long pfn);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 05/20] xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, Julien Grall, linux-kernel,
	Julien Grall, David Vrabel, Boris Ostrovsky, linux-arm-kernel

Many PV drivers contain the idiom:

pfn = page_to_gfn(...) /* Or similar */
gnttab_grant_foreign_access_ref

Replace it by a new helper. Note that when Linux is using a different
page granularity than Xen, the helper only gives access to the first 4KB
grant.

This is useful where drivers are allocating a full Linux page for each
grant.

Also include xen/interface/grant_table.h rather than xen/grant_table.h in
asm/page.h for x86 to fix a compilation issue [1]. Only the former is
useful in order to get the structure definition.

[1] Interdependency between asm/page.h and xen/grant_table.h which result
to page_mfn not being defined when necessary.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    Changes in v3:
        - Rename gnttab_page_grant_foreign_access_ref into
        gnttab_page_grant_foreign_access_ref_one
        - Fix typo in the commit message
        - s/mfn/gfn based on the new naming
        - Add David and Stefano's reviewed-by

    Changes in v2:
        - Patch added
---
 include/xen/grant_table.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 05b5b08..e17a4b3 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -131,6 +131,15 @@ void gnttab_cancel_free_callback(struct gnttab_free_callback *callback);
 void gnttab_grant_foreign_access_ref(grant_ref_t ref, domid_t domid,
 				     unsigned long frame, int readonly);
 
+/* Give access to the first 4K of the page */
+static inline void gnttab_page_grant_foreign_access_ref_one(
+	grant_ref_t ref, domid_t domid,
+	struct page *page, int readonly)
+{
+	gnttab_grant_foreign_access_ref(ref, domid, xen_page_to_gfn(page),
+					readonly);
+}
+
 void gnttab_grant_foreign_transfer_ref(grant_ref_t, domid_t domid,
 				       unsigned long pfn);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 05/20] xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

Many PV drivers contain the idiom:

pfn = page_to_gfn(...) /* Or similar */
gnttab_grant_foreign_access_ref

Replace it by a new helper. Note that when Linux is using a different
page granularity than Xen, the helper only gives access to the first 4KB
grant.

This is useful where drivers are allocating a full Linux page for each
grant.

Also include xen/interface/grant_table.h rather than xen/grant_table.h in
asm/page.h for x86 to fix a compilation issue [1]. Only the former is
useful in order to get the structure definition.

[1] Interdependency between asm/page.h and xen/grant_table.h which result
to page_mfn not being defined when necessary.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    Changes in v3:
        - Rename gnttab_page_grant_foreign_access_ref into
        gnttab_page_grant_foreign_access_ref_one
        - Fix typo in the commit message
        - s/mfn/gfn based on the new naming
        - Add David and Stefano's reviewed-by

    Changes in v2:
        - Patch added
---
 include/xen/grant_table.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 05b5b08..e17a4b3 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -131,6 +131,15 @@ void gnttab_cancel_free_callback(struct gnttab_free_callback *callback);
 void gnttab_grant_foreign_access_ref(grant_ref_t ref, domid_t domid,
 				     unsigned long frame, int readonly);
 
+/* Give access to the first 4K of the page */
+static inline void gnttab_page_grant_foreign_access_ref_one(
+	grant_ref_t ref, domid_t domid,
+	struct page *page, int readonly)
+{
+	gnttab_grant_foreign_access_ref(ref, domid, xen_page_to_gfn(page),
+					readonly);
+}
+
 void gnttab_grant_foreign_transfer_ref(grant_ref_t, domid_t domid,
 				       unsigned long pfn);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 06/20] block/xen-blkfront: Split blkif_queue_request in 2
  2015-09-07 15:33 ` Julien Grall
  (?)
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, Roger Pau Monné, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, David Vrabel

Currently, blkif_queue_request has 2 distinct execution path:
    - Send a discard request
    - Send a read/write request

The function is also allocating grants to use for generating the
request. Although, this is only used for read/write request.

Rather than having a function with 2 distinct execution path, separate
the function in 2. This will also remove one level of tabulation.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

    Roger, if you really want if can drop the else clause in
    blkif_queue_request, IHMO it's more clear here. Although I've kept
    your Reviewed-by. Let me know if it's not fine.

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - Add Roger's Reviewed-by

    Changes in v2:
        - Patch added
---
 drivers/block/xen-blkfront.c | 277 ++++++++++++++++++++++++-------------------
 1 file changed, 153 insertions(+), 124 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 432e105..b11f084 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -395,13 +395,35 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 	return 0;
 }
 
-/*
- * Generate a Xen blkfront IO request from a blk layer request.  Reads
- * and writes are handled as expected.
- *
- * @req: a request struct
- */
-static int blkif_queue_request(struct request *req)
+static int blkif_queue_discard_req(struct request *req)
+{
+	struct blkfront_info *info = req->rq_disk->private_data;
+	struct blkif_request *ring_req;
+	unsigned long id;
+
+	/* Fill out a communications ring structure. */
+	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+	id = get_id_from_freelist(info);
+	info->shadow[id].request = req;
+
+	ring_req->operation = BLKIF_OP_DISCARD;
+	ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
+	ring_req->u.discard.id = id;
+	ring_req->u.discard.sector_number = (blkif_sector_t)blk_rq_pos(req);
+	if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
+		ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
+	else
+		ring_req->u.discard.flag = 0;
+
+	info->ring.req_prod_pvt++;
+
+	/* Keep a private copy so we can reissue requests when recovering. */
+	info->shadow[id].req = *ring_req;
+
+	return 0;
+}
+
+static int blkif_queue_rw_req(struct request *req)
 {
 	struct blkfront_info *info = req->rq_disk->private_data;
 	struct blkif_request *ring_req;
@@ -421,9 +443,6 @@ static int blkif_queue_request(struct request *req)
 	struct scatterlist *sg;
 	int nseg, max_grefs;
 
-	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
-		return 1;
-
 	max_grefs = req->nr_phys_segments;
 	if (max_grefs > BLKIF_MAX_SEGMENTS_PER_REQUEST)
 		/*
@@ -453,139 +472,131 @@ static int blkif_queue_request(struct request *req)
 	id = get_id_from_freelist(info);
 	info->shadow[id].request = req;
 
-	if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE))) {
-		ring_req->operation = BLKIF_OP_DISCARD;
-		ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
-		ring_req->u.discard.id = id;
-		ring_req->u.discard.sector_number = (blkif_sector_t)blk_rq_pos(req);
-		if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
-			ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
-		else
-			ring_req->u.discard.flag = 0;
+	BUG_ON(info->max_indirect_segments == 0 &&
+	       req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
+	BUG_ON(info->max_indirect_segments &&
+	       req->nr_phys_segments > info->max_indirect_segments);
+	nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+	ring_req->u.rw.id = id;
+	if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+		/*
+		 * The indirect operation can only be a BLKIF_OP_READ or
+		 * BLKIF_OP_WRITE
+		 */
+		BUG_ON(req->cmd_flags & (REQ_FLUSH | REQ_FUA));
+		ring_req->operation = BLKIF_OP_INDIRECT;
+		ring_req->u.indirect.indirect_op = rq_data_dir(req) ?
+			BLKIF_OP_WRITE : BLKIF_OP_READ;
+		ring_req->u.indirect.sector_number = (blkif_sector_t)blk_rq_pos(req);
+		ring_req->u.indirect.handle = info->handle;
+		ring_req->u.indirect.nr_segments = nseg;
 	} else {
-		BUG_ON(info->max_indirect_segments == 0 &&
-		       req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
-		BUG_ON(info->max_indirect_segments &&
-		       req->nr_phys_segments > info->max_indirect_segments);
-		nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
-		ring_req->u.rw.id = id;
-		if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+		ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
+		ring_req->u.rw.handle = info->handle;
+		ring_req->operation = rq_data_dir(req) ?
+			BLKIF_OP_WRITE : BLKIF_OP_READ;
+		if (req->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
 			/*
-			 * The indirect operation can only be a BLKIF_OP_READ or
-			 * BLKIF_OP_WRITE
+			 * Ideally we can do an unordered flush-to-disk.
+			 * In case the backend onlysupports barriers, use that.
+			 * A barrier request a superset of FUA, so we can
+			 * implement it the same way.  (It's also a FLUSH+FUA,
+			 * since it is guaranteed ordered WRT previous writes.)
 			 */
-			BUG_ON(req->cmd_flags & (REQ_FLUSH | REQ_FUA));
-			ring_req->operation = BLKIF_OP_INDIRECT;
-			ring_req->u.indirect.indirect_op = rq_data_dir(req) ?
-				BLKIF_OP_WRITE : BLKIF_OP_READ;
-			ring_req->u.indirect.sector_number = (blkif_sector_t)blk_rq_pos(req);
-			ring_req->u.indirect.handle = info->handle;
-			ring_req->u.indirect.nr_segments = nseg;
-		} else {
-			ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
-			ring_req->u.rw.handle = info->handle;
-			ring_req->operation = rq_data_dir(req) ?
-				BLKIF_OP_WRITE : BLKIF_OP_READ;
-			if (req->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
-				/*
-				 * Ideally we can do an unordered flush-to-disk. In case the
-				 * backend onlysupports barriers, use that. A barrier request
-				 * a superset of FUA, so we can implement it the same
-				 * way.  (It's also a FLUSH+FUA, since it is
-				 * guaranteed ordered WRT previous writes.)
-				 */
-				switch (info->feature_flush &
-					((REQ_FLUSH|REQ_FUA))) {
-				case REQ_FLUSH|REQ_FUA:
-					ring_req->operation =
-						BLKIF_OP_WRITE_BARRIER;
-					break;
-				case REQ_FLUSH:
-					ring_req->operation =
-						BLKIF_OP_FLUSH_DISKCACHE;
-					break;
-				default:
-					ring_req->operation = 0;
-				}
+			switch (info->feature_flush &
+				((REQ_FLUSH|REQ_FUA))) {
+			case REQ_FLUSH|REQ_FUA:
+				ring_req->operation =
+					BLKIF_OP_WRITE_BARRIER;
+				break;
+			case REQ_FLUSH:
+				ring_req->operation =
+					BLKIF_OP_FLUSH_DISKCACHE;
+				break;
+			default:
+				ring_req->operation = 0;
 			}
-			ring_req->u.rw.nr_segments = nseg;
 		}
-		for_each_sg(info->shadow[id].sg, sg, nseg, i) {
-			fsect = sg->offset >> 9;
-			lsect = fsect + (sg->length >> 9) - 1;
+		ring_req->u.rw.nr_segments = nseg;
+	}
+	for_each_sg(info->shadow[id].sg, sg, nseg, i) {
+		fsect = sg->offset >> 9;
+		lsect = fsect + (sg->length >> 9) - 1;
 
-			if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
-			    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
-				unsigned long uninitialized_var(pfn);
+		if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
+		    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
+			unsigned long uninitialized_var(pfn);
 
-				if (segments)
-					kunmap_atomic(segments);
+			if (segments)
+				kunmap_atomic(segments);
 
-				n = i / SEGS_PER_INDIRECT_FRAME;
-				if (!info->feature_persistent) {
-					struct page *indirect_page;
-
-					/* Fetch a pre-allocated page to use for indirect grefs */
-					BUG_ON(list_empty(&info->indirect_pages));
-					indirect_page = list_first_entry(&info->indirect_pages,
-					                                 struct page, lru);
-					list_del(&indirect_page->lru);
-					pfn = page_to_pfn(indirect_page);
-				}
-				gnt_list_entry = get_grant(&gref_head, pfn, info);
-				info->shadow[id].indirect_grants[n] = gnt_list_entry;
-				segments = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
-				ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
+			n = i / SEGS_PER_INDIRECT_FRAME;
+			if (!info->feature_persistent) {
+				struct page *indirect_page;
+
+				/*
+				 * Fetch a pre-allocated page to use for
+				 * indirect grefs
+				 */
+				BUG_ON(list_empty(&info->indirect_pages));
+				indirect_page = list_first_entry(&info->indirect_pages,
+								 struct page, lru);
+				list_del(&indirect_page->lru);
+				pfn = page_to_pfn(indirect_page);
 			}
+			gnt_list_entry = get_grant(&gref_head, pfn, info);
+			info->shadow[id].indirect_grants[n] = gnt_list_entry;
+			segments = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+			ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
+		}
 
-			gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
-			ref = gnt_list_entry->gref;
+		gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
+		ref = gnt_list_entry->gref;
 
-			info->shadow[id].grants_used[i] = gnt_list_entry;
+		info->shadow[id].grants_used[i] = gnt_list_entry;
 
-			if (rq_data_dir(req) && info->feature_persistent) {
-				char *bvec_data;
-				void *shared_data;
+		if (rq_data_dir(req) && info->feature_persistent) {
+			char *bvec_data;
+			void *shared_data;
 
-				BUG_ON(sg->offset + sg->length > PAGE_SIZE);
+			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
-				shared_data = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
-				bvec_data = kmap_atomic(sg_page(sg));
+			shared_data = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+			bvec_data = kmap_atomic(sg_page(sg));
 
-				/*
-				 * this does not wipe data stored outside the
-				 * range sg->offset..sg->offset+sg->length.
-				 * Therefore, blkback *could* see data from
-				 * previous requests. This is OK as long as
-				 * persistent grants are shared with just one
-				 * domain. It may need refactoring if this
-				 * changes
-				 */
-				memcpy(shared_data + sg->offset,
-				       bvec_data   + sg->offset,
-				       sg->length);
+			/*
+			 * this does not wipe data stored outside the
+			 * range sg->offset..sg->offset+sg->length.
+			 * Therefore, blkback *could* see data from
+			 * previous requests. This is OK as long as
+			 * persistent grants are shared with just one
+			 * domain. It may need refactoring if this
+			 * changes
+			 */
+			memcpy(shared_data + sg->offset,
+			       bvec_data   + sg->offset,
+			       sg->length);
 
-				kunmap_atomic(bvec_data);
-				kunmap_atomic(shared_data);
-			}
-			if (ring_req->operation != BLKIF_OP_INDIRECT) {
-				ring_req->u.rw.seg[i] =
-						(struct blkif_request_segment) {
-							.gref       = ref,
-							.first_sect = fsect,
-							.last_sect  = lsect };
-			} else {
-				n = i % SEGS_PER_INDIRECT_FRAME;
-				segments[n] =
+			kunmap_atomic(bvec_data);
+			kunmap_atomic(shared_data);
+		}
+		if (ring_req->operation != BLKIF_OP_INDIRECT) {
+			ring_req->u.rw.seg[i] =
 					(struct blkif_request_segment) {
-							.gref       = ref,
-							.first_sect = fsect,
-							.last_sect  = lsect };
-			}
+						.gref       = ref,
+						.first_sect = fsect,
+						.last_sect  = lsect };
+		} else {
+			n = i % SEGS_PER_INDIRECT_FRAME;
+			segments[n] =
+				(struct blkif_request_segment) {
+						.gref       = ref,
+						.first_sect = fsect,
+						.last_sect  = lsect };
 		}
-		if (segments)
-			kunmap_atomic(segments);
 	}
+	if (segments)
+		kunmap_atomic(segments);
 
 	info->ring.req_prod_pvt++;
 
@@ -598,6 +609,24 @@ static int blkif_queue_request(struct request *req)
 	return 0;
 }
 
+/*
+ * Generate a Xen blkfront IO request from a blk layer request.  Reads
+ * and writes are handled as expected.
+ *
+ * @req: a request struct
+ */
+static int blkif_queue_request(struct request *req)
+{
+	struct blkfront_info *info = req->rq_disk->private_data;
+
+	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
+		return 1;
+
+	if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE)))
+		return blkif_queue_discard_req(req);
+	else
+		return blkif_queue_rw_req(req);
+}
 
 static inline void flush_requests(struct blkfront_info *info)
 {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 06/20] block/xen-blkfront: Split blkif_queue_request in 2
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, Julien Grall,
	David Vrabel, Boris Ostrovsky, linux-arm-kernel,
	Roger Pau Monné

Currently, blkif_queue_request has 2 distinct execution path:
    - Send a discard request
    - Send a read/write request

The function is also allocating grants to use for generating the
request. Although, this is only used for read/write request.

Rather than having a function with 2 distinct execution path, separate
the function in 2. This will also remove one level of tabulation.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

    Roger, if you really want if can drop the else clause in
    blkif_queue_request, IHMO it's more clear here. Although I've kept
    your Reviewed-by. Let me know if it's not fine.

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - Add Roger's Reviewed-by

    Changes in v2:
        - Patch added
---
 drivers/block/xen-blkfront.c | 277 ++++++++++++++++++++++++-------------------
 1 file changed, 153 insertions(+), 124 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 432e105..b11f084 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -395,13 +395,35 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 	return 0;
 }
 
-/*
- * Generate a Xen blkfront IO request from a blk layer request.  Reads
- * and writes are handled as expected.
- *
- * @req: a request struct
- */
-static int blkif_queue_request(struct request *req)
+static int blkif_queue_discard_req(struct request *req)
+{
+	struct blkfront_info *info = req->rq_disk->private_data;
+	struct blkif_request *ring_req;
+	unsigned long id;
+
+	/* Fill out a communications ring structure. */
+	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+	id = get_id_from_freelist(info);
+	info->shadow[id].request = req;
+
+	ring_req->operation = BLKIF_OP_DISCARD;
+	ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
+	ring_req->u.discard.id = id;
+	ring_req->u.discard.sector_number = (blkif_sector_t)blk_rq_pos(req);
+	if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
+		ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
+	else
+		ring_req->u.discard.flag = 0;
+
+	info->ring.req_prod_pvt++;
+
+	/* Keep a private copy so we can reissue requests when recovering. */
+	info->shadow[id].req = *ring_req;
+
+	return 0;
+}
+
+static int blkif_queue_rw_req(struct request *req)
 {
 	struct blkfront_info *info = req->rq_disk->private_data;
 	struct blkif_request *ring_req;
@@ -421,9 +443,6 @@ static int blkif_queue_request(struct request *req)
 	struct scatterlist *sg;
 	int nseg, max_grefs;
 
-	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
-		return 1;
-
 	max_grefs = req->nr_phys_segments;
 	if (max_grefs > BLKIF_MAX_SEGMENTS_PER_REQUEST)
 		/*
@@ -453,139 +472,131 @@ static int blkif_queue_request(struct request *req)
 	id = get_id_from_freelist(info);
 	info->shadow[id].request = req;
 
-	if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE))) {
-		ring_req->operation = BLKIF_OP_DISCARD;
-		ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
-		ring_req->u.discard.id = id;
-		ring_req->u.discard.sector_number = (blkif_sector_t)blk_rq_pos(req);
-		if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
-			ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
-		else
-			ring_req->u.discard.flag = 0;
+	BUG_ON(info->max_indirect_segments == 0 &&
+	       req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
+	BUG_ON(info->max_indirect_segments &&
+	       req->nr_phys_segments > info->max_indirect_segments);
+	nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+	ring_req->u.rw.id = id;
+	if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+		/*
+		 * The indirect operation can only be a BLKIF_OP_READ or
+		 * BLKIF_OP_WRITE
+		 */
+		BUG_ON(req->cmd_flags & (REQ_FLUSH | REQ_FUA));
+		ring_req->operation = BLKIF_OP_INDIRECT;
+		ring_req->u.indirect.indirect_op = rq_data_dir(req) ?
+			BLKIF_OP_WRITE : BLKIF_OP_READ;
+		ring_req->u.indirect.sector_number = (blkif_sector_t)blk_rq_pos(req);
+		ring_req->u.indirect.handle = info->handle;
+		ring_req->u.indirect.nr_segments = nseg;
 	} else {
-		BUG_ON(info->max_indirect_segments == 0 &&
-		       req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
-		BUG_ON(info->max_indirect_segments &&
-		       req->nr_phys_segments > info->max_indirect_segments);
-		nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
-		ring_req->u.rw.id = id;
-		if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+		ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
+		ring_req->u.rw.handle = info->handle;
+		ring_req->operation = rq_data_dir(req) ?
+			BLKIF_OP_WRITE : BLKIF_OP_READ;
+		if (req->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
 			/*
-			 * The indirect operation can only be a BLKIF_OP_READ or
-			 * BLKIF_OP_WRITE
+			 * Ideally we can do an unordered flush-to-disk.
+			 * In case the backend onlysupports barriers, use that.
+			 * A barrier request a superset of FUA, so we can
+			 * implement it the same way.  (It's also a FLUSH+FUA,
+			 * since it is guaranteed ordered WRT previous writes.)
 			 */
-			BUG_ON(req->cmd_flags & (REQ_FLUSH | REQ_FUA));
-			ring_req->operation = BLKIF_OP_INDIRECT;
-			ring_req->u.indirect.indirect_op = rq_data_dir(req) ?
-				BLKIF_OP_WRITE : BLKIF_OP_READ;
-			ring_req->u.indirect.sector_number = (blkif_sector_t)blk_rq_pos(req);
-			ring_req->u.indirect.handle = info->handle;
-			ring_req->u.indirect.nr_segments = nseg;
-		} else {
-			ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
-			ring_req->u.rw.handle = info->handle;
-			ring_req->operation = rq_data_dir(req) ?
-				BLKIF_OP_WRITE : BLKIF_OP_READ;
-			if (req->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
-				/*
-				 * Ideally we can do an unordered flush-to-disk. In case the
-				 * backend onlysupports barriers, use that. A barrier request
-				 * a superset of FUA, so we can implement it the same
-				 * way.  (It's also a FLUSH+FUA, since it is
-				 * guaranteed ordered WRT previous writes.)
-				 */
-				switch (info->feature_flush &
-					((REQ_FLUSH|REQ_FUA))) {
-				case REQ_FLUSH|REQ_FUA:
-					ring_req->operation =
-						BLKIF_OP_WRITE_BARRIER;
-					break;
-				case REQ_FLUSH:
-					ring_req->operation =
-						BLKIF_OP_FLUSH_DISKCACHE;
-					break;
-				default:
-					ring_req->operation = 0;
-				}
+			switch (info->feature_flush &
+				((REQ_FLUSH|REQ_FUA))) {
+			case REQ_FLUSH|REQ_FUA:
+				ring_req->operation =
+					BLKIF_OP_WRITE_BARRIER;
+				break;
+			case REQ_FLUSH:
+				ring_req->operation =
+					BLKIF_OP_FLUSH_DISKCACHE;
+				break;
+			default:
+				ring_req->operation = 0;
 			}
-			ring_req->u.rw.nr_segments = nseg;
 		}
-		for_each_sg(info->shadow[id].sg, sg, nseg, i) {
-			fsect = sg->offset >> 9;
-			lsect = fsect + (sg->length >> 9) - 1;
+		ring_req->u.rw.nr_segments = nseg;
+	}
+	for_each_sg(info->shadow[id].sg, sg, nseg, i) {
+		fsect = sg->offset >> 9;
+		lsect = fsect + (sg->length >> 9) - 1;
 
-			if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
-			    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
-				unsigned long uninitialized_var(pfn);
+		if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
+		    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
+			unsigned long uninitialized_var(pfn);
 
-				if (segments)
-					kunmap_atomic(segments);
+			if (segments)
+				kunmap_atomic(segments);
 
-				n = i / SEGS_PER_INDIRECT_FRAME;
-				if (!info->feature_persistent) {
-					struct page *indirect_page;
-
-					/* Fetch a pre-allocated page to use for indirect grefs */
-					BUG_ON(list_empty(&info->indirect_pages));
-					indirect_page = list_first_entry(&info->indirect_pages,
-					                                 struct page, lru);
-					list_del(&indirect_page->lru);
-					pfn = page_to_pfn(indirect_page);
-				}
-				gnt_list_entry = get_grant(&gref_head, pfn, info);
-				info->shadow[id].indirect_grants[n] = gnt_list_entry;
-				segments = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
-				ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
+			n = i / SEGS_PER_INDIRECT_FRAME;
+			if (!info->feature_persistent) {
+				struct page *indirect_page;
+
+				/*
+				 * Fetch a pre-allocated page to use for
+				 * indirect grefs
+				 */
+				BUG_ON(list_empty(&info->indirect_pages));
+				indirect_page = list_first_entry(&info->indirect_pages,
+								 struct page, lru);
+				list_del(&indirect_page->lru);
+				pfn = page_to_pfn(indirect_page);
 			}
+			gnt_list_entry = get_grant(&gref_head, pfn, info);
+			info->shadow[id].indirect_grants[n] = gnt_list_entry;
+			segments = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+			ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
+		}
 
-			gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
-			ref = gnt_list_entry->gref;
+		gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
+		ref = gnt_list_entry->gref;
 
-			info->shadow[id].grants_used[i] = gnt_list_entry;
+		info->shadow[id].grants_used[i] = gnt_list_entry;
 
-			if (rq_data_dir(req) && info->feature_persistent) {
-				char *bvec_data;
-				void *shared_data;
+		if (rq_data_dir(req) && info->feature_persistent) {
+			char *bvec_data;
+			void *shared_data;
 
-				BUG_ON(sg->offset + sg->length > PAGE_SIZE);
+			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
-				shared_data = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
-				bvec_data = kmap_atomic(sg_page(sg));
+			shared_data = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+			bvec_data = kmap_atomic(sg_page(sg));
 
-				/*
-				 * this does not wipe data stored outside the
-				 * range sg->offset..sg->offset+sg->length.
-				 * Therefore, blkback *could* see data from
-				 * previous requests. This is OK as long as
-				 * persistent grants are shared with just one
-				 * domain. It may need refactoring if this
-				 * changes
-				 */
-				memcpy(shared_data + sg->offset,
-				       bvec_data   + sg->offset,
-				       sg->length);
+			/*
+			 * this does not wipe data stored outside the
+			 * range sg->offset..sg->offset+sg->length.
+			 * Therefore, blkback *could* see data from
+			 * previous requests. This is OK as long as
+			 * persistent grants are shared with just one
+			 * domain. It may need refactoring if this
+			 * changes
+			 */
+			memcpy(shared_data + sg->offset,
+			       bvec_data   + sg->offset,
+			       sg->length);
 
-				kunmap_atomic(bvec_data);
-				kunmap_atomic(shared_data);
-			}
-			if (ring_req->operation != BLKIF_OP_INDIRECT) {
-				ring_req->u.rw.seg[i] =
-						(struct blkif_request_segment) {
-							.gref       = ref,
-							.first_sect = fsect,
-							.last_sect  = lsect };
-			} else {
-				n = i % SEGS_PER_INDIRECT_FRAME;
-				segments[n] =
+			kunmap_atomic(bvec_data);
+			kunmap_atomic(shared_data);
+		}
+		if (ring_req->operation != BLKIF_OP_INDIRECT) {
+			ring_req->u.rw.seg[i] =
 					(struct blkif_request_segment) {
-							.gref       = ref,
-							.first_sect = fsect,
-							.last_sect  = lsect };
-			}
+						.gref       = ref,
+						.first_sect = fsect,
+						.last_sect  = lsect };
+		} else {
+			n = i % SEGS_PER_INDIRECT_FRAME;
+			segments[n] =
+				(struct blkif_request_segment) {
+						.gref       = ref,
+						.first_sect = fsect,
+						.last_sect  = lsect };
 		}
-		if (segments)
-			kunmap_atomic(segments);
 	}
+	if (segments)
+		kunmap_atomic(segments);
 
 	info->ring.req_prod_pvt++;
 
@@ -598,6 +609,24 @@ static int blkif_queue_request(struct request *req)
 	return 0;
 }
 
+/*
+ * Generate a Xen blkfront IO request from a blk layer request.  Reads
+ * and writes are handled as expected.
+ *
+ * @req: a request struct
+ */
+static int blkif_queue_request(struct request *req)
+{
+	struct blkfront_info *info = req->rq_disk->private_data;
+
+	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
+		return 1;
+
+	if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE)))
+		return blkif_queue_discard_req(req);
+	else
+		return blkif_queue_rw_req(req);
+}
 
 static inline void flush_requests(struct blkfront_info *info)
 {
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 06/20] block/xen-blkfront: Split blkif_queue_request in 2
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

Currently, blkif_queue_request has 2 distinct execution path:
    - Send a discard request
    - Send a read/write request

The function is also allocating grants to use for generating the
request. Although, this is only used for read/write request.

Rather than having a function with 2 distinct execution path, separate
the function in 2. This will also remove one level of tabulation.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Roger Pau Monn? <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

    Roger, if you really want if can drop the else clause in
    blkif_queue_request, IHMO it's more clear here. Although I've kept
    your Reviewed-by. Let me know if it's not fine.

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - Add Roger's Reviewed-by

    Changes in v2:
        - Patch added
---
 drivers/block/xen-blkfront.c | 277 ++++++++++++++++++++++++-------------------
 1 file changed, 153 insertions(+), 124 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 432e105..b11f084 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -395,13 +395,35 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 	return 0;
 }
 
-/*
- * Generate a Xen blkfront IO request from a blk layer request.  Reads
- * and writes are handled as expected.
- *
- * @req: a request struct
- */
-static int blkif_queue_request(struct request *req)
+static int blkif_queue_discard_req(struct request *req)
+{
+	struct blkfront_info *info = req->rq_disk->private_data;
+	struct blkif_request *ring_req;
+	unsigned long id;
+
+	/* Fill out a communications ring structure. */
+	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+	id = get_id_from_freelist(info);
+	info->shadow[id].request = req;
+
+	ring_req->operation = BLKIF_OP_DISCARD;
+	ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
+	ring_req->u.discard.id = id;
+	ring_req->u.discard.sector_number = (blkif_sector_t)blk_rq_pos(req);
+	if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
+		ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
+	else
+		ring_req->u.discard.flag = 0;
+
+	info->ring.req_prod_pvt++;
+
+	/* Keep a private copy so we can reissue requests when recovering. */
+	info->shadow[id].req = *ring_req;
+
+	return 0;
+}
+
+static int blkif_queue_rw_req(struct request *req)
 {
 	struct blkfront_info *info = req->rq_disk->private_data;
 	struct blkif_request *ring_req;
@@ -421,9 +443,6 @@ static int blkif_queue_request(struct request *req)
 	struct scatterlist *sg;
 	int nseg, max_grefs;
 
-	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
-		return 1;
-
 	max_grefs = req->nr_phys_segments;
 	if (max_grefs > BLKIF_MAX_SEGMENTS_PER_REQUEST)
 		/*
@@ -453,139 +472,131 @@ static int blkif_queue_request(struct request *req)
 	id = get_id_from_freelist(info);
 	info->shadow[id].request = req;
 
-	if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE))) {
-		ring_req->operation = BLKIF_OP_DISCARD;
-		ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
-		ring_req->u.discard.id = id;
-		ring_req->u.discard.sector_number = (blkif_sector_t)blk_rq_pos(req);
-		if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
-			ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
-		else
-			ring_req->u.discard.flag = 0;
+	BUG_ON(info->max_indirect_segments == 0 &&
+	       req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
+	BUG_ON(info->max_indirect_segments &&
+	       req->nr_phys_segments > info->max_indirect_segments);
+	nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+	ring_req->u.rw.id = id;
+	if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+		/*
+		 * The indirect operation can only be a BLKIF_OP_READ or
+		 * BLKIF_OP_WRITE
+		 */
+		BUG_ON(req->cmd_flags & (REQ_FLUSH | REQ_FUA));
+		ring_req->operation = BLKIF_OP_INDIRECT;
+		ring_req->u.indirect.indirect_op = rq_data_dir(req) ?
+			BLKIF_OP_WRITE : BLKIF_OP_READ;
+		ring_req->u.indirect.sector_number = (blkif_sector_t)blk_rq_pos(req);
+		ring_req->u.indirect.handle = info->handle;
+		ring_req->u.indirect.nr_segments = nseg;
 	} else {
-		BUG_ON(info->max_indirect_segments == 0 &&
-		       req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
-		BUG_ON(info->max_indirect_segments &&
-		       req->nr_phys_segments > info->max_indirect_segments);
-		nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
-		ring_req->u.rw.id = id;
-		if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+		ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
+		ring_req->u.rw.handle = info->handle;
+		ring_req->operation = rq_data_dir(req) ?
+			BLKIF_OP_WRITE : BLKIF_OP_READ;
+		if (req->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
 			/*
-			 * The indirect operation can only be a BLKIF_OP_READ or
-			 * BLKIF_OP_WRITE
+			 * Ideally we can do an unordered flush-to-disk.
+			 * In case the backend onlysupports barriers, use that.
+			 * A barrier request a superset of FUA, so we can
+			 * implement it the same way.  (It's also a FLUSH+FUA,
+			 * since it is guaranteed ordered WRT previous writes.)
 			 */
-			BUG_ON(req->cmd_flags & (REQ_FLUSH | REQ_FUA));
-			ring_req->operation = BLKIF_OP_INDIRECT;
-			ring_req->u.indirect.indirect_op = rq_data_dir(req) ?
-				BLKIF_OP_WRITE : BLKIF_OP_READ;
-			ring_req->u.indirect.sector_number = (blkif_sector_t)blk_rq_pos(req);
-			ring_req->u.indirect.handle = info->handle;
-			ring_req->u.indirect.nr_segments = nseg;
-		} else {
-			ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
-			ring_req->u.rw.handle = info->handle;
-			ring_req->operation = rq_data_dir(req) ?
-				BLKIF_OP_WRITE : BLKIF_OP_READ;
-			if (req->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
-				/*
-				 * Ideally we can do an unordered flush-to-disk. In case the
-				 * backend onlysupports barriers, use that. A barrier request
-				 * a superset of FUA, so we can implement it the same
-				 * way.  (It's also a FLUSH+FUA, since it is
-				 * guaranteed ordered WRT previous writes.)
-				 */
-				switch (info->feature_flush &
-					((REQ_FLUSH|REQ_FUA))) {
-				case REQ_FLUSH|REQ_FUA:
-					ring_req->operation =
-						BLKIF_OP_WRITE_BARRIER;
-					break;
-				case REQ_FLUSH:
-					ring_req->operation =
-						BLKIF_OP_FLUSH_DISKCACHE;
-					break;
-				default:
-					ring_req->operation = 0;
-				}
+			switch (info->feature_flush &
+				((REQ_FLUSH|REQ_FUA))) {
+			case REQ_FLUSH|REQ_FUA:
+				ring_req->operation =
+					BLKIF_OP_WRITE_BARRIER;
+				break;
+			case REQ_FLUSH:
+				ring_req->operation =
+					BLKIF_OP_FLUSH_DISKCACHE;
+				break;
+			default:
+				ring_req->operation = 0;
 			}
-			ring_req->u.rw.nr_segments = nseg;
 		}
-		for_each_sg(info->shadow[id].sg, sg, nseg, i) {
-			fsect = sg->offset >> 9;
-			lsect = fsect + (sg->length >> 9) - 1;
+		ring_req->u.rw.nr_segments = nseg;
+	}
+	for_each_sg(info->shadow[id].sg, sg, nseg, i) {
+		fsect = sg->offset >> 9;
+		lsect = fsect + (sg->length >> 9) - 1;
 
-			if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
-			    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
-				unsigned long uninitialized_var(pfn);
+		if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
+		    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
+			unsigned long uninitialized_var(pfn);
 
-				if (segments)
-					kunmap_atomic(segments);
+			if (segments)
+				kunmap_atomic(segments);
 
-				n = i / SEGS_PER_INDIRECT_FRAME;
-				if (!info->feature_persistent) {
-					struct page *indirect_page;
-
-					/* Fetch a pre-allocated page to use for indirect grefs */
-					BUG_ON(list_empty(&info->indirect_pages));
-					indirect_page = list_first_entry(&info->indirect_pages,
-					                                 struct page, lru);
-					list_del(&indirect_page->lru);
-					pfn = page_to_pfn(indirect_page);
-				}
-				gnt_list_entry = get_grant(&gref_head, pfn, info);
-				info->shadow[id].indirect_grants[n] = gnt_list_entry;
-				segments = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
-				ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
+			n = i / SEGS_PER_INDIRECT_FRAME;
+			if (!info->feature_persistent) {
+				struct page *indirect_page;
+
+				/*
+				 * Fetch a pre-allocated page to use for
+				 * indirect grefs
+				 */
+				BUG_ON(list_empty(&info->indirect_pages));
+				indirect_page = list_first_entry(&info->indirect_pages,
+								 struct page, lru);
+				list_del(&indirect_page->lru);
+				pfn = page_to_pfn(indirect_page);
 			}
+			gnt_list_entry = get_grant(&gref_head, pfn, info);
+			info->shadow[id].indirect_grants[n] = gnt_list_entry;
+			segments = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+			ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
+		}
 
-			gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
-			ref = gnt_list_entry->gref;
+		gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
+		ref = gnt_list_entry->gref;
 
-			info->shadow[id].grants_used[i] = gnt_list_entry;
+		info->shadow[id].grants_used[i] = gnt_list_entry;
 
-			if (rq_data_dir(req) && info->feature_persistent) {
-				char *bvec_data;
-				void *shared_data;
+		if (rq_data_dir(req) && info->feature_persistent) {
+			char *bvec_data;
+			void *shared_data;
 
-				BUG_ON(sg->offset + sg->length > PAGE_SIZE);
+			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
-				shared_data = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
-				bvec_data = kmap_atomic(sg_page(sg));
+			shared_data = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+			bvec_data = kmap_atomic(sg_page(sg));
 
-				/*
-				 * this does not wipe data stored outside the
-				 * range sg->offset..sg->offset+sg->length.
-				 * Therefore, blkback *could* see data from
-				 * previous requests. This is OK as long as
-				 * persistent grants are shared with just one
-				 * domain. It may need refactoring if this
-				 * changes
-				 */
-				memcpy(shared_data + sg->offset,
-				       bvec_data   + sg->offset,
-				       sg->length);
+			/*
+			 * this does not wipe data stored outside the
+			 * range sg->offset..sg->offset+sg->length.
+			 * Therefore, blkback *could* see data from
+			 * previous requests. This is OK as long as
+			 * persistent grants are shared with just one
+			 * domain. It may need refactoring if this
+			 * changes
+			 */
+			memcpy(shared_data + sg->offset,
+			       bvec_data   + sg->offset,
+			       sg->length);
 
-				kunmap_atomic(bvec_data);
-				kunmap_atomic(shared_data);
-			}
-			if (ring_req->operation != BLKIF_OP_INDIRECT) {
-				ring_req->u.rw.seg[i] =
-						(struct blkif_request_segment) {
-							.gref       = ref,
-							.first_sect = fsect,
-							.last_sect  = lsect };
-			} else {
-				n = i % SEGS_PER_INDIRECT_FRAME;
-				segments[n] =
+			kunmap_atomic(bvec_data);
+			kunmap_atomic(shared_data);
+		}
+		if (ring_req->operation != BLKIF_OP_INDIRECT) {
+			ring_req->u.rw.seg[i] =
 					(struct blkif_request_segment) {
-							.gref       = ref,
-							.first_sect = fsect,
-							.last_sect  = lsect };
-			}
+						.gref       = ref,
+						.first_sect = fsect,
+						.last_sect  = lsect };
+		} else {
+			n = i % SEGS_PER_INDIRECT_FRAME;
+			segments[n] =
+				(struct blkif_request_segment) {
+						.gref       = ref,
+						.first_sect = fsect,
+						.last_sect  = lsect };
 		}
-		if (segments)
-			kunmap_atomic(segments);
 	}
+	if (segments)
+		kunmap_atomic(segments);
 
 	info->ring.req_prod_pvt++;
 
@@ -598,6 +609,24 @@ static int blkif_queue_request(struct request *req)
 	return 0;
 }
 
+/*
+ * Generate a Xen blkfront IO request from a blk layer request.  Reads
+ * and writes are handled as expected.
+ *
+ * @req: a request struct
+ */
+static int blkif_queue_request(struct request *req)
+{
+	struct blkfront_info *info = req->rq_disk->private_data;
+
+	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
+		return 1;
+
+	if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE)))
+		return blkif_queue_discard_req(req);
+	else
+		return blkif_queue_rw_req(req);
+}
 
 static inline void flush_requests(struct blkfront_info *info)
 {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 07/20] block/xen-blkfront: Store a page rather a pfn in the grant structure
  2015-09-07 15:33 ` Julien Grall
  (?)
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, Roger Pau Monné, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, David Vrabel

All the usage of the field pfn are done using the same idiom:

pfn_to_page(grant->pfn)

This will  return always the same page. Store directly the page in the
grant to clean up the code.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

Roger, Stefano, I kept your Acked-by/Reviewed-by because the rebase was
minor. Let me know if you disagree.

    Changes in v4:
        - rebase after 7adf12b87f45a77d364464018fb8e9e1ac875152
        "xen-blkfront: don't add indirect pages to list when
        !feature_persistent"

    Changes in v3:
        - Use the correct indentation in get_grant. The current
        indentation (i.e without this patch) was wrong because it was
        using space rather than tabulation.
        - Add Roger's acked and Stefano's reviewed
        - s/mfn/gfn based on the new naming

    Changes in v2:
        - Patch added
---
 drivers/block/xen-blkfront.c | 39 +++++++++++++++++++--------------------
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b11f084..556475d 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -68,7 +68,7 @@ enum blkif_state {
 
 struct grant {
 	grant_ref_t gref;
-	unsigned long pfn;
+	struct page *page;
 	struct list_head node;
 };
 
@@ -222,7 +222,7 @@ static int fill_grant_buffer(struct blkfront_info *info, int num)
 				kfree(gnt_list_entry);
 				goto out_of_memory;
 			}
-			gnt_list_entry->pfn = page_to_pfn(granted_page);
+			gnt_list_entry->page = granted_page;
 		}
 
 		gnt_list_entry->gref = GRANT_INVALID_REF;
@@ -237,7 +237,7 @@ out_of_memory:
 	                         &info->grants, node) {
 		list_del(&gnt_list_entry->node);
 		if (info->feature_persistent)
-			__free_page(pfn_to_page(gnt_list_entry->pfn));
+			__free_page(gnt_list_entry->page);
 		kfree(gnt_list_entry);
 		i--;
 	}
@@ -246,8 +246,8 @@ out_of_memory:
 }
 
 static struct grant *get_grant(grant_ref_t *gref_head,
-                               unsigned long pfn,
-                               struct blkfront_info *info)
+			       struct page *page,
+			       struct blkfront_info *info)
 {
 	struct grant *gnt_list_entry;
 	unsigned long buffer_gfn;
@@ -266,10 +266,10 @@ static struct grant *get_grant(grant_ref_t *gref_head,
 	gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
 	BUG_ON(gnt_list_entry->gref == -ENOSPC);
 	if (!info->feature_persistent) {
-		BUG_ON(!pfn);
-		gnt_list_entry->pfn = pfn;
+		BUG_ON(!page);
+		gnt_list_entry->page = page;
 	}
-	buffer_gfn = pfn_to_gfn(gnt_list_entry->pfn);
+	buffer_gfn = xen_page_to_gfn(gnt_list_entry->page);
 	gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
 	                                info->xbdev->otherend_id,
 	                                buffer_gfn, 0);
@@ -525,7 +525,7 @@ static int blkif_queue_rw_req(struct request *req)
 
 		if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
 		    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
-			unsigned long uninitialized_var(pfn);
+			struct page *uninitialized_var(page);
 
 			if (segments)
 				kunmap_atomic(segments);
@@ -542,15 +542,15 @@ static int blkif_queue_rw_req(struct request *req)
 				indirect_page = list_first_entry(&info->indirect_pages,
 								 struct page, lru);
 				list_del(&indirect_page->lru);
-				pfn = page_to_pfn(indirect_page);
+				page = indirect_page;
 			}
-			gnt_list_entry = get_grant(&gref_head, pfn, info);
+			gnt_list_entry = get_grant(&gref_head, page, info);
 			info->shadow[id].indirect_grants[n] = gnt_list_entry;
-			segments = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+			segments = kmap_atomic(gnt_list_entry->page);
 			ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
 		}
 
-		gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
+		gnt_list_entry = get_grant(&gref_head, sg_page(sg), info);
 		ref = gnt_list_entry->gref;
 
 		info->shadow[id].grants_used[i] = gnt_list_entry;
@@ -561,7 +561,7 @@ static int blkif_queue_rw_req(struct request *req)
 
 			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
-			shared_data = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+			shared_data = kmap_atomic(gnt_list_entry->page);
 			bvec_data = kmap_atomic(sg_page(sg));
 
 			/*
@@ -1002,7 +1002,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 				info->persistent_gnts_c--;
 			}
 			if (info->feature_persistent)
-				__free_page(pfn_to_page(persistent_gnt->pfn));
+				__free_page(persistent_gnt->page);
 			kfree(persistent_gnt);
 		}
 	}
@@ -1037,7 +1037,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 			persistent_gnt = info->shadow[i].grants_used[j];
 			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
 			if (info->feature_persistent)
-				__free_page(pfn_to_page(persistent_gnt->pfn));
+				__free_page(persistent_gnt->page);
 			kfree(persistent_gnt);
 		}
 
@@ -1051,7 +1051,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		for (j = 0; j < INDIRECT_GREFS(segs); j++) {
 			persistent_gnt = info->shadow[i].indirect_grants[j];
 			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
-			__free_page(pfn_to_page(persistent_gnt->pfn));
+			__free_page(persistent_gnt->page);
 			kfree(persistent_gnt);
 		}
 
@@ -1102,8 +1102,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
 		for_each_sg(s->sg, sg, nseg, i) {
 			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
-			shared_data = kmap_atomic(
-				pfn_to_page(s->grants_used[i]->pfn));
+			shared_data = kmap_atomic(s->grants_used[i]->page);
 			bvec_data = kmap_atomic(sg_page(sg));
 			memcpy(bvec_data   + sg->offset,
 			       shared_data + sg->offset,
@@ -1155,7 +1154,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 				 * available pages for indirect grefs.
 				 */
 				if (!info->feature_persistent) {
-					indirect_page = pfn_to_page(s->indirect_grants[i]->pfn);
+					indirect_page = s->indirect_grants[i]->page;
 					list_add(&indirect_page->lru, &info->indirect_pages);
 				}
 				s->indirect_grants[i]->gref = GRANT_INVALID_REF;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 07/20] block/xen-blkfront: Store a page rather a pfn in the grant structure
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, Julien Grall,
	David Vrabel, Boris Ostrovsky, linux-arm-kernel,
	Roger Pau Monné

All the usage of the field pfn are done using the same idiom:

pfn_to_page(grant->pfn)

This will  return always the same page. Store directly the page in the
grant to clean up the code.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

Roger, Stefano, I kept your Acked-by/Reviewed-by because the rebase was
minor. Let me know if you disagree.

    Changes in v4:
        - rebase after 7adf12b87f45a77d364464018fb8e9e1ac875152
        "xen-blkfront: don't add indirect pages to list when
        !feature_persistent"

    Changes in v3:
        - Use the correct indentation in get_grant. The current
        indentation (i.e without this patch) was wrong because it was
        using space rather than tabulation.
        - Add Roger's acked and Stefano's reviewed
        - s/mfn/gfn based on the new naming

    Changes in v2:
        - Patch added
---
 drivers/block/xen-blkfront.c | 39 +++++++++++++++++++--------------------
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b11f084..556475d 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -68,7 +68,7 @@ enum blkif_state {
 
 struct grant {
 	grant_ref_t gref;
-	unsigned long pfn;
+	struct page *page;
 	struct list_head node;
 };
 
@@ -222,7 +222,7 @@ static int fill_grant_buffer(struct blkfront_info *info, int num)
 				kfree(gnt_list_entry);
 				goto out_of_memory;
 			}
-			gnt_list_entry->pfn = page_to_pfn(granted_page);
+			gnt_list_entry->page = granted_page;
 		}
 
 		gnt_list_entry->gref = GRANT_INVALID_REF;
@@ -237,7 +237,7 @@ out_of_memory:
 	                         &info->grants, node) {
 		list_del(&gnt_list_entry->node);
 		if (info->feature_persistent)
-			__free_page(pfn_to_page(gnt_list_entry->pfn));
+			__free_page(gnt_list_entry->page);
 		kfree(gnt_list_entry);
 		i--;
 	}
@@ -246,8 +246,8 @@ out_of_memory:
 }
 
 static struct grant *get_grant(grant_ref_t *gref_head,
-                               unsigned long pfn,
-                               struct blkfront_info *info)
+			       struct page *page,
+			       struct blkfront_info *info)
 {
 	struct grant *gnt_list_entry;
 	unsigned long buffer_gfn;
@@ -266,10 +266,10 @@ static struct grant *get_grant(grant_ref_t *gref_head,
 	gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
 	BUG_ON(gnt_list_entry->gref == -ENOSPC);
 	if (!info->feature_persistent) {
-		BUG_ON(!pfn);
-		gnt_list_entry->pfn = pfn;
+		BUG_ON(!page);
+		gnt_list_entry->page = page;
 	}
-	buffer_gfn = pfn_to_gfn(gnt_list_entry->pfn);
+	buffer_gfn = xen_page_to_gfn(gnt_list_entry->page);
 	gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
 	                                info->xbdev->otherend_id,
 	                                buffer_gfn, 0);
@@ -525,7 +525,7 @@ static int blkif_queue_rw_req(struct request *req)
 
 		if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
 		    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
-			unsigned long uninitialized_var(pfn);
+			struct page *uninitialized_var(page);
 
 			if (segments)
 				kunmap_atomic(segments);
@@ -542,15 +542,15 @@ static int blkif_queue_rw_req(struct request *req)
 				indirect_page = list_first_entry(&info->indirect_pages,
 								 struct page, lru);
 				list_del(&indirect_page->lru);
-				pfn = page_to_pfn(indirect_page);
+				page = indirect_page;
 			}
-			gnt_list_entry = get_grant(&gref_head, pfn, info);
+			gnt_list_entry = get_grant(&gref_head, page, info);
 			info->shadow[id].indirect_grants[n] = gnt_list_entry;
-			segments = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+			segments = kmap_atomic(gnt_list_entry->page);
 			ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
 		}
 
-		gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
+		gnt_list_entry = get_grant(&gref_head, sg_page(sg), info);
 		ref = gnt_list_entry->gref;
 
 		info->shadow[id].grants_used[i] = gnt_list_entry;
@@ -561,7 +561,7 @@ static int blkif_queue_rw_req(struct request *req)
 
 			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
-			shared_data = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+			shared_data = kmap_atomic(gnt_list_entry->page);
 			bvec_data = kmap_atomic(sg_page(sg));
 
 			/*
@@ -1002,7 +1002,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 				info->persistent_gnts_c--;
 			}
 			if (info->feature_persistent)
-				__free_page(pfn_to_page(persistent_gnt->pfn));
+				__free_page(persistent_gnt->page);
 			kfree(persistent_gnt);
 		}
 	}
@@ -1037,7 +1037,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 			persistent_gnt = info->shadow[i].grants_used[j];
 			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
 			if (info->feature_persistent)
-				__free_page(pfn_to_page(persistent_gnt->pfn));
+				__free_page(persistent_gnt->page);
 			kfree(persistent_gnt);
 		}
 
@@ -1051,7 +1051,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		for (j = 0; j < INDIRECT_GREFS(segs); j++) {
 			persistent_gnt = info->shadow[i].indirect_grants[j];
 			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
-			__free_page(pfn_to_page(persistent_gnt->pfn));
+			__free_page(persistent_gnt->page);
 			kfree(persistent_gnt);
 		}
 
@@ -1102,8 +1102,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
 		for_each_sg(s->sg, sg, nseg, i) {
 			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
-			shared_data = kmap_atomic(
-				pfn_to_page(s->grants_used[i]->pfn));
+			shared_data = kmap_atomic(s->grants_used[i]->page);
 			bvec_data = kmap_atomic(sg_page(sg));
 			memcpy(bvec_data   + sg->offset,
 			       shared_data + sg->offset,
@@ -1155,7 +1154,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 				 * available pages for indirect grefs.
 				 */
 				if (!info->feature_persistent) {
-					indirect_page = pfn_to_page(s->indirect_grants[i]->pfn);
+					indirect_page = s->indirect_grants[i]->page;
 					list_add(&indirect_page->lru, &info->indirect_pages);
 				}
 				s->indirect_grants[i]->gref = GRANT_INVALID_REF;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 07/20] block/xen-blkfront: Store a page rather a pfn in the grant structure
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

All the usage of the field pfn are done using the same idiom:

pfn_to_page(grant->pfn)

This will  return always the same page. Store directly the page in the
grant to clean up the code.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monn? <roger.pau@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

Roger, Stefano, I kept your Acked-by/Reviewed-by because the rebase was
minor. Let me know if you disagree.

    Changes in v4:
        - rebase after 7adf12b87f45a77d364464018fb8e9e1ac875152
        "xen-blkfront: don't add indirect pages to list when
        !feature_persistent"

    Changes in v3:
        - Use the correct indentation in get_grant. The current
        indentation (i.e without this patch) was wrong because it was
        using space rather than tabulation.
        - Add Roger's acked and Stefano's reviewed
        - s/mfn/gfn based on the new naming

    Changes in v2:
        - Patch added
---
 drivers/block/xen-blkfront.c | 39 +++++++++++++++++++--------------------
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b11f084..556475d 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -68,7 +68,7 @@ enum blkif_state {
 
 struct grant {
 	grant_ref_t gref;
-	unsigned long pfn;
+	struct page *page;
 	struct list_head node;
 };
 
@@ -222,7 +222,7 @@ static int fill_grant_buffer(struct blkfront_info *info, int num)
 				kfree(gnt_list_entry);
 				goto out_of_memory;
 			}
-			gnt_list_entry->pfn = page_to_pfn(granted_page);
+			gnt_list_entry->page = granted_page;
 		}
 
 		gnt_list_entry->gref = GRANT_INVALID_REF;
@@ -237,7 +237,7 @@ out_of_memory:
 	                         &info->grants, node) {
 		list_del(&gnt_list_entry->node);
 		if (info->feature_persistent)
-			__free_page(pfn_to_page(gnt_list_entry->pfn));
+			__free_page(gnt_list_entry->page);
 		kfree(gnt_list_entry);
 		i--;
 	}
@@ -246,8 +246,8 @@ out_of_memory:
 }
 
 static struct grant *get_grant(grant_ref_t *gref_head,
-                               unsigned long pfn,
-                               struct blkfront_info *info)
+			       struct page *page,
+			       struct blkfront_info *info)
 {
 	struct grant *gnt_list_entry;
 	unsigned long buffer_gfn;
@@ -266,10 +266,10 @@ static struct grant *get_grant(grant_ref_t *gref_head,
 	gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
 	BUG_ON(gnt_list_entry->gref == -ENOSPC);
 	if (!info->feature_persistent) {
-		BUG_ON(!pfn);
-		gnt_list_entry->pfn = pfn;
+		BUG_ON(!page);
+		gnt_list_entry->page = page;
 	}
-	buffer_gfn = pfn_to_gfn(gnt_list_entry->pfn);
+	buffer_gfn = xen_page_to_gfn(gnt_list_entry->page);
 	gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
 	                                info->xbdev->otherend_id,
 	                                buffer_gfn, 0);
@@ -525,7 +525,7 @@ static int blkif_queue_rw_req(struct request *req)
 
 		if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
 		    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
-			unsigned long uninitialized_var(pfn);
+			struct page *uninitialized_var(page);
 
 			if (segments)
 				kunmap_atomic(segments);
@@ -542,15 +542,15 @@ static int blkif_queue_rw_req(struct request *req)
 				indirect_page = list_first_entry(&info->indirect_pages,
 								 struct page, lru);
 				list_del(&indirect_page->lru);
-				pfn = page_to_pfn(indirect_page);
+				page = indirect_page;
 			}
-			gnt_list_entry = get_grant(&gref_head, pfn, info);
+			gnt_list_entry = get_grant(&gref_head, page, info);
 			info->shadow[id].indirect_grants[n] = gnt_list_entry;
-			segments = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+			segments = kmap_atomic(gnt_list_entry->page);
 			ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
 		}
 
-		gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
+		gnt_list_entry = get_grant(&gref_head, sg_page(sg), info);
 		ref = gnt_list_entry->gref;
 
 		info->shadow[id].grants_used[i] = gnt_list_entry;
@@ -561,7 +561,7 @@ static int blkif_queue_rw_req(struct request *req)
 
 			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
-			shared_data = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+			shared_data = kmap_atomic(gnt_list_entry->page);
 			bvec_data = kmap_atomic(sg_page(sg));
 
 			/*
@@ -1002,7 +1002,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 				info->persistent_gnts_c--;
 			}
 			if (info->feature_persistent)
-				__free_page(pfn_to_page(persistent_gnt->pfn));
+				__free_page(persistent_gnt->page);
 			kfree(persistent_gnt);
 		}
 	}
@@ -1037,7 +1037,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 			persistent_gnt = info->shadow[i].grants_used[j];
 			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
 			if (info->feature_persistent)
-				__free_page(pfn_to_page(persistent_gnt->pfn));
+				__free_page(persistent_gnt->page);
 			kfree(persistent_gnt);
 		}
 
@@ -1051,7 +1051,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		for (j = 0; j < INDIRECT_GREFS(segs); j++) {
 			persistent_gnt = info->shadow[i].indirect_grants[j];
 			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
-			__free_page(pfn_to_page(persistent_gnt->pfn));
+			__free_page(persistent_gnt->page);
 			kfree(persistent_gnt);
 		}
 
@@ -1102,8 +1102,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
 		for_each_sg(s->sg, sg, nseg, i) {
 			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
-			shared_data = kmap_atomic(
-				pfn_to_page(s->grants_used[i]->pfn));
+			shared_data = kmap_atomic(s->grants_used[i]->page);
 			bvec_data = kmap_atomic(sg_page(sg));
 			memcpy(bvec_data   + sg->offset,
 			       shared_data + sg->offset,
@@ -1155,7 +1154,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 				 * available pages for indirect grefs.
 				 */
 				if (!info->feature_persistent) {
-					indirect_page = pfn_to_page(s->indirect_grants[i]->pfn);
+					indirect_page = s->indirect_grants[i]->page;
 					list_add(&indirect_page->lru, &info->indirect_pages);
 				}
 				s->indirect_grants[i]->gref = GRANT_INVALID_REF;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 08/20] block/xen-blkfront: split get_grant in 2
  2015-09-07 15:33 ` Julien Grall
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, Roger Pau Monné, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, David Vrabel

Prepare the code to support 64KB page granularity. The first
implementation will use a full Linux page per indirect and persistent
grant. When non-persistent grant is used, each page of a bio request
may be split in multiple grant.

Furthermore, the field page of the grant structure is only used to copy
data from persistent grant or indirect grant. Avoid to set it for other
use case as it will have no meaning given the page will be split in
multiple grant.

Provide 2 functions, to setup indirect grant, the other for bio page.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

    Changes in v4:
        - Add Roger's acked-by

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - gnttab_page_grant_foreign_access_ref has been renamed to
        gnttab_page_grant_foreign_access_ref_one
        - Fix compilation by using get_indirect_grant rather than
        get_grant (the changes was in a later patch...).
        - Make grant_foreign_access static inline
        - s/mfn/gfn/ based on the new naming

    Changes in v2:
        - Patch added
---
 drivers/block/xen-blkfront.c | 88 +++++++++++++++++++++++++++++---------------
 1 file changed, 59 insertions(+), 29 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 556475d..4232cbd 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -245,34 +245,77 @@ out_of_memory:
 	return -ENOMEM;
 }
 
-static struct grant *get_grant(grant_ref_t *gref_head,
-			       struct page *page,
-			       struct blkfront_info *info)
+static struct grant *get_free_grant(struct blkfront_info *info)
 {
 	struct grant *gnt_list_entry;
-	unsigned long buffer_gfn;
 
 	BUG_ON(list_empty(&info->grants));
 	gnt_list_entry = list_first_entry(&info->grants, struct grant,
-	                                  node);
+					  node);
 	list_del(&gnt_list_entry->node);
 
-	if (gnt_list_entry->gref != GRANT_INVALID_REF) {
+	if (gnt_list_entry->gref != GRANT_INVALID_REF)
 		info->persistent_gnts_c--;
+
+	return gnt_list_entry;
+}
+
+static inline void grant_foreign_access(const struct grant *gnt_list_entry,
+					const struct blkfront_info *info)
+{
+	gnttab_page_grant_foreign_access_ref_one(gnt_list_entry->gref,
+						 info->xbdev->otherend_id,
+						 gnt_list_entry->page,
+						 0);
+}
+
+static struct grant *get_grant(grant_ref_t *gref_head,
+			       unsigned long gfn,
+			       struct blkfront_info *info)
+{
+	struct grant *gnt_list_entry = get_free_grant(info);
+
+	if (gnt_list_entry->gref != GRANT_INVALID_REF)
 		return gnt_list_entry;
+
+	/* Assign a gref to this page */
+	gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
+	BUG_ON(gnt_list_entry->gref == -ENOSPC);
+	if (info->feature_persistent)
+		grant_foreign_access(gnt_list_entry, info);
+	else {
+		/* Grant access to the GFN passed by the caller */
+		gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
+						info->xbdev->otherend_id,
+						gfn, 0);
 	}
 
+	return gnt_list_entry;
+}
+
+static struct grant *get_indirect_grant(grant_ref_t *gref_head,
+					struct blkfront_info *info)
+{
+	struct grant *gnt_list_entry = get_free_grant(info);
+
+	if (gnt_list_entry->gref != GRANT_INVALID_REF)
+		return gnt_list_entry;
+
 	/* Assign a gref to this page */
 	gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
 	BUG_ON(gnt_list_entry->gref == -ENOSPC);
 	if (!info->feature_persistent) {
-		BUG_ON(!page);
-		gnt_list_entry->page = page;
+		struct page *indirect_page;
+
+		/* Fetch a pre-allocated page to use for indirect grefs */
+		BUG_ON(list_empty(&info->indirect_pages));
+		indirect_page = list_first_entry(&info->indirect_pages,
+						 struct page, lru);
+		list_del(&indirect_page->lru);
+		gnt_list_entry->page = indirect_page;
 	}
-	buffer_gfn = xen_page_to_gfn(gnt_list_entry->page);
-	gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
-	                                info->xbdev->otherend_id,
-	                                buffer_gfn, 0);
+	grant_foreign_access(gnt_list_entry, info);
+
 	return gnt_list_entry;
 }
 
@@ -525,32 +568,19 @@ static int blkif_queue_rw_req(struct request *req)
 
 		if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
 		    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
-			struct page *uninitialized_var(page);
-
 			if (segments)
 				kunmap_atomic(segments);
 
 			n = i / SEGS_PER_INDIRECT_FRAME;
-			if (!info->feature_persistent) {
-				struct page *indirect_page;
-
-				/*
-				 * Fetch a pre-allocated page to use for
-				 * indirect grefs
-				 */
-				BUG_ON(list_empty(&info->indirect_pages));
-				indirect_page = list_first_entry(&info->indirect_pages,
-								 struct page, lru);
-				list_del(&indirect_page->lru);
-				page = indirect_page;
-			}
-			gnt_list_entry = get_grant(&gref_head, page, info);
+			gnt_list_entry = get_indirect_grant(&gref_head, info);
 			info->shadow[id].indirect_grants[n] = gnt_list_entry;
 			segments = kmap_atomic(gnt_list_entry->page);
 			ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
 		}
 
-		gnt_list_entry = get_grant(&gref_head, sg_page(sg), info);
+		gnt_list_entry = get_grant(&gref_head,
+					   xen_page_to_gfn(sg_page(sg)),
+					   info);
 		ref = gnt_list_entry->gref;
 
 		info->shadow[id].grants_used[i] = gnt_list_entry;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 08/20] block/xen-blkfront: split get_grant in 2
  2015-09-07 15:33 ` Julien Grall
                   ` (11 preceding siblings ...)
  (?)
@ 2015-09-07 15:33 ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, Julien Grall,
	David Vrabel, Boris Ostrovsky, linux-arm-kernel,
	Roger Pau Monné

Prepare the code to support 64KB page granularity. The first
implementation will use a full Linux page per indirect and persistent
grant. When non-persistent grant is used, each page of a bio request
may be split in multiple grant.

Furthermore, the field page of the grant structure is only used to copy
data from persistent grant or indirect grant. Avoid to set it for other
use case as it will have no meaning given the page will be split in
multiple grant.

Provide 2 functions, to setup indirect grant, the other for bio page.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

    Changes in v4:
        - Add Roger's acked-by

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - gnttab_page_grant_foreign_access_ref has been renamed to
        gnttab_page_grant_foreign_access_ref_one
        - Fix compilation by using get_indirect_grant rather than
        get_grant (the changes was in a later patch...).
        - Make grant_foreign_access static inline
        - s/mfn/gfn/ based on the new naming

    Changes in v2:
        - Patch added
---
 drivers/block/xen-blkfront.c | 88 +++++++++++++++++++++++++++++---------------
 1 file changed, 59 insertions(+), 29 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 556475d..4232cbd 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -245,34 +245,77 @@ out_of_memory:
 	return -ENOMEM;
 }
 
-static struct grant *get_grant(grant_ref_t *gref_head,
-			       struct page *page,
-			       struct blkfront_info *info)
+static struct grant *get_free_grant(struct blkfront_info *info)
 {
 	struct grant *gnt_list_entry;
-	unsigned long buffer_gfn;
 
 	BUG_ON(list_empty(&info->grants));
 	gnt_list_entry = list_first_entry(&info->grants, struct grant,
-	                                  node);
+					  node);
 	list_del(&gnt_list_entry->node);
 
-	if (gnt_list_entry->gref != GRANT_INVALID_REF) {
+	if (gnt_list_entry->gref != GRANT_INVALID_REF)
 		info->persistent_gnts_c--;
+
+	return gnt_list_entry;
+}
+
+static inline void grant_foreign_access(const struct grant *gnt_list_entry,
+					const struct blkfront_info *info)
+{
+	gnttab_page_grant_foreign_access_ref_one(gnt_list_entry->gref,
+						 info->xbdev->otherend_id,
+						 gnt_list_entry->page,
+						 0);
+}
+
+static struct grant *get_grant(grant_ref_t *gref_head,
+			       unsigned long gfn,
+			       struct blkfront_info *info)
+{
+	struct grant *gnt_list_entry = get_free_grant(info);
+
+	if (gnt_list_entry->gref != GRANT_INVALID_REF)
 		return gnt_list_entry;
+
+	/* Assign a gref to this page */
+	gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
+	BUG_ON(gnt_list_entry->gref == -ENOSPC);
+	if (info->feature_persistent)
+		grant_foreign_access(gnt_list_entry, info);
+	else {
+		/* Grant access to the GFN passed by the caller */
+		gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
+						info->xbdev->otherend_id,
+						gfn, 0);
 	}
 
+	return gnt_list_entry;
+}
+
+static struct grant *get_indirect_grant(grant_ref_t *gref_head,
+					struct blkfront_info *info)
+{
+	struct grant *gnt_list_entry = get_free_grant(info);
+
+	if (gnt_list_entry->gref != GRANT_INVALID_REF)
+		return gnt_list_entry;
+
 	/* Assign a gref to this page */
 	gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
 	BUG_ON(gnt_list_entry->gref == -ENOSPC);
 	if (!info->feature_persistent) {
-		BUG_ON(!page);
-		gnt_list_entry->page = page;
+		struct page *indirect_page;
+
+		/* Fetch a pre-allocated page to use for indirect grefs */
+		BUG_ON(list_empty(&info->indirect_pages));
+		indirect_page = list_first_entry(&info->indirect_pages,
+						 struct page, lru);
+		list_del(&indirect_page->lru);
+		gnt_list_entry->page = indirect_page;
 	}
-	buffer_gfn = xen_page_to_gfn(gnt_list_entry->page);
-	gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
-	                                info->xbdev->otherend_id,
-	                                buffer_gfn, 0);
+	grant_foreign_access(gnt_list_entry, info);
+
 	return gnt_list_entry;
 }
 
@@ -525,32 +568,19 @@ static int blkif_queue_rw_req(struct request *req)
 
 		if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
 		    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
-			struct page *uninitialized_var(page);
-
 			if (segments)
 				kunmap_atomic(segments);
 
 			n = i / SEGS_PER_INDIRECT_FRAME;
-			if (!info->feature_persistent) {
-				struct page *indirect_page;
-
-				/*
-				 * Fetch a pre-allocated page to use for
-				 * indirect grefs
-				 */
-				BUG_ON(list_empty(&info->indirect_pages));
-				indirect_page = list_first_entry(&info->indirect_pages,
-								 struct page, lru);
-				list_del(&indirect_page->lru);
-				page = indirect_page;
-			}
-			gnt_list_entry = get_grant(&gref_head, page, info);
+			gnt_list_entry = get_indirect_grant(&gref_head, info);
 			info->shadow[id].indirect_grants[n] = gnt_list_entry;
 			segments = kmap_atomic(gnt_list_entry->page);
 			ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
 		}
 
-		gnt_list_entry = get_grant(&gref_head, sg_page(sg), info);
+		gnt_list_entry = get_grant(&gref_head,
+					   xen_page_to_gfn(sg_page(sg)),
+					   info);
 		ref = gnt_list_entry->gref;
 
 		info->shadow[id].grants_used[i] = gnt_list_entry;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 08/20] block/xen-blkfront: split get_grant in 2
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

Prepare the code to support 64KB page granularity. The first
implementation will use a full Linux page per indirect and persistent
grant. When non-persistent grant is used, each page of a bio request
may be split in multiple grant.

Furthermore, the field page of the grant structure is only used to copy
data from persistent grant or indirect grant. Avoid to set it for other
use case as it will have no meaning given the page will be split in
multiple grant.

Provide 2 functions, to setup indirect grant, the other for bio page.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monn? <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

    Changes in v4:
        - Add Roger's acked-by

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - gnttab_page_grant_foreign_access_ref has been renamed to
        gnttab_page_grant_foreign_access_ref_one
        - Fix compilation by using get_indirect_grant rather than
        get_grant (the changes was in a later patch...).
        - Make grant_foreign_access static inline
        - s/mfn/gfn/ based on the new naming

    Changes in v2:
        - Patch added
---
 drivers/block/xen-blkfront.c | 88 +++++++++++++++++++++++++++++---------------
 1 file changed, 59 insertions(+), 29 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 556475d..4232cbd 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -245,34 +245,77 @@ out_of_memory:
 	return -ENOMEM;
 }
 
-static struct grant *get_grant(grant_ref_t *gref_head,
-			       struct page *page,
-			       struct blkfront_info *info)
+static struct grant *get_free_grant(struct blkfront_info *info)
 {
 	struct grant *gnt_list_entry;
-	unsigned long buffer_gfn;
 
 	BUG_ON(list_empty(&info->grants));
 	gnt_list_entry = list_first_entry(&info->grants, struct grant,
-	                                  node);
+					  node);
 	list_del(&gnt_list_entry->node);
 
-	if (gnt_list_entry->gref != GRANT_INVALID_REF) {
+	if (gnt_list_entry->gref != GRANT_INVALID_REF)
 		info->persistent_gnts_c--;
+
+	return gnt_list_entry;
+}
+
+static inline void grant_foreign_access(const struct grant *gnt_list_entry,
+					const struct blkfront_info *info)
+{
+	gnttab_page_grant_foreign_access_ref_one(gnt_list_entry->gref,
+						 info->xbdev->otherend_id,
+						 gnt_list_entry->page,
+						 0);
+}
+
+static struct grant *get_grant(grant_ref_t *gref_head,
+			       unsigned long gfn,
+			       struct blkfront_info *info)
+{
+	struct grant *gnt_list_entry = get_free_grant(info);
+
+	if (gnt_list_entry->gref != GRANT_INVALID_REF)
 		return gnt_list_entry;
+
+	/* Assign a gref to this page */
+	gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
+	BUG_ON(gnt_list_entry->gref == -ENOSPC);
+	if (info->feature_persistent)
+		grant_foreign_access(gnt_list_entry, info);
+	else {
+		/* Grant access to the GFN passed by the caller */
+		gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
+						info->xbdev->otherend_id,
+						gfn, 0);
 	}
 
+	return gnt_list_entry;
+}
+
+static struct grant *get_indirect_grant(grant_ref_t *gref_head,
+					struct blkfront_info *info)
+{
+	struct grant *gnt_list_entry = get_free_grant(info);
+
+	if (gnt_list_entry->gref != GRANT_INVALID_REF)
+		return gnt_list_entry;
+
 	/* Assign a gref to this page */
 	gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
 	BUG_ON(gnt_list_entry->gref == -ENOSPC);
 	if (!info->feature_persistent) {
-		BUG_ON(!page);
-		gnt_list_entry->page = page;
+		struct page *indirect_page;
+
+		/* Fetch a pre-allocated page to use for indirect grefs */
+		BUG_ON(list_empty(&info->indirect_pages));
+		indirect_page = list_first_entry(&info->indirect_pages,
+						 struct page, lru);
+		list_del(&indirect_page->lru);
+		gnt_list_entry->page = indirect_page;
 	}
-	buffer_gfn = xen_page_to_gfn(gnt_list_entry->page);
-	gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
-	                                info->xbdev->otherend_id,
-	                                buffer_gfn, 0);
+	grant_foreign_access(gnt_list_entry, info);
+
 	return gnt_list_entry;
 }
 
@@ -525,32 +568,19 @@ static int blkif_queue_rw_req(struct request *req)
 
 		if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
 		    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
-			struct page *uninitialized_var(page);
-
 			if (segments)
 				kunmap_atomic(segments);
 
 			n = i / SEGS_PER_INDIRECT_FRAME;
-			if (!info->feature_persistent) {
-				struct page *indirect_page;
-
-				/*
-				 * Fetch a pre-allocated page to use for
-				 * indirect grefs
-				 */
-				BUG_ON(list_empty(&info->indirect_pages));
-				indirect_page = list_first_entry(&info->indirect_pages,
-								 struct page, lru);
-				list_del(&indirect_page->lru);
-				page = indirect_page;
-			}
-			gnt_list_entry = get_grant(&gref_head, page, info);
+			gnt_list_entry = get_indirect_grant(&gref_head, info);
 			info->shadow[id].indirect_grants[n] = gnt_list_entry;
 			segments = kmap_atomic(gnt_list_entry->page);
 			ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
 		}
 
-		gnt_list_entry = get_grant(&gref_head, sg_page(sg), info);
+		gnt_list_entry = get_grant(&gref_head,
+					   xen_page_to_gfn(sg_page(sg)),
+					   info);
 		ref = gnt_list_entry->gref;
 
 		info->shadow[id].grants_used[i] = gnt_list_entry;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 09/20] xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB pages
  2015-09-07 15:33 ` Julien Grall
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	David Vrabel

On ARM all dma-capable devices on a same platform may not be protected
by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in
order to use correctly the device.

While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant
mapping will screw this contiguous mapping.

When Linux is using 64KB page granularitary, the page may be split
accross multiple non-contiguous MFN (Xen is using 4KB page
granularity). Therefore a DMA request will likely fail.

Checking that a 64KB page is using contiguous MFN is tedious. For
now, always says that biovec are not mergeable.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

    There is some ideas to check whether two biovec could be merged
    (see [1]) but it's not critical and can be consider as a performance
    improvement.

    Changes in v4:
        - Fix typoes in the subject
        - Add Stefano's reviewed-by

    Changes in v3:
        - Update commit message
        - s/mfn/bfn/ base on the new renaming
        - Update TODO

    Changes in v2:
        - Remove the workaround and check if the Linux page granularity
        is the same as Xen or not

    [1] https://lkml.org/lkml/2015/7/17/418
---
 drivers/xen/biomerge.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
index 8ae2fc90..4da69db 100644
--- a/drivers/xen/biomerge.c
+++ b/drivers/xen/biomerge.c
@@ -6,10 +6,18 @@
 bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
 			       const struct bio_vec *vec2)
 {
+#if XEN_PAGE_SIZE == PAGE_SIZE
 	unsigned long bfn1 = pfn_to_bfn(page_to_pfn(vec1->bv_page));
 	unsigned long bfn2 = pfn_to_bfn(page_to_pfn(vec2->bv_page));
 
 	return __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
 		((bfn1 == bfn2) || ((bfn1+1) == bfn2));
+#else
+	/*
+	 * XXX: Add support for merging bio_vec when using different page
+	 * size in Xen and Linux.
+	 */
+	return 0;
+#endif
 }
 EXPORT_SYMBOL(xen_biovec_phys_mergeable);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 09/20] xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB pages
  2015-09-07 15:33 ` Julien Grall
                   ` (13 preceding siblings ...)
  (?)
@ 2015-09-07 15:33 ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, Julien Grall,
	David Vrabel, Boris Ostrovsky, linux-arm-kernel

On ARM all dma-capable devices on a same platform may not be protected
by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in
order to use correctly the device.

While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant
mapping will screw this contiguous mapping.

When Linux is using 64KB page granularitary, the page may be split
accross multiple non-contiguous MFN (Xen is using 4KB page
granularity). Therefore a DMA request will likely fail.

Checking that a 64KB page is using contiguous MFN is tedious. For
now, always says that biovec are not mergeable.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

    There is some ideas to check whether two biovec could be merged
    (see [1]) but it's not critical and can be consider as a performance
    improvement.

    Changes in v4:
        - Fix typoes in the subject
        - Add Stefano's reviewed-by

    Changes in v3:
        - Update commit message
        - s/mfn/bfn/ base on the new renaming
        - Update TODO

    Changes in v2:
        - Remove the workaround and check if the Linux page granularity
        is the same as Xen or not

    [1] https://lkml.org/lkml/2015/7/17/418
---
 drivers/xen/biomerge.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
index 8ae2fc90..4da69db 100644
--- a/drivers/xen/biomerge.c
+++ b/drivers/xen/biomerge.c
@@ -6,10 +6,18 @@
 bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
 			       const struct bio_vec *vec2)
 {
+#if XEN_PAGE_SIZE == PAGE_SIZE
 	unsigned long bfn1 = pfn_to_bfn(page_to_pfn(vec1->bv_page));
 	unsigned long bfn2 = pfn_to_bfn(page_to_pfn(vec2->bv_page));
 
 	return __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
 		((bfn1 == bfn2) || ((bfn1+1) == bfn2));
+#else
+	/*
+	 * XXX: Add support for merging bio_vec when using different page
+	 * size in Xen and Linux.
+	 */
+	return 0;
+#endif
 }
 EXPORT_SYMBOL(xen_biovec_phys_mergeable);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 09/20] xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB pages
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

On ARM all dma-capable devices on a same platform may not be protected
by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in
order to use correctly the device.

While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant
mapping will screw this contiguous mapping.

When Linux is using 64KB page granularitary, the page may be split
accross multiple non-contiguous MFN (Xen is using 4KB page
granularity). Therefore a DMA request will likely fail.

Checking that a 64KB page is using contiguous MFN is tedious. For
now, always says that biovec are not mergeable.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

    There is some ideas to check whether two biovec could be merged
    (see [1]) but it's not critical and can be consider as a performance
    improvement.

    Changes in v4:
        - Fix typoes in the subject
        - Add Stefano's reviewed-by

    Changes in v3:
        - Update commit message
        - s/mfn/bfn/ base on the new renaming
        - Update TODO

    Changes in v2:
        - Remove the workaround and check if the Linux page granularity
        is the same as Xen or not

    [1] https://lkml.org/lkml/2015/7/17/418
---
 drivers/xen/biomerge.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
index 8ae2fc90..4da69db 100644
--- a/drivers/xen/biomerge.c
+++ b/drivers/xen/biomerge.c
@@ -6,10 +6,18 @@
 bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
 			       const struct bio_vec *vec2)
 {
+#if XEN_PAGE_SIZE == PAGE_SIZE
 	unsigned long bfn1 = pfn_to_bfn(page_to_pfn(vec1->bv_page));
 	unsigned long bfn2 = pfn_to_bfn(page_to_pfn(vec2->bv_page));
 
 	return __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
 		((bfn1 == bfn2) || ((bfn1+1) == bfn2));
+#else
+	/*
+	 * XXX: Add support for merging bio_vec when using different page
+	 * size in Xen and Linux.
+	 */
+	return 0;
+#endif
 }
 EXPORT_SYMBOL(xen_biovec_phys_mergeable);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 10/20] xen/xenbus: Use Xen page definition
  2015-09-07 15:33 ` Julien Grall
  (?)
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, David Vrabel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky

All the ring (xenstore, and PV rings) are always based on the page
granularity of Xen.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - s/MFN/GFN base on the new naming
        - Add David and Stefano's reviewed-by

    Changes in v2:
        - Also update the ring mapping function
---
 drivers/xen/xenbus/xenbus_client.c | 6 +++---
 drivers/xen/xenbus/xenbus_probe.c  | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 2ba09c1..359e654 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -388,7 +388,7 @@ int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
 		}
 		grefs[i] = err;
 
-		vaddr = vaddr + PAGE_SIZE;
+		vaddr = vaddr + XEN_PAGE_SIZE;
 	}
 
 	return 0;
@@ -555,7 +555,7 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
 	if (!node)
 		return -ENOMEM;
 
-	area = alloc_vm_area(PAGE_SIZE * nr_grefs, ptes);
+	area = alloc_vm_area(XEN_PAGE_SIZE * nr_grefs, ptes);
 	if (!area) {
 		kfree(node);
 		return -ENOMEM;
@@ -750,7 +750,7 @@ static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr)
 		unsigned long addr;
 
 		memset(&unmap[i], 0, sizeof(unmap[i]));
-		addr = (unsigned long)vaddr + (PAGE_SIZE * i);
+		addr = (unsigned long)vaddr + (XEN_PAGE_SIZE * i);
 		unmap[i].host_addr = arbitrary_virt_to_machine(
 			lookup_address(addr, &level)).maddr;
 		unmap[i].dev_bus_addr = 0;
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 3cbe055..33a31cf 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -802,7 +802,8 @@ static int __init xenbus_init(void)
 			goto out_error;
 		xen_store_gfn = (unsigned long)v;
 		xen_store_interface =
-			xen_remap(xen_store_gfn << PAGE_SHIFT, PAGE_SIZE);
+			xen_remap(xen_store_gfn << XEN_PAGE_SHIFT,
+				  XEN_PAGE_SIZE);
 		break;
 	default:
 		pr_warn("Xenstore state unknown\n");
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 10/20] xen/xenbus: Use Xen page definition
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, Julien Grall,
	David Vrabel, Boris Ostrovsky, linux-arm-kernel

All the ring (xenstore, and PV rings) are always based on the page
granularity of Xen.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - s/MFN/GFN base on the new naming
        - Add David and Stefano's reviewed-by

    Changes in v2:
        - Also update the ring mapping function
---
 drivers/xen/xenbus/xenbus_client.c | 6 +++---
 drivers/xen/xenbus/xenbus_probe.c  | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 2ba09c1..359e654 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -388,7 +388,7 @@ int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
 		}
 		grefs[i] = err;
 
-		vaddr = vaddr + PAGE_SIZE;
+		vaddr = vaddr + XEN_PAGE_SIZE;
 	}
 
 	return 0;
@@ -555,7 +555,7 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
 	if (!node)
 		return -ENOMEM;
 
-	area = alloc_vm_area(PAGE_SIZE * nr_grefs, ptes);
+	area = alloc_vm_area(XEN_PAGE_SIZE * nr_grefs, ptes);
 	if (!area) {
 		kfree(node);
 		return -ENOMEM;
@@ -750,7 +750,7 @@ static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr)
 		unsigned long addr;
 
 		memset(&unmap[i], 0, sizeof(unmap[i]));
-		addr = (unsigned long)vaddr + (PAGE_SIZE * i);
+		addr = (unsigned long)vaddr + (XEN_PAGE_SIZE * i);
 		unmap[i].host_addr = arbitrary_virt_to_machine(
 			lookup_address(addr, &level)).maddr;
 		unmap[i].dev_bus_addr = 0;
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 3cbe055..33a31cf 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -802,7 +802,8 @@ static int __init xenbus_init(void)
 			goto out_error;
 		xen_store_gfn = (unsigned long)v;
 		xen_store_interface =
-			xen_remap(xen_store_gfn << PAGE_SHIFT, PAGE_SIZE);
+			xen_remap(xen_store_gfn << XEN_PAGE_SHIFT,
+				  XEN_PAGE_SIZE);
 		break;
 	default:
 		pr_warn("Xenstore state unknown\n");
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 10/20] xen/xenbus: Use Xen page definition
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

All the ring (xenstore, and PV rings) are always based on the page
granularity of Xen.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - s/MFN/GFN base on the new naming
        - Add David and Stefano's reviewed-by

    Changes in v2:
        - Also update the ring mapping function
---
 drivers/xen/xenbus/xenbus_client.c | 6 +++---
 drivers/xen/xenbus/xenbus_probe.c  | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 2ba09c1..359e654 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -388,7 +388,7 @@ int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
 		}
 		grefs[i] = err;
 
-		vaddr = vaddr + PAGE_SIZE;
+		vaddr = vaddr + XEN_PAGE_SIZE;
 	}
 
 	return 0;
@@ -555,7 +555,7 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
 	if (!node)
 		return -ENOMEM;
 
-	area = alloc_vm_area(PAGE_SIZE * nr_grefs, ptes);
+	area = alloc_vm_area(XEN_PAGE_SIZE * nr_grefs, ptes);
 	if (!area) {
 		kfree(node);
 		return -ENOMEM;
@@ -750,7 +750,7 @@ static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr)
 		unsigned long addr;
 
 		memset(&unmap[i], 0, sizeof(unmap[i]));
-		addr = (unsigned long)vaddr + (PAGE_SIZE * i);
+		addr = (unsigned long)vaddr + (XEN_PAGE_SIZE * i);
 		unmap[i].host_addr = arbitrary_virt_to_machine(
 			lookup_address(addr, &level)).maddr;
 		unmap[i].dev_bus_addr = 0;
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 3cbe055..33a31cf 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -802,7 +802,8 @@ static int __init xenbus_init(void)
 			goto out_error;
 		xen_store_gfn = (unsigned long)v;
 		xen_store_interface =
-			xen_remap(xen_store_gfn << PAGE_SHIFT, PAGE_SIZE);
+			xen_remap(xen_store_gfn << XEN_PAGE_SHIFT,
+				  XEN_PAGE_SIZE);
 		break;
 	default:
 		pr_warn("Xenstore state unknown\n");
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 11/20] tty/hvc: xen: Use xen page definition
  2015-09-07 15:33 ` Julien Grall
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, Greg Kroah-Hartman, Jiri Slaby, David Vrabel,
	Boris Ostrovsky, linuxppc-dev

The console ring is always based on the page granularity of Xen.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: linuxppc-dev@lists.ozlabs.org

    Changes in v4:
        - The ring is always 4K (i.e XEN_PAGE_SIZE), so no need to
        map with PAGE_SIZE. This was correctly done in v2 but lost with
        the rebase to the "s/mfn/gfn/" series

    Changes in v3:
        - Some changes has been moved in the series "Use correctly the
        Xen memory terminologies in Linux".
        - Add Stefano's reviewed-by
---
 drivers/tty/hvc/hvc_xen.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 10beb15..fa816b7 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -230,7 +230,7 @@ static int xen_hvm_console_init(void)
 	if (r < 0 || v == 0)
 		goto err;
 	gfn = v;
-	info->intf = xen_remap(gfn << PAGE_SHIFT, PAGE_SIZE);
+	info->intf = xen_remap(gfn << XEN_PAGE_SHIFT, XEN_PAGE_SIZE);
 	if (info->intf == NULL)
 		goto err;
 	info->vtermno = HVC_COOKIE;
@@ -472,7 +472,7 @@ static int xencons_resume(struct xenbus_device *dev)
 	struct xencons_info *info = dev_get_drvdata(&dev->dev);
 
 	xencons_disconnect_backend(info);
-	memset(info->intf, 0, PAGE_SIZE);
+	memset(info->intf, 0, XEN_PAGE_SIZE);
 	return xencons_connect_backend(dev, info);
 }
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 11/20] tty/hvc: xen: Use xen page definition
  2015-09-07 15:33 ` Julien Grall
                   ` (15 preceding siblings ...)
  (?)
@ 2015-09-07 15:33 ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, Greg Kroah-Hartman,
	linuxppc-dev, linux-kernel, Julien Grall, David Vrabel,
	Boris Ostrovsky, Jiri Slaby, linux-arm-kernel

The console ring is always based on the page granularity of Xen.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: linuxppc-dev@lists.ozlabs.org

    Changes in v4:
        - The ring is always 4K (i.e XEN_PAGE_SIZE), so no need to
        map with PAGE_SIZE. This was correctly done in v2 but lost with
        the rebase to the "s/mfn/gfn/" series

    Changes in v3:
        - Some changes has been moved in the series "Use correctly the
        Xen memory terminologies in Linux".
        - Add Stefano's reviewed-by
---
 drivers/tty/hvc/hvc_xen.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 10beb15..fa816b7 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -230,7 +230,7 @@ static int xen_hvm_console_init(void)
 	if (r < 0 || v == 0)
 		goto err;
 	gfn = v;
-	info->intf = xen_remap(gfn << PAGE_SHIFT, PAGE_SIZE);
+	info->intf = xen_remap(gfn << XEN_PAGE_SHIFT, XEN_PAGE_SIZE);
 	if (info->intf == NULL)
 		goto err;
 	info->vtermno = HVC_COOKIE;
@@ -472,7 +472,7 @@ static int xencons_resume(struct xenbus_device *dev)
 	struct xencons_info *info = dev_get_drvdata(&dev->dev);
 
 	xencons_disconnect_backend(info);
-	memset(info->intf, 0, PAGE_SIZE);
+	memset(info->intf, 0, XEN_PAGE_SIZE);
 	return xencons_connect_backend(dev, info);
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 11/20] tty/hvc: xen: Use xen page definition
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

The console ring is always based on the page granularity of Xen.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: linuxppc-dev at lists.ozlabs.org

    Changes in v4:
        - The ring is always 4K (i.e XEN_PAGE_SIZE), so no need to
        map with PAGE_SIZE. This was correctly done in v2 but lost with
        the rebase to the "s/mfn/gfn/" series

    Changes in v3:
        - Some changes has been moved in the series "Use correctly the
        Xen memory terminologies in Linux".
        - Add Stefano's reviewed-by
---
 drivers/tty/hvc/hvc_xen.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 10beb15..fa816b7 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -230,7 +230,7 @@ static int xen_hvm_console_init(void)
 	if (r < 0 || v == 0)
 		goto err;
 	gfn = v;
-	info->intf = xen_remap(gfn << PAGE_SHIFT, PAGE_SIZE);
+	info->intf = xen_remap(gfn << XEN_PAGE_SHIFT, XEN_PAGE_SIZE);
 	if (info->intf == NULL)
 		goto err;
 	info->vtermno = HVC_COOKIE;
@@ -472,7 +472,7 @@ static int xencons_resume(struct xenbus_device *dev)
 	struct xencons_info *info = dev_get_drvdata(&dev->dev);
 
 	xencons_disconnect_backend(info);
-	memset(info->intf, 0, PAGE_SIZE);
+	memset(info->intf, 0, XEN_PAGE_SIZE);
 	return xencons_connect_backend(dev, info);
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux
  2015-09-07 15:33 ` Julien Grall
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	David Vrabel, Wei Liu

For ARM64 guests, Linux is able to support either 64K or 4K page
granularity. Although, the hypercall interface is always based on 4K
page granularity.

With 64K page granularity, a single page will be spread over multiple
Xen frame.

To avoid splitting the page into 4K frame, take advantage of the
extent_order field to directly allocate/free chunk of the Linux page
size.

Note that PVMMU is only used for PV guest (which is x86) and the page
granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
that because the code has not been modified.

Signed-off-by: Julien Grall <julien.grall@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>

Note that two BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE) in code built
for the PV MMU code is kept in order to have at least one even if we
ever decide to drop of code section.

    Changes in v4:
        - s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
        - Use the field lru in the page to get a list of pages when
        decreasing the memory reservation. It avoids to use a static
        array to store the pages (see v3).
        - Update comment for EXTENT_ORDER.

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - s/mfn/gfn/ based on the new naming
        - Rather than splitting the page into 4KB chunk, use the
        extent_order field to allocate directly a Linux page size. This
        is avoid lots of code for no benefits.

    Changes in v2:
        - Use xen_apply_to_page to split a page in 4K chunk
        - It's not necessary to have a smaller frame list. Re-use
        PAGE_SIZE
        - Convert reserve_additional_memory to use XEN_... macro
---
 drivers/xen/balloon.c | 59 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 44 insertions(+), 15 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index c79329f..3babf13 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -70,6 +70,11 @@
 #include <xen/features.h>
 #include <xen/page.h>
 
+/* Use one extent per PAGE_SIZE to avoid to break down the page into
+ * multiple frame.
+ */
+#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
+
 /*
  * balloon_process() state:
  *
@@ -230,6 +235,11 @@ static enum bp_state reserve_additional_memory(long credit)
 	nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
 
 #ifdef CONFIG_XEN_HAVE_PVMMU
+	/* We don't support PV MMU when Linux and Xen is using
+	 * different page granularity.
+	 */
+	BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
         /*
          * add_memory() will build page tables for the new memory so
          * the p2m must contain invalid entries so the correct
@@ -326,11 +336,11 @@ static enum bp_state reserve_additional_memory(long credit)
 static enum bp_state increase_reservation(unsigned long nr_pages)
 {
 	int rc;
-	unsigned long  pfn, i;
+	unsigned long i;
 	struct page   *page;
 	struct xen_memory_reservation reservation = {
 		.address_bits = 0,
-		.extent_order = 0,
+		.extent_order = EXTENT_ORDER,
 		.domid        = DOMID_SELF
 	};
 
@@ -352,7 +362,11 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
 			nr_pages = i;
 			break;
 		}
-		frame_list[i] = page_to_pfn(page);
+
+		/* XENMEM_populate_physmap requires a PFN based on Xen
+		 * granularity.
+		 */
+		frame_list[i] = page_to_xen_pfn(page);
 		page = balloon_next_page(page);
 	}
 
@@ -366,10 +380,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
 		page = balloon_retrieve(false);
 		BUG_ON(page == NULL);
 
-		pfn = page_to_pfn(page);
-
 #ifdef CONFIG_XEN_HAVE_PVMMU
+		/* We don't support PV MMU when Linux and Xen is using
+		 * different page granularity.
+		 */
+		BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			unsigned long pfn = page_to_pfn(page);
+
 			set_phys_to_machine(pfn, frame_list[i]);
 
 			/* Link back into the page tables if not highmem. */
@@ -396,14 +415,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
 static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 {
 	enum bp_state state = BP_DONE;
-	unsigned long  pfn, i;
-	struct page   *page;
+	unsigned long i;
+	struct page *page, *tmp;
 	int ret;
 	struct xen_memory_reservation reservation = {
 		.address_bits = 0,
-		.extent_order = 0,
+		.extent_order = EXTENT_ORDER,
 		.domid        = DOMID_SELF
 	};
+	LIST_HEAD(pages);
 
 #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
 	if (balloon_stats.hotplug_pages) {
@@ -425,8 +445,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 			break;
 		}
 		scrub_page(page);
-
-		frame_list[i] = page_to_pfn(page);
+		list_add(&page->lru, &pages);
 	}
 
 	/*
@@ -438,14 +457,23 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 	 */
 	kmap_flush_unused();
 
-	/* Update direct mapping, invalidate P2M, and add to balloon. */
-	for (i = 0; i < nr_pages; i++) {
-		pfn = frame_list[i];
-		frame_list[i] = pfn_to_gfn(pfn);
-		page = pfn_to_page(pfn);
+	/*
+	 * Setup the frame, update direct mapping, invalidate P2M,
+	 * and add to balloon.
+	 */
+	list_for_each_entry_safe(page, tmp, &pages, lru) {
+		/* XENMEM_decrease_reservation requires a GFN */
+		frame_list[i] = xen_page_to_gfn(page);
 
 #ifdef CONFIG_XEN_HAVE_PVMMU
+		/* We don't support PV MMU when Linux and Xen is using
+		 * different page granularity.
+		 */
+		BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			unsigned long pfn = page_to_pfn(page);
+
 			if (!PageHighMem(page)) {
 				ret = HYPERVISOR_update_va_mapping(
 						(unsigned long)__va(pfn << PAGE_SHIFT),
@@ -455,6 +483,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 			__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 		}
 #endif
+		list_del(&page->lru);
 
 		balloon_append(page);
 	}
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux
  2015-09-07 15:33 ` Julien Grall
                   ` (18 preceding siblings ...)
  (?)
@ 2015-09-07 15:33 ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, David Vrabel, Boris Ostrovsky, linux-arm-kernel

For ARM64 guests, Linux is able to support either 64K or 4K page
granularity. Although, the hypercall interface is always based on 4K
page granularity.

With 64K page granularity, a single page will be spread over multiple
Xen frame.

To avoid splitting the page into 4K frame, take advantage of the
extent_order field to directly allocate/free chunk of the Linux page
size.

Note that PVMMU is only used for PV guest (which is x86) and the page
granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
that because the code has not been modified.

Signed-off-by: Julien Grall <julien.grall@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>

Note that two BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE) in code built
for the PV MMU code is kept in order to have at least one even if we
ever decide to drop of code section.

    Changes in v4:
        - s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
        - Use the field lru in the page to get a list of pages when
        decreasing the memory reservation. It avoids to use a static
        array to store the pages (see v3).
        - Update comment for EXTENT_ORDER.

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - s/mfn/gfn/ based on the new naming
        - Rather than splitting the page into 4KB chunk, use the
        extent_order field to allocate directly a Linux page size. This
        is avoid lots of code for no benefits.

    Changes in v2:
        - Use xen_apply_to_page to split a page in 4K chunk
        - It's not necessary to have a smaller frame list. Re-use
        PAGE_SIZE
        - Convert reserve_additional_memory to use XEN_... macro
---
 drivers/xen/balloon.c | 59 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 44 insertions(+), 15 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index c79329f..3babf13 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -70,6 +70,11 @@
 #include <xen/features.h>
 #include <xen/page.h>
 
+/* Use one extent per PAGE_SIZE to avoid to break down the page into
+ * multiple frame.
+ */
+#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
+
 /*
  * balloon_process() state:
  *
@@ -230,6 +235,11 @@ static enum bp_state reserve_additional_memory(long credit)
 	nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
 
 #ifdef CONFIG_XEN_HAVE_PVMMU
+	/* We don't support PV MMU when Linux and Xen is using
+	 * different page granularity.
+	 */
+	BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
         /*
          * add_memory() will build page tables for the new memory so
          * the p2m must contain invalid entries so the correct
@@ -326,11 +336,11 @@ static enum bp_state reserve_additional_memory(long credit)
 static enum bp_state increase_reservation(unsigned long nr_pages)
 {
 	int rc;
-	unsigned long  pfn, i;
+	unsigned long i;
 	struct page   *page;
 	struct xen_memory_reservation reservation = {
 		.address_bits = 0,
-		.extent_order = 0,
+		.extent_order = EXTENT_ORDER,
 		.domid        = DOMID_SELF
 	};
 
@@ -352,7 +362,11 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
 			nr_pages = i;
 			break;
 		}
-		frame_list[i] = page_to_pfn(page);
+
+		/* XENMEM_populate_physmap requires a PFN based on Xen
+		 * granularity.
+		 */
+		frame_list[i] = page_to_xen_pfn(page);
 		page = balloon_next_page(page);
 	}
 
@@ -366,10 +380,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
 		page = balloon_retrieve(false);
 		BUG_ON(page == NULL);
 
-		pfn = page_to_pfn(page);
-
 #ifdef CONFIG_XEN_HAVE_PVMMU
+		/* We don't support PV MMU when Linux and Xen is using
+		 * different page granularity.
+		 */
+		BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			unsigned long pfn = page_to_pfn(page);
+
 			set_phys_to_machine(pfn, frame_list[i]);
 
 			/* Link back into the page tables if not highmem. */
@@ -396,14 +415,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
 static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 {
 	enum bp_state state = BP_DONE;
-	unsigned long  pfn, i;
-	struct page   *page;
+	unsigned long i;
+	struct page *page, *tmp;
 	int ret;
 	struct xen_memory_reservation reservation = {
 		.address_bits = 0,
-		.extent_order = 0,
+		.extent_order = EXTENT_ORDER,
 		.domid        = DOMID_SELF
 	};
+	LIST_HEAD(pages);
 
 #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
 	if (balloon_stats.hotplug_pages) {
@@ -425,8 +445,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 			break;
 		}
 		scrub_page(page);
-
-		frame_list[i] = page_to_pfn(page);
+		list_add(&page->lru, &pages);
 	}
 
 	/*
@@ -438,14 +457,23 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 	 */
 	kmap_flush_unused();
 
-	/* Update direct mapping, invalidate P2M, and add to balloon. */
-	for (i = 0; i < nr_pages; i++) {
-		pfn = frame_list[i];
-		frame_list[i] = pfn_to_gfn(pfn);
-		page = pfn_to_page(pfn);
+	/*
+	 * Setup the frame, update direct mapping, invalidate P2M,
+	 * and add to balloon.
+	 */
+	list_for_each_entry_safe(page, tmp, &pages, lru) {
+		/* XENMEM_decrease_reservation requires a GFN */
+		frame_list[i] = xen_page_to_gfn(page);
 
 #ifdef CONFIG_XEN_HAVE_PVMMU
+		/* We don't support PV MMU when Linux and Xen is using
+		 * different page granularity.
+		 */
+		BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			unsigned long pfn = page_to_pfn(page);
+
 			if (!PageHighMem(page)) {
 				ret = HYPERVISOR_update_va_mapping(
 						(unsigned long)__va(pfn << PAGE_SHIFT),
@@ -455,6 +483,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 			__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 		}
 #endif
+		list_del(&page->lru);
 
 		balloon_append(page);
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

For ARM64 guests, Linux is able to support either 64K or 4K page
granularity. Although, the hypercall interface is always based on 4K
page granularity.

With 64K page granularity, a single page will be spread over multiple
Xen frame.

To avoid splitting the page into 4K frame, take advantage of the
extent_order field to directly allocate/free chunk of the Linux page
size.

Note that PVMMU is only used for PV guest (which is x86) and the page
granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
that because the code has not been modified.

Signed-off-by: Julien Grall <julien.grall@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>

Note that two BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE) in code built
for the PV MMU code is kept in order to have at least one even if we
ever decide to drop of code section.

    Changes in v4:
        - s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
        - Use the field lru in the page to get a list of pages when
        decreasing the memory reservation. It avoids to use a static
        array to store the pages (see v3).
        - Update comment for EXTENT_ORDER.

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - s/mfn/gfn/ based on the new naming
        - Rather than splitting the page into 4KB chunk, use the
        extent_order field to allocate directly a Linux page size. This
        is avoid lots of code for no benefits.

    Changes in v2:
        - Use xen_apply_to_page to split a page in 4K chunk
        - It's not necessary to have a smaller frame list. Re-use
        PAGE_SIZE
        - Convert reserve_additional_memory to use XEN_... macro
---
 drivers/xen/balloon.c | 59 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 44 insertions(+), 15 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index c79329f..3babf13 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -70,6 +70,11 @@
 #include <xen/features.h>
 #include <xen/page.h>
 
+/* Use one extent per PAGE_SIZE to avoid to break down the page into
+ * multiple frame.
+ */
+#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
+
 /*
  * balloon_process() state:
  *
@@ -230,6 +235,11 @@ static enum bp_state reserve_additional_memory(long credit)
 	nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
 
 #ifdef CONFIG_XEN_HAVE_PVMMU
+	/* We don't support PV MMU when Linux and Xen is using
+	 * different page granularity.
+	 */
+	BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
         /*
          * add_memory() will build page tables for the new memory so
          * the p2m must contain invalid entries so the correct
@@ -326,11 +336,11 @@ static enum bp_state reserve_additional_memory(long credit)
 static enum bp_state increase_reservation(unsigned long nr_pages)
 {
 	int rc;
-	unsigned long  pfn, i;
+	unsigned long i;
 	struct page   *page;
 	struct xen_memory_reservation reservation = {
 		.address_bits = 0,
-		.extent_order = 0,
+		.extent_order = EXTENT_ORDER,
 		.domid        = DOMID_SELF
 	};
 
@@ -352,7 +362,11 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
 			nr_pages = i;
 			break;
 		}
-		frame_list[i] = page_to_pfn(page);
+
+		/* XENMEM_populate_physmap requires a PFN based on Xen
+		 * granularity.
+		 */
+		frame_list[i] = page_to_xen_pfn(page);
 		page = balloon_next_page(page);
 	}
 
@@ -366,10 +380,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
 		page = balloon_retrieve(false);
 		BUG_ON(page == NULL);
 
-		pfn = page_to_pfn(page);
-
 #ifdef CONFIG_XEN_HAVE_PVMMU
+		/* We don't support PV MMU when Linux and Xen is using
+		 * different page granularity.
+		 */
+		BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			unsigned long pfn = page_to_pfn(page);
+
 			set_phys_to_machine(pfn, frame_list[i]);
 
 			/* Link back into the page tables if not highmem. */
@@ -396,14 +415,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
 static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 {
 	enum bp_state state = BP_DONE;
-	unsigned long  pfn, i;
-	struct page   *page;
+	unsigned long i;
+	struct page *page, *tmp;
 	int ret;
 	struct xen_memory_reservation reservation = {
 		.address_bits = 0,
-		.extent_order = 0,
+		.extent_order = EXTENT_ORDER,
 		.domid        = DOMID_SELF
 	};
+	LIST_HEAD(pages);
 
 #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
 	if (balloon_stats.hotplug_pages) {
@@ -425,8 +445,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 			break;
 		}
 		scrub_page(page);
-
-		frame_list[i] = page_to_pfn(page);
+		list_add(&page->lru, &pages);
 	}
 
 	/*
@@ -438,14 +457,23 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 	 */
 	kmap_flush_unused();
 
-	/* Update direct mapping, invalidate P2M, and add to balloon. */
-	for (i = 0; i < nr_pages; i++) {
-		pfn = frame_list[i];
-		frame_list[i] = pfn_to_gfn(pfn);
-		page = pfn_to_page(pfn);
+	/*
+	 * Setup the frame, update direct mapping, invalidate P2M,
+	 * and add to balloon.
+	 */
+	list_for_each_entry_safe(page, tmp, &pages, lru) {
+		/* XENMEM_decrease_reservation requires a GFN */
+		frame_list[i] = xen_page_to_gfn(page);
 
 #ifdef CONFIG_XEN_HAVE_PVMMU
+		/* We don't support PV MMU when Linux and Xen is using
+		 * different page granularity.
+		 */
+		BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			unsigned long pfn = page_to_pfn(page);
+
 			if (!PageHighMem(page)) {
 				ret = HYPERVISOR_update_va_mapping(
 						(unsigned long)__va(pfn << PAGE_SHIFT),
@@ -455,6 +483,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 			__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 		}
 #endif
+		list_del(&page->lru);
 
 		balloon_append(page);
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 13/20] xen/events: fifo: Make it running on 64KB granularity
  2015-09-07 15:33 ` Julien Grall
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, David Vrabel, Stefano Stabellini,
	Konrad Rzeszutek Wilk, Boris Ostrovsky

Only use the first 4KB of the page to store the events channel info. It
means that we will waste 60KB every time we allocate page for:
     * control block: a page is allocating per CPU
     * event array: a page is allocating everytime we need to expand it

I think we can reduce the memory waste for the 2 areas by:

    * control block: sharing between multiple vCPUs. Although it will
    require some bookkeeping in order to not free the page when the CPU
    goes offline and the other CPUs sharing the page still there

    * event array: always extend the array event by 64K (i.e 16 4K
    chunk). That would require more care when we fail to expand the
    event channel.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    Note I haven't updated the suggestion to reduce the memory waste
    after David's email [1]. I can do it if necessary.

    Changes in v3:
        - Add David and Stefano's reviewed-by

    [1] http://lists.xen.org/archives/html/xen-devel/2015-07/msg04596.html
---
 drivers/xen/events/events_base.c | 2 +-
 drivers/xen/events/events_fifo.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index c49bb7a..00dd923 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -40,11 +40,11 @@
 #include <asm/idle.h>
 #include <asm/io_apic.h>
 #include <asm/xen/pci.h>
-#include <xen/page.h>
 #endif
 #include <asm/sync_bitops.h>
 #include <asm/xen/hypercall.h>
 #include <asm/xen/hypervisor.h>
+#include <xen/page.h>
 
 #include <xen/xen.h>
 #include <xen/hvm.h>
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index 1d4baf5..e3e9e3d 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -54,7 +54,7 @@
 
 #include "events_internal.h"
 
-#define EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t))
+#define EVENT_WORDS_PER_PAGE (XEN_PAGE_SIZE / sizeof(event_word_t))
 #define MAX_EVENT_ARRAY_PAGES (EVTCHN_FIFO_NR_CHANNELS / EVENT_WORDS_PER_PAGE)
 
 struct evtchn_fifo_queue {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 13/20] xen/events: fifo: Make it running on 64KB granularity
  2015-09-07 15:33 ` Julien Grall
                   ` (19 preceding siblings ...)
  (?)
@ 2015-09-07 15:33 ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, Julien Grall,
	Stefano Stabellini, David Vrabel, Boris Ostrovsky,
	linux-arm-kernel

Only use the first 4KB of the page to store the events channel info. It
means that we will waste 60KB every time we allocate page for:
     * control block: a page is allocating per CPU
     * event array: a page is allocating everytime we need to expand it

I think we can reduce the memory waste for the 2 areas by:

    * control block: sharing between multiple vCPUs. Although it will
    require some bookkeeping in order to not free the page when the CPU
    goes offline and the other CPUs sharing the page still there

    * event array: always extend the array event by 64K (i.e 16 4K
    chunk). That would require more care when we fail to expand the
    event channel.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    Note I haven't updated the suggestion to reduce the memory waste
    after David's email [1]. I can do it if necessary.

    Changes in v3:
        - Add David and Stefano's reviewed-by

    [1] http://lists.xen.org/archives/html/xen-devel/2015-07/msg04596.html
---
 drivers/xen/events/events_base.c | 2 +-
 drivers/xen/events/events_fifo.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index c49bb7a..00dd923 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -40,11 +40,11 @@
 #include <asm/idle.h>
 #include <asm/io_apic.h>
 #include <asm/xen/pci.h>
-#include <xen/page.h>
 #endif
 #include <asm/sync_bitops.h>
 #include <asm/xen/hypercall.h>
 #include <asm/xen/hypervisor.h>
+#include <xen/page.h>
 
 #include <xen/xen.h>
 #include <xen/hvm.h>
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index 1d4baf5..e3e9e3d 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -54,7 +54,7 @@
 
 #include "events_internal.h"
 
-#define EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t))
+#define EVENT_WORDS_PER_PAGE (XEN_PAGE_SIZE / sizeof(event_word_t))
 #define MAX_EVENT_ARRAY_PAGES (EVTCHN_FIFO_NR_CHANNELS / EVENT_WORDS_PER_PAGE)
 
 struct evtchn_fifo_queue {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 13/20] xen/events: fifo: Make it running on 64KB granularity
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

Only use the first 4KB of the page to store the events channel info. It
means that we will waste 60KB every time we allocate page for:
     * control block: a page is allocating per CPU
     * event array: a page is allocating everytime we need to expand it

I think we can reduce the memory waste for the 2 areas by:

    * control block: sharing between multiple vCPUs. Although it will
    require some bookkeeping in order to not free the page when the CPU
    goes offline and the other CPUs sharing the page still there

    * event array: always extend the array event by 64K (i.e 16 4K
    chunk). That would require more care when we fail to expand the
    event channel.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    Note I haven't updated the suggestion to reduce the memory waste
    after David's email [1]. I can do it if necessary.

    Changes in v3:
        - Add David and Stefano's reviewed-by

    [1] http://lists.xen.org/archives/html/xen-devel/2015-07/msg04596.html
---
 drivers/xen/events/events_base.c | 2 +-
 drivers/xen/events/events_fifo.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index c49bb7a..00dd923 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -40,11 +40,11 @@
 #include <asm/idle.h>
 #include <asm/io_apic.h>
 #include <asm/xen/pci.h>
-#include <xen/page.h>
 #endif
 #include <asm/sync_bitops.h>
 #include <asm/xen/hypercall.h>
 #include <asm/xen/hypervisor.h>
+#include <xen/page.h>
 
 #include <xen/xen.h>
 #include <xen/hvm.h>
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index 1d4baf5..e3e9e3d 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -54,7 +54,7 @@
 
 #include "events_internal.h"
 
-#define EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t))
+#define EVENT_WORDS_PER_PAGE (XEN_PAGE_SIZE / sizeof(event_word_t))
 #define MAX_EVENT_ARRAY_PAGES (EVTCHN_FIFO_NR_CHANNELS / EVENT_WORDS_PER_PAGE)
 
 struct evtchn_fifo_queue {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 14/20] xen/grant-table: Make it running on 64KB granularity
  2015-09-07 15:33 ` Julien Grall
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, David Vrabel, Russell King, Konrad Rzeszutek Wilk,
	Boris Ostrovsky

The Xen interface is using 4KB page granularity. This means that each
grant is 4KB.

The current implementation allocates a Linux page per grant. On Linux
using 64KB page granularity, only the first 4KB of the page will be
used.

We could decrease the memory wasted by sharing the page with multiple
grant. It will require some care with the {Set,Clear}ForeignPage macro.

Note that no changes has been made in the x86 code because both Linux
and Xen will only use 4KB page granularity.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    Changes in v3:
        - Add Stefano's reviewed-by

    Changes in v2
        - Add David's reviewed-by
---
 arch/arm/xen/p2m.c        | 6 +++---
 drivers/xen/grant-table.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm/xen/p2m.c b/arch/arm/xen/p2m.c
index 887596c..0ed01f2 100644
--- a/arch/arm/xen/p2m.c
+++ b/arch/arm/xen/p2m.c
@@ -93,8 +93,8 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
 	for (i = 0; i < count; i++) {
 		if (map_ops[i].status)
 			continue;
-		set_phys_to_machine(map_ops[i].host_addr >> PAGE_SHIFT,
-				    map_ops[i].dev_bus_addr >> PAGE_SHIFT);
+		set_phys_to_machine(map_ops[i].host_addr >> XEN_PAGE_SHIFT,
+				    map_ops[i].dev_bus_addr >> XEN_PAGE_SHIFT);
 	}
 
 	return 0;
@@ -108,7 +108,7 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
 	int i;
 
 	for (i = 0; i < count; i++) {
-		set_phys_to_machine(unmap_ops[i].host_addr >> PAGE_SHIFT,
+		set_phys_to_machine(unmap_ops[i].host_addr >> XEN_PAGE_SHIFT,
 				    INVALID_P2M_ENTRY);
 	}
 
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 7b4e1cf..99ed9c2 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -642,7 +642,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
 	if (xen_auto_xlat_grant_frames.count)
 		return -EINVAL;
 
-	vaddr = xen_remap(addr, PAGE_SIZE * max_nr_gframes);
+	vaddr = xen_remap(addr, XEN_PAGE_SIZE * max_nr_gframes);
 	if (vaddr == NULL) {
 		pr_warn("Failed to ioremap gnttab share frames (addr=%pa)!\n",
 			&addr);
@@ -654,7 +654,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
 		return -ENOMEM;
 	}
 	for (i = 0; i < max_nr_gframes; i++)
-		pfn[i] = PFN_DOWN(addr) + i;
+		pfn[i] = XEN_PFN_DOWN(addr) + i;
 
 	xen_auto_xlat_grant_frames.vaddr = vaddr;
 	xen_auto_xlat_grant_frames.pfn = pfn;
@@ -1004,7 +1004,7 @@ static void gnttab_request_version(void)
 {
 	/* Only version 1 is used, which will always be available. */
 	grant_table_version = 1;
-	grefs_per_grant_frame = PAGE_SIZE / sizeof(struct grant_entry_v1);
+	grefs_per_grant_frame = XEN_PAGE_SIZE / sizeof(struct grant_entry_v1);
 	gnttab_interface = &gnttab_v1_ops;
 
 	pr_info("Grant tables using version %d layout\n", grant_table_version);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 14/20] xen/grant-table: Make it running on 64KB granularity
  2015-09-07 15:33 ` Julien Grall
                   ` (21 preceding siblings ...)
  (?)
@ 2015-09-07 15:33 ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Russell King, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, David Vrabel, Boris Ostrovsky, linux-arm-kernel

The Xen interface is using 4KB page granularity. This means that each
grant is 4KB.

The current implementation allocates a Linux page per grant. On Linux
using 64KB page granularity, only the first 4KB of the page will be
used.

We could decrease the memory wasted by sharing the page with multiple
grant. It will require some care with the {Set,Clear}ForeignPage macro.

Note that no changes has been made in the x86 code because both Linux
and Xen will only use 4KB page granularity.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    Changes in v3:
        - Add Stefano's reviewed-by

    Changes in v2
        - Add David's reviewed-by
---
 arch/arm/xen/p2m.c        | 6 +++---
 drivers/xen/grant-table.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm/xen/p2m.c b/arch/arm/xen/p2m.c
index 887596c..0ed01f2 100644
--- a/arch/arm/xen/p2m.c
+++ b/arch/arm/xen/p2m.c
@@ -93,8 +93,8 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
 	for (i = 0; i < count; i++) {
 		if (map_ops[i].status)
 			continue;
-		set_phys_to_machine(map_ops[i].host_addr >> PAGE_SHIFT,
-				    map_ops[i].dev_bus_addr >> PAGE_SHIFT);
+		set_phys_to_machine(map_ops[i].host_addr >> XEN_PAGE_SHIFT,
+				    map_ops[i].dev_bus_addr >> XEN_PAGE_SHIFT);
 	}
 
 	return 0;
@@ -108,7 +108,7 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
 	int i;
 
 	for (i = 0; i < count; i++) {
-		set_phys_to_machine(unmap_ops[i].host_addr >> PAGE_SHIFT,
+		set_phys_to_machine(unmap_ops[i].host_addr >> XEN_PAGE_SHIFT,
 				    INVALID_P2M_ENTRY);
 	}
 
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 7b4e1cf..99ed9c2 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -642,7 +642,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
 	if (xen_auto_xlat_grant_frames.count)
 		return -EINVAL;
 
-	vaddr = xen_remap(addr, PAGE_SIZE * max_nr_gframes);
+	vaddr = xen_remap(addr, XEN_PAGE_SIZE * max_nr_gframes);
 	if (vaddr == NULL) {
 		pr_warn("Failed to ioremap gnttab share frames (addr=%pa)!\n",
 			&addr);
@@ -654,7 +654,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
 		return -ENOMEM;
 	}
 	for (i = 0; i < max_nr_gframes; i++)
-		pfn[i] = PFN_DOWN(addr) + i;
+		pfn[i] = XEN_PFN_DOWN(addr) + i;
 
 	xen_auto_xlat_grant_frames.vaddr = vaddr;
 	xen_auto_xlat_grant_frames.pfn = pfn;
@@ -1004,7 +1004,7 @@ static void gnttab_request_version(void)
 {
 	/* Only version 1 is used, which will always be available. */
 	grant_table_version = 1;
-	grefs_per_grant_frame = PAGE_SIZE / sizeof(struct grant_entry_v1);
+	grefs_per_grant_frame = XEN_PAGE_SIZE / sizeof(struct grant_entry_v1);
 	gnttab_interface = &gnttab_v1_ops;
 
 	pr_info("Grant tables using version %d layout\n", grant_table_version);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 14/20] xen/grant-table: Make it running on 64KB granularity
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

The Xen interface is using 4KB page granularity. This means that each
grant is 4KB.

The current implementation allocates a Linux page per grant. On Linux
using 64KB page granularity, only the first 4KB of the page will be
used.

We could decrease the memory wasted by sharing the page with multiple
grant. It will require some care with the {Set,Clear}ForeignPage macro.

Note that no changes has been made in the x86 code because both Linux
and Xen will only use 4KB page granularity.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    Changes in v3:
        - Add Stefano's reviewed-by

    Changes in v2
        - Add David's reviewed-by
---
 arch/arm/xen/p2m.c        | 6 +++---
 drivers/xen/grant-table.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm/xen/p2m.c b/arch/arm/xen/p2m.c
index 887596c..0ed01f2 100644
--- a/arch/arm/xen/p2m.c
+++ b/arch/arm/xen/p2m.c
@@ -93,8 +93,8 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
 	for (i = 0; i < count; i++) {
 		if (map_ops[i].status)
 			continue;
-		set_phys_to_machine(map_ops[i].host_addr >> PAGE_SHIFT,
-				    map_ops[i].dev_bus_addr >> PAGE_SHIFT);
+		set_phys_to_machine(map_ops[i].host_addr >> XEN_PAGE_SHIFT,
+				    map_ops[i].dev_bus_addr >> XEN_PAGE_SHIFT);
 	}
 
 	return 0;
@@ -108,7 +108,7 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
 	int i;
 
 	for (i = 0; i < count; i++) {
-		set_phys_to_machine(unmap_ops[i].host_addr >> PAGE_SHIFT,
+		set_phys_to_machine(unmap_ops[i].host_addr >> XEN_PAGE_SHIFT,
 				    INVALID_P2M_ENTRY);
 	}
 
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 7b4e1cf..99ed9c2 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -642,7 +642,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
 	if (xen_auto_xlat_grant_frames.count)
 		return -EINVAL;
 
-	vaddr = xen_remap(addr, PAGE_SIZE * max_nr_gframes);
+	vaddr = xen_remap(addr, XEN_PAGE_SIZE * max_nr_gframes);
 	if (vaddr == NULL) {
 		pr_warn("Failed to ioremap gnttab share frames (addr=%pa)!\n",
 			&addr);
@@ -654,7 +654,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
 		return -ENOMEM;
 	}
 	for (i = 0; i < max_nr_gframes; i++)
-		pfn[i] = PFN_DOWN(addr) + i;
+		pfn[i] = XEN_PFN_DOWN(addr) + i;
 
 	xen_auto_xlat_grant_frames.vaddr = vaddr;
 	xen_auto_xlat_grant_frames.pfn = pfn;
@@ -1004,7 +1004,7 @@ static void gnttab_request_version(void)
 {
 	/* Only version 1 is used, which will always be available. */
 	grant_table_version = 1;
-	grefs_per_grant_frame = PAGE_SIZE / sizeof(struct grant_entry_v1);
+	grefs_per_grant_frame = XEN_PAGE_SIZE / sizeof(struct grant_entry_v1);
 	gnttab_interface = &gnttab_v1_ops;
 
 	pr_info("Grant tables using version %d layout\n", grant_table_version);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 15/20] block/xen-blkfront: Make it running on 64KB page granularity
  2015-09-07 15:33 ` Julien Grall
  (?)
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, Roger Pau Monné, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, David Vrabel

The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using block
device on a non-modified Xen.

The block API is using segment which should at least be the size of a
Linux page. Therefore, the driver will have to break the page in chunk
of 4K before giving the page to the backend.

When breaking a 64KB segment in 4KB chunks, it is possible that some
chunks are empty. As the PV protocol always require to have data in the
chunk, we have to count the number of Xen page which will be in use and
avoid sending empty chunks.

Note that, a pre-defined number of grants are reserved before preparing
the request. This pre-defined number is based on the number and the
maximum size of the segments. If each segment contains a very small
amount of data, the driver may reserve too many grants (16 grants is
reserved per segment with 64KB page granularity).

Furthermore, in the case of persistent grants we allocate one Linux page
per grant although only the first 4KB of the page will be effectively
in use. This could be improved by sharing the page with multiple grants.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

Improvement such as support 64KB grant is not taken into consideration in
this patch because we have the requirement to run a Linux using 64KB page
on a non-modified Xen.

    Changes in v4:
        - Rebase after d50babbe300eedf33ea5b00a12c5df3a05bd96c7 "
        xen-blkfront: introduce blkfront_gather_backend_features()"
        - Fix typoes
        - Add Roger's acked-by

    Changes in v3:
        - Use DIV_ROUND_UP in INDIRECT_GREFS
        - Split lines over 80 characters whenever it's possible
        - s/mfn/gfn/ based on the new naming
        - The grant callback doesn't allow anymore to change the len
        (wasn't used here).
        - gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
        - Use gnttab_count_grant to get the number of grants in a sg
        - Do some renaming to use the correct variable every time

    Changes in v2:
        - Use gnttab_foreach_grant to split a Linux page into grant
---
 drivers/block/xen-blkfront.c | 324 ++++++++++++++++++++++++++++---------------
 1 file changed, 213 insertions(+), 111 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 4232cbd..f2cdc73 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -78,6 +78,7 @@ struct blk_shadow {
 	struct grant **grants_used;
 	struct grant **indirect_grants;
 	struct scatterlist *sg;
+	unsigned int num_sg;
 };
 
 struct split_bio {
@@ -107,8 +108,12 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
 
-#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * (info)->nr_ring_pages)
-#define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
+#define BLK_RING_SIZE(info)	\
+	__CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * (info)->nr_ring_pages)
+
+#define BLK_MAX_RING_SIZE	\
+	__CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * XENBUS_MAX_RING_PAGES)
+
 /*
  * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
  * characters are enough. Define to 20 to keep consist with backend.
@@ -147,6 +152,7 @@ struct blkfront_info
 	unsigned int discard_granularity;
 	unsigned int discard_alignment;
 	unsigned int feature_persistent:1;
+	/* Number of 4KB segments handled */
 	unsigned int max_indirect_segments;
 	int is_ready;
 	struct blk_mq_tag_set tag_set;
@@ -175,10 +181,23 @@ static DEFINE_SPINLOCK(minor_lock);
 
 #define DEV_NAME	"xvd"	/* name in /dev */
 
-#define SEGS_PER_INDIRECT_FRAME \
-	(PAGE_SIZE/sizeof(struct blkif_request_segment))
-#define INDIRECT_GREFS(_segs) \
-	((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+/*
+ * Grants are always the same size as a Xen page (i.e 4KB).
+ * A physical segment is always the same size as a Linux page.
+ * Number of grants per physical segment
+ */
+#define GRANTS_PER_PSEG	(PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define GRANTS_PER_INDIRECT_FRAME \
+	(XEN_PAGE_SIZE / sizeof(struct blkif_request_segment))
+
+#define PSEGS_PER_INDIRECT_FRAME	\
+	(GRANTS_INDIRECT_FRAME / GRANTS_PSEGS)
+
+#define INDIRECT_GREFS(_grants)		\
+	DIV_ROUND_UP(_grants, GRANTS_PER_INDIRECT_FRAME)
+
+#define GREFS(_psegs)	((_psegs) * GRANTS_PER_PSEG)
 
 static int blkfront_setup_indirect(struct blkfront_info *info);
 static int blkfront_gather_backend_features(struct blkfront_info *info);
@@ -466,14 +485,100 @@ static int blkif_queue_discard_req(struct request *req)
 	return 0;
 }
 
+struct setup_rw_req {
+	unsigned int grant_idx;
+	struct blkif_request_segment *segments;
+	struct blkfront_info *info;
+	struct blkif_request *ring_req;
+	grant_ref_t gref_head;
+	unsigned int id;
+	/* Only used when persistent grant is used and it's a read request */
+	bool need_copy;
+	unsigned int bvec_off;
+	char *bvec_data;
+};
+
+static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
+				     unsigned int len, void *data)
+{
+	struct setup_rw_req *setup = data;
+	int n, ref;
+	struct grant *gnt_list_entry;
+	unsigned int fsect, lsect;
+	/* Convenient aliases */
+	unsigned int grant_idx = setup->grant_idx;
+	struct blkif_request *ring_req = setup->ring_req;
+	struct blkfront_info *info = setup->info;
+	struct blk_shadow *shadow = &info->shadow[setup->id];
+
+	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
+	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
+		if (setup->segments)
+			kunmap_atomic(setup->segments);
+
+		n = grant_idx / GRANTS_PER_INDIRECT_FRAME;
+		gnt_list_entry = get_indirect_grant(&setup->gref_head, info);
+		shadow->indirect_grants[n] = gnt_list_entry;
+		setup->segments = kmap_atomic(gnt_list_entry->page);
+		ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
+	}
+
+	gnt_list_entry = get_grant(&setup->gref_head, gfn, info);
+	ref = gnt_list_entry->gref;
+	shadow->grants_used[grant_idx] = gnt_list_entry;
+
+	if (setup->need_copy) {
+		void *shared_data;
+
+		shared_data = kmap_atomic(gnt_list_entry->page);
+		/*
+		 * this does not wipe data stored outside the
+		 * range sg->offset..sg->offset+sg->length.
+		 * Therefore, blkback *could* see data from
+		 * previous requests. This is OK as long as
+		 * persistent grants are shared with just one
+		 * domain. It may need refactoring if this
+		 * changes
+		 */
+		memcpy(shared_data + offset,
+		       setup->bvec_data + setup->bvec_off,
+		       len);
+
+		kunmap_atomic(shared_data);
+		setup->bvec_off += len;
+	}
+
+	fsect = offset >> 9;
+	lsect = fsect + (len >> 9) - 1;
+	if (ring_req->operation != BLKIF_OP_INDIRECT) {
+		ring_req->u.rw.seg[grant_idx] =
+			(struct blkif_request_segment) {
+				.gref       = ref,
+				.first_sect = fsect,
+				.last_sect  = lsect };
+	} else {
+		setup->segments[grant_idx % GRANTS_PER_INDIRECT_FRAME] =
+			(struct blkif_request_segment) {
+				.gref       = ref,
+				.first_sect = fsect,
+				.last_sect  = lsect };
+	}
+
+	(setup->grant_idx)++;
+}
+
 static int blkif_queue_rw_req(struct request *req)
 {
 	struct blkfront_info *info = req->rq_disk->private_data;
 	struct blkif_request *ring_req;
 	unsigned long id;
-	unsigned int fsect, lsect;
-	int i, ref, n;
-	struct blkif_request_segment *segments = NULL;
+	int i;
+	struct setup_rw_req setup = {
+		.grant_idx = 0,
+		.segments = NULL,
+		.info = info,
+		.need_copy = rq_data_dir(req) && info->feature_persistent,
+	};
 
 	/*
 	 * Used to store if we are able to queue the request by just using
@@ -481,25 +586,23 @@ static int blkif_queue_rw_req(struct request *req)
 	 * as there are not sufficiently many free.
 	 */
 	bool new_persistent_gnts;
-	grant_ref_t gref_head;
-	struct grant *gnt_list_entry = NULL;
 	struct scatterlist *sg;
-	int nseg, max_grefs;
+	int num_sg, max_grefs, num_grant;
 
-	max_grefs = req->nr_phys_segments;
+	max_grefs = req->nr_phys_segments * GRANTS_PER_PSEG;
 	if (max_grefs > BLKIF_MAX_SEGMENTS_PER_REQUEST)
 		/*
 		 * If we are using indirect segments we need to account
 		 * for the indirect grefs used in the request.
 		 */
-		max_grefs += INDIRECT_GREFS(req->nr_phys_segments);
+		max_grefs += INDIRECT_GREFS(max_grefs);
 
 	/* Check if we have enough grants to allocate a requests */
 	if (info->persistent_gnts_c < max_grefs) {
 		new_persistent_gnts = 1;
 		if (gnttab_alloc_grant_references(
 		    max_grefs - info->persistent_gnts_c,
-		    &gref_head) < 0) {
+		    &setup.gref_head) < 0) {
 			gnttab_request_free_callback(
 				&info->callback,
 				blkif_restart_queue_callback,
@@ -516,12 +619,19 @@ static int blkif_queue_rw_req(struct request *req)
 	info->shadow[id].request = req;
 
 	BUG_ON(info->max_indirect_segments == 0 &&
-	       req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
+	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
 	BUG_ON(info->max_indirect_segments &&
-	       req->nr_phys_segments > info->max_indirect_segments);
-	nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
+
+	num_sg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+	num_grant = 0;
+	/* Calculate the number of grant used */
+	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
+	       num_grant += gnttab_count_grant(sg->offset, sg->length);
+
 	ring_req->u.rw.id = id;
-	if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+	info->shadow[id].num_sg = num_sg;
+	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 		/*
 		 * The indirect operation can only be a BLKIF_OP_READ or
 		 * BLKIF_OP_WRITE
@@ -532,7 +642,7 @@ static int blkif_queue_rw_req(struct request *req)
 			BLKIF_OP_WRITE : BLKIF_OP_READ;
 		ring_req->u.indirect.sector_number = (blkif_sector_t)blk_rq_pos(req);
 		ring_req->u.indirect.handle = info->handle;
-		ring_req->u.indirect.nr_segments = nseg;
+		ring_req->u.indirect.nr_segments = num_grant;
 	} else {
 		ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
 		ring_req->u.rw.handle = info->handle;
@@ -560,73 +670,30 @@ static int blkif_queue_rw_req(struct request *req)
 				ring_req->operation = 0;
 			}
 		}
-		ring_req->u.rw.nr_segments = nseg;
-	}
-	for_each_sg(info->shadow[id].sg, sg, nseg, i) {
-		fsect = sg->offset >> 9;
-		lsect = fsect + (sg->length >> 9) - 1;
-
-		if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
-		    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
-			if (segments)
-				kunmap_atomic(segments);
-
-			n = i / SEGS_PER_INDIRECT_FRAME;
-			gnt_list_entry = get_indirect_grant(&gref_head, info);
-			info->shadow[id].indirect_grants[n] = gnt_list_entry;
-			segments = kmap_atomic(gnt_list_entry->page);
-			ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
-		}
-
-		gnt_list_entry = get_grant(&gref_head,
-					   xen_page_to_gfn(sg_page(sg)),
-					   info);
-		ref = gnt_list_entry->gref;
-
-		info->shadow[id].grants_used[i] = gnt_list_entry;
-
-		if (rq_data_dir(req) && info->feature_persistent) {
-			char *bvec_data;
-			void *shared_data;
+		ring_req->u.rw.nr_segments = num_grant;
+	}
 
-			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
+	setup.ring_req = ring_req;
+	setup.id = id;
+	for_each_sg(info->shadow[id].sg, sg, num_sg, i) {
+		BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
-			shared_data = kmap_atomic(gnt_list_entry->page);
-			bvec_data = kmap_atomic(sg_page(sg));
+		if (setup.need_copy) {
+			setup.bvec_off = sg->offset;
+			setup.bvec_data = kmap_atomic(sg_page(sg));
+		}
 
-			/*
-			 * this does not wipe data stored outside the
-			 * range sg->offset..sg->offset+sg->length.
-			 * Therefore, blkback *could* see data from
-			 * previous requests. This is OK as long as
-			 * persistent grants are shared with just one
-			 * domain. It may need refactoring if this
-			 * changes
-			 */
-			memcpy(shared_data + sg->offset,
-			       bvec_data   + sg->offset,
-			       sg->length);
+		gnttab_foreach_grant_in_range(sg_page(sg),
+					      sg->offset,
+					      sg->length,
+					      blkif_setup_rw_req_grant,
+					      &setup);
 
-			kunmap_atomic(bvec_data);
-			kunmap_atomic(shared_data);
-		}
-		if (ring_req->operation != BLKIF_OP_INDIRECT) {
-			ring_req->u.rw.seg[i] =
-					(struct blkif_request_segment) {
-						.gref       = ref,
-						.first_sect = fsect,
-						.last_sect  = lsect };
-		} else {
-			n = i % SEGS_PER_INDIRECT_FRAME;
-			segments[n] =
-				(struct blkif_request_segment) {
-						.gref       = ref,
-						.first_sect = fsect,
-						.last_sect  = lsect };
-		}
+		if (setup.need_copy)
+			kunmap_atomic(setup.bvec_data);
 	}
-	if (segments)
-		kunmap_atomic(segments);
+	if (setup.segments)
+		kunmap_atomic(setup.segments);
 
 	info->ring.req_prod_pvt++;
 
@@ -634,7 +701,7 @@ static int blkif_queue_rw_req(struct request *req)
 	info->shadow[id].req = *ring_req;
 
 	if (new_persistent_gnts)
-		gnttab_free_grant_references(gref_head);
+		gnttab_free_grant_references(setup.gref_head);
 
 	return 0;
 }
@@ -751,14 +818,14 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 	/* Hard sector size and max sectors impersonate the equiv. hardware. */
 	blk_queue_logical_block_size(rq, sector_size);
 	blk_queue_physical_block_size(rq, physical_sector_size);
-	blk_queue_max_hw_sectors(rq, (segments * PAGE_SIZE) / 512);
+	blk_queue_max_hw_sectors(rq, (segments * XEN_PAGE_SIZE) / 512);
 
 	/* Each segment in a request is up to an aligned page in size. */
 	blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
 	blk_queue_max_segment_size(rq, PAGE_SIZE);
 
 	/* Ensure a merged request will fit in a single I/O ring slot. */
-	blk_queue_max_segments(rq, segments);
+	blk_queue_max_segments(rq, segments / GRANTS_PER_PSEG);
 
 	/* Make sure buffer addresses are sector-aligned. */
 	blk_queue_dma_alignment(rq, 511);
@@ -1117,32 +1184,65 @@ free_shadow:
 
 }
 
+struct copy_from_grant {
+	const struct blk_shadow *s;
+	unsigned int grant_idx;
+	unsigned int bvec_offset;
+	char *bvec_data;
+};
+
+static void blkif_copy_from_grant(unsigned long gfn, unsigned int offset,
+				  unsigned int len, void *data)
+{
+	struct copy_from_grant *info = data;
+	char *shared_data;
+	/* Convenient aliases */
+	const struct blk_shadow *s = info->s;
+
+	shared_data = kmap_atomic(s->grants_used[info->grant_idx]->page);
+
+	memcpy(info->bvec_data + info->bvec_offset,
+	       shared_data + offset, len);
+
+	info->bvec_offset += len;
+	info->grant_idx++;
+
+	kunmap_atomic(shared_data);
+}
+
 static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			     struct blkif_response *bret)
 {
 	int i = 0;
 	struct scatterlist *sg;
-	char *bvec_data;
-	void *shared_data;
-	int nseg;
+	int num_sg, num_grant;
+	struct copy_from_grant data = {
+		.s = s,
+		.grant_idx = 0,
+	};
 
-	nseg = s->req.operation == BLKIF_OP_INDIRECT ?
+	num_grant = s->req.operation == BLKIF_OP_INDIRECT ?
 		s->req.u.indirect.nr_segments : s->req.u.rw.nr_segments;
+	num_sg = s->num_sg;
 
 	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
-		for_each_sg(s->sg, sg, nseg, i) {
+		for_each_sg(s->sg, sg, num_sg, i) {
 			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
-			shared_data = kmap_atomic(s->grants_used[i]->page);
-			bvec_data = kmap_atomic(sg_page(sg));
-			memcpy(bvec_data   + sg->offset,
-			       shared_data + sg->offset,
-			       sg->length);
-			kunmap_atomic(bvec_data);
-			kunmap_atomic(shared_data);
+
+			data.bvec_offset = sg->offset;
+			data.bvec_data = kmap_atomic(sg_page(sg));
+
+			gnttab_foreach_grant_in_range(sg_page(sg),
+						      sg->offset,
+						      sg->length,
+						      blkif_copy_from_grant,
+						      &data);
+
+			kunmap_atomic(data.bvec_data);
 		}
 	}
 	/* Add the persistent grant into the list of free grants */
-	for (i = 0; i < nseg; i++) {
+	for (i = 0; i < num_grant; i++) {
 		if (gnttab_query_foreign_access(s->grants_used[i]->gref)) {
 			/*
 			 * If the grant is still mapped by the backend (the
@@ -1168,7 +1268,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 		}
 	}
 	if (s->req.operation == BLKIF_OP_INDIRECT) {
-		for (i = 0; i < INDIRECT_GREFS(nseg); i++) {
+		for (i = 0; i < INDIRECT_GREFS(num_grant); i++) {
 			if (gnttab_query_foreign_access(s->indirect_grants[i]->gref)) {
 				if (!info->feature_persistent)
 					pr_alert_ratelimited("backed has not unmapped grant: %u\n",
@@ -1312,7 +1412,7 @@ static int setup_blkring(struct xenbus_device *dev,
 {
 	struct blkif_sring *sring;
 	int err, i;
-	unsigned long ring_size = info->nr_ring_pages * PAGE_SIZE;
+	unsigned long ring_size = info->nr_ring_pages * XEN_PAGE_SIZE;
 	grant_ref_t gref[XENBUS_MAX_RING_PAGES];
 
 	for (i = 0; i < info->nr_ring_pages; i++)
@@ -1643,8 +1743,8 @@ static int blkif_recover(struct blkfront_info *info)
 			atomic_set(&split_bio->pending, pending);
 			split_bio->bio = bio;
 			for (i = 0; i < pending; i++) {
-				offset = (i * segs * PAGE_SIZE) >> 9;
-				size = min((unsigned int)(segs * PAGE_SIZE) >> 9,
+				offset = (i * segs * XEN_PAGE_SIZE) >> 9;
+				size = min((unsigned int)(segs * XEN_PAGE_SIZE) >> 9,
 					   (unsigned int)bio_sectors(bio) - offset);
 				cloned_bio = bio_clone(bio, GFP_NOIO);
 				BUG_ON(cloned_bio == NULL);
@@ -1755,15 +1855,17 @@ static void blkfront_setup_discard(struct blkfront_info *info)
 
 static int blkfront_setup_indirect(struct blkfront_info *info)
 {
-	unsigned int segs;
+	unsigned int psegs, grants;
 	int err, i;
 
 	if (info->max_indirect_segments == 0)
-		segs = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+		grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
 	else
-		segs = info->max_indirect_segments;
+		grants = info->max_indirect_segments;
+	psegs = grants / GRANTS_PER_PSEG;
 
-	err = fill_grant_buffer(info, (segs + INDIRECT_GREFS(segs)) * BLK_RING_SIZE(info));
+	err = fill_grant_buffer(info,
+				(grants + INDIRECT_GREFS(grants)) * BLK_RING_SIZE(info));
 	if (err)
 		goto out_of_memory;
 
@@ -1773,7 +1875,7 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 		 * grants, we need to allocate a set of pages that can be
 		 * used for mapping indirect grefs
 		 */
-		int num = INDIRECT_GREFS(segs) * BLK_RING_SIZE(info);
+		int num = INDIRECT_GREFS(grants) * BLK_RING_SIZE(info);
 
 		BUG_ON(!list_empty(&info->indirect_pages));
 		for (i = 0; i < num; i++) {
@@ -1786,20 +1888,20 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 
 	for (i = 0; i < BLK_RING_SIZE(info); i++) {
 		info->shadow[i].grants_used = kzalloc(
-			sizeof(info->shadow[i].grants_used[0]) * segs,
+			sizeof(info->shadow[i].grants_used[0]) * grants,
 			GFP_NOIO);
-		info->shadow[i].sg = kzalloc(sizeof(info->shadow[i].sg[0]) * segs, GFP_NOIO);
+		info->shadow[i].sg = kzalloc(sizeof(info->shadow[i].sg[0]) * psegs, GFP_NOIO);
 		if (info->max_indirect_segments)
 			info->shadow[i].indirect_grants = kzalloc(
 				sizeof(info->shadow[i].indirect_grants[0]) *
-				INDIRECT_GREFS(segs),
+				INDIRECT_GREFS(grants),
 				GFP_NOIO);
 		if ((info->shadow[i].grants_used == NULL) ||
 			(info->shadow[i].sg == NULL) ||
 		     (info->max_indirect_segments &&
 		     (info->shadow[i].indirect_grants == NULL)))
 			goto out_of_memory;
-		sg_init_table(info->shadow[i].sg, segs);
+		sg_init_table(info->shadow[i].sg, psegs);
 	}
 
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 15/20] block/xen-blkfront: Make it running on 64KB page granularity
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, Julien Grall,
	David Vrabel, Boris Ostrovsky, linux-arm-kernel,
	Roger Pau Monné

The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using block
device on a non-modified Xen.

The block API is using segment which should at least be the size of a
Linux page. Therefore, the driver will have to break the page in chunk
of 4K before giving the page to the backend.

When breaking a 64KB segment in 4KB chunks, it is possible that some
chunks are empty. As the PV protocol always require to have data in the
chunk, we have to count the number of Xen page which will be in use and
avoid sending empty chunks.

Note that, a pre-defined number of grants are reserved before preparing
the request. This pre-defined number is based on the number and the
maximum size of the segments. If each segment contains a very small
amount of data, the driver may reserve too many grants (16 grants is
reserved per segment with 64KB page granularity).

Furthermore, in the case of persistent grants we allocate one Linux page
per grant although only the first 4KB of the page will be effectively
in use. This could be improved by sharing the page with multiple grants.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

Improvement such as support 64KB grant is not taken into consideration in
this patch because we have the requirement to run a Linux using 64KB page
on a non-modified Xen.

    Changes in v4:
        - Rebase after d50babbe300eedf33ea5b00a12c5df3a05bd96c7 "
        xen-blkfront: introduce blkfront_gather_backend_features()"
        - Fix typoes
        - Add Roger's acked-by

    Changes in v3:
        - Use DIV_ROUND_UP in INDIRECT_GREFS
        - Split lines over 80 characters whenever it's possible
        - s/mfn/gfn/ based on the new naming
        - The grant callback doesn't allow anymore to change the len
        (wasn't used here).
        - gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
        - Use gnttab_count_grant to get the number of grants in a sg
        - Do some renaming to use the correct variable every time

    Changes in v2:
        - Use gnttab_foreach_grant to split a Linux page into grant
---
 drivers/block/xen-blkfront.c | 324 ++++++++++++++++++++++++++++---------------
 1 file changed, 213 insertions(+), 111 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 4232cbd..f2cdc73 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -78,6 +78,7 @@ struct blk_shadow {
 	struct grant **grants_used;
 	struct grant **indirect_grants;
 	struct scatterlist *sg;
+	unsigned int num_sg;
 };
 
 struct split_bio {
@@ -107,8 +108,12 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
 
-#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * (info)->nr_ring_pages)
-#define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
+#define BLK_RING_SIZE(info)	\
+	__CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * (info)->nr_ring_pages)
+
+#define BLK_MAX_RING_SIZE	\
+	__CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * XENBUS_MAX_RING_PAGES)
+
 /*
  * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
  * characters are enough. Define to 20 to keep consist with backend.
@@ -147,6 +152,7 @@ struct blkfront_info
 	unsigned int discard_granularity;
 	unsigned int discard_alignment;
 	unsigned int feature_persistent:1;
+	/* Number of 4KB segments handled */
 	unsigned int max_indirect_segments;
 	int is_ready;
 	struct blk_mq_tag_set tag_set;
@@ -175,10 +181,23 @@ static DEFINE_SPINLOCK(minor_lock);
 
 #define DEV_NAME	"xvd"	/* name in /dev */
 
-#define SEGS_PER_INDIRECT_FRAME \
-	(PAGE_SIZE/sizeof(struct blkif_request_segment))
-#define INDIRECT_GREFS(_segs) \
-	((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+/*
+ * Grants are always the same size as a Xen page (i.e 4KB).
+ * A physical segment is always the same size as a Linux page.
+ * Number of grants per physical segment
+ */
+#define GRANTS_PER_PSEG	(PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define GRANTS_PER_INDIRECT_FRAME \
+	(XEN_PAGE_SIZE / sizeof(struct blkif_request_segment))
+
+#define PSEGS_PER_INDIRECT_FRAME	\
+	(GRANTS_INDIRECT_FRAME / GRANTS_PSEGS)
+
+#define INDIRECT_GREFS(_grants)		\
+	DIV_ROUND_UP(_grants, GRANTS_PER_INDIRECT_FRAME)
+
+#define GREFS(_psegs)	((_psegs) * GRANTS_PER_PSEG)
 
 static int blkfront_setup_indirect(struct blkfront_info *info);
 static int blkfront_gather_backend_features(struct blkfront_info *info);
@@ -466,14 +485,100 @@ static int blkif_queue_discard_req(struct request *req)
 	return 0;
 }
 
+struct setup_rw_req {
+	unsigned int grant_idx;
+	struct blkif_request_segment *segments;
+	struct blkfront_info *info;
+	struct blkif_request *ring_req;
+	grant_ref_t gref_head;
+	unsigned int id;
+	/* Only used when persistent grant is used and it's a read request */
+	bool need_copy;
+	unsigned int bvec_off;
+	char *bvec_data;
+};
+
+static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
+				     unsigned int len, void *data)
+{
+	struct setup_rw_req *setup = data;
+	int n, ref;
+	struct grant *gnt_list_entry;
+	unsigned int fsect, lsect;
+	/* Convenient aliases */
+	unsigned int grant_idx = setup->grant_idx;
+	struct blkif_request *ring_req = setup->ring_req;
+	struct blkfront_info *info = setup->info;
+	struct blk_shadow *shadow = &info->shadow[setup->id];
+
+	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
+	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
+		if (setup->segments)
+			kunmap_atomic(setup->segments);
+
+		n = grant_idx / GRANTS_PER_INDIRECT_FRAME;
+		gnt_list_entry = get_indirect_grant(&setup->gref_head, info);
+		shadow->indirect_grants[n] = gnt_list_entry;
+		setup->segments = kmap_atomic(gnt_list_entry->page);
+		ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
+	}
+
+	gnt_list_entry = get_grant(&setup->gref_head, gfn, info);
+	ref = gnt_list_entry->gref;
+	shadow->grants_used[grant_idx] = gnt_list_entry;
+
+	if (setup->need_copy) {
+		void *shared_data;
+
+		shared_data = kmap_atomic(gnt_list_entry->page);
+		/*
+		 * this does not wipe data stored outside the
+		 * range sg->offset..sg->offset+sg->length.
+		 * Therefore, blkback *could* see data from
+		 * previous requests. This is OK as long as
+		 * persistent grants are shared with just one
+		 * domain. It may need refactoring if this
+		 * changes
+		 */
+		memcpy(shared_data + offset,
+		       setup->bvec_data + setup->bvec_off,
+		       len);
+
+		kunmap_atomic(shared_data);
+		setup->bvec_off += len;
+	}
+
+	fsect = offset >> 9;
+	lsect = fsect + (len >> 9) - 1;
+	if (ring_req->operation != BLKIF_OP_INDIRECT) {
+		ring_req->u.rw.seg[grant_idx] =
+			(struct blkif_request_segment) {
+				.gref       = ref,
+				.first_sect = fsect,
+				.last_sect  = lsect };
+	} else {
+		setup->segments[grant_idx % GRANTS_PER_INDIRECT_FRAME] =
+			(struct blkif_request_segment) {
+				.gref       = ref,
+				.first_sect = fsect,
+				.last_sect  = lsect };
+	}
+
+	(setup->grant_idx)++;
+}
+
 static int blkif_queue_rw_req(struct request *req)
 {
 	struct blkfront_info *info = req->rq_disk->private_data;
 	struct blkif_request *ring_req;
 	unsigned long id;
-	unsigned int fsect, lsect;
-	int i, ref, n;
-	struct blkif_request_segment *segments = NULL;
+	int i;
+	struct setup_rw_req setup = {
+		.grant_idx = 0,
+		.segments = NULL,
+		.info = info,
+		.need_copy = rq_data_dir(req) && info->feature_persistent,
+	};
 
 	/*
 	 * Used to store if we are able to queue the request by just using
@@ -481,25 +586,23 @@ static int blkif_queue_rw_req(struct request *req)
 	 * as there are not sufficiently many free.
 	 */
 	bool new_persistent_gnts;
-	grant_ref_t gref_head;
-	struct grant *gnt_list_entry = NULL;
 	struct scatterlist *sg;
-	int nseg, max_grefs;
+	int num_sg, max_grefs, num_grant;
 
-	max_grefs = req->nr_phys_segments;
+	max_grefs = req->nr_phys_segments * GRANTS_PER_PSEG;
 	if (max_grefs > BLKIF_MAX_SEGMENTS_PER_REQUEST)
 		/*
 		 * If we are using indirect segments we need to account
 		 * for the indirect grefs used in the request.
 		 */
-		max_grefs += INDIRECT_GREFS(req->nr_phys_segments);
+		max_grefs += INDIRECT_GREFS(max_grefs);
 
 	/* Check if we have enough grants to allocate a requests */
 	if (info->persistent_gnts_c < max_grefs) {
 		new_persistent_gnts = 1;
 		if (gnttab_alloc_grant_references(
 		    max_grefs - info->persistent_gnts_c,
-		    &gref_head) < 0) {
+		    &setup.gref_head) < 0) {
 			gnttab_request_free_callback(
 				&info->callback,
 				blkif_restart_queue_callback,
@@ -516,12 +619,19 @@ static int blkif_queue_rw_req(struct request *req)
 	info->shadow[id].request = req;
 
 	BUG_ON(info->max_indirect_segments == 0 &&
-	       req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
+	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
 	BUG_ON(info->max_indirect_segments &&
-	       req->nr_phys_segments > info->max_indirect_segments);
-	nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
+
+	num_sg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+	num_grant = 0;
+	/* Calculate the number of grant used */
+	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
+	       num_grant += gnttab_count_grant(sg->offset, sg->length);
+
 	ring_req->u.rw.id = id;
-	if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+	info->shadow[id].num_sg = num_sg;
+	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 		/*
 		 * The indirect operation can only be a BLKIF_OP_READ or
 		 * BLKIF_OP_WRITE
@@ -532,7 +642,7 @@ static int blkif_queue_rw_req(struct request *req)
 			BLKIF_OP_WRITE : BLKIF_OP_READ;
 		ring_req->u.indirect.sector_number = (blkif_sector_t)blk_rq_pos(req);
 		ring_req->u.indirect.handle = info->handle;
-		ring_req->u.indirect.nr_segments = nseg;
+		ring_req->u.indirect.nr_segments = num_grant;
 	} else {
 		ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
 		ring_req->u.rw.handle = info->handle;
@@ -560,73 +670,30 @@ static int blkif_queue_rw_req(struct request *req)
 				ring_req->operation = 0;
 			}
 		}
-		ring_req->u.rw.nr_segments = nseg;
-	}
-	for_each_sg(info->shadow[id].sg, sg, nseg, i) {
-		fsect = sg->offset >> 9;
-		lsect = fsect + (sg->length >> 9) - 1;
-
-		if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
-		    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
-			if (segments)
-				kunmap_atomic(segments);
-
-			n = i / SEGS_PER_INDIRECT_FRAME;
-			gnt_list_entry = get_indirect_grant(&gref_head, info);
-			info->shadow[id].indirect_grants[n] = gnt_list_entry;
-			segments = kmap_atomic(gnt_list_entry->page);
-			ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
-		}
-
-		gnt_list_entry = get_grant(&gref_head,
-					   xen_page_to_gfn(sg_page(sg)),
-					   info);
-		ref = gnt_list_entry->gref;
-
-		info->shadow[id].grants_used[i] = gnt_list_entry;
-
-		if (rq_data_dir(req) && info->feature_persistent) {
-			char *bvec_data;
-			void *shared_data;
+		ring_req->u.rw.nr_segments = num_grant;
+	}
 
-			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
+	setup.ring_req = ring_req;
+	setup.id = id;
+	for_each_sg(info->shadow[id].sg, sg, num_sg, i) {
+		BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
-			shared_data = kmap_atomic(gnt_list_entry->page);
-			bvec_data = kmap_atomic(sg_page(sg));
+		if (setup.need_copy) {
+			setup.bvec_off = sg->offset;
+			setup.bvec_data = kmap_atomic(sg_page(sg));
+		}
 
-			/*
-			 * this does not wipe data stored outside the
-			 * range sg->offset..sg->offset+sg->length.
-			 * Therefore, blkback *could* see data from
-			 * previous requests. This is OK as long as
-			 * persistent grants are shared with just one
-			 * domain. It may need refactoring if this
-			 * changes
-			 */
-			memcpy(shared_data + sg->offset,
-			       bvec_data   + sg->offset,
-			       sg->length);
+		gnttab_foreach_grant_in_range(sg_page(sg),
+					      sg->offset,
+					      sg->length,
+					      blkif_setup_rw_req_grant,
+					      &setup);
 
-			kunmap_atomic(bvec_data);
-			kunmap_atomic(shared_data);
-		}
-		if (ring_req->operation != BLKIF_OP_INDIRECT) {
-			ring_req->u.rw.seg[i] =
-					(struct blkif_request_segment) {
-						.gref       = ref,
-						.first_sect = fsect,
-						.last_sect  = lsect };
-		} else {
-			n = i % SEGS_PER_INDIRECT_FRAME;
-			segments[n] =
-				(struct blkif_request_segment) {
-						.gref       = ref,
-						.first_sect = fsect,
-						.last_sect  = lsect };
-		}
+		if (setup.need_copy)
+			kunmap_atomic(setup.bvec_data);
 	}
-	if (segments)
-		kunmap_atomic(segments);
+	if (setup.segments)
+		kunmap_atomic(setup.segments);
 
 	info->ring.req_prod_pvt++;
 
@@ -634,7 +701,7 @@ static int blkif_queue_rw_req(struct request *req)
 	info->shadow[id].req = *ring_req;
 
 	if (new_persistent_gnts)
-		gnttab_free_grant_references(gref_head);
+		gnttab_free_grant_references(setup.gref_head);
 
 	return 0;
 }
@@ -751,14 +818,14 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 	/* Hard sector size and max sectors impersonate the equiv. hardware. */
 	blk_queue_logical_block_size(rq, sector_size);
 	blk_queue_physical_block_size(rq, physical_sector_size);
-	blk_queue_max_hw_sectors(rq, (segments * PAGE_SIZE) / 512);
+	blk_queue_max_hw_sectors(rq, (segments * XEN_PAGE_SIZE) / 512);
 
 	/* Each segment in a request is up to an aligned page in size. */
 	blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
 	blk_queue_max_segment_size(rq, PAGE_SIZE);
 
 	/* Ensure a merged request will fit in a single I/O ring slot. */
-	blk_queue_max_segments(rq, segments);
+	blk_queue_max_segments(rq, segments / GRANTS_PER_PSEG);
 
 	/* Make sure buffer addresses are sector-aligned. */
 	blk_queue_dma_alignment(rq, 511);
@@ -1117,32 +1184,65 @@ free_shadow:
 
 }
 
+struct copy_from_grant {
+	const struct blk_shadow *s;
+	unsigned int grant_idx;
+	unsigned int bvec_offset;
+	char *bvec_data;
+};
+
+static void blkif_copy_from_grant(unsigned long gfn, unsigned int offset,
+				  unsigned int len, void *data)
+{
+	struct copy_from_grant *info = data;
+	char *shared_data;
+	/* Convenient aliases */
+	const struct blk_shadow *s = info->s;
+
+	shared_data = kmap_atomic(s->grants_used[info->grant_idx]->page);
+
+	memcpy(info->bvec_data + info->bvec_offset,
+	       shared_data + offset, len);
+
+	info->bvec_offset += len;
+	info->grant_idx++;
+
+	kunmap_atomic(shared_data);
+}
+
 static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			     struct blkif_response *bret)
 {
 	int i = 0;
 	struct scatterlist *sg;
-	char *bvec_data;
-	void *shared_data;
-	int nseg;
+	int num_sg, num_grant;
+	struct copy_from_grant data = {
+		.s = s,
+		.grant_idx = 0,
+	};
 
-	nseg = s->req.operation == BLKIF_OP_INDIRECT ?
+	num_grant = s->req.operation == BLKIF_OP_INDIRECT ?
 		s->req.u.indirect.nr_segments : s->req.u.rw.nr_segments;
+	num_sg = s->num_sg;
 
 	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
-		for_each_sg(s->sg, sg, nseg, i) {
+		for_each_sg(s->sg, sg, num_sg, i) {
 			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
-			shared_data = kmap_atomic(s->grants_used[i]->page);
-			bvec_data = kmap_atomic(sg_page(sg));
-			memcpy(bvec_data   + sg->offset,
-			       shared_data + sg->offset,
-			       sg->length);
-			kunmap_atomic(bvec_data);
-			kunmap_atomic(shared_data);
+
+			data.bvec_offset = sg->offset;
+			data.bvec_data = kmap_atomic(sg_page(sg));
+
+			gnttab_foreach_grant_in_range(sg_page(sg),
+						      sg->offset,
+						      sg->length,
+						      blkif_copy_from_grant,
+						      &data);
+
+			kunmap_atomic(data.bvec_data);
 		}
 	}
 	/* Add the persistent grant into the list of free grants */
-	for (i = 0; i < nseg; i++) {
+	for (i = 0; i < num_grant; i++) {
 		if (gnttab_query_foreign_access(s->grants_used[i]->gref)) {
 			/*
 			 * If the grant is still mapped by the backend (the
@@ -1168,7 +1268,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 		}
 	}
 	if (s->req.operation == BLKIF_OP_INDIRECT) {
-		for (i = 0; i < INDIRECT_GREFS(nseg); i++) {
+		for (i = 0; i < INDIRECT_GREFS(num_grant); i++) {
 			if (gnttab_query_foreign_access(s->indirect_grants[i]->gref)) {
 				if (!info->feature_persistent)
 					pr_alert_ratelimited("backed has not unmapped grant: %u\n",
@@ -1312,7 +1412,7 @@ static int setup_blkring(struct xenbus_device *dev,
 {
 	struct blkif_sring *sring;
 	int err, i;
-	unsigned long ring_size = info->nr_ring_pages * PAGE_SIZE;
+	unsigned long ring_size = info->nr_ring_pages * XEN_PAGE_SIZE;
 	grant_ref_t gref[XENBUS_MAX_RING_PAGES];
 
 	for (i = 0; i < info->nr_ring_pages; i++)
@@ -1643,8 +1743,8 @@ static int blkif_recover(struct blkfront_info *info)
 			atomic_set(&split_bio->pending, pending);
 			split_bio->bio = bio;
 			for (i = 0; i < pending; i++) {
-				offset = (i * segs * PAGE_SIZE) >> 9;
-				size = min((unsigned int)(segs * PAGE_SIZE) >> 9,
+				offset = (i * segs * XEN_PAGE_SIZE) >> 9;
+				size = min((unsigned int)(segs * XEN_PAGE_SIZE) >> 9,
 					   (unsigned int)bio_sectors(bio) - offset);
 				cloned_bio = bio_clone(bio, GFP_NOIO);
 				BUG_ON(cloned_bio == NULL);
@@ -1755,15 +1855,17 @@ static void blkfront_setup_discard(struct blkfront_info *info)
 
 static int blkfront_setup_indirect(struct blkfront_info *info)
 {
-	unsigned int segs;
+	unsigned int psegs, grants;
 	int err, i;
 
 	if (info->max_indirect_segments == 0)
-		segs = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+		grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
 	else
-		segs = info->max_indirect_segments;
+		grants = info->max_indirect_segments;
+	psegs = grants / GRANTS_PER_PSEG;
 
-	err = fill_grant_buffer(info, (segs + INDIRECT_GREFS(segs)) * BLK_RING_SIZE(info));
+	err = fill_grant_buffer(info,
+				(grants + INDIRECT_GREFS(grants)) * BLK_RING_SIZE(info));
 	if (err)
 		goto out_of_memory;
 
@@ -1773,7 +1875,7 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 		 * grants, we need to allocate a set of pages that can be
 		 * used for mapping indirect grefs
 		 */
-		int num = INDIRECT_GREFS(segs) * BLK_RING_SIZE(info);
+		int num = INDIRECT_GREFS(grants) * BLK_RING_SIZE(info);
 
 		BUG_ON(!list_empty(&info->indirect_pages));
 		for (i = 0; i < num; i++) {
@@ -1786,20 +1888,20 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 
 	for (i = 0; i < BLK_RING_SIZE(info); i++) {
 		info->shadow[i].grants_used = kzalloc(
-			sizeof(info->shadow[i].grants_used[0]) * segs,
+			sizeof(info->shadow[i].grants_used[0]) * grants,
 			GFP_NOIO);
-		info->shadow[i].sg = kzalloc(sizeof(info->shadow[i].sg[0]) * segs, GFP_NOIO);
+		info->shadow[i].sg = kzalloc(sizeof(info->shadow[i].sg[0]) * psegs, GFP_NOIO);
 		if (info->max_indirect_segments)
 			info->shadow[i].indirect_grants = kzalloc(
 				sizeof(info->shadow[i].indirect_grants[0]) *
-				INDIRECT_GREFS(segs),
+				INDIRECT_GREFS(grants),
 				GFP_NOIO);
 		if ((info->shadow[i].grants_used == NULL) ||
 			(info->shadow[i].sg == NULL) ||
 		     (info->max_indirect_segments &&
 		     (info->shadow[i].indirect_grants == NULL)))
 			goto out_of_memory;
-		sg_init_table(info->shadow[i].sg, segs);
+		sg_init_table(info->shadow[i].sg, psegs);
 	}
 
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 15/20] block/xen-blkfront: Make it running on 64KB page granularity
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using block
device on a non-modified Xen.

The block API is using segment which should at least be the size of a
Linux page. Therefore, the driver will have to break the page in chunk
of 4K before giving the page to the backend.

When breaking a 64KB segment in 4KB chunks, it is possible that some
chunks are empty. As the PV protocol always require to have data in the
chunk, we have to count the number of Xen page which will be in use and
avoid sending empty chunks.

Note that, a pre-defined number of grants are reserved before preparing
the request. This pre-defined number is based on the number and the
maximum size of the segments. If each segment contains a very small
amount of data, the driver may reserve too many grants (16 grants is
reserved per segment with 64KB page granularity).

Furthermore, in the case of persistent grants we allocate one Linux page
per grant although only the first 4KB of the page will be effectively
in use. This could be improved by sharing the page with multiple grants.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monn? <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

Improvement such as support 64KB grant is not taken into consideration in
this patch because we have the requirement to run a Linux using 64KB page
on a non-modified Xen.

    Changes in v4:
        - Rebase after d50babbe300eedf33ea5b00a12c5df3a05bd96c7 "
        xen-blkfront: introduce blkfront_gather_backend_features()"
        - Fix typoes
        - Add Roger's acked-by

    Changes in v3:
        - Use DIV_ROUND_UP in INDIRECT_GREFS
        - Split lines over 80 characters whenever it's possible
        - s/mfn/gfn/ based on the new naming
        - The grant callback doesn't allow anymore to change the len
        (wasn't used here).
        - gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
        - Use gnttab_count_grant to get the number of grants in a sg
        - Do some renaming to use the correct variable every time

    Changes in v2:
        - Use gnttab_foreach_grant to split a Linux page into grant
---
 drivers/block/xen-blkfront.c | 324 ++++++++++++++++++++++++++++---------------
 1 file changed, 213 insertions(+), 111 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 4232cbd..f2cdc73 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -78,6 +78,7 @@ struct blk_shadow {
 	struct grant **grants_used;
 	struct grant **indirect_grants;
 	struct scatterlist *sg;
+	unsigned int num_sg;
 };
 
 struct split_bio {
@@ -107,8 +108,12 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
 
-#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * (info)->nr_ring_pages)
-#define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
+#define BLK_RING_SIZE(info)	\
+	__CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * (info)->nr_ring_pages)
+
+#define BLK_MAX_RING_SIZE	\
+	__CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * XENBUS_MAX_RING_PAGES)
+
 /*
  * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
  * characters are enough. Define to 20 to keep consist with backend.
@@ -147,6 +152,7 @@ struct blkfront_info
 	unsigned int discard_granularity;
 	unsigned int discard_alignment;
 	unsigned int feature_persistent:1;
+	/* Number of 4KB segments handled */
 	unsigned int max_indirect_segments;
 	int is_ready;
 	struct blk_mq_tag_set tag_set;
@@ -175,10 +181,23 @@ static DEFINE_SPINLOCK(minor_lock);
 
 #define DEV_NAME	"xvd"	/* name in /dev */
 
-#define SEGS_PER_INDIRECT_FRAME \
-	(PAGE_SIZE/sizeof(struct blkif_request_segment))
-#define INDIRECT_GREFS(_segs) \
-	((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+/*
+ * Grants are always the same size as a Xen page (i.e 4KB).
+ * A physical segment is always the same size as a Linux page.
+ * Number of grants per physical segment
+ */
+#define GRANTS_PER_PSEG	(PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define GRANTS_PER_INDIRECT_FRAME \
+	(XEN_PAGE_SIZE / sizeof(struct blkif_request_segment))
+
+#define PSEGS_PER_INDIRECT_FRAME	\
+	(GRANTS_INDIRECT_FRAME / GRANTS_PSEGS)
+
+#define INDIRECT_GREFS(_grants)		\
+	DIV_ROUND_UP(_grants, GRANTS_PER_INDIRECT_FRAME)
+
+#define GREFS(_psegs)	((_psegs) * GRANTS_PER_PSEG)
 
 static int blkfront_setup_indirect(struct blkfront_info *info);
 static int blkfront_gather_backend_features(struct blkfront_info *info);
@@ -466,14 +485,100 @@ static int blkif_queue_discard_req(struct request *req)
 	return 0;
 }
 
+struct setup_rw_req {
+	unsigned int grant_idx;
+	struct blkif_request_segment *segments;
+	struct blkfront_info *info;
+	struct blkif_request *ring_req;
+	grant_ref_t gref_head;
+	unsigned int id;
+	/* Only used when persistent grant is used and it's a read request */
+	bool need_copy;
+	unsigned int bvec_off;
+	char *bvec_data;
+};
+
+static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
+				     unsigned int len, void *data)
+{
+	struct setup_rw_req *setup = data;
+	int n, ref;
+	struct grant *gnt_list_entry;
+	unsigned int fsect, lsect;
+	/* Convenient aliases */
+	unsigned int grant_idx = setup->grant_idx;
+	struct blkif_request *ring_req = setup->ring_req;
+	struct blkfront_info *info = setup->info;
+	struct blk_shadow *shadow = &info->shadow[setup->id];
+
+	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
+	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
+		if (setup->segments)
+			kunmap_atomic(setup->segments);
+
+		n = grant_idx / GRANTS_PER_INDIRECT_FRAME;
+		gnt_list_entry = get_indirect_grant(&setup->gref_head, info);
+		shadow->indirect_grants[n] = gnt_list_entry;
+		setup->segments = kmap_atomic(gnt_list_entry->page);
+		ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
+	}
+
+	gnt_list_entry = get_grant(&setup->gref_head, gfn, info);
+	ref = gnt_list_entry->gref;
+	shadow->grants_used[grant_idx] = gnt_list_entry;
+
+	if (setup->need_copy) {
+		void *shared_data;
+
+		shared_data = kmap_atomic(gnt_list_entry->page);
+		/*
+		 * this does not wipe data stored outside the
+		 * range sg->offset..sg->offset+sg->length.
+		 * Therefore, blkback *could* see data from
+		 * previous requests. This is OK as long as
+		 * persistent grants are shared with just one
+		 * domain. It may need refactoring if this
+		 * changes
+		 */
+		memcpy(shared_data + offset,
+		       setup->bvec_data + setup->bvec_off,
+		       len);
+
+		kunmap_atomic(shared_data);
+		setup->bvec_off += len;
+	}
+
+	fsect = offset >> 9;
+	lsect = fsect + (len >> 9) - 1;
+	if (ring_req->operation != BLKIF_OP_INDIRECT) {
+		ring_req->u.rw.seg[grant_idx] =
+			(struct blkif_request_segment) {
+				.gref       = ref,
+				.first_sect = fsect,
+				.last_sect  = lsect };
+	} else {
+		setup->segments[grant_idx % GRANTS_PER_INDIRECT_FRAME] =
+			(struct blkif_request_segment) {
+				.gref       = ref,
+				.first_sect = fsect,
+				.last_sect  = lsect };
+	}
+
+	(setup->grant_idx)++;
+}
+
 static int blkif_queue_rw_req(struct request *req)
 {
 	struct blkfront_info *info = req->rq_disk->private_data;
 	struct blkif_request *ring_req;
 	unsigned long id;
-	unsigned int fsect, lsect;
-	int i, ref, n;
-	struct blkif_request_segment *segments = NULL;
+	int i;
+	struct setup_rw_req setup = {
+		.grant_idx = 0,
+		.segments = NULL,
+		.info = info,
+		.need_copy = rq_data_dir(req) && info->feature_persistent,
+	};
 
 	/*
 	 * Used to store if we are able to queue the request by just using
@@ -481,25 +586,23 @@ static int blkif_queue_rw_req(struct request *req)
 	 * as there are not sufficiently many free.
 	 */
 	bool new_persistent_gnts;
-	grant_ref_t gref_head;
-	struct grant *gnt_list_entry = NULL;
 	struct scatterlist *sg;
-	int nseg, max_grefs;
+	int num_sg, max_grefs, num_grant;
 
-	max_grefs = req->nr_phys_segments;
+	max_grefs = req->nr_phys_segments * GRANTS_PER_PSEG;
 	if (max_grefs > BLKIF_MAX_SEGMENTS_PER_REQUEST)
 		/*
 		 * If we are using indirect segments we need to account
 		 * for the indirect grefs used in the request.
 		 */
-		max_grefs += INDIRECT_GREFS(req->nr_phys_segments);
+		max_grefs += INDIRECT_GREFS(max_grefs);
 
 	/* Check if we have enough grants to allocate a requests */
 	if (info->persistent_gnts_c < max_grefs) {
 		new_persistent_gnts = 1;
 		if (gnttab_alloc_grant_references(
 		    max_grefs - info->persistent_gnts_c,
-		    &gref_head) < 0) {
+		    &setup.gref_head) < 0) {
 			gnttab_request_free_callback(
 				&info->callback,
 				blkif_restart_queue_callback,
@@ -516,12 +619,19 @@ static int blkif_queue_rw_req(struct request *req)
 	info->shadow[id].request = req;
 
 	BUG_ON(info->max_indirect_segments == 0 &&
-	       req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
+	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
 	BUG_ON(info->max_indirect_segments &&
-	       req->nr_phys_segments > info->max_indirect_segments);
-	nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
+
+	num_sg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+	num_grant = 0;
+	/* Calculate the number of grant used */
+	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
+	       num_grant += gnttab_count_grant(sg->offset, sg->length);
+
 	ring_req->u.rw.id = id;
-	if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+	info->shadow[id].num_sg = num_sg;
+	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 		/*
 		 * The indirect operation can only be a BLKIF_OP_READ or
 		 * BLKIF_OP_WRITE
@@ -532,7 +642,7 @@ static int blkif_queue_rw_req(struct request *req)
 			BLKIF_OP_WRITE : BLKIF_OP_READ;
 		ring_req->u.indirect.sector_number = (blkif_sector_t)blk_rq_pos(req);
 		ring_req->u.indirect.handle = info->handle;
-		ring_req->u.indirect.nr_segments = nseg;
+		ring_req->u.indirect.nr_segments = num_grant;
 	} else {
 		ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
 		ring_req->u.rw.handle = info->handle;
@@ -560,73 +670,30 @@ static int blkif_queue_rw_req(struct request *req)
 				ring_req->operation = 0;
 			}
 		}
-		ring_req->u.rw.nr_segments = nseg;
-	}
-	for_each_sg(info->shadow[id].sg, sg, nseg, i) {
-		fsect = sg->offset >> 9;
-		lsect = fsect + (sg->length >> 9) - 1;
-
-		if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
-		    (i % SEGS_PER_INDIRECT_FRAME == 0)) {
-			if (segments)
-				kunmap_atomic(segments);
-
-			n = i / SEGS_PER_INDIRECT_FRAME;
-			gnt_list_entry = get_indirect_grant(&gref_head, info);
-			info->shadow[id].indirect_grants[n] = gnt_list_entry;
-			segments = kmap_atomic(gnt_list_entry->page);
-			ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
-		}
-
-		gnt_list_entry = get_grant(&gref_head,
-					   xen_page_to_gfn(sg_page(sg)),
-					   info);
-		ref = gnt_list_entry->gref;
-
-		info->shadow[id].grants_used[i] = gnt_list_entry;
-
-		if (rq_data_dir(req) && info->feature_persistent) {
-			char *bvec_data;
-			void *shared_data;
+		ring_req->u.rw.nr_segments = num_grant;
+	}
 
-			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
+	setup.ring_req = ring_req;
+	setup.id = id;
+	for_each_sg(info->shadow[id].sg, sg, num_sg, i) {
+		BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
-			shared_data = kmap_atomic(gnt_list_entry->page);
-			bvec_data = kmap_atomic(sg_page(sg));
+		if (setup.need_copy) {
+			setup.bvec_off = sg->offset;
+			setup.bvec_data = kmap_atomic(sg_page(sg));
+		}
 
-			/*
-			 * this does not wipe data stored outside the
-			 * range sg->offset..sg->offset+sg->length.
-			 * Therefore, blkback *could* see data from
-			 * previous requests. This is OK as long as
-			 * persistent grants are shared with just one
-			 * domain. It may need refactoring if this
-			 * changes
-			 */
-			memcpy(shared_data + sg->offset,
-			       bvec_data   + sg->offset,
-			       sg->length);
+		gnttab_foreach_grant_in_range(sg_page(sg),
+					      sg->offset,
+					      sg->length,
+					      blkif_setup_rw_req_grant,
+					      &setup);
 
-			kunmap_atomic(bvec_data);
-			kunmap_atomic(shared_data);
-		}
-		if (ring_req->operation != BLKIF_OP_INDIRECT) {
-			ring_req->u.rw.seg[i] =
-					(struct blkif_request_segment) {
-						.gref       = ref,
-						.first_sect = fsect,
-						.last_sect  = lsect };
-		} else {
-			n = i % SEGS_PER_INDIRECT_FRAME;
-			segments[n] =
-				(struct blkif_request_segment) {
-						.gref       = ref,
-						.first_sect = fsect,
-						.last_sect  = lsect };
-		}
+		if (setup.need_copy)
+			kunmap_atomic(setup.bvec_data);
 	}
-	if (segments)
-		kunmap_atomic(segments);
+	if (setup.segments)
+		kunmap_atomic(setup.segments);
 
 	info->ring.req_prod_pvt++;
 
@@ -634,7 +701,7 @@ static int blkif_queue_rw_req(struct request *req)
 	info->shadow[id].req = *ring_req;
 
 	if (new_persistent_gnts)
-		gnttab_free_grant_references(gref_head);
+		gnttab_free_grant_references(setup.gref_head);
 
 	return 0;
 }
@@ -751,14 +818,14 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 	/* Hard sector size and max sectors impersonate the equiv. hardware. */
 	blk_queue_logical_block_size(rq, sector_size);
 	blk_queue_physical_block_size(rq, physical_sector_size);
-	blk_queue_max_hw_sectors(rq, (segments * PAGE_SIZE) / 512);
+	blk_queue_max_hw_sectors(rq, (segments * XEN_PAGE_SIZE) / 512);
 
 	/* Each segment in a request is up to an aligned page in size. */
 	blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
 	blk_queue_max_segment_size(rq, PAGE_SIZE);
 
 	/* Ensure a merged request will fit in a single I/O ring slot. */
-	blk_queue_max_segments(rq, segments);
+	blk_queue_max_segments(rq, segments / GRANTS_PER_PSEG);
 
 	/* Make sure buffer addresses are sector-aligned. */
 	blk_queue_dma_alignment(rq, 511);
@@ -1117,32 +1184,65 @@ free_shadow:
 
 }
 
+struct copy_from_grant {
+	const struct blk_shadow *s;
+	unsigned int grant_idx;
+	unsigned int bvec_offset;
+	char *bvec_data;
+};
+
+static void blkif_copy_from_grant(unsigned long gfn, unsigned int offset,
+				  unsigned int len, void *data)
+{
+	struct copy_from_grant *info = data;
+	char *shared_data;
+	/* Convenient aliases */
+	const struct blk_shadow *s = info->s;
+
+	shared_data = kmap_atomic(s->grants_used[info->grant_idx]->page);
+
+	memcpy(info->bvec_data + info->bvec_offset,
+	       shared_data + offset, len);
+
+	info->bvec_offset += len;
+	info->grant_idx++;
+
+	kunmap_atomic(shared_data);
+}
+
 static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			     struct blkif_response *bret)
 {
 	int i = 0;
 	struct scatterlist *sg;
-	char *bvec_data;
-	void *shared_data;
-	int nseg;
+	int num_sg, num_grant;
+	struct copy_from_grant data = {
+		.s = s,
+		.grant_idx = 0,
+	};
 
-	nseg = s->req.operation == BLKIF_OP_INDIRECT ?
+	num_grant = s->req.operation == BLKIF_OP_INDIRECT ?
 		s->req.u.indirect.nr_segments : s->req.u.rw.nr_segments;
+	num_sg = s->num_sg;
 
 	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
-		for_each_sg(s->sg, sg, nseg, i) {
+		for_each_sg(s->sg, sg, num_sg, i) {
 			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
-			shared_data = kmap_atomic(s->grants_used[i]->page);
-			bvec_data = kmap_atomic(sg_page(sg));
-			memcpy(bvec_data   + sg->offset,
-			       shared_data + sg->offset,
-			       sg->length);
-			kunmap_atomic(bvec_data);
-			kunmap_atomic(shared_data);
+
+			data.bvec_offset = sg->offset;
+			data.bvec_data = kmap_atomic(sg_page(sg));
+
+			gnttab_foreach_grant_in_range(sg_page(sg),
+						      sg->offset,
+						      sg->length,
+						      blkif_copy_from_grant,
+						      &data);
+
+			kunmap_atomic(data.bvec_data);
 		}
 	}
 	/* Add the persistent grant into the list of free grants */
-	for (i = 0; i < nseg; i++) {
+	for (i = 0; i < num_grant; i++) {
 		if (gnttab_query_foreign_access(s->grants_used[i]->gref)) {
 			/*
 			 * If the grant is still mapped by the backend (the
@@ -1168,7 +1268,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 		}
 	}
 	if (s->req.operation == BLKIF_OP_INDIRECT) {
-		for (i = 0; i < INDIRECT_GREFS(nseg); i++) {
+		for (i = 0; i < INDIRECT_GREFS(num_grant); i++) {
 			if (gnttab_query_foreign_access(s->indirect_grants[i]->gref)) {
 				if (!info->feature_persistent)
 					pr_alert_ratelimited("backed has not unmapped grant: %u\n",
@@ -1312,7 +1412,7 @@ static int setup_blkring(struct xenbus_device *dev,
 {
 	struct blkif_sring *sring;
 	int err, i;
-	unsigned long ring_size = info->nr_ring_pages * PAGE_SIZE;
+	unsigned long ring_size = info->nr_ring_pages * XEN_PAGE_SIZE;
 	grant_ref_t gref[XENBUS_MAX_RING_PAGES];
 
 	for (i = 0; i < info->nr_ring_pages; i++)
@@ -1643,8 +1743,8 @@ static int blkif_recover(struct blkfront_info *info)
 			atomic_set(&split_bio->pending, pending);
 			split_bio->bio = bio;
 			for (i = 0; i < pending; i++) {
-				offset = (i * segs * PAGE_SIZE) >> 9;
-				size = min((unsigned int)(segs * PAGE_SIZE) >> 9,
+				offset = (i * segs * XEN_PAGE_SIZE) >> 9;
+				size = min((unsigned int)(segs * XEN_PAGE_SIZE) >> 9,
 					   (unsigned int)bio_sectors(bio) - offset);
 				cloned_bio = bio_clone(bio, GFP_NOIO);
 				BUG_ON(cloned_bio == NULL);
@@ -1755,15 +1855,17 @@ static void blkfront_setup_discard(struct blkfront_info *info)
 
 static int blkfront_setup_indirect(struct blkfront_info *info)
 {
-	unsigned int segs;
+	unsigned int psegs, grants;
 	int err, i;
 
 	if (info->max_indirect_segments == 0)
-		segs = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+		grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
 	else
-		segs = info->max_indirect_segments;
+		grants = info->max_indirect_segments;
+	psegs = grants / GRANTS_PER_PSEG;
 
-	err = fill_grant_buffer(info, (segs + INDIRECT_GREFS(segs)) * BLK_RING_SIZE(info));
+	err = fill_grant_buffer(info,
+				(grants + INDIRECT_GREFS(grants)) * BLK_RING_SIZE(info));
 	if (err)
 		goto out_of_memory;
 
@@ -1773,7 +1875,7 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 		 * grants, we need to allocate a set of pages that can be
 		 * used for mapping indirect grefs
 		 */
-		int num = INDIRECT_GREFS(segs) * BLK_RING_SIZE(info);
+		int num = INDIRECT_GREFS(grants) * BLK_RING_SIZE(info);
 
 		BUG_ON(!list_empty(&info->indirect_pages));
 		for (i = 0; i < num; i++) {
@@ -1786,20 +1888,20 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 
 	for (i = 0; i < BLK_RING_SIZE(info); i++) {
 		info->shadow[i].grants_used = kzalloc(
-			sizeof(info->shadow[i].grants_used[0]) * segs,
+			sizeof(info->shadow[i].grants_used[0]) * grants,
 			GFP_NOIO);
-		info->shadow[i].sg = kzalloc(sizeof(info->shadow[i].sg[0]) * segs, GFP_NOIO);
+		info->shadow[i].sg = kzalloc(sizeof(info->shadow[i].sg[0]) * psegs, GFP_NOIO);
 		if (info->max_indirect_segments)
 			info->shadow[i].indirect_grants = kzalloc(
 				sizeof(info->shadow[i].indirect_grants[0]) *
-				INDIRECT_GREFS(segs),
+				INDIRECT_GREFS(grants),
 				GFP_NOIO);
 		if ((info->shadow[i].grants_used == NULL) ||
 			(info->shadow[i].sg == NULL) ||
 		     (info->max_indirect_segments &&
 		     (info->shadow[i].indirect_grants == NULL)))
 			goto out_of_memory;
-		sg_init_table(info->shadow[i].sg, segs);
+		sg_init_table(info->shadow[i].sg, psegs);
 	}
 
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 16/20] block/xen-blkback: Make it running on 64KB page granularity
  2015-09-07 15:33 ` Julien Grall
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, Roger Pau Monné, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, David Vrabel

The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity behaving as a
block backend on a non-modified Xen.

It's only necessary to adapt the ring size and the number of request per
indirect frames. The rest of the code is relying on the grant table
code.

Note that the grant table code is allocating a Linux page per grant
which will result to waste 6OKB for every grant when Linux is using 64KB
page granularity. This could be improved by sharing the page between
multiple grants.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: "Roger Pau Monné" <roger.pau@citrix.com>
---

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

This has been tested only with a loop device. I plan to test passing
hard drive partition but I didn't yet convert the swiotlb code.

    Changes in v4:
        - Add Roger's acked-by

    Changes in v3:
        - Use DIV_ROUND_UP in INDIRECT_PAGES to avoid a line over 80
        characters
---
 drivers/block/xen-blkback/blkback.c |  5 +++--
 drivers/block/xen-blkback/common.h  | 17 +++++++++++++----
 drivers/block/xen-blkback/xenbus.c  |  9 ++++++---
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 954c002..802319a 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -961,7 +961,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 		seg[n].nsec = segments[i].last_sect -
 			segments[i].first_sect + 1;
 		seg[n].offset = (segments[i].first_sect << 9);
-		if ((segments[i].last_sect >= (PAGE_SIZE >> 9)) ||
+		if ((segments[i].last_sect >= (XEN_PAGE_SIZE >> 9)) ||
 		    (segments[i].last_sect < segments[i].first_sect)) {
 			rc = -EINVAL;
 			goto unmap;
@@ -1210,6 +1210,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
 	req_operation = req->operation == BLKIF_OP_INDIRECT ?
 			req->u.indirect.indirect_op : req->operation;
+
 	if ((req->operation == BLKIF_OP_INDIRECT) &&
 	    (req_operation != BLKIF_OP_READ) &&
 	    (req_operation != BLKIF_OP_WRITE)) {
@@ -1268,7 +1269,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 			seg[i].nsec = req->u.rw.seg[i].last_sect -
 				req->u.rw.seg[i].first_sect + 1;
 			seg[i].offset = (req->u.rw.seg[i].first_sect << 9);
-			if ((req->u.rw.seg[i].last_sect >= (PAGE_SIZE >> 9)) ||
+			if ((req->u.rw.seg[i].last_sect >= (XEN_PAGE_SIZE >> 9)) ||
 			    (req->u.rw.seg[i].last_sect <
 			     req->u.rw.seg[i].first_sect))
 				goto fail_response;
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 45a044a..68e87a0 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -39,6 +39,7 @@
 #include <asm/pgalloc.h>
 #include <asm/hypervisor.h>
 #include <xen/grant_table.h>
+#include <xen/page.h>
 #include <xen/xenbus.h>
 #include <xen/interface/io/ring.h>
 #include <xen/interface/io/blkif.h>
@@ -51,12 +52,20 @@ extern unsigned int xen_blkif_max_ring_order;
  */
 #define MAX_INDIRECT_SEGMENTS 256
 
-#define SEGS_PER_INDIRECT_FRAME \
-	(PAGE_SIZE/sizeof(struct blkif_request_segment))
+/*
+ * Xen use 4K pages. The guest may use different page size (4K or 64K)
+ * Number of Xen pages per segment
+ */
+#define XEN_PAGES_PER_SEGMENT   (PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define XEN_PAGES_PER_INDIRECT_FRAME \
+	(XEN_PAGE_SIZE/sizeof(struct blkif_request_segment))
+#define SEGS_PER_INDIRECT_FRAME	\
+	(XEN_PAGES_PER_INDIRECT_FRAME / XEN_PAGES_PER_SEGMENT)
+
 #define MAX_INDIRECT_PAGES \
 	((MAX_INDIRECT_SEGMENTS + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
-#define INDIRECT_PAGES(_segs) \
-	((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+#define INDIRECT_PAGES(_segs) DIV_ROUND_UP(_segs, XEN_PAGES_PER_INDIRECT_FRAME)
 
 /* Not a real protocol.  Used to generate ring structs which contain
  * the elements common to all protocols only.  This way we get a
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index deb3f00..edd27e4 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -176,21 +176,24 @@ static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
 	{
 		struct blkif_sring *sring;
 		sring = (struct blkif_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.native, sring, PAGE_SIZE * nr_grefs);
+		BACK_RING_INIT(&blkif->blk_rings.native, sring,
+			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
 	case BLKIF_PROTOCOL_X86_32:
 	{
 		struct blkif_x86_32_sring *sring_x86_32;
 		sring_x86_32 = (struct blkif_x86_32_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32, PAGE_SIZE * nr_grefs);
+		BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32,
+			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
 	case BLKIF_PROTOCOL_X86_64:
 	{
 		struct blkif_x86_64_sring *sring_x86_64;
 		sring_x86_64 = (struct blkif_x86_64_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64, PAGE_SIZE * nr_grefs);
+		BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64,
+			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
 	default:
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 16/20] block/xen-blkback: Make it running on 64KB page granularity
  2015-09-07 15:33 ` Julien Grall
                   ` (24 preceding siblings ...)
  (?)
@ 2015-09-07 15:33 ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, Julien Grall,
	David Vrabel, Boris Ostrovsky, linux-arm-kernel,
	Roger Pau Monné

The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity behaving as a
block backend on a non-modified Xen.

It's only necessary to adapt the ring size and the number of request per
indirect frames. The rest of the code is relying on the grant table
code.

Note that the grant table code is allocating a Linux page per grant
which will result to waste 6OKB for every grant when Linux is using 64KB
page granularity. This could be improved by sharing the page between
multiple grants.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: "Roger Pau Monné" <roger.pau@citrix.com>
---

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

This has been tested only with a loop device. I plan to test passing
hard drive partition but I didn't yet convert the swiotlb code.

    Changes in v4:
        - Add Roger's acked-by

    Changes in v3:
        - Use DIV_ROUND_UP in INDIRECT_PAGES to avoid a line over 80
        characters
---
 drivers/block/xen-blkback/blkback.c |  5 +++--
 drivers/block/xen-blkback/common.h  | 17 +++++++++++++----
 drivers/block/xen-blkback/xenbus.c  |  9 ++++++---
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 954c002..802319a 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -961,7 +961,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 		seg[n].nsec = segments[i].last_sect -
 			segments[i].first_sect + 1;
 		seg[n].offset = (segments[i].first_sect << 9);
-		if ((segments[i].last_sect >= (PAGE_SIZE >> 9)) ||
+		if ((segments[i].last_sect >= (XEN_PAGE_SIZE >> 9)) ||
 		    (segments[i].last_sect < segments[i].first_sect)) {
 			rc = -EINVAL;
 			goto unmap;
@@ -1210,6 +1210,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
 	req_operation = req->operation == BLKIF_OP_INDIRECT ?
 			req->u.indirect.indirect_op : req->operation;
+
 	if ((req->operation == BLKIF_OP_INDIRECT) &&
 	    (req_operation != BLKIF_OP_READ) &&
 	    (req_operation != BLKIF_OP_WRITE)) {
@@ -1268,7 +1269,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 			seg[i].nsec = req->u.rw.seg[i].last_sect -
 				req->u.rw.seg[i].first_sect + 1;
 			seg[i].offset = (req->u.rw.seg[i].first_sect << 9);
-			if ((req->u.rw.seg[i].last_sect >= (PAGE_SIZE >> 9)) ||
+			if ((req->u.rw.seg[i].last_sect >= (XEN_PAGE_SIZE >> 9)) ||
 			    (req->u.rw.seg[i].last_sect <
 			     req->u.rw.seg[i].first_sect))
 				goto fail_response;
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 45a044a..68e87a0 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -39,6 +39,7 @@
 #include <asm/pgalloc.h>
 #include <asm/hypervisor.h>
 #include <xen/grant_table.h>
+#include <xen/page.h>
 #include <xen/xenbus.h>
 #include <xen/interface/io/ring.h>
 #include <xen/interface/io/blkif.h>
@@ -51,12 +52,20 @@ extern unsigned int xen_blkif_max_ring_order;
  */
 #define MAX_INDIRECT_SEGMENTS 256
 
-#define SEGS_PER_INDIRECT_FRAME \
-	(PAGE_SIZE/sizeof(struct blkif_request_segment))
+/*
+ * Xen use 4K pages. The guest may use different page size (4K or 64K)
+ * Number of Xen pages per segment
+ */
+#define XEN_PAGES_PER_SEGMENT   (PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define XEN_PAGES_PER_INDIRECT_FRAME \
+	(XEN_PAGE_SIZE/sizeof(struct blkif_request_segment))
+#define SEGS_PER_INDIRECT_FRAME	\
+	(XEN_PAGES_PER_INDIRECT_FRAME / XEN_PAGES_PER_SEGMENT)
+
 #define MAX_INDIRECT_PAGES \
 	((MAX_INDIRECT_SEGMENTS + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
-#define INDIRECT_PAGES(_segs) \
-	((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+#define INDIRECT_PAGES(_segs) DIV_ROUND_UP(_segs, XEN_PAGES_PER_INDIRECT_FRAME)
 
 /* Not a real protocol.  Used to generate ring structs which contain
  * the elements common to all protocols only.  This way we get a
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index deb3f00..edd27e4 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -176,21 +176,24 @@ static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
 	{
 		struct blkif_sring *sring;
 		sring = (struct blkif_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.native, sring, PAGE_SIZE * nr_grefs);
+		BACK_RING_INIT(&blkif->blk_rings.native, sring,
+			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
 	case BLKIF_PROTOCOL_X86_32:
 	{
 		struct blkif_x86_32_sring *sring_x86_32;
 		sring_x86_32 = (struct blkif_x86_32_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32, PAGE_SIZE * nr_grefs);
+		BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32,
+			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
 	case BLKIF_PROTOCOL_X86_64:
 	{
 		struct blkif_x86_64_sring *sring_x86_64;
 		sring_x86_64 = (struct blkif_x86_64_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64, PAGE_SIZE * nr_grefs);
+		BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64,
+			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
 	default:
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 16/20] block/xen-blkback: Make it running on 64KB page granularity
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity behaving as a
block backend on a non-modified Xen.

It's only necessary to adapt the ring size and the number of request per
indirect frames. The rest of the code is relying on the grant table
code.

Note that the grant table code is allocating a Linux page per grant
which will result to waste 6OKB for every grant when Linux is using 64KB
page granularity. This could be improved by sharing the page between
multiple grants.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: "Roger Pau Monn?" <roger.pau@citrix.com>
---

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

This has been tested only with a loop device. I plan to test passing
hard drive partition but I didn't yet convert the swiotlb code.

    Changes in v4:
        - Add Roger's acked-by

    Changes in v3:
        - Use DIV_ROUND_UP in INDIRECT_PAGES to avoid a line over 80
        characters
---
 drivers/block/xen-blkback/blkback.c |  5 +++--
 drivers/block/xen-blkback/common.h  | 17 +++++++++++++----
 drivers/block/xen-blkback/xenbus.c  |  9 ++++++---
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 954c002..802319a 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -961,7 +961,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 		seg[n].nsec = segments[i].last_sect -
 			segments[i].first_sect + 1;
 		seg[n].offset = (segments[i].first_sect << 9);
-		if ((segments[i].last_sect >= (PAGE_SIZE >> 9)) ||
+		if ((segments[i].last_sect >= (XEN_PAGE_SIZE >> 9)) ||
 		    (segments[i].last_sect < segments[i].first_sect)) {
 			rc = -EINVAL;
 			goto unmap;
@@ -1210,6 +1210,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
 	req_operation = req->operation == BLKIF_OP_INDIRECT ?
 			req->u.indirect.indirect_op : req->operation;
+
 	if ((req->operation == BLKIF_OP_INDIRECT) &&
 	    (req_operation != BLKIF_OP_READ) &&
 	    (req_operation != BLKIF_OP_WRITE)) {
@@ -1268,7 +1269,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 			seg[i].nsec = req->u.rw.seg[i].last_sect -
 				req->u.rw.seg[i].first_sect + 1;
 			seg[i].offset = (req->u.rw.seg[i].first_sect << 9);
-			if ((req->u.rw.seg[i].last_sect >= (PAGE_SIZE >> 9)) ||
+			if ((req->u.rw.seg[i].last_sect >= (XEN_PAGE_SIZE >> 9)) ||
 			    (req->u.rw.seg[i].last_sect <
 			     req->u.rw.seg[i].first_sect))
 				goto fail_response;
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 45a044a..68e87a0 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -39,6 +39,7 @@
 #include <asm/pgalloc.h>
 #include <asm/hypervisor.h>
 #include <xen/grant_table.h>
+#include <xen/page.h>
 #include <xen/xenbus.h>
 #include <xen/interface/io/ring.h>
 #include <xen/interface/io/blkif.h>
@@ -51,12 +52,20 @@ extern unsigned int xen_blkif_max_ring_order;
  */
 #define MAX_INDIRECT_SEGMENTS 256
 
-#define SEGS_PER_INDIRECT_FRAME \
-	(PAGE_SIZE/sizeof(struct blkif_request_segment))
+/*
+ * Xen use 4K pages. The guest may use different page size (4K or 64K)
+ * Number of Xen pages per segment
+ */
+#define XEN_PAGES_PER_SEGMENT   (PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define XEN_PAGES_PER_INDIRECT_FRAME \
+	(XEN_PAGE_SIZE/sizeof(struct blkif_request_segment))
+#define SEGS_PER_INDIRECT_FRAME	\
+	(XEN_PAGES_PER_INDIRECT_FRAME / XEN_PAGES_PER_SEGMENT)
+
 #define MAX_INDIRECT_PAGES \
 	((MAX_INDIRECT_SEGMENTS + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
-#define INDIRECT_PAGES(_segs) \
-	((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+#define INDIRECT_PAGES(_segs) DIV_ROUND_UP(_segs, XEN_PAGES_PER_INDIRECT_FRAME)
 
 /* Not a real protocol.  Used to generate ring structs which contain
  * the elements common to all protocols only.  This way we get a
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index deb3f00..edd27e4 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -176,21 +176,24 @@ static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
 	{
 		struct blkif_sring *sring;
 		sring = (struct blkif_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.native, sring, PAGE_SIZE * nr_grefs);
+		BACK_RING_INIT(&blkif->blk_rings.native, sring,
+			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
 	case BLKIF_PROTOCOL_X86_32:
 	{
 		struct blkif_x86_32_sring *sring_x86_32;
 		sring_x86_32 = (struct blkif_x86_32_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32, PAGE_SIZE * nr_grefs);
+		BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32,
+			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
 	case BLKIF_PROTOCOL_X86_64:
 	{
 		struct blkif_x86_64_sring *sring_x86_64;
 		sring_x86_64 = (struct blkif_x86_64_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64, PAGE_SIZE * nr_grefs);
+		BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64,
+			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
 	default:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 17/20] net/xen-netfront: Make it running on 64KB page granularity
  2015-09-07 15:33 ` Julien Grall
  (?)
  (?)
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, David Vrabel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, netdev

The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using network
device on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Note that we allocate a Linux page for each rx skb but only the first
4KB is used. We may improve the memory usage by extending the size of
the rx skb.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: netdev@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a Linux
using 64KB pages on a non-modified Xen.

Tested with workload such as ping, ssh, wget, git... I would happy if
someone give details how to test all the path.

    Changes in v4:
        - s/gnttab_one_grant/gnttab_for_one_grant/ based on the new naming
        - Add David's reviewed-by

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - s/mfn/gfn/ base on the new naming
        - xennet_tx_setup_grant was calling itself resulting an
        guest stall when using iperf.
        - The grant callback doesn't allow anymore to change the len
        (wasn't used here)
        - gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
        - gnttab_page_grant_foreign_ref has been renamed to
        gnttab_foreach_grant_foreign_ref_one

    Changes in v2:
        - Use gnttab_foreach_grant to split a Linux page in grant
        - Fix count slots
---
 drivers/net/xen-netfront.c | 122 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 86 insertions(+), 36 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 47f791e..17b1013 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -74,8 +74,8 @@ struct netfront_cb {
 
 #define GRANT_INVALID_REF	0
 
-#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 /* Minimum number of Rx slots (includes slot for GSO metadata). */
 #define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1)
@@ -291,7 +291,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		struct sk_buff *skb;
 		unsigned short id;
 		grant_ref_t ref;
-		unsigned long gfn;
+		struct page *page;
 		struct xen_netif_rx_request *req;
 
 		skb = xennet_alloc_one_rx_buffer(queue);
@@ -307,14 +307,13 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		BUG_ON((signed short)ref < 0);
 		queue->grant_rx_ref[id] = ref;
 
-		gfn = xen_page_to_gfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+		page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
 
 		req = RING_GET_REQUEST(&queue->rx, req_prod);
-		gnttab_grant_foreign_access_ref(ref,
-						queue->info->xbdev->otherend_id,
-						gfn,
-						0);
-
+		gnttab_page_grant_foreign_access_ref_one(ref,
+							 queue->info->xbdev->otherend_id,
+							 page,
+							 0);
 		req->id = id;
 		req->gref = ref;
 	}
@@ -415,25 +414,33 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
 	xennet_maybe_wake_tx(queue);
 }
 
-static struct xen_netif_tx_request *xennet_make_one_txreq(
-	struct netfront_queue *queue, struct sk_buff *skb,
-	struct page *page, unsigned int offset, unsigned int len)
+struct xennet_gnttab_make_txreq {
+	struct netfront_queue *queue;
+	struct sk_buff *skb;
+	struct page *page;
+	struct xen_netif_tx_request *tx; /* Last request */
+	unsigned int size;
+};
+
+static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,
+				  unsigned int len, void *data)
 {
+	struct xennet_gnttab_make_txreq *info = data;
 	unsigned int id;
 	struct xen_netif_tx_request *tx;
 	grant_ref_t ref;
-
-	len = min_t(unsigned int, PAGE_SIZE - offset, len);
+	/* convenient aliases */
+	struct page *page = info->page;
+	struct netfront_queue *queue = info->queue;
+	struct sk_buff *skb = info->skb;
 
 	id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
 	tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
 	ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
 	BUG_ON((signed short)ref < 0);
 
-	gnttab_grant_foreign_access_ref(ref,
-					queue->info->xbdev->otherend_id,
-					xen_page_to_gfn(page),
-					GNTMAP_readonly);
+	gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
+					gfn, GNTMAP_readonly);
 
 	queue->tx_skbs[id].skb = skb;
 	queue->grant_tx_page[id] = page;
@@ -445,7 +452,34 @@ static struct xen_netif_tx_request *xennet_make_one_txreq(
 	tx->size = len;
 	tx->flags = 0;
 
-	return tx;
+	info->tx = tx;
+	info->size += tx->size;
+}
+
+static struct xen_netif_tx_request *xennet_make_first_txreq(
+	struct netfront_queue *queue, struct sk_buff *skb,
+	struct page *page, unsigned int offset, unsigned int len)
+{
+	struct xennet_gnttab_make_txreq info = {
+		.queue = queue,
+		.skb = skb,
+		.page = page,
+		.size = 0,
+	};
+
+	gnttab_for_one_grant(page, offset, len, xennet_tx_setup_grant, &info);
+
+	return info.tx;
+}
+
+static void xennet_make_one_txreq(unsigned long gfn, unsigned int offset,
+				  unsigned int len, void *data)
+{
+	struct xennet_gnttab_make_txreq *info = data;
+
+	info->tx->flags |= XEN_NETTXF_more_data;
+	skb_get(info->skb);
+	xennet_tx_setup_grant(gfn, offset, len, data);
 }
 
 static struct xen_netif_tx_request *xennet_make_txreqs(
@@ -453,20 +487,30 @@ static struct xen_netif_tx_request *xennet_make_txreqs(
 	struct sk_buff *skb, struct page *page,
 	unsigned int offset, unsigned int len)
 {
+	struct xennet_gnttab_make_txreq info = {
+		.queue = queue,
+		.skb = skb,
+		.tx = tx,
+	};
+
 	/* Skip unused frames from start of page */
 	page += offset >> PAGE_SHIFT;
 	offset &= ~PAGE_MASK;
 
 	while (len) {
-		tx->flags |= XEN_NETTXF_more_data;
-		tx = xennet_make_one_txreq(queue, skb_get(skb),
-					   page, offset, len);
+		info.page = page;
+		info.size = 0;
+
+		gnttab_foreach_grant_in_range(page, offset, len,
+					      xennet_make_one_txreq,
+					      &info);
+
 		page++;
 		offset = 0;
-		len -= tx->size;
+		len -= info.size;
 	}
 
-	return tx;
+	return info.tx;
 }
 
 /*
@@ -476,9 +520,10 @@ static struct xen_netif_tx_request *xennet_make_txreqs(
 static int xennet_count_skb_slots(struct sk_buff *skb)
 {
 	int i, frags = skb_shinfo(skb)->nr_frags;
-	int pages;
+	int slots;
 
-	pages = PFN_UP(offset_in_page(skb->data) + skb_headlen(skb));
+	slots = gnttab_count_grant(offset_in_page(skb->data),
+				   skb_headlen(skb));
 
 	for (i = 0; i < frags; i++) {
 		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
@@ -488,10 +533,10 @@ static int xennet_count_skb_slots(struct sk_buff *skb)
 		/* Skip unused frames from start of page */
 		offset &= ~PAGE_MASK;
 
-		pages += PFN_UP(offset + size);
+		slots += gnttab_count_grant(offset, size);
 	}
 
-	return pages;
+	return slots;
 }
 
 static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb,
@@ -512,6 +557,8 @@ static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb,
 	return queue_idx;
 }
 
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct netfront_info *np = netdev_priv(dev);
@@ -546,7 +593,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	slots = xennet_count_skb_slots(skb);
-	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+	if (unlikely(slots > MAX_XEN_SKB_FRAGS + 1)) {
 		net_dbg_ratelimited("xennet: skb rides the rocket: %d slots, %d bytes\n",
 				    slots, skb->len);
 		if (skb_linearize(skb))
@@ -567,10 +614,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	/* First request for the linear area. */
-	first_tx = tx = xennet_make_one_txreq(queue, skb,
-					      page, offset, len);
-	page++;
-	offset = 0;
+	first_tx = tx = xennet_make_first_txreq(queue, skb,
+						page, offset, len);
+	offset += tx->size;
+	if (offset == PAGE_SIZE) {
+		page++;
+		offset = 0;
+	}
 	len -= tx->size;
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL)
@@ -732,7 +782,7 @@ static int xennet_get_responses(struct netfront_queue *queue,
 
 	for (;;) {
 		if (unlikely(rx->status < 0 ||
-			     rx->offset + rx->status > PAGE_SIZE)) {
+			     rx->offset + rx->status > XEN_PAGE_SIZE)) {
 			if (net_ratelimit())
 				dev_warn(dev, "rx->offset: %u, size: %d\n",
 					 rx->offset, rx->status);
@@ -1495,7 +1545,7 @@ static int setup_netfront(struct xenbus_device *dev,
 		goto fail;
 	}
 	SHARED_RING_INIT(txs);
-	FRONT_RING_INIT(&queue->tx, txs, PAGE_SIZE);
+	FRONT_RING_INIT(&queue->tx, txs, XEN_PAGE_SIZE);
 
 	err = xenbus_grant_ring(dev, txs, 1, &gref);
 	if (err < 0)
@@ -1509,7 +1559,7 @@ static int setup_netfront(struct xenbus_device *dev,
 		goto alloc_rx_ring_fail;
 	}
 	SHARED_RING_INIT(rxs);
-	FRONT_RING_INIT(&queue->rx, rxs, PAGE_SIZE);
+	FRONT_RING_INIT(&queue->rx, rxs, XEN_PAGE_SIZE);
 
 	err = xenbus_grant_ring(dev, rxs, 1, &gref);
 	if (err < 0)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 17/20] net/xen-netfront: Make it running on 64KB page granularity
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, netdev, linux-kernel,
	Julien Grall, David Vrabel, Boris Ostrovsky, linux-arm-kernel

The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using network
device on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Note that we allocate a Linux page for each rx skb but only the first
4KB is used. We may improve the memory usage by extending the size of
the rx skb.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: netdev@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a Linux
using 64KB pages on a non-modified Xen.

Tested with workload such as ping, ssh, wget, git... I would happy if
someone give details how to test all the path.

    Changes in v4:
        - s/gnttab_one_grant/gnttab_for_one_grant/ based on the new naming
        - Add David's reviewed-by

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - s/mfn/gfn/ base on the new naming
        - xennet_tx_setup_grant was calling itself resulting an
        guest stall when using iperf.
        - The grant callback doesn't allow anymore to change the len
        (wasn't used here)
        - gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
        - gnttab_page_grant_foreign_ref has been renamed to
        gnttab_foreach_grant_foreign_ref_one

    Changes in v2:
        - Use gnttab_foreach_grant to split a Linux page in grant
        - Fix count slots
---
 drivers/net/xen-netfront.c | 122 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 86 insertions(+), 36 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 47f791e..17b1013 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -74,8 +74,8 @@ struct netfront_cb {
 
 #define GRANT_INVALID_REF	0
 
-#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 /* Minimum number of Rx slots (includes slot for GSO metadata). */
 #define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1)
@@ -291,7 +291,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		struct sk_buff *skb;
 		unsigned short id;
 		grant_ref_t ref;
-		unsigned long gfn;
+		struct page *page;
 		struct xen_netif_rx_request *req;
 
 		skb = xennet_alloc_one_rx_buffer(queue);
@@ -307,14 +307,13 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		BUG_ON((signed short)ref < 0);
 		queue->grant_rx_ref[id] = ref;
 
-		gfn = xen_page_to_gfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+		page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
 
 		req = RING_GET_REQUEST(&queue->rx, req_prod);
-		gnttab_grant_foreign_access_ref(ref,
-						queue->info->xbdev->otherend_id,
-						gfn,
-						0);
-
+		gnttab_page_grant_foreign_access_ref_one(ref,
+							 queue->info->xbdev->otherend_id,
+							 page,
+							 0);
 		req->id = id;
 		req->gref = ref;
 	}
@@ -415,25 +414,33 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
 	xennet_maybe_wake_tx(queue);
 }
 
-static struct xen_netif_tx_request *xennet_make_one_txreq(
-	struct netfront_queue *queue, struct sk_buff *skb,
-	struct page *page, unsigned int offset, unsigned int len)
+struct xennet_gnttab_make_txreq {
+	struct netfront_queue *queue;
+	struct sk_buff *skb;
+	struct page *page;
+	struct xen_netif_tx_request *tx; /* Last request */
+	unsigned int size;
+};
+
+static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,
+				  unsigned int len, void *data)
 {
+	struct xennet_gnttab_make_txreq *info = data;
 	unsigned int id;
 	struct xen_netif_tx_request *tx;
 	grant_ref_t ref;
-
-	len = min_t(unsigned int, PAGE_SIZE - offset, len);
+	/* convenient aliases */
+	struct page *page = info->page;
+	struct netfront_queue *queue = info->queue;
+	struct sk_buff *skb = info->skb;
 
 	id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
 	tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
 	ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
 	BUG_ON((signed short)ref < 0);
 
-	gnttab_grant_foreign_access_ref(ref,
-					queue->info->xbdev->otherend_id,
-					xen_page_to_gfn(page),
-					GNTMAP_readonly);
+	gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
+					gfn, GNTMAP_readonly);
 
 	queue->tx_skbs[id].skb = skb;
 	queue->grant_tx_page[id] = page;
@@ -445,7 +452,34 @@ static struct xen_netif_tx_request *xennet_make_one_txreq(
 	tx->size = len;
 	tx->flags = 0;
 
-	return tx;
+	info->tx = tx;
+	info->size += tx->size;
+}
+
+static struct xen_netif_tx_request *xennet_make_first_txreq(
+	struct netfront_queue *queue, struct sk_buff *skb,
+	struct page *page, unsigned int offset, unsigned int len)
+{
+	struct xennet_gnttab_make_txreq info = {
+		.queue = queue,
+		.skb = skb,
+		.page = page,
+		.size = 0,
+	};
+
+	gnttab_for_one_grant(page, offset, len, xennet_tx_setup_grant, &info);
+
+	return info.tx;
+}
+
+static void xennet_make_one_txreq(unsigned long gfn, unsigned int offset,
+				  unsigned int len, void *data)
+{
+	struct xennet_gnttab_make_txreq *info = data;
+
+	info->tx->flags |= XEN_NETTXF_more_data;
+	skb_get(info->skb);
+	xennet_tx_setup_grant(gfn, offset, len, data);
 }
 
 static struct xen_netif_tx_request *xennet_make_txreqs(
@@ -453,20 +487,30 @@ static struct xen_netif_tx_request *xennet_make_txreqs(
 	struct sk_buff *skb, struct page *page,
 	unsigned int offset, unsigned int len)
 {
+	struct xennet_gnttab_make_txreq info = {
+		.queue = queue,
+		.skb = skb,
+		.tx = tx,
+	};
+
 	/* Skip unused frames from start of page */
 	page += offset >> PAGE_SHIFT;
 	offset &= ~PAGE_MASK;
 
 	while (len) {
-		tx->flags |= XEN_NETTXF_more_data;
-		tx = xennet_make_one_txreq(queue, skb_get(skb),
-					   page, offset, len);
+		info.page = page;
+		info.size = 0;
+
+		gnttab_foreach_grant_in_range(page, offset, len,
+					      xennet_make_one_txreq,
+					      &info);
+
 		page++;
 		offset = 0;
-		len -= tx->size;
+		len -= info.size;
 	}
 
-	return tx;
+	return info.tx;
 }
 
 /*
@@ -476,9 +520,10 @@ static struct xen_netif_tx_request *xennet_make_txreqs(
 static int xennet_count_skb_slots(struct sk_buff *skb)
 {
 	int i, frags = skb_shinfo(skb)->nr_frags;
-	int pages;
+	int slots;
 
-	pages = PFN_UP(offset_in_page(skb->data) + skb_headlen(skb));
+	slots = gnttab_count_grant(offset_in_page(skb->data),
+				   skb_headlen(skb));
 
 	for (i = 0; i < frags; i++) {
 		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
@@ -488,10 +533,10 @@ static int xennet_count_skb_slots(struct sk_buff *skb)
 		/* Skip unused frames from start of page */
 		offset &= ~PAGE_MASK;
 
-		pages += PFN_UP(offset + size);
+		slots += gnttab_count_grant(offset, size);
 	}
 
-	return pages;
+	return slots;
 }
 
 static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb,
@@ -512,6 +557,8 @@ static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb,
 	return queue_idx;
 }
 
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct netfront_info *np = netdev_priv(dev);
@@ -546,7 +593,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	slots = xennet_count_skb_slots(skb);
-	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+	if (unlikely(slots > MAX_XEN_SKB_FRAGS + 1)) {
 		net_dbg_ratelimited("xennet: skb rides the rocket: %d slots, %d bytes\n",
 				    slots, skb->len);
 		if (skb_linearize(skb))
@@ -567,10 +614,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	/* First request for the linear area. */
-	first_tx = tx = xennet_make_one_txreq(queue, skb,
-					      page, offset, len);
-	page++;
-	offset = 0;
+	first_tx = tx = xennet_make_first_txreq(queue, skb,
+						page, offset, len);
+	offset += tx->size;
+	if (offset == PAGE_SIZE) {
+		page++;
+		offset = 0;
+	}
 	len -= tx->size;
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL)
@@ -732,7 +782,7 @@ static int xennet_get_responses(struct netfront_queue *queue,
 
 	for (;;) {
 		if (unlikely(rx->status < 0 ||
-			     rx->offset + rx->status > PAGE_SIZE)) {
+			     rx->offset + rx->status > XEN_PAGE_SIZE)) {
 			if (net_ratelimit())
 				dev_warn(dev, "rx->offset: %u, size: %d\n",
 					 rx->offset, rx->status);
@@ -1495,7 +1545,7 @@ static int setup_netfront(struct xenbus_device *dev,
 		goto fail;
 	}
 	SHARED_RING_INIT(txs);
-	FRONT_RING_INIT(&queue->tx, txs, PAGE_SIZE);
+	FRONT_RING_INIT(&queue->tx, txs, XEN_PAGE_SIZE);
 
 	err = xenbus_grant_ring(dev, txs, 1, &gref);
 	if (err < 0)
@@ -1509,7 +1559,7 @@ static int setup_netfront(struct xenbus_device *dev,
 		goto alloc_rx_ring_fail;
 	}
 	SHARED_RING_INIT(rxs);
-	FRONT_RING_INIT(&queue->rx, rxs, PAGE_SIZE);
+	FRONT_RING_INIT(&queue->rx, rxs, XEN_PAGE_SIZE);
 
 	err = xenbus_grant_ring(dev, rxs, 1, &gref);
 	if (err < 0)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 17/20] net/xen-netfront: Make it running on 64KB page granularity
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, netdev, linux-kernel,
	Julien Grall, David Vrabel, Boris Ostrovsky, linux-arm-kernel

The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using network
device on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Note that we allocate a Linux page for each rx skb but only the first
4KB is used. We may improve the memory usage by extending the size of
the rx skb.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: netdev@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a Linux
using 64KB pages on a non-modified Xen.

Tested with workload such as ping, ssh, wget, git... I would happy if
someone give details how to test all the path.

    Changes in v4:
        - s/gnttab_one_grant/gnttab_for_one_grant/ based on the new naming
        - Add David's reviewed-by

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - s/mfn/gfn/ base on the new naming
        - xennet_tx_setup_grant was calling itself resulting an
        guest stall when using iperf.
        - The grant callback doesn't allow anymore to change the len
        (wasn't used here)
        - gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
        - gnttab_page_grant_foreign_ref has been renamed to
        gnttab_foreach_grant_foreign_ref_one

    Changes in v2:
        - Use gnttab_foreach_grant to split a Linux page in grant
        - Fix count slots
---
 drivers/net/xen-netfront.c | 122 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 86 insertions(+), 36 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 47f791e..17b1013 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -74,8 +74,8 @@ struct netfront_cb {
 
 #define GRANT_INVALID_REF	0
 
-#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 /* Minimum number of Rx slots (includes slot for GSO metadata). */
 #define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1)
@@ -291,7 +291,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		struct sk_buff *skb;
 		unsigned short id;
 		grant_ref_t ref;
-		unsigned long gfn;
+		struct page *page;
 		struct xen_netif_rx_request *req;
 
 		skb = xennet_alloc_one_rx_buffer(queue);
@@ -307,14 +307,13 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		BUG_ON((signed short)ref < 0);
 		queue->grant_rx_ref[id] = ref;
 
-		gfn = xen_page_to_gfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+		page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
 
 		req = RING_GET_REQUEST(&queue->rx, req_prod);
-		gnttab_grant_foreign_access_ref(ref,
-						queue->info->xbdev->otherend_id,
-						gfn,
-						0);
-
+		gnttab_page_grant_foreign_access_ref_one(ref,
+							 queue->info->xbdev->otherend_id,
+							 page,
+							 0);
 		req->id = id;
 		req->gref = ref;
 	}
@@ -415,25 +414,33 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
 	xennet_maybe_wake_tx(queue);
 }
 
-static struct xen_netif_tx_request *xennet_make_one_txreq(
-	struct netfront_queue *queue, struct sk_buff *skb,
-	struct page *page, unsigned int offset, unsigned int len)
+struct xennet_gnttab_make_txreq {
+	struct netfront_queue *queue;
+	struct sk_buff *skb;
+	struct page *page;
+	struct xen_netif_tx_request *tx; /* Last request */
+	unsigned int size;
+};
+
+static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,
+				  unsigned int len, void *data)
 {
+	struct xennet_gnttab_make_txreq *info = data;
 	unsigned int id;
 	struct xen_netif_tx_request *tx;
 	grant_ref_t ref;
-
-	len = min_t(unsigned int, PAGE_SIZE - offset, len);
+	/* convenient aliases */
+	struct page *page = info->page;
+	struct netfront_queue *queue = info->queue;
+	struct sk_buff *skb = info->skb;
 
 	id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
 	tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
 	ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
 	BUG_ON((signed short)ref < 0);
 
-	gnttab_grant_foreign_access_ref(ref,
-					queue->info->xbdev->otherend_id,
-					xen_page_to_gfn(page),
-					GNTMAP_readonly);
+	gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
+					gfn, GNTMAP_readonly);
 
 	queue->tx_skbs[id].skb = skb;
 	queue->grant_tx_page[id] = page;
@@ -445,7 +452,34 @@ static struct xen_netif_tx_request *xennet_make_one_txreq(
 	tx->size = len;
 	tx->flags = 0;
 
-	return tx;
+	info->tx = tx;
+	info->size += tx->size;
+}
+
+static struct xen_netif_tx_request *xennet_make_first_txreq(
+	struct netfront_queue *queue, struct sk_buff *skb,
+	struct page *page, unsigned int offset, unsigned int len)
+{
+	struct xennet_gnttab_make_txreq info = {
+		.queue = queue,
+		.skb = skb,
+		.page = page,
+		.size = 0,
+	};
+
+	gnttab_for_one_grant(page, offset, len, xennet_tx_setup_grant, &info);
+
+	return info.tx;
+}
+
+static void xennet_make_one_txreq(unsigned long gfn, unsigned int offset,
+				  unsigned int len, void *data)
+{
+	struct xennet_gnttab_make_txreq *info = data;
+
+	info->tx->flags |= XEN_NETTXF_more_data;
+	skb_get(info->skb);
+	xennet_tx_setup_grant(gfn, offset, len, data);
 }
 
 static struct xen_netif_tx_request *xennet_make_txreqs(
@@ -453,20 +487,30 @@ static struct xen_netif_tx_request *xennet_make_txreqs(
 	struct sk_buff *skb, struct page *page,
 	unsigned int offset, unsigned int len)
 {
+	struct xennet_gnttab_make_txreq info = {
+		.queue = queue,
+		.skb = skb,
+		.tx = tx,
+	};
+
 	/* Skip unused frames from start of page */
 	page += offset >> PAGE_SHIFT;
 	offset &= ~PAGE_MASK;
 
 	while (len) {
-		tx->flags |= XEN_NETTXF_more_data;
-		tx = xennet_make_one_txreq(queue, skb_get(skb),
-					   page, offset, len);
+		info.page = page;
+		info.size = 0;
+
+		gnttab_foreach_grant_in_range(page, offset, len,
+					      xennet_make_one_txreq,
+					      &info);
+
 		page++;
 		offset = 0;
-		len -= tx->size;
+		len -= info.size;
 	}
 
-	return tx;
+	return info.tx;
 }
 
 /*
@@ -476,9 +520,10 @@ static struct xen_netif_tx_request *xennet_make_txreqs(
 static int xennet_count_skb_slots(struct sk_buff *skb)
 {
 	int i, frags = skb_shinfo(skb)->nr_frags;
-	int pages;
+	int slots;
 
-	pages = PFN_UP(offset_in_page(skb->data) + skb_headlen(skb));
+	slots = gnttab_count_grant(offset_in_page(skb->data),
+				   skb_headlen(skb));
 
 	for (i = 0; i < frags; i++) {
 		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
@@ -488,10 +533,10 @@ static int xennet_count_skb_slots(struct sk_buff *skb)
 		/* Skip unused frames from start of page */
 		offset &= ~PAGE_MASK;
 
-		pages += PFN_UP(offset + size);
+		slots += gnttab_count_grant(offset, size);
 	}
 
-	return pages;
+	return slots;
 }
 
 static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb,
@@ -512,6 +557,8 @@ static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb,
 	return queue_idx;
 }
 
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct netfront_info *np = netdev_priv(dev);
@@ -546,7 +593,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	slots = xennet_count_skb_slots(skb);
-	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+	if (unlikely(slots > MAX_XEN_SKB_FRAGS + 1)) {
 		net_dbg_ratelimited("xennet: skb rides the rocket: %d slots, %d bytes\n",
 				    slots, skb->len);
 		if (skb_linearize(skb))
@@ -567,10 +614,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	/* First request for the linear area. */
-	first_tx = tx = xennet_make_one_txreq(queue, skb,
-					      page, offset, len);
-	page++;
-	offset = 0;
+	first_tx = tx = xennet_make_first_txreq(queue, skb,
+						page, offset, len);
+	offset += tx->size;
+	if (offset == PAGE_SIZE) {
+		page++;
+		offset = 0;
+	}
 	len -= tx->size;
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL)
@@ -732,7 +782,7 @@ static int xennet_get_responses(struct netfront_queue *queue,
 
 	for (;;) {
 		if (unlikely(rx->status < 0 ||
-			     rx->offset + rx->status > PAGE_SIZE)) {
+			     rx->offset + rx->status > XEN_PAGE_SIZE)) {
 			if (net_ratelimit())
 				dev_warn(dev, "rx->offset: %u, size: %d\n",
 					 rx->offset, rx->status);
@@ -1495,7 +1545,7 @@ static int setup_netfront(struct xenbus_device *dev,
 		goto fail;
 	}
 	SHARED_RING_INIT(txs);
-	FRONT_RING_INIT(&queue->tx, txs, PAGE_SIZE);
+	FRONT_RING_INIT(&queue->tx, txs, XEN_PAGE_SIZE);
 
 	err = xenbus_grant_ring(dev, txs, 1, &gref);
 	if (err < 0)
@@ -1509,7 +1559,7 @@ static int setup_netfront(struct xenbus_device *dev,
 		goto alloc_rx_ring_fail;
 	}
 	SHARED_RING_INIT(rxs);
-	FRONT_RING_INIT(&queue->rx, rxs, PAGE_SIZE);
+	FRONT_RING_INIT(&queue->rx, rxs, XEN_PAGE_SIZE);
 
 	err = xenbus_grant_ring(dev, rxs, 1, &gref);
 	if (err < 0)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 17/20] net/xen-netfront: Make it running on 64KB page granularity
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using network
device on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Note that we allocate a Linux page for each rx skb but only the first
4KB is used. We may improve the memory usage by extending the size of
the rx skb.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: netdev at vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a Linux
using 64KB pages on a non-modified Xen.

Tested with workload such as ping, ssh, wget, git... I would happy if
someone give details how to test all the path.

    Changes in v4:
        - s/gnttab_one_grant/gnttab_for_one_grant/ based on the new naming
        - Add David's reviewed-by

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - s/mfn/gfn/ base on the new naming
        - xennet_tx_setup_grant was calling itself resulting an
        guest stall when using iperf.
        - The grant callback doesn't allow anymore to change the len
        (wasn't used here)
        - gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
        - gnttab_page_grant_foreign_ref has been renamed to
        gnttab_foreach_grant_foreign_ref_one

    Changes in v2:
        - Use gnttab_foreach_grant to split a Linux page in grant
        - Fix count slots
---
 drivers/net/xen-netfront.c | 122 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 86 insertions(+), 36 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 47f791e..17b1013 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -74,8 +74,8 @@ struct netfront_cb {
 
 #define GRANT_INVALID_REF	0
 
-#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 /* Minimum number of Rx slots (includes slot for GSO metadata). */
 #define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1)
@@ -291,7 +291,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		struct sk_buff *skb;
 		unsigned short id;
 		grant_ref_t ref;
-		unsigned long gfn;
+		struct page *page;
 		struct xen_netif_rx_request *req;
 
 		skb = xennet_alloc_one_rx_buffer(queue);
@@ -307,14 +307,13 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		BUG_ON((signed short)ref < 0);
 		queue->grant_rx_ref[id] = ref;
 
-		gfn = xen_page_to_gfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+		page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
 
 		req = RING_GET_REQUEST(&queue->rx, req_prod);
-		gnttab_grant_foreign_access_ref(ref,
-						queue->info->xbdev->otherend_id,
-						gfn,
-						0);
-
+		gnttab_page_grant_foreign_access_ref_one(ref,
+							 queue->info->xbdev->otherend_id,
+							 page,
+							 0);
 		req->id = id;
 		req->gref = ref;
 	}
@@ -415,25 +414,33 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
 	xennet_maybe_wake_tx(queue);
 }
 
-static struct xen_netif_tx_request *xennet_make_one_txreq(
-	struct netfront_queue *queue, struct sk_buff *skb,
-	struct page *page, unsigned int offset, unsigned int len)
+struct xennet_gnttab_make_txreq {
+	struct netfront_queue *queue;
+	struct sk_buff *skb;
+	struct page *page;
+	struct xen_netif_tx_request *tx; /* Last request */
+	unsigned int size;
+};
+
+static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,
+				  unsigned int len, void *data)
 {
+	struct xennet_gnttab_make_txreq *info = data;
 	unsigned int id;
 	struct xen_netif_tx_request *tx;
 	grant_ref_t ref;
-
-	len = min_t(unsigned int, PAGE_SIZE - offset, len);
+	/* convenient aliases */
+	struct page *page = info->page;
+	struct netfront_queue *queue = info->queue;
+	struct sk_buff *skb = info->skb;
 
 	id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
 	tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
 	ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
 	BUG_ON((signed short)ref < 0);
 
-	gnttab_grant_foreign_access_ref(ref,
-					queue->info->xbdev->otherend_id,
-					xen_page_to_gfn(page),
-					GNTMAP_readonly);
+	gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
+					gfn, GNTMAP_readonly);
 
 	queue->tx_skbs[id].skb = skb;
 	queue->grant_tx_page[id] = page;
@@ -445,7 +452,34 @@ static struct xen_netif_tx_request *xennet_make_one_txreq(
 	tx->size = len;
 	tx->flags = 0;
 
-	return tx;
+	info->tx = tx;
+	info->size += tx->size;
+}
+
+static struct xen_netif_tx_request *xennet_make_first_txreq(
+	struct netfront_queue *queue, struct sk_buff *skb,
+	struct page *page, unsigned int offset, unsigned int len)
+{
+	struct xennet_gnttab_make_txreq info = {
+		.queue = queue,
+		.skb = skb,
+		.page = page,
+		.size = 0,
+	};
+
+	gnttab_for_one_grant(page, offset, len, xennet_tx_setup_grant, &info);
+
+	return info.tx;
+}
+
+static void xennet_make_one_txreq(unsigned long gfn, unsigned int offset,
+				  unsigned int len, void *data)
+{
+	struct xennet_gnttab_make_txreq *info = data;
+
+	info->tx->flags |= XEN_NETTXF_more_data;
+	skb_get(info->skb);
+	xennet_tx_setup_grant(gfn, offset, len, data);
 }
 
 static struct xen_netif_tx_request *xennet_make_txreqs(
@@ -453,20 +487,30 @@ static struct xen_netif_tx_request *xennet_make_txreqs(
 	struct sk_buff *skb, struct page *page,
 	unsigned int offset, unsigned int len)
 {
+	struct xennet_gnttab_make_txreq info = {
+		.queue = queue,
+		.skb = skb,
+		.tx = tx,
+	};
+
 	/* Skip unused frames from start of page */
 	page += offset >> PAGE_SHIFT;
 	offset &= ~PAGE_MASK;
 
 	while (len) {
-		tx->flags |= XEN_NETTXF_more_data;
-		tx = xennet_make_one_txreq(queue, skb_get(skb),
-					   page, offset, len);
+		info.page = page;
+		info.size = 0;
+
+		gnttab_foreach_grant_in_range(page, offset, len,
+					      xennet_make_one_txreq,
+					      &info);
+
 		page++;
 		offset = 0;
-		len -= tx->size;
+		len -= info.size;
 	}
 
-	return tx;
+	return info.tx;
 }
 
 /*
@@ -476,9 +520,10 @@ static struct xen_netif_tx_request *xennet_make_txreqs(
 static int xennet_count_skb_slots(struct sk_buff *skb)
 {
 	int i, frags = skb_shinfo(skb)->nr_frags;
-	int pages;
+	int slots;
 
-	pages = PFN_UP(offset_in_page(skb->data) + skb_headlen(skb));
+	slots = gnttab_count_grant(offset_in_page(skb->data),
+				   skb_headlen(skb));
 
 	for (i = 0; i < frags; i++) {
 		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
@@ -488,10 +533,10 @@ static int xennet_count_skb_slots(struct sk_buff *skb)
 		/* Skip unused frames from start of page */
 		offset &= ~PAGE_MASK;
 
-		pages += PFN_UP(offset + size);
+		slots += gnttab_count_grant(offset, size);
 	}
 
-	return pages;
+	return slots;
 }
 
 static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb,
@@ -512,6 +557,8 @@ static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb,
 	return queue_idx;
 }
 
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct netfront_info *np = netdev_priv(dev);
@@ -546,7 +593,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	slots = xennet_count_skb_slots(skb);
-	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+	if (unlikely(slots > MAX_XEN_SKB_FRAGS + 1)) {
 		net_dbg_ratelimited("xennet: skb rides the rocket: %d slots, %d bytes\n",
 				    slots, skb->len);
 		if (skb_linearize(skb))
@@ -567,10 +614,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	/* First request for the linear area. */
-	first_tx = tx = xennet_make_one_txreq(queue, skb,
-					      page, offset, len);
-	page++;
-	offset = 0;
+	first_tx = tx = xennet_make_first_txreq(queue, skb,
+						page, offset, len);
+	offset += tx->size;
+	if (offset == PAGE_SIZE) {
+		page++;
+		offset = 0;
+	}
 	len -= tx->size;
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL)
@@ -732,7 +782,7 @@ static int xennet_get_responses(struct netfront_queue *queue,
 
 	for (;;) {
 		if (unlikely(rx->status < 0 ||
-			     rx->offset + rx->status > PAGE_SIZE)) {
+			     rx->offset + rx->status > XEN_PAGE_SIZE)) {
 			if (net_ratelimit())
 				dev_warn(dev, "rx->offset: %u, size: %d\n",
 					 rx->offset, rx->status);
@@ -1495,7 +1545,7 @@ static int setup_netfront(struct xenbus_device *dev,
 		goto fail;
 	}
 	SHARED_RING_INIT(txs);
-	FRONT_RING_INIT(&queue->tx, txs, PAGE_SIZE);
+	FRONT_RING_INIT(&queue->tx, txs, XEN_PAGE_SIZE);
 
 	err = xenbus_grant_ring(dev, txs, 1, &gref);
 	if (err < 0)
@@ -1509,7 +1559,7 @@ static int setup_netfront(struct xenbus_device *dev,
 		goto alloc_rx_ring_fail;
 	}
 	SHARED_RING_INIT(rxs);
-	FRONT_RING_INIT(&queue->rx, rxs, PAGE_SIZE);
+	FRONT_RING_INIT(&queue->rx, rxs, XEN_PAGE_SIZE);
 
 	err = xenbus_grant_ring(dev, rxs, 1, &gref);
 	if (err < 0)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
  2015-09-07 15:33 ` Julien Grall
  (?)
  (?)
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, Wei Liu, netdev

The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity working as a
network backend on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Signed-off-by: Julien Grall <julien.grall@citrix.com>

---
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: netdev@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

Note that I haven't add a comment why the offset is 0 after the first
iteration. See [1] for more details.

[1] https://lkml.org/lkml/2015/8/10/456

    Changes in v4:
        - Add a comment to explain how we compute MAX_XEN_SKB_FRAGS

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - s/mfn/gfn/ based on the new naming
        - gnttab_foreach_grant has been renamed to gnttab_forach_grant_in_range
        - The grant callback doesn't allow anymore to use less data. An
        helpers has been added in netback to handle this.

    Changes in v2:
        - Correctly set MAX_GRANT_COPY_OPS and XEN_NETBK_RX_SLOTS_MAX
        - Don't use XEN_PAGE_SIZE in handle_frag_list as we coalesce
        fragment into a new skb
        - Use gnntab_foreach_grant to split a Linux page into grant
---
 drivers/net/xen-netback/common.h  |  18 +++--
 drivers/net/xen-netback/netback.c | 153 ++++++++++++++++++++++++--------------
 2 files changed, 110 insertions(+), 61 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..24cb365 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -44,6 +44,7 @@
 #include <xen/interface/grant_table.h>
 #include <xen/grant_table.h>
 #include <xen/xenbus.h>
+#include <xen/page.h>
 #include <linux/debugfs.h>
 
 typedef unsigned int pending_ring_idx_t;
@@ -64,8 +65,8 @@ struct pending_tx_info {
 	struct ubuf_info callback_struct;
 };
 
-#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 struct xenvif_rx_meta {
 	int id;
@@ -80,16 +81,21 @@ struct xenvif_rx_meta {
 /* Discriminate from any valid pending_idx value. */
 #define INVALID_PENDING_IDX 0xFFFF
 
-#define MAX_BUFFER_OFFSET PAGE_SIZE
+#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
 
 #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
 
+/* The maximum number of frags is derived from the size of a grant (same
+ * as a Xen page size for now).
+ */
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
 /* It's possible for an skb to have a maximal number of frags
  * but still be less than MAX_BUFFER_OFFSET in size. Thus the
- * worst-case number of copy operations is MAX_SKB_FRAGS per
+ * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per
  * ring slot.
  */
-#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
+#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
 
 #define NETBACK_INVALID_HANDLE -1
 
@@ -203,7 +209,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
 /* Maximum number of Rx slots a to-guest packet may use, including the
  * slot needed for GSO meta-data.
  */
-#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1)
+#define XEN_NETBK_RX_SLOTS_MAX ((MAX_XEN_SKB_FRAGS + 1))
 
 enum state_bit_shift {
 	/* This bit marks that the vif is connected */
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index d4c1bc7..b1649aa 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -263,6 +263,80 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue,
 	return meta;
 }
 
+struct gop_frag_copy {
+	struct xenvif_queue *queue;
+	struct netrx_pending_operations *npo;
+	struct xenvif_rx_meta *meta;
+	int head;
+	int gso_type;
+
+	struct page *page;
+};
+
+static void xenvif_setup_copy_gop(unsigned long gfn,
+				  unsigned int offset,
+				  unsigned int *len,
+				  struct gop_frag_copy *info)
+{
+	struct gnttab_copy *copy_gop;
+	struct xen_page_foreign *foreign;
+	/* Convenient aliases */
+	struct xenvif_queue *queue = info->queue;
+	struct netrx_pending_operations *npo = info->npo;
+	struct page *page = info->page;
+
+	BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
+
+	if (npo->copy_off == MAX_BUFFER_OFFSET)
+		info->meta = get_next_rx_buffer(queue, npo);
+
+	if (npo->copy_off + *len > MAX_BUFFER_OFFSET)
+		*len = MAX_BUFFER_OFFSET - npo->copy_off;
+
+	copy_gop = npo->copy + npo->copy_prod++;
+	copy_gop->flags = GNTCOPY_dest_gref;
+	copy_gop->len = *len;
+
+	foreign = xen_page_foreign(page);
+	if (foreign) {
+		copy_gop->source.domid = foreign->domid;
+		copy_gop->source.u.ref = foreign->gref;
+		copy_gop->flags |= GNTCOPY_source_gref;
+	} else {
+		copy_gop->source.domid = DOMID_SELF;
+		copy_gop->source.u.gmfn = gfn;
+	}
+	copy_gop->source.offset = offset;
+
+	copy_gop->dest.domid = queue->vif->domid;
+	copy_gop->dest.offset = npo->copy_off;
+	copy_gop->dest.u.ref = npo->copy_gref;
+
+	npo->copy_off += *len;
+	info->meta->size += *len;
+
+	/* Leave a gap for the GSO descriptor. */
+	if (info->head && ((1 << info->gso_type) & queue->vif->gso_mask))
+		queue->rx.req_cons++;
+
+	info->head = 0; /* There must be something in this buffer now */
+}
+
+static void xenvif_gop_frag_copy_grant(unsigned long gfn,
+				       unsigned offset,
+				       unsigned int len,
+				       void *data)
+{
+	unsigned int bytes;
+
+	while (len) {
+		bytes = len;
+		xenvif_setup_copy_gop(gfn, offset, &bytes, data);
+		offset += bytes;
+		len -= bytes;
+	}
+}
+
 /*
  * Set up the grant operations for this fragment. If it's a flipping
  * interface, we also set up the unmap request from here.
@@ -272,83 +346,52 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 				 struct page *page, unsigned long size,
 				 unsigned long offset, int *head)
 {
-	struct gnttab_copy *copy_gop;
-	struct xenvif_rx_meta *meta;
+	struct gop_frag_copy info = {
+		.queue = queue,
+		.npo = npo,
+		.head = *head,
+		.gso_type = XEN_NETIF_GSO_TYPE_NONE,
+	};
 	unsigned long bytes;
-	int gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
 	if (skb_is_gso(skb)) {
 		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
-			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+			info.gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
 		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
-			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+			info.gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
 	}
 
 	/* Data must not cross a page boundary. */
 	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
 
-	meta = npo->meta + npo->meta_prod - 1;
+	info.meta = npo->meta + npo->meta_prod - 1;
 
 	/* Skip unused frames from start of page */
 	page += offset >> PAGE_SHIFT;
 	offset &= ~PAGE_MASK;
 
 	while (size > 0) {
-		struct xen_page_foreign *foreign;
-
 		BUG_ON(offset >= PAGE_SIZE);
-		BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
-
-		if (npo->copy_off == MAX_BUFFER_OFFSET)
-			meta = get_next_rx_buffer(queue, npo);
 
 		bytes = PAGE_SIZE - offset;
 		if (bytes > size)
 			bytes = size;
 
-		if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
-			bytes = MAX_BUFFER_OFFSET - npo->copy_off;
-
-		copy_gop = npo->copy + npo->copy_prod++;
-		copy_gop->flags = GNTCOPY_dest_gref;
-		copy_gop->len = bytes;
-
-		foreign = xen_page_foreign(page);
-		if (foreign) {
-			copy_gop->source.domid = foreign->domid;
-			copy_gop->source.u.ref = foreign->gref;
-			copy_gop->flags |= GNTCOPY_source_gref;
-		} else {
-			copy_gop->source.domid = DOMID_SELF;
-			copy_gop->source.u.gmfn =
-				virt_to_gfn(page_address(page));
-		}
-		copy_gop->source.offset = offset;
-
-		copy_gop->dest.domid = queue->vif->domid;
-		copy_gop->dest.offset = npo->copy_off;
-		copy_gop->dest.u.ref = npo->copy_gref;
-
-		npo->copy_off += bytes;
-		meta->size += bytes;
-
-		offset += bytes;
+		info.page = page;
+		gnttab_foreach_grant_in_range(page, offset, bytes,
+					      xenvif_gop_frag_copy_grant,
+					      &info);
 		size -= bytes;
+		offset = 0;
 
-		/* Next frame */
-		if (offset == PAGE_SIZE && size) {
+		/* Next page */
+		if (size) {
 			BUG_ON(!PageCompound(page));
 			page++;
-			offset = 0;
 		}
-
-		/* Leave a gap for the GSO descriptor. */
-		if (*head && ((1 << gso_type) & queue->vif->gso_mask))
-			queue->rx.req_cons++;
-
-		*head = 0; /* There must be something in this buffer now. */
-
 	}
+
+	*head = info.head;
 }
 
 /*
@@ -747,7 +790,7 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
 		first->size -= txp->size;
 		slots++;
 
-		if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
+		if (unlikely((txp->offset + txp->size) > XEN_PAGE_SIZE)) {
 			netdev_err(queue->vif->dev, "Cross page boundary, txp->offset: %u, size: %u\n",
 				 txp->offset, txp->size);
 			xenvif_fatal_tx_err(queue->vif);
@@ -1229,11 +1272,11 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 		}
 
 		/* No crossing a page as the payload mustn't fragment. */
-		if (unlikely((txreq.offset + txreq.size) > PAGE_SIZE)) {
+		if (unlikely((txreq.offset + txreq.size) > XEN_PAGE_SIZE)) {
 			netdev_err(queue->vif->dev,
 				   "txreq.offset: %u, size: %u, end: %lu\n",
 				   txreq.offset, txreq.size,
-				   (unsigned long)(txreq.offset&~PAGE_MASK) + txreq.size);
+				   (unsigned long)(txreq.offset&~XEN_PAGE_MASK) + txreq.size);
 			xenvif_fatal_tx_err(queue->vif);
 			break;
 		}
@@ -1299,7 +1342,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 			virt_to_gfn(skb->data);
 		queue->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF;
 		queue->tx_copy_ops[*copy_ops].dest.offset =
-			offset_in_page(skb->data);
+			offset_in_page(skb->data) & ~XEN_PAGE_MASK;
 
 		queue->tx_copy_ops[*copy_ops].len = data_len;
 		queue->tx_copy_ops[*copy_ops].flags = GNTCOPY_source_gref;
@@ -1784,7 +1827,7 @@ int xenvif_map_frontend_rings(struct xenvif_queue *queue,
 		goto err;
 
 	txs = (struct xen_netif_tx_sring *)addr;
-	BACK_RING_INIT(&queue->tx, txs, PAGE_SIZE);
+	BACK_RING_INIT(&queue->tx, txs, XEN_PAGE_SIZE);
 
 	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(queue->vif),
 				     &rx_ring_ref, 1, &addr);
@@ -1792,7 +1835,7 @@ int xenvif_map_frontend_rings(struct xenvif_queue *queue,
 		goto err;
 
 	rxs = (struct xen_netif_rx_sring *)addr;
-	BACK_RING_INIT(&queue->rx, rxs, PAGE_SIZE);
+	BACK_RING_INIT(&queue->rx, rxs, XEN_PAGE_SIZE);
 
 	return 0;
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, stefano.stabellini, netdev, linux-kernel,
	Julien Grall, linux-arm-kernel

The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity working as a
network backend on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Signed-off-by: Julien Grall <julien.grall@citrix.com>

---
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: netdev@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

Note that I haven't add a comment why the offset is 0 after the first
iteration. See [1] for more details.

[1] https://lkml.org/lkml/2015/8/10/456

    Changes in v4:
        - Add a comment to explain how we compute MAX_XEN_SKB_FRAGS

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - s/mfn/gfn/ based on the new naming
        - gnttab_foreach_grant has been renamed to gnttab_forach_grant_in_range
        - The grant callback doesn't allow anymore to use less data. An
        helpers has been added in netback to handle this.

    Changes in v2:
        - Correctly set MAX_GRANT_COPY_OPS and XEN_NETBK_RX_SLOTS_MAX
        - Don't use XEN_PAGE_SIZE in handle_frag_list as we coalesce
        fragment into a new skb
        - Use gnntab_foreach_grant to split a Linux page into grant
---
 drivers/net/xen-netback/common.h  |  18 +++--
 drivers/net/xen-netback/netback.c | 153 ++++++++++++++++++++++++--------------
 2 files changed, 110 insertions(+), 61 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..24cb365 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -44,6 +44,7 @@
 #include <xen/interface/grant_table.h>
 #include <xen/grant_table.h>
 #include <xen/xenbus.h>
+#include <xen/page.h>
 #include <linux/debugfs.h>
 
 typedef unsigned int pending_ring_idx_t;
@@ -64,8 +65,8 @@ struct pending_tx_info {
 	struct ubuf_info callback_struct;
 };
 
-#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 struct xenvif_rx_meta {
 	int id;
@@ -80,16 +81,21 @@ struct xenvif_rx_meta {
 /* Discriminate from any valid pending_idx value. */
 #define INVALID_PENDING_IDX 0xFFFF
 
-#define MAX_BUFFER_OFFSET PAGE_SIZE
+#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
 
 #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
 
+/* The maximum number of frags is derived from the size of a grant (same
+ * as a Xen page size for now).
+ */
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
 /* It's possible for an skb to have a maximal number of frags
  * but still be less than MAX_BUFFER_OFFSET in size. Thus the
- * worst-case number of copy operations is MAX_SKB_FRAGS per
+ * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per
  * ring slot.
  */
-#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
+#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
 
 #define NETBACK_INVALID_HANDLE -1
 
@@ -203,7 +209,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
 /* Maximum number of Rx slots a to-guest packet may use, including the
  * slot needed for GSO meta-data.
  */
-#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1)
+#define XEN_NETBK_RX_SLOTS_MAX ((MAX_XEN_SKB_FRAGS + 1))
 
 enum state_bit_shift {
 	/* This bit marks that the vif is connected */
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index d4c1bc7..b1649aa 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -263,6 +263,80 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue,
 	return meta;
 }
 
+struct gop_frag_copy {
+	struct xenvif_queue *queue;
+	struct netrx_pending_operations *npo;
+	struct xenvif_rx_meta *meta;
+	int head;
+	int gso_type;
+
+	struct page *page;
+};
+
+static void xenvif_setup_copy_gop(unsigned long gfn,
+				  unsigned int offset,
+				  unsigned int *len,
+				  struct gop_frag_copy *info)
+{
+	struct gnttab_copy *copy_gop;
+	struct xen_page_foreign *foreign;
+	/* Convenient aliases */
+	struct xenvif_queue *queue = info->queue;
+	struct netrx_pending_operations *npo = info->npo;
+	struct page *page = info->page;
+
+	BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
+
+	if (npo->copy_off == MAX_BUFFER_OFFSET)
+		info->meta = get_next_rx_buffer(queue, npo);
+
+	if (npo->copy_off + *len > MAX_BUFFER_OFFSET)
+		*len = MAX_BUFFER_OFFSET - npo->copy_off;
+
+	copy_gop = npo->copy + npo->copy_prod++;
+	copy_gop->flags = GNTCOPY_dest_gref;
+	copy_gop->len = *len;
+
+	foreign = xen_page_foreign(page);
+	if (foreign) {
+		copy_gop->source.domid = foreign->domid;
+		copy_gop->source.u.ref = foreign->gref;
+		copy_gop->flags |= GNTCOPY_source_gref;
+	} else {
+		copy_gop->source.domid = DOMID_SELF;
+		copy_gop->source.u.gmfn = gfn;
+	}
+	copy_gop->source.offset = offset;
+
+	copy_gop->dest.domid = queue->vif->domid;
+	copy_gop->dest.offset = npo->copy_off;
+	copy_gop->dest.u.ref = npo->copy_gref;
+
+	npo->copy_off += *len;
+	info->meta->size += *len;
+
+	/* Leave a gap for the GSO descriptor. */
+	if (info->head && ((1 << info->gso_type) & queue->vif->gso_mask))
+		queue->rx.req_cons++;
+
+	info->head = 0; /* There must be something in this buffer now */
+}
+
+static void xenvif_gop_frag_copy_grant(unsigned long gfn,
+				       unsigned offset,
+				       unsigned int len,
+				       void *data)
+{
+	unsigned int bytes;
+
+	while (len) {
+		bytes = len;
+		xenvif_setup_copy_gop(gfn, offset, &bytes, data);
+		offset += bytes;
+		len -= bytes;
+	}
+}
+
 /*
  * Set up the grant operations for this fragment. If it's a flipping
  * interface, we also set up the unmap request from here.
@@ -272,83 +346,52 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 				 struct page *page, unsigned long size,
 				 unsigned long offset, int *head)
 {
-	struct gnttab_copy *copy_gop;
-	struct xenvif_rx_meta *meta;
+	struct gop_frag_copy info = {
+		.queue = queue,
+		.npo = npo,
+		.head = *head,
+		.gso_type = XEN_NETIF_GSO_TYPE_NONE,
+	};
 	unsigned long bytes;
-	int gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
 	if (skb_is_gso(skb)) {
 		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
-			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+			info.gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
 		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
-			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+			info.gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
 	}
 
 	/* Data must not cross a page boundary. */
 	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
 
-	meta = npo->meta + npo->meta_prod - 1;
+	info.meta = npo->meta + npo->meta_prod - 1;
 
 	/* Skip unused frames from start of page */
 	page += offset >> PAGE_SHIFT;
 	offset &= ~PAGE_MASK;
 
 	while (size > 0) {
-		struct xen_page_foreign *foreign;
-
 		BUG_ON(offset >= PAGE_SIZE);
-		BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
-
-		if (npo->copy_off == MAX_BUFFER_OFFSET)
-			meta = get_next_rx_buffer(queue, npo);
 
 		bytes = PAGE_SIZE - offset;
 		if (bytes > size)
 			bytes = size;
 
-		if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
-			bytes = MAX_BUFFER_OFFSET - npo->copy_off;
-
-		copy_gop = npo->copy + npo->copy_prod++;
-		copy_gop->flags = GNTCOPY_dest_gref;
-		copy_gop->len = bytes;
-
-		foreign = xen_page_foreign(page);
-		if (foreign) {
-			copy_gop->source.domid = foreign->domid;
-			copy_gop->source.u.ref = foreign->gref;
-			copy_gop->flags |= GNTCOPY_source_gref;
-		} else {
-			copy_gop->source.domid = DOMID_SELF;
-			copy_gop->source.u.gmfn =
-				virt_to_gfn(page_address(page));
-		}
-		copy_gop->source.offset = offset;
-
-		copy_gop->dest.domid = queue->vif->domid;
-		copy_gop->dest.offset = npo->copy_off;
-		copy_gop->dest.u.ref = npo->copy_gref;
-
-		npo->copy_off += bytes;
-		meta->size += bytes;
-
-		offset += bytes;
+		info.page = page;
+		gnttab_foreach_grant_in_range(page, offset, bytes,
+					      xenvif_gop_frag_copy_grant,
+					      &info);
 		size -= bytes;
+		offset = 0;
 
-		/* Next frame */
-		if (offset == PAGE_SIZE && size) {
+		/* Next page */
+		if (size) {
 			BUG_ON(!PageCompound(page));
 			page++;
-			offset = 0;
 		}
-
-		/* Leave a gap for the GSO descriptor. */
-		if (*head && ((1 << gso_type) & queue->vif->gso_mask))
-			queue->rx.req_cons++;
-
-		*head = 0; /* There must be something in this buffer now. */
-
 	}
+
+	*head = info.head;
 }
 
 /*
@@ -747,7 +790,7 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
 		first->size -= txp->size;
 		slots++;
 
-		if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
+		if (unlikely((txp->offset + txp->size) > XEN_PAGE_SIZE)) {
 			netdev_err(queue->vif->dev, "Cross page boundary, txp->offset: %u, size: %u\n",
 				 txp->offset, txp->size);
 			xenvif_fatal_tx_err(queue->vif);
@@ -1229,11 +1272,11 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 		}
 
 		/* No crossing a page as the payload mustn't fragment. */
-		if (unlikely((txreq.offset + txreq.size) > PAGE_SIZE)) {
+		if (unlikely((txreq.offset + txreq.size) > XEN_PAGE_SIZE)) {
 			netdev_err(queue->vif->dev,
 				   "txreq.offset: %u, size: %u, end: %lu\n",
 				   txreq.offset, txreq.size,
-				   (unsigned long)(txreq.offset&~PAGE_MASK) + txreq.size);
+				   (unsigned long)(txreq.offset&~XEN_PAGE_MASK) + txreq.size);
 			xenvif_fatal_tx_err(queue->vif);
 			break;
 		}
@@ -1299,7 +1342,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 			virt_to_gfn(skb->data);
 		queue->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF;
 		queue->tx_copy_ops[*copy_ops].dest.offset =
-			offset_in_page(skb->data);
+			offset_in_page(skb->data) & ~XEN_PAGE_MASK;
 
 		queue->tx_copy_ops[*copy_ops].len = data_len;
 		queue->tx_copy_ops[*copy_ops].flags = GNTCOPY_source_gref;
@@ -1784,7 +1827,7 @@ int xenvif_map_frontend_rings(struct xenvif_queue *queue,
 		goto err;
 
 	txs = (struct xen_netif_tx_sring *)addr;
-	BACK_RING_INIT(&queue->tx, txs, PAGE_SIZE);
+	BACK_RING_INIT(&queue->tx, txs, XEN_PAGE_SIZE);
 
 	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(queue->vif),
 				     &rx_ring_ref, 1, &addr);
@@ -1792,7 +1835,7 @@ int xenvif_map_frontend_rings(struct xenvif_queue *queue,
 		goto err;
 
 	rxs = (struct xen_netif_rx_sring *)addr;
-	BACK_RING_INIT(&queue->rx, rxs, PAGE_SIZE);
+	BACK_RING_INIT(&queue->rx, rxs, XEN_PAGE_SIZE);
 
 	return 0;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, stefano.stabellini, netdev, linux-kernel,
	Julien Grall, linux-arm-kernel

The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity working as a
network backend on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Signed-off-by: Julien Grall <julien.grall@citrix.com>

---
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: netdev@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

Note that I haven't add a comment why the offset is 0 after the first
iteration. See [1] for more details.

[1] https://lkml.org/lkml/2015/8/10/456

    Changes in v4:
        - Add a comment to explain how we compute MAX_XEN_SKB_FRAGS

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - s/mfn/gfn/ based on the new naming
        - gnttab_foreach_grant has been renamed to gnttab_forach_grant_in_range
        - The grant callback doesn't allow anymore to use less data. An
        helpers has been added in netback to handle this.

    Changes in v2:
        - Correctly set MAX_GRANT_COPY_OPS and XEN_NETBK_RX_SLOTS_MAX
        - Don't use XEN_PAGE_SIZE in handle_frag_list as we coalesce
        fragment into a new skb
        - Use gnntab_foreach_grant to split a Linux page into grant
---
 drivers/net/xen-netback/common.h  |  18 +++--
 drivers/net/xen-netback/netback.c | 153 ++++++++++++++++++++++++--------------
 2 files changed, 110 insertions(+), 61 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..24cb365 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -44,6 +44,7 @@
 #include <xen/interface/grant_table.h>
 #include <xen/grant_table.h>
 #include <xen/xenbus.h>
+#include <xen/page.h>
 #include <linux/debugfs.h>
 
 typedef unsigned int pending_ring_idx_t;
@@ -64,8 +65,8 @@ struct pending_tx_info {
 	struct ubuf_info callback_struct;
 };
 
-#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 struct xenvif_rx_meta {
 	int id;
@@ -80,16 +81,21 @@ struct xenvif_rx_meta {
 /* Discriminate from any valid pending_idx value. */
 #define INVALID_PENDING_IDX 0xFFFF
 
-#define MAX_BUFFER_OFFSET PAGE_SIZE
+#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
 
 #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
 
+/* The maximum number of frags is derived from the size of a grant (same
+ * as a Xen page size for now).
+ */
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
 /* It's possible for an skb to have a maximal number of frags
  * but still be less than MAX_BUFFER_OFFSET in size. Thus the
- * worst-case number of copy operations is MAX_SKB_FRAGS per
+ * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per
  * ring slot.
  */
-#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
+#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
 
 #define NETBACK_INVALID_HANDLE -1
 
@@ -203,7 +209,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
 /* Maximum number of Rx slots a to-guest packet may use, including the
  * slot needed for GSO meta-data.
  */
-#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1)
+#define XEN_NETBK_RX_SLOTS_MAX ((MAX_XEN_SKB_FRAGS + 1))
 
 enum state_bit_shift {
 	/* This bit marks that the vif is connected */
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index d4c1bc7..b1649aa 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -263,6 +263,80 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue,
 	return meta;
 }
 
+struct gop_frag_copy {
+	struct xenvif_queue *queue;
+	struct netrx_pending_operations *npo;
+	struct xenvif_rx_meta *meta;
+	int head;
+	int gso_type;
+
+	struct page *page;
+};
+
+static void xenvif_setup_copy_gop(unsigned long gfn,
+				  unsigned int offset,
+				  unsigned int *len,
+				  struct gop_frag_copy *info)
+{
+	struct gnttab_copy *copy_gop;
+	struct xen_page_foreign *foreign;
+	/* Convenient aliases */
+	struct xenvif_queue *queue = info->queue;
+	struct netrx_pending_operations *npo = info->npo;
+	struct page *page = info->page;
+
+	BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
+
+	if (npo->copy_off == MAX_BUFFER_OFFSET)
+		info->meta = get_next_rx_buffer(queue, npo);
+
+	if (npo->copy_off + *len > MAX_BUFFER_OFFSET)
+		*len = MAX_BUFFER_OFFSET - npo->copy_off;
+
+	copy_gop = npo->copy + npo->copy_prod++;
+	copy_gop->flags = GNTCOPY_dest_gref;
+	copy_gop->len = *len;
+
+	foreign = xen_page_foreign(page);
+	if (foreign) {
+		copy_gop->source.domid = foreign->domid;
+		copy_gop->source.u.ref = foreign->gref;
+		copy_gop->flags |= GNTCOPY_source_gref;
+	} else {
+		copy_gop->source.domid = DOMID_SELF;
+		copy_gop->source.u.gmfn = gfn;
+	}
+	copy_gop->source.offset = offset;
+
+	copy_gop->dest.domid = queue->vif->domid;
+	copy_gop->dest.offset = npo->copy_off;
+	copy_gop->dest.u.ref = npo->copy_gref;
+
+	npo->copy_off += *len;
+	info->meta->size += *len;
+
+	/* Leave a gap for the GSO descriptor. */
+	if (info->head && ((1 << info->gso_type) & queue->vif->gso_mask))
+		queue->rx.req_cons++;
+
+	info->head = 0; /* There must be something in this buffer now */
+}
+
+static void xenvif_gop_frag_copy_grant(unsigned long gfn,
+				       unsigned offset,
+				       unsigned int len,
+				       void *data)
+{
+	unsigned int bytes;
+
+	while (len) {
+		bytes = len;
+		xenvif_setup_copy_gop(gfn, offset, &bytes, data);
+		offset += bytes;
+		len -= bytes;
+	}
+}
+
 /*
  * Set up the grant operations for this fragment. If it's a flipping
  * interface, we also set up the unmap request from here.
@@ -272,83 +346,52 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 				 struct page *page, unsigned long size,
 				 unsigned long offset, int *head)
 {
-	struct gnttab_copy *copy_gop;
-	struct xenvif_rx_meta *meta;
+	struct gop_frag_copy info = {
+		.queue = queue,
+		.npo = npo,
+		.head = *head,
+		.gso_type = XEN_NETIF_GSO_TYPE_NONE,
+	};
 	unsigned long bytes;
-	int gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
 	if (skb_is_gso(skb)) {
 		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
-			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+			info.gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
 		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
-			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+			info.gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
 	}
 
 	/* Data must not cross a page boundary. */
 	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
 
-	meta = npo->meta + npo->meta_prod - 1;
+	info.meta = npo->meta + npo->meta_prod - 1;
 
 	/* Skip unused frames from start of page */
 	page += offset >> PAGE_SHIFT;
 	offset &= ~PAGE_MASK;
 
 	while (size > 0) {
-		struct xen_page_foreign *foreign;
-
 		BUG_ON(offset >= PAGE_SIZE);
-		BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
-
-		if (npo->copy_off == MAX_BUFFER_OFFSET)
-			meta = get_next_rx_buffer(queue, npo);
 
 		bytes = PAGE_SIZE - offset;
 		if (bytes > size)
 			bytes = size;
 
-		if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
-			bytes = MAX_BUFFER_OFFSET - npo->copy_off;
-
-		copy_gop = npo->copy + npo->copy_prod++;
-		copy_gop->flags = GNTCOPY_dest_gref;
-		copy_gop->len = bytes;
-
-		foreign = xen_page_foreign(page);
-		if (foreign) {
-			copy_gop->source.domid = foreign->domid;
-			copy_gop->source.u.ref = foreign->gref;
-			copy_gop->flags |= GNTCOPY_source_gref;
-		} else {
-			copy_gop->source.domid = DOMID_SELF;
-			copy_gop->source.u.gmfn =
-				virt_to_gfn(page_address(page));
-		}
-		copy_gop->source.offset = offset;
-
-		copy_gop->dest.domid = queue->vif->domid;
-		copy_gop->dest.offset = npo->copy_off;
-		copy_gop->dest.u.ref = npo->copy_gref;
-
-		npo->copy_off += bytes;
-		meta->size += bytes;
-
-		offset += bytes;
+		info.page = page;
+		gnttab_foreach_grant_in_range(page, offset, bytes,
+					      xenvif_gop_frag_copy_grant,
+					      &info);
 		size -= bytes;
+		offset = 0;
 
-		/* Next frame */
-		if (offset == PAGE_SIZE && size) {
+		/* Next page */
+		if (size) {
 			BUG_ON(!PageCompound(page));
 			page++;
-			offset = 0;
 		}
-
-		/* Leave a gap for the GSO descriptor. */
-		if (*head && ((1 << gso_type) & queue->vif->gso_mask))
-			queue->rx.req_cons++;
-
-		*head = 0; /* There must be something in this buffer now. */
-
 	}
+
+	*head = info.head;
 }
 
 /*
@@ -747,7 +790,7 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
 		first->size -= txp->size;
 		slots++;
 
-		if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
+		if (unlikely((txp->offset + txp->size) > XEN_PAGE_SIZE)) {
 			netdev_err(queue->vif->dev, "Cross page boundary, txp->offset: %u, size: %u\n",
 				 txp->offset, txp->size);
 			xenvif_fatal_tx_err(queue->vif);
@@ -1229,11 +1272,11 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 		}
 
 		/* No crossing a page as the payload mustn't fragment. */
-		if (unlikely((txreq.offset + txreq.size) > PAGE_SIZE)) {
+		if (unlikely((txreq.offset + txreq.size) > XEN_PAGE_SIZE)) {
 			netdev_err(queue->vif->dev,
 				   "txreq.offset: %u, size: %u, end: %lu\n",
 				   txreq.offset, txreq.size,
-				   (unsigned long)(txreq.offset&~PAGE_MASK) + txreq.size);
+				   (unsigned long)(txreq.offset&~XEN_PAGE_MASK) + txreq.size);
 			xenvif_fatal_tx_err(queue->vif);
 			break;
 		}
@@ -1299,7 +1342,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 			virt_to_gfn(skb->data);
 		queue->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF;
 		queue->tx_copy_ops[*copy_ops].dest.offset =
-			offset_in_page(skb->data);
+			offset_in_page(skb->data) & ~XEN_PAGE_MASK;
 
 		queue->tx_copy_ops[*copy_ops].len = data_len;
 		queue->tx_copy_ops[*copy_ops].flags = GNTCOPY_source_gref;
@@ -1784,7 +1827,7 @@ int xenvif_map_frontend_rings(struct xenvif_queue *queue,
 		goto err;
 
 	txs = (struct xen_netif_tx_sring *)addr;
-	BACK_RING_INIT(&queue->tx, txs, PAGE_SIZE);
+	BACK_RING_INIT(&queue->tx, txs, XEN_PAGE_SIZE);
 
 	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(queue->vif),
 				     &rx_ring_ref, 1, &addr);
@@ -1792,7 +1835,7 @@ int xenvif_map_frontend_rings(struct xenvif_queue *queue,
 		goto err;
 
 	rxs = (struct xen_netif_rx_sring *)addr;
-	BACK_RING_INIT(&queue->rx, rxs, PAGE_SIZE);
+	BACK_RING_INIT(&queue->rx, rxs, XEN_PAGE_SIZE);
 
 	return 0;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity working as a
network backend on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Signed-off-by: Julien Grall <julien.grall@citrix.com>

---
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: netdev at vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

Note that I haven't add a comment why the offset is 0 after the first
iteration. See [1] for more details.

[1] https://lkml.org/lkml/2015/8/10/456

    Changes in v4:
        - Add a comment to explain how we compute MAX_XEN_SKB_FRAGS

    Changes in v3:
        - Fix errors reported by checkpatch.pl
        - s/mfn/gfn/ based on the new naming
        - gnttab_foreach_grant has been renamed to gnttab_forach_grant_in_range
        - The grant callback doesn't allow anymore to use less data. An
        helpers has been added in netback to handle this.

    Changes in v2:
        - Correctly set MAX_GRANT_COPY_OPS and XEN_NETBK_RX_SLOTS_MAX
        - Don't use XEN_PAGE_SIZE in handle_frag_list as we coalesce
        fragment into a new skb
        - Use gnntab_foreach_grant to split a Linux page into grant
---
 drivers/net/xen-netback/common.h  |  18 +++--
 drivers/net/xen-netback/netback.c | 153 ++++++++++++++++++++++++--------------
 2 files changed, 110 insertions(+), 61 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..24cb365 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -44,6 +44,7 @@
 #include <xen/interface/grant_table.h>
 #include <xen/grant_table.h>
 #include <xen/xenbus.h>
+#include <xen/page.h>
 #include <linux/debugfs.h>
 
 typedef unsigned int pending_ring_idx_t;
@@ -64,8 +65,8 @@ struct pending_tx_info {
 	struct ubuf_info callback_struct;
 };
 
-#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 struct xenvif_rx_meta {
 	int id;
@@ -80,16 +81,21 @@ struct xenvif_rx_meta {
 /* Discriminate from any valid pending_idx value. */
 #define INVALID_PENDING_IDX 0xFFFF
 
-#define MAX_BUFFER_OFFSET PAGE_SIZE
+#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
 
 #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
 
+/* The maximum number of frags is derived from the size of a grant (same
+ * as a Xen page size for now).
+ */
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
 /* It's possible for an skb to have a maximal number of frags
  * but still be less than MAX_BUFFER_OFFSET in size. Thus the
- * worst-case number of copy operations is MAX_SKB_FRAGS per
+ * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per
  * ring slot.
  */
-#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
+#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
 
 #define NETBACK_INVALID_HANDLE -1
 
@@ -203,7 +209,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
 /* Maximum number of Rx slots a to-guest packet may use, including the
  * slot needed for GSO meta-data.
  */
-#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1)
+#define XEN_NETBK_RX_SLOTS_MAX ((MAX_XEN_SKB_FRAGS + 1))
 
 enum state_bit_shift {
 	/* This bit marks that the vif is connected */
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index d4c1bc7..b1649aa 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -263,6 +263,80 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue,
 	return meta;
 }
 
+struct gop_frag_copy {
+	struct xenvif_queue *queue;
+	struct netrx_pending_operations *npo;
+	struct xenvif_rx_meta *meta;
+	int head;
+	int gso_type;
+
+	struct page *page;
+};
+
+static void xenvif_setup_copy_gop(unsigned long gfn,
+				  unsigned int offset,
+				  unsigned int *len,
+				  struct gop_frag_copy *info)
+{
+	struct gnttab_copy *copy_gop;
+	struct xen_page_foreign *foreign;
+	/* Convenient aliases */
+	struct xenvif_queue *queue = info->queue;
+	struct netrx_pending_operations *npo = info->npo;
+	struct page *page = info->page;
+
+	BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
+
+	if (npo->copy_off == MAX_BUFFER_OFFSET)
+		info->meta = get_next_rx_buffer(queue, npo);
+
+	if (npo->copy_off + *len > MAX_BUFFER_OFFSET)
+		*len = MAX_BUFFER_OFFSET - npo->copy_off;
+
+	copy_gop = npo->copy + npo->copy_prod++;
+	copy_gop->flags = GNTCOPY_dest_gref;
+	copy_gop->len = *len;
+
+	foreign = xen_page_foreign(page);
+	if (foreign) {
+		copy_gop->source.domid = foreign->domid;
+		copy_gop->source.u.ref = foreign->gref;
+		copy_gop->flags |= GNTCOPY_source_gref;
+	} else {
+		copy_gop->source.domid = DOMID_SELF;
+		copy_gop->source.u.gmfn = gfn;
+	}
+	copy_gop->source.offset = offset;
+
+	copy_gop->dest.domid = queue->vif->domid;
+	copy_gop->dest.offset = npo->copy_off;
+	copy_gop->dest.u.ref = npo->copy_gref;
+
+	npo->copy_off += *len;
+	info->meta->size += *len;
+
+	/* Leave a gap for the GSO descriptor. */
+	if (info->head && ((1 << info->gso_type) & queue->vif->gso_mask))
+		queue->rx.req_cons++;
+
+	info->head = 0; /* There must be something in this buffer now */
+}
+
+static void xenvif_gop_frag_copy_grant(unsigned long gfn,
+				       unsigned offset,
+				       unsigned int len,
+				       void *data)
+{
+	unsigned int bytes;
+
+	while (len) {
+		bytes = len;
+		xenvif_setup_copy_gop(gfn, offset, &bytes, data);
+		offset += bytes;
+		len -= bytes;
+	}
+}
+
 /*
  * Set up the grant operations for this fragment. If it's a flipping
  * interface, we also set up the unmap request from here.
@@ -272,83 +346,52 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 				 struct page *page, unsigned long size,
 				 unsigned long offset, int *head)
 {
-	struct gnttab_copy *copy_gop;
-	struct xenvif_rx_meta *meta;
+	struct gop_frag_copy info = {
+		.queue = queue,
+		.npo = npo,
+		.head = *head,
+		.gso_type = XEN_NETIF_GSO_TYPE_NONE,
+	};
 	unsigned long bytes;
-	int gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
 	if (skb_is_gso(skb)) {
 		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
-			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+			info.gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
 		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
-			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+			info.gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
 	}
 
 	/* Data must not cross a page boundary. */
 	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
 
-	meta = npo->meta + npo->meta_prod - 1;
+	info.meta = npo->meta + npo->meta_prod - 1;
 
 	/* Skip unused frames from start of page */
 	page += offset >> PAGE_SHIFT;
 	offset &= ~PAGE_MASK;
 
 	while (size > 0) {
-		struct xen_page_foreign *foreign;
-
 		BUG_ON(offset >= PAGE_SIZE);
-		BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
-
-		if (npo->copy_off == MAX_BUFFER_OFFSET)
-			meta = get_next_rx_buffer(queue, npo);
 
 		bytes = PAGE_SIZE - offset;
 		if (bytes > size)
 			bytes = size;
 
-		if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
-			bytes = MAX_BUFFER_OFFSET - npo->copy_off;
-
-		copy_gop = npo->copy + npo->copy_prod++;
-		copy_gop->flags = GNTCOPY_dest_gref;
-		copy_gop->len = bytes;
-
-		foreign = xen_page_foreign(page);
-		if (foreign) {
-			copy_gop->source.domid = foreign->domid;
-			copy_gop->source.u.ref = foreign->gref;
-			copy_gop->flags |= GNTCOPY_source_gref;
-		} else {
-			copy_gop->source.domid = DOMID_SELF;
-			copy_gop->source.u.gmfn =
-				virt_to_gfn(page_address(page));
-		}
-		copy_gop->source.offset = offset;
-
-		copy_gop->dest.domid = queue->vif->domid;
-		copy_gop->dest.offset = npo->copy_off;
-		copy_gop->dest.u.ref = npo->copy_gref;
-
-		npo->copy_off += bytes;
-		meta->size += bytes;
-
-		offset += bytes;
+		info.page = page;
+		gnttab_foreach_grant_in_range(page, offset, bytes,
+					      xenvif_gop_frag_copy_grant,
+					      &info);
 		size -= bytes;
+		offset = 0;
 
-		/* Next frame */
-		if (offset == PAGE_SIZE && size) {
+		/* Next page */
+		if (size) {
 			BUG_ON(!PageCompound(page));
 			page++;
-			offset = 0;
 		}
-
-		/* Leave a gap for the GSO descriptor. */
-		if (*head && ((1 << gso_type) & queue->vif->gso_mask))
-			queue->rx.req_cons++;
-
-		*head = 0; /* There must be something in this buffer now. */
-
 	}
+
+	*head = info.head;
 }
 
 /*
@@ -747,7 +790,7 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
 		first->size -= txp->size;
 		slots++;
 
-		if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
+		if (unlikely((txp->offset + txp->size) > XEN_PAGE_SIZE)) {
 			netdev_err(queue->vif->dev, "Cross page boundary, txp->offset: %u, size: %u\n",
 				 txp->offset, txp->size);
 			xenvif_fatal_tx_err(queue->vif);
@@ -1229,11 +1272,11 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 		}
 
 		/* No crossing a page as the payload mustn't fragment. */
-		if (unlikely((txreq.offset + txreq.size) > PAGE_SIZE)) {
+		if (unlikely((txreq.offset + txreq.size) > XEN_PAGE_SIZE)) {
 			netdev_err(queue->vif->dev,
 				   "txreq.offset: %u, size: %u, end: %lu\n",
 				   txreq.offset, txreq.size,
-				   (unsigned long)(txreq.offset&~PAGE_MASK) + txreq.size);
+				   (unsigned long)(txreq.offset&~XEN_PAGE_MASK) + txreq.size);
 			xenvif_fatal_tx_err(queue->vif);
 			break;
 		}
@@ -1299,7 +1342,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 			virt_to_gfn(skb->data);
 		queue->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF;
 		queue->tx_copy_ops[*copy_ops].dest.offset =
-			offset_in_page(skb->data);
+			offset_in_page(skb->data) & ~XEN_PAGE_MASK;
 
 		queue->tx_copy_ops[*copy_ops].len = data_len;
 		queue->tx_copy_ops[*copy_ops].flags = GNTCOPY_source_gref;
@@ -1784,7 +1827,7 @@ int xenvif_map_frontend_rings(struct xenvif_queue *queue,
 		goto err;
 
 	txs = (struct xen_netif_tx_sring *)addr;
-	BACK_RING_INIT(&queue->tx, txs, PAGE_SIZE);
+	BACK_RING_INIT(&queue->tx, txs, XEN_PAGE_SIZE);
 
 	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(queue->vif),
 				     &rx_ring_ref, 1, &addr);
@@ -1792,7 +1835,7 @@ int xenvif_map_frontend_rings(struct xenvif_queue *queue,
 		goto err;
 
 	rxs = (struct xen_netif_rx_sring *)addr;
-	BACK_RING_INIT(&queue->rx, rxs, PAGE_SIZE);
+	BACK_RING_INIT(&queue->rx, rxs, XEN_PAGE_SIZE);
 
 	return 0;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 19/20] xen/privcmd: Add support for Linux 64KB page granularity
  2015-09-07 15:33 ` Julien Grall
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, David Vrabel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky

The hypercall interface (as well as the toolstack) is always using 4KB
page granularity. When the toolstack is asking for mapping a series of
guest PFN in a batch, it expects to have the page map contiguously in
its virtual memory.

When Linux is using 64KB page granularity, the privcmd driver will have
to map multiple Xen PFN in a single Linux page.

Note that this solution works on page granularity which is a multiple of
4KB.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    I kept the hypercall arguments in remap_data to avoid allocating them on
    the stack every time that remap_pte_fn is called.
    I will keep like that unless someone is strongly disagree.

    Changes in v4:
        - s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
        - Add David's reviewed-by

    Changes in v3:
        - The function to split a Linux page in mutiple Xen page has
        been moved internally. It was the only use (not used anymore in
        the balloon) and it's not quite clear what should be the common
        interface. Differ the question until someone need to use it.
        - s/nr_pfn/numgfns/ to make clear that we are dealing with GFN
        - Use DIV_ROUND_UP rather round_up and fix the usage in
        xen_xlate_unmap_gfn_range

    Changes in v2:
        - Use xen_apply_to_page
---
 drivers/xen/privcmd.c   |   8 ++--
 drivers/xen/xlate_mmu.c | 124 ++++++++++++++++++++++++++++++++----------------
 2 files changed, 89 insertions(+), 43 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index c6deb87..c8798ee 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -446,7 +446,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
 		return -EINVAL;
 	}
 
-	nr_pages = m.num;
+	nr_pages = DIV_ROUND_UP(m.num, XEN_PFN_PER_PAGE);
 	if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT)))
 		return -EINVAL;
 
@@ -494,7 +494,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
 			goto out_unlock;
 		}
 		if (xen_feature(XENFEAT_auto_translated_physmap)) {
-			ret = alloc_empty_pages(vma, m.num);
+			ret = alloc_empty_pages(vma, nr_pages);
 			if (ret < 0)
 				goto out_unlock;
 		} else
@@ -518,6 +518,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
 	state.global_error  = 0;
 	state.version       = version;
 
+	BUILD_BUG_ON(((PAGE_SIZE / sizeof(xen_pfn_t)) % XEN_PFN_PER_PAGE) != 0);
 	/* mmap_batch_fn guarantees ret == 0 */
 	BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
 				    &pagelist, mmap_batch_fn, &state));
@@ -582,12 +583,13 @@ static void privcmd_close(struct vm_area_struct *vma)
 {
 	struct page **pages = vma->vm_private_data;
 	int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+	int numgfns = (vma->vm_end - vma->vm_start) >> XEN_PAGE_SHIFT;
 	int rc;
 
 	if (!xen_feature(XENFEAT_auto_translated_physmap) || !numpgs || !pages)
 		return;
 
-	rc = xen_unmap_domain_gfn_range(vma, numpgs, pages);
+	rc = xen_unmap_domain_gfn_range(vma, numgfns, pages);
 	if (rc == 0)
 		free_xenballooned_pages(numpgs, pages);
 	else
diff --git a/drivers/xen/xlate_mmu.c b/drivers/xen/xlate_mmu.c
index cff2387..5063c5e 100644
--- a/drivers/xen/xlate_mmu.c
+++ b/drivers/xen/xlate_mmu.c
@@ -38,31 +38,28 @@
 #include <xen/interface/xen.h>
 #include <xen/interface/memory.h>
 
-/* map fgfn of domid to lpfn in the current domain */
-static int map_foreign_page(unsigned long lpfn, unsigned long fgfn,
-			    unsigned int domid)
-{
-	int rc;
-	struct xen_add_to_physmap_range xatp = {
-		.domid = DOMID_SELF,
-		.foreign_domid = domid,
-		.size = 1,
-		.space = XENMAPSPACE_gmfn_foreign,
-	};
-	xen_ulong_t idx = fgfn;
-	xen_pfn_t gpfn = lpfn;
-	int err = 0;
+typedef void (*xen_gfn_fn_t)(unsigned long gfn, void *data);
 
-	set_xen_guest_handle(xatp.idxs, &idx);
-	set_xen_guest_handle(xatp.gpfns, &gpfn);
-	set_xen_guest_handle(xatp.errs, &err);
+/* Break down the pages in 4KB chunk and call fn for each gfn */
+static void xen_for_each_gfn(struct page **pages, unsigned nr_gfn,
+			     xen_gfn_fn_t fn, void *data)
+{
+	unsigned long xen_pfn = 0;
+	struct page *page;
+	int i;
 
-	rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
-	return rc < 0 ? rc : err;
+	for (i = 0; i < nr_gfn; i++) {
+		if ((i % XEN_PFN_PER_PAGE) == 0) {
+			page = pages[i / XEN_PFN_PER_PAGE];
+			xen_pfn = page_to_xen_pfn(page);
+		}
+		fn(pfn_to_gfn(xen_pfn++), data);
+	}
 }
 
 struct remap_data {
 	xen_pfn_t *fgfn; /* foreign domain's gfn */
+	int nr_fgfn; /* Number of foreign gfn left to map */
 	pgprot_t prot;
 	domid_t  domid;
 	struct vm_area_struct *vma;
@@ -71,24 +68,71 @@ struct remap_data {
 	struct xen_remap_gfn_info *info;
 	int *err_ptr;
 	int mapped;
+
+	/* Hypercall parameters */
+	int h_errs[XEN_PFN_PER_PAGE];
+	xen_ulong_t h_idxs[XEN_PFN_PER_PAGE];
+	xen_pfn_t h_gpfns[XEN_PFN_PER_PAGE];
+
+	int h_iter;	/* Iterator */
 };
 
+static void setup_hparams(unsigned long gfn, void *data)
+{
+	struct remap_data *info = data;
+
+	info->h_idxs[info->h_iter] = *info->fgfn;
+	info->h_gpfns[info->h_iter] = gfn;
+	info->h_errs[info->h_iter] = 0;
+
+	info->h_iter++;
+	info->fgfn++;
+}
+
 static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
 			void *data)
 {
 	struct remap_data *info = data;
 	struct page *page = info->pages[info->index++];
-	unsigned long pfn = page_to_pfn(page);
-	pte_t pte = pte_mkspecial(pfn_pte(pfn, info->prot));
-	int rc;
+	pte_t pte = pte_mkspecial(pfn_pte(page_to_pfn(page), info->prot));
+	int rc, nr_gfn;
+	uint32_t i;
+	struct xen_add_to_physmap_range xatp = {
+		.domid = DOMID_SELF,
+		.foreign_domid = info->domid,
+		.space = XENMAPSPACE_gmfn_foreign,
+	};
 
-	rc = map_foreign_page(pfn, *info->fgfn, info->domid);
-	*info->err_ptr++ = rc;
-	if (!rc) {
-		set_pte_at(info->vma->vm_mm, addr, ptep, pte);
-		info->mapped++;
+	nr_gfn = min_t(typeof(info->nr_fgfn), XEN_PFN_PER_PAGE, info->nr_fgfn);
+	info->nr_fgfn -= nr_gfn;
+
+	info->h_iter = 0;
+	xen_for_each_gfn(&page, nr_gfn, setup_hparams, info);
+	BUG_ON(info->h_iter != nr_gfn);
+
+	set_xen_guest_handle(xatp.idxs, info->h_idxs);
+	set_xen_guest_handle(xatp.gpfns, info->h_gpfns);
+	set_xen_guest_handle(xatp.errs, info->h_errs);
+	xatp.size = nr_gfn;
+
+	rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
+
+	/* info->err_ptr expect to have one error status per Xen PFN */
+	for (i = 0; i < nr_gfn; i++) {
+		int err = (rc < 0) ? rc : info->h_errs[i];
+
+		*(info->err_ptr++) = err;
+		if (!err)
+			info->mapped++;
 	}
-	info->fgfn++;
+
+	/*
+	 * Note: The hypercall will return 0 in most of the case if even if
+	 * all the fgmfn are not mapped. We still have to update the pte
+	 * as the userspace may decide to continue.
+	 */
+	if (!rc)
+		set_pte_at(info->vma->vm_mm, addr, ptep, pte);
 
 	return 0;
 }
@@ -102,13 +146,14 @@ int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
 {
 	int err;
 	struct remap_data data;
-	unsigned long range = nr << PAGE_SHIFT;
+	unsigned long range = DIV_ROUND_UP(nr, XEN_PFN_PER_PAGE) << PAGE_SHIFT;
 
 	/* Kept here for the purpose of making sure code doesn't break
 	   x86 PVOPS */
 	BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
 	data.fgfn = gfn;
+	data.nr_fgfn = nr;
 	data.prot  = prot;
 	data.domid = domid;
 	data.vma   = vma;
@@ -123,21 +168,20 @@ int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL_GPL(xen_xlate_remap_gfn_array);
 
-int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
-			      int nr, struct page **pages)
+static void unmap_gfn(unsigned long gfn, void *data)
 {
-	int i;
+	struct xen_remove_from_physmap xrp;
 
-	for (i = 0; i < nr; i++) {
-		struct xen_remove_from_physmap xrp;
-		unsigned long pfn;
+	xrp.domid = DOMID_SELF;
+	xrp.gpfn = gfn;
+	(void)HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrp);
+}
 
-		pfn = page_to_pfn(pages[i]);
+int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
+			      int nr, struct page **pages)
+{
+	xen_for_each_gfn(pages, nr, unmap_gfn, NULL);
 
-		xrp.domid = DOMID_SELF;
-		xrp.gpfn = pfn;
-		(void)HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrp);
-	}
 	return 0;
 }
 EXPORT_SYMBOL_GPL(xen_xlate_unmap_gfn_range);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 19/20] xen/privcmd: Add support for Linux 64KB page granularity
  2015-09-07 15:33 ` Julien Grall
                   ` (29 preceding siblings ...)
  (?)
@ 2015-09-07 15:33 ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, Julien Grall,
	David Vrabel, Boris Ostrovsky, linux-arm-kernel

The hypercall interface (as well as the toolstack) is always using 4KB
page granularity. When the toolstack is asking for mapping a series of
guest PFN in a batch, it expects to have the page map contiguously in
its virtual memory.

When Linux is using 64KB page granularity, the privcmd driver will have
to map multiple Xen PFN in a single Linux page.

Note that this solution works on page granularity which is a multiple of
4KB.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    I kept the hypercall arguments in remap_data to avoid allocating them on
    the stack every time that remap_pte_fn is called.
    I will keep like that unless someone is strongly disagree.

    Changes in v4:
        - s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
        - Add David's reviewed-by

    Changes in v3:
        - The function to split a Linux page in mutiple Xen page has
        been moved internally. It was the only use (not used anymore in
        the balloon) and it's not quite clear what should be the common
        interface. Differ the question until someone need to use it.
        - s/nr_pfn/numgfns/ to make clear that we are dealing with GFN
        - Use DIV_ROUND_UP rather round_up and fix the usage in
        xen_xlate_unmap_gfn_range

    Changes in v2:
        - Use xen_apply_to_page
---
 drivers/xen/privcmd.c   |   8 ++--
 drivers/xen/xlate_mmu.c | 124 ++++++++++++++++++++++++++++++++----------------
 2 files changed, 89 insertions(+), 43 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index c6deb87..c8798ee 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -446,7 +446,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
 		return -EINVAL;
 	}
 
-	nr_pages = m.num;
+	nr_pages = DIV_ROUND_UP(m.num, XEN_PFN_PER_PAGE);
 	if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT)))
 		return -EINVAL;
 
@@ -494,7 +494,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
 			goto out_unlock;
 		}
 		if (xen_feature(XENFEAT_auto_translated_physmap)) {
-			ret = alloc_empty_pages(vma, m.num);
+			ret = alloc_empty_pages(vma, nr_pages);
 			if (ret < 0)
 				goto out_unlock;
 		} else
@@ -518,6 +518,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
 	state.global_error  = 0;
 	state.version       = version;
 
+	BUILD_BUG_ON(((PAGE_SIZE / sizeof(xen_pfn_t)) % XEN_PFN_PER_PAGE) != 0);
 	/* mmap_batch_fn guarantees ret == 0 */
 	BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
 				    &pagelist, mmap_batch_fn, &state));
@@ -582,12 +583,13 @@ static void privcmd_close(struct vm_area_struct *vma)
 {
 	struct page **pages = vma->vm_private_data;
 	int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+	int numgfns = (vma->vm_end - vma->vm_start) >> XEN_PAGE_SHIFT;
 	int rc;
 
 	if (!xen_feature(XENFEAT_auto_translated_physmap) || !numpgs || !pages)
 		return;
 
-	rc = xen_unmap_domain_gfn_range(vma, numpgs, pages);
+	rc = xen_unmap_domain_gfn_range(vma, numgfns, pages);
 	if (rc == 0)
 		free_xenballooned_pages(numpgs, pages);
 	else
diff --git a/drivers/xen/xlate_mmu.c b/drivers/xen/xlate_mmu.c
index cff2387..5063c5e 100644
--- a/drivers/xen/xlate_mmu.c
+++ b/drivers/xen/xlate_mmu.c
@@ -38,31 +38,28 @@
 #include <xen/interface/xen.h>
 #include <xen/interface/memory.h>
 
-/* map fgfn of domid to lpfn in the current domain */
-static int map_foreign_page(unsigned long lpfn, unsigned long fgfn,
-			    unsigned int domid)
-{
-	int rc;
-	struct xen_add_to_physmap_range xatp = {
-		.domid = DOMID_SELF,
-		.foreign_domid = domid,
-		.size = 1,
-		.space = XENMAPSPACE_gmfn_foreign,
-	};
-	xen_ulong_t idx = fgfn;
-	xen_pfn_t gpfn = lpfn;
-	int err = 0;
+typedef void (*xen_gfn_fn_t)(unsigned long gfn, void *data);
 
-	set_xen_guest_handle(xatp.idxs, &idx);
-	set_xen_guest_handle(xatp.gpfns, &gpfn);
-	set_xen_guest_handle(xatp.errs, &err);
+/* Break down the pages in 4KB chunk and call fn for each gfn */
+static void xen_for_each_gfn(struct page **pages, unsigned nr_gfn,
+			     xen_gfn_fn_t fn, void *data)
+{
+	unsigned long xen_pfn = 0;
+	struct page *page;
+	int i;
 
-	rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
-	return rc < 0 ? rc : err;
+	for (i = 0; i < nr_gfn; i++) {
+		if ((i % XEN_PFN_PER_PAGE) == 0) {
+			page = pages[i / XEN_PFN_PER_PAGE];
+			xen_pfn = page_to_xen_pfn(page);
+		}
+		fn(pfn_to_gfn(xen_pfn++), data);
+	}
 }
 
 struct remap_data {
 	xen_pfn_t *fgfn; /* foreign domain's gfn */
+	int nr_fgfn; /* Number of foreign gfn left to map */
 	pgprot_t prot;
 	domid_t  domid;
 	struct vm_area_struct *vma;
@@ -71,24 +68,71 @@ struct remap_data {
 	struct xen_remap_gfn_info *info;
 	int *err_ptr;
 	int mapped;
+
+	/* Hypercall parameters */
+	int h_errs[XEN_PFN_PER_PAGE];
+	xen_ulong_t h_idxs[XEN_PFN_PER_PAGE];
+	xen_pfn_t h_gpfns[XEN_PFN_PER_PAGE];
+
+	int h_iter;	/* Iterator */
 };
 
+static void setup_hparams(unsigned long gfn, void *data)
+{
+	struct remap_data *info = data;
+
+	info->h_idxs[info->h_iter] = *info->fgfn;
+	info->h_gpfns[info->h_iter] = gfn;
+	info->h_errs[info->h_iter] = 0;
+
+	info->h_iter++;
+	info->fgfn++;
+}
+
 static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
 			void *data)
 {
 	struct remap_data *info = data;
 	struct page *page = info->pages[info->index++];
-	unsigned long pfn = page_to_pfn(page);
-	pte_t pte = pte_mkspecial(pfn_pte(pfn, info->prot));
-	int rc;
+	pte_t pte = pte_mkspecial(pfn_pte(page_to_pfn(page), info->prot));
+	int rc, nr_gfn;
+	uint32_t i;
+	struct xen_add_to_physmap_range xatp = {
+		.domid = DOMID_SELF,
+		.foreign_domid = info->domid,
+		.space = XENMAPSPACE_gmfn_foreign,
+	};
 
-	rc = map_foreign_page(pfn, *info->fgfn, info->domid);
-	*info->err_ptr++ = rc;
-	if (!rc) {
-		set_pte_at(info->vma->vm_mm, addr, ptep, pte);
-		info->mapped++;
+	nr_gfn = min_t(typeof(info->nr_fgfn), XEN_PFN_PER_PAGE, info->nr_fgfn);
+	info->nr_fgfn -= nr_gfn;
+
+	info->h_iter = 0;
+	xen_for_each_gfn(&page, nr_gfn, setup_hparams, info);
+	BUG_ON(info->h_iter != nr_gfn);
+
+	set_xen_guest_handle(xatp.idxs, info->h_idxs);
+	set_xen_guest_handle(xatp.gpfns, info->h_gpfns);
+	set_xen_guest_handle(xatp.errs, info->h_errs);
+	xatp.size = nr_gfn;
+
+	rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
+
+	/* info->err_ptr expect to have one error status per Xen PFN */
+	for (i = 0; i < nr_gfn; i++) {
+		int err = (rc < 0) ? rc : info->h_errs[i];
+
+		*(info->err_ptr++) = err;
+		if (!err)
+			info->mapped++;
 	}
-	info->fgfn++;
+
+	/*
+	 * Note: The hypercall will return 0 in most of the case if even if
+	 * all the fgmfn are not mapped. We still have to update the pte
+	 * as the userspace may decide to continue.
+	 */
+	if (!rc)
+		set_pte_at(info->vma->vm_mm, addr, ptep, pte);
 
 	return 0;
 }
@@ -102,13 +146,14 @@ int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
 {
 	int err;
 	struct remap_data data;
-	unsigned long range = nr << PAGE_SHIFT;
+	unsigned long range = DIV_ROUND_UP(nr, XEN_PFN_PER_PAGE) << PAGE_SHIFT;
 
 	/* Kept here for the purpose of making sure code doesn't break
 	   x86 PVOPS */
 	BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
 	data.fgfn = gfn;
+	data.nr_fgfn = nr;
 	data.prot  = prot;
 	data.domid = domid;
 	data.vma   = vma;
@@ -123,21 +168,20 @@ int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL_GPL(xen_xlate_remap_gfn_array);
 
-int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
-			      int nr, struct page **pages)
+static void unmap_gfn(unsigned long gfn, void *data)
 {
-	int i;
+	struct xen_remove_from_physmap xrp;
 
-	for (i = 0; i < nr; i++) {
-		struct xen_remove_from_physmap xrp;
-		unsigned long pfn;
+	xrp.domid = DOMID_SELF;
+	xrp.gpfn = gfn;
+	(void)HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrp);
+}
 
-		pfn = page_to_pfn(pages[i]);
+int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
+			      int nr, struct page **pages)
+{
+	xen_for_each_gfn(pages, nr, unmap_gfn, NULL);
 
-		xrp.domid = DOMID_SELF;
-		xrp.gpfn = pfn;
-		(void)HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrp);
-	}
 	return 0;
 }
 EXPORT_SYMBOL_GPL(xen_xlate_unmap_gfn_range);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 19/20] xen/privcmd: Add support for Linux 64KB page granularity
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

The hypercall interface (as well as the toolstack) is always using 4KB
page granularity. When the toolstack is asking for mapping a series of
guest PFN in a batch, it expects to have the page map contiguously in
its virtual memory.

When Linux is using 64KB page granularity, the privcmd driver will have
to map multiple Xen PFN in a single Linux page.

Note that this solution works on page granularity which is a multiple of
4KB.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>

    I kept the hypercall arguments in remap_data to avoid allocating them on
    the stack every time that remap_pte_fn is called.
    I will keep like that unless someone is strongly disagree.

    Changes in v4:
        - s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
        - Add David's reviewed-by

    Changes in v3:
        - The function to split a Linux page in mutiple Xen page has
        been moved internally. It was the only use (not used anymore in
        the balloon) and it's not quite clear what should be the common
        interface. Differ the question until someone need to use it.
        - s/nr_pfn/numgfns/ to make clear that we are dealing with GFN
        - Use DIV_ROUND_UP rather round_up and fix the usage in
        xen_xlate_unmap_gfn_range

    Changes in v2:
        - Use xen_apply_to_page
---
 drivers/xen/privcmd.c   |   8 ++--
 drivers/xen/xlate_mmu.c | 124 ++++++++++++++++++++++++++++++++----------------
 2 files changed, 89 insertions(+), 43 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index c6deb87..c8798ee 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -446,7 +446,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
 		return -EINVAL;
 	}
 
-	nr_pages = m.num;
+	nr_pages = DIV_ROUND_UP(m.num, XEN_PFN_PER_PAGE);
 	if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT)))
 		return -EINVAL;
 
@@ -494,7 +494,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
 			goto out_unlock;
 		}
 		if (xen_feature(XENFEAT_auto_translated_physmap)) {
-			ret = alloc_empty_pages(vma, m.num);
+			ret = alloc_empty_pages(vma, nr_pages);
 			if (ret < 0)
 				goto out_unlock;
 		} else
@@ -518,6 +518,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
 	state.global_error  = 0;
 	state.version       = version;
 
+	BUILD_BUG_ON(((PAGE_SIZE / sizeof(xen_pfn_t)) % XEN_PFN_PER_PAGE) != 0);
 	/* mmap_batch_fn guarantees ret == 0 */
 	BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
 				    &pagelist, mmap_batch_fn, &state));
@@ -582,12 +583,13 @@ static void privcmd_close(struct vm_area_struct *vma)
 {
 	struct page **pages = vma->vm_private_data;
 	int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+	int numgfns = (vma->vm_end - vma->vm_start) >> XEN_PAGE_SHIFT;
 	int rc;
 
 	if (!xen_feature(XENFEAT_auto_translated_physmap) || !numpgs || !pages)
 		return;
 
-	rc = xen_unmap_domain_gfn_range(vma, numpgs, pages);
+	rc = xen_unmap_domain_gfn_range(vma, numgfns, pages);
 	if (rc == 0)
 		free_xenballooned_pages(numpgs, pages);
 	else
diff --git a/drivers/xen/xlate_mmu.c b/drivers/xen/xlate_mmu.c
index cff2387..5063c5e 100644
--- a/drivers/xen/xlate_mmu.c
+++ b/drivers/xen/xlate_mmu.c
@@ -38,31 +38,28 @@
 #include <xen/interface/xen.h>
 #include <xen/interface/memory.h>
 
-/* map fgfn of domid to lpfn in the current domain */
-static int map_foreign_page(unsigned long lpfn, unsigned long fgfn,
-			    unsigned int domid)
-{
-	int rc;
-	struct xen_add_to_physmap_range xatp = {
-		.domid = DOMID_SELF,
-		.foreign_domid = domid,
-		.size = 1,
-		.space = XENMAPSPACE_gmfn_foreign,
-	};
-	xen_ulong_t idx = fgfn;
-	xen_pfn_t gpfn = lpfn;
-	int err = 0;
+typedef void (*xen_gfn_fn_t)(unsigned long gfn, void *data);
 
-	set_xen_guest_handle(xatp.idxs, &idx);
-	set_xen_guest_handle(xatp.gpfns, &gpfn);
-	set_xen_guest_handle(xatp.errs, &err);
+/* Break down the pages in 4KB chunk and call fn for each gfn */
+static void xen_for_each_gfn(struct page **pages, unsigned nr_gfn,
+			     xen_gfn_fn_t fn, void *data)
+{
+	unsigned long xen_pfn = 0;
+	struct page *page;
+	int i;
 
-	rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
-	return rc < 0 ? rc : err;
+	for (i = 0; i < nr_gfn; i++) {
+		if ((i % XEN_PFN_PER_PAGE) == 0) {
+			page = pages[i / XEN_PFN_PER_PAGE];
+			xen_pfn = page_to_xen_pfn(page);
+		}
+		fn(pfn_to_gfn(xen_pfn++), data);
+	}
 }
 
 struct remap_data {
 	xen_pfn_t *fgfn; /* foreign domain's gfn */
+	int nr_fgfn; /* Number of foreign gfn left to map */
 	pgprot_t prot;
 	domid_t  domid;
 	struct vm_area_struct *vma;
@@ -71,24 +68,71 @@ struct remap_data {
 	struct xen_remap_gfn_info *info;
 	int *err_ptr;
 	int mapped;
+
+	/* Hypercall parameters */
+	int h_errs[XEN_PFN_PER_PAGE];
+	xen_ulong_t h_idxs[XEN_PFN_PER_PAGE];
+	xen_pfn_t h_gpfns[XEN_PFN_PER_PAGE];
+
+	int h_iter;	/* Iterator */
 };
 
+static void setup_hparams(unsigned long gfn, void *data)
+{
+	struct remap_data *info = data;
+
+	info->h_idxs[info->h_iter] = *info->fgfn;
+	info->h_gpfns[info->h_iter] = gfn;
+	info->h_errs[info->h_iter] = 0;
+
+	info->h_iter++;
+	info->fgfn++;
+}
+
 static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
 			void *data)
 {
 	struct remap_data *info = data;
 	struct page *page = info->pages[info->index++];
-	unsigned long pfn = page_to_pfn(page);
-	pte_t pte = pte_mkspecial(pfn_pte(pfn, info->prot));
-	int rc;
+	pte_t pte = pte_mkspecial(pfn_pte(page_to_pfn(page), info->prot));
+	int rc, nr_gfn;
+	uint32_t i;
+	struct xen_add_to_physmap_range xatp = {
+		.domid = DOMID_SELF,
+		.foreign_domid = info->domid,
+		.space = XENMAPSPACE_gmfn_foreign,
+	};
 
-	rc = map_foreign_page(pfn, *info->fgfn, info->domid);
-	*info->err_ptr++ = rc;
-	if (!rc) {
-		set_pte_at(info->vma->vm_mm, addr, ptep, pte);
-		info->mapped++;
+	nr_gfn = min_t(typeof(info->nr_fgfn), XEN_PFN_PER_PAGE, info->nr_fgfn);
+	info->nr_fgfn -= nr_gfn;
+
+	info->h_iter = 0;
+	xen_for_each_gfn(&page, nr_gfn, setup_hparams, info);
+	BUG_ON(info->h_iter != nr_gfn);
+
+	set_xen_guest_handle(xatp.idxs, info->h_idxs);
+	set_xen_guest_handle(xatp.gpfns, info->h_gpfns);
+	set_xen_guest_handle(xatp.errs, info->h_errs);
+	xatp.size = nr_gfn;
+
+	rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
+
+	/* info->err_ptr expect to have one error status per Xen PFN */
+	for (i = 0; i < nr_gfn; i++) {
+		int err = (rc < 0) ? rc : info->h_errs[i];
+
+		*(info->err_ptr++) = err;
+		if (!err)
+			info->mapped++;
 	}
-	info->fgfn++;
+
+	/*
+	 * Note: The hypercall will return 0 in most of the case if even if
+	 * all the fgmfn are not mapped. We still have to update the pte
+	 * as the userspace may decide to continue.
+	 */
+	if (!rc)
+		set_pte_at(info->vma->vm_mm, addr, ptep, pte);
 
 	return 0;
 }
@@ -102,13 +146,14 @@ int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
 {
 	int err;
 	struct remap_data data;
-	unsigned long range = nr << PAGE_SHIFT;
+	unsigned long range = DIV_ROUND_UP(nr, XEN_PFN_PER_PAGE) << PAGE_SHIFT;
 
 	/* Kept here for the purpose of making sure code doesn't break
 	   x86 PVOPS */
 	BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
 	data.fgfn = gfn;
+	data.nr_fgfn = nr;
 	data.prot  = prot;
 	data.domid = domid;
 	data.vma   = vma;
@@ -123,21 +168,20 @@ int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL_GPL(xen_xlate_remap_gfn_array);
 
-int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
-			      int nr, struct page **pages)
+static void unmap_gfn(unsigned long gfn, void *data)
 {
-	int i;
+	struct xen_remove_from_physmap xrp;
 
-	for (i = 0; i < nr; i++) {
-		struct xen_remove_from_physmap xrp;
-		unsigned long pfn;
+	xrp.domid = DOMID_SELF;
+	xrp.gpfn = gfn;
+	(void)HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrp);
+}
 
-		pfn = page_to_pfn(pages[i]);
+int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
+			      int nr, struct page **pages)
+{
+	xen_for_each_gfn(pages, nr, unmap_gfn, NULL);
 
-		xrp.domid = DOMID_SELF;
-		xrp.gpfn = pfn;
-		(void)HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrp);
-	}
 	return 0;
 }
 EXPORT_SYMBOL_GPL(xen_xlate_unmap_gfn_range);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 20/20] arm/xen: Add support for 64KB page granularity
  2015-09-07 15:33 ` Julien Grall
@ 2015-09-07 15:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, Russell King

The hypercall interface is always using 4KB page granularity. This is
requiring to use xen page definition macro when we deal with hypercall.

Note that pfn_to_gfn is working with a Xen pfn (i.e 4KB). We may want to
rename pfn_gfn to make this explicit.

We also allocate a 64KB page for the shared page even though only the
first 4KB is used. I don't think this is really important for now as it
helps to have the pointer 4KB aligned (XENMEM_add_to_physmap is taking a
Xen PFN).

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Russell King <linux@arm.linux.org.uk>

Stefano, I've dropped your reviewed-by given I've updated the doc and do
changes to avoid usage of XEN_PAGE_SHIFT

    Changes in v4:
        - Add Stefano's Reviewed-by

    Changes in v3:
        - s/MFN/GFN/ base on the new naming
        - Use virt_to_gfn to avoid use XEN_PAGE_SHIFT
        - Drop Stefano's reviewed-by
        - Add some docs in arch/arm/asm/xen/page.h

    Changes in v2
        - Add Stefano's reviewed-by
---
 arch/arm/include/asm/xen/page.h | 15 +++++++++++++--
 arch/arm/xen/enlighten.c        |  6 +++---
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 98c9fc3..e3d94cf 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -28,6 +28,17 @@ typedef struct xpaddr {
 
 #define INVALID_P2M_ENTRY      (~0UL)
 
+/*
+ * The pseudo-physical frame (pfn) used in all the helpers is always based
+ * on Xen page granularity (i.e 4KB).
+ *
+ * A Linux page may be split across multiple non-contiguous Xen page so we
+ * have to keep track with frame based on 4KB page granularity.
+ *
+ * PV drivers should never make a direct usage of those helpers (particularly
+ * pfn_to_gfn and gfn_to_pfn).
+ */
+
 unsigned long __pfn_to_mfn(unsigned long pfn);
 extern struct rb_root phys_to_mach;
 
@@ -64,8 +75,8 @@ static inline unsigned long bfn_to_pfn(unsigned long bfn)
 #define bfn_to_local_pfn(bfn)	bfn_to_pfn(bfn)
 
 /* VIRT <-> GUEST conversion */
-#define virt_to_gfn(v)		(pfn_to_gfn(virt_to_pfn(v)))
-#define gfn_to_virt(m)		(__va(gfn_to_pfn(m) << PAGE_SHIFT))
+#define virt_to_gfn(v)		(pfn_to_gfn(virt_to_phys(v) >> XEN_PAGE_SHIFT))
+#define gfn_to_virt(m)		(__va(gfn_to_pfn(m) << XEN_PAGE_SHIFT))
 
 /* Only used in PV code. But ARM guests are always HVM. */
 static inline xmaddr_t arbitrary_virt_to_machine(void *vaddr)
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index eeeab07..50b4769 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,8 +89,8 @@ static void xen_percpu_init(void)
 	pr_info("Xen: initializing cpu%d\n", cpu);
 	vcpup = per_cpu_ptr(xen_vcpu_info, cpu);
 
-	info.mfn = __pa(vcpup) >> PAGE_SHIFT;
-	info.offset = offset_in_page(vcpup);
+	info.mfn = virt_to_gfn(vcpup);
+	info.offset = xen_offset_in_page(vcpup);
 
 	err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, cpu, &info);
 	BUG_ON(err);
@@ -213,7 +213,7 @@ static int __init xen_guest_init(void)
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
-	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+	xatp.gpfn = virt_to_gfn(shared_info_page);
 	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
 		BUG();
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 20/20] arm/xen: Add support for 64KB page granularity
  2015-09-07 15:33 ` Julien Grall
                   ` (31 preceding siblings ...)
  (?)
@ 2015-09-07 15:33 ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Russell King, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, linux-arm-kernel

The hypercall interface is always using 4KB page granularity. This is
requiring to use xen page definition macro when we deal with hypercall.

Note that pfn_to_gfn is working with a Xen pfn (i.e 4KB). We may want to
rename pfn_gfn to make this explicit.

We also allocate a 64KB page for the shared page even though only the
first 4KB is used. I don't think this is really important for now as it
helps to have the pointer 4KB aligned (XENMEM_add_to_physmap is taking a
Xen PFN).

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Russell King <linux@arm.linux.org.uk>

Stefano, I've dropped your reviewed-by given I've updated the doc and do
changes to avoid usage of XEN_PAGE_SHIFT

    Changes in v4:
        - Add Stefano's Reviewed-by

    Changes in v3:
        - s/MFN/GFN/ base on the new naming
        - Use virt_to_gfn to avoid use XEN_PAGE_SHIFT
        - Drop Stefano's reviewed-by
        - Add some docs in arch/arm/asm/xen/page.h

    Changes in v2
        - Add Stefano's reviewed-by
---
 arch/arm/include/asm/xen/page.h | 15 +++++++++++++--
 arch/arm/xen/enlighten.c        |  6 +++---
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 98c9fc3..e3d94cf 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -28,6 +28,17 @@ typedef struct xpaddr {
 
 #define INVALID_P2M_ENTRY      (~0UL)
 
+/*
+ * The pseudo-physical frame (pfn) used in all the helpers is always based
+ * on Xen page granularity (i.e 4KB).
+ *
+ * A Linux page may be split across multiple non-contiguous Xen page so we
+ * have to keep track with frame based on 4KB page granularity.
+ *
+ * PV drivers should never make a direct usage of those helpers (particularly
+ * pfn_to_gfn and gfn_to_pfn).
+ */
+
 unsigned long __pfn_to_mfn(unsigned long pfn);
 extern struct rb_root phys_to_mach;
 
@@ -64,8 +75,8 @@ static inline unsigned long bfn_to_pfn(unsigned long bfn)
 #define bfn_to_local_pfn(bfn)	bfn_to_pfn(bfn)
 
 /* VIRT <-> GUEST conversion */
-#define virt_to_gfn(v)		(pfn_to_gfn(virt_to_pfn(v)))
-#define gfn_to_virt(m)		(__va(gfn_to_pfn(m) << PAGE_SHIFT))
+#define virt_to_gfn(v)		(pfn_to_gfn(virt_to_phys(v) >> XEN_PAGE_SHIFT))
+#define gfn_to_virt(m)		(__va(gfn_to_pfn(m) << XEN_PAGE_SHIFT))
 
 /* Only used in PV code. But ARM guests are always HVM. */
 static inline xmaddr_t arbitrary_virt_to_machine(void *vaddr)
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index eeeab07..50b4769 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,8 +89,8 @@ static void xen_percpu_init(void)
 	pr_info("Xen: initializing cpu%d\n", cpu);
 	vcpup = per_cpu_ptr(xen_vcpu_info, cpu);
 
-	info.mfn = __pa(vcpup) >> PAGE_SHIFT;
-	info.offset = offset_in_page(vcpup);
+	info.mfn = virt_to_gfn(vcpup);
+	info.offset = xen_offset_in_page(vcpup);
 
 	err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, cpu, &info);
 	BUG_ON(err);
@@ -213,7 +213,7 @@ static int __init xen_guest_init(void)
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
-	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+	xatp.gpfn = virt_to_gfn(shared_info_page);
 	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
 		BUG();
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* [PATCH v4 20/20] arm/xen: Add support for 64KB page granularity
@ 2015-09-07 15:33   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-07 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

The hypercall interface is always using 4KB page granularity. This is
requiring to use xen page definition macro when we deal with hypercall.

Note that pfn_to_gfn is working with a Xen pfn (i.e 4KB). We may want to
rename pfn_gfn to make this explicit.

We also allocate a 64KB page for the shared page even though only the
first 4KB is used. I don't think this is really important for now as it
helps to have the pointer 4KB aligned (XENMEM_add_to_physmap is taking a
Xen PFN).

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
Cc: Russell King <linux@arm.linux.org.uk>

Stefano, I've dropped your reviewed-by given I've updated the doc and do
changes to avoid usage of XEN_PAGE_SHIFT

    Changes in v4:
        - Add Stefano's Reviewed-by

    Changes in v3:
        - s/MFN/GFN/ base on the new naming
        - Use virt_to_gfn to avoid use XEN_PAGE_SHIFT
        - Drop Stefano's reviewed-by
        - Add some docs in arch/arm/asm/xen/page.h

    Changes in v2
        - Add Stefano's reviewed-by
---
 arch/arm/include/asm/xen/page.h | 15 +++++++++++++--
 arch/arm/xen/enlighten.c        |  6 +++---
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 98c9fc3..e3d94cf 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -28,6 +28,17 @@ typedef struct xpaddr {
 
 #define INVALID_P2M_ENTRY      (~0UL)
 
+/*
+ * The pseudo-physical frame (pfn) used in all the helpers is always based
+ * on Xen page granularity (i.e 4KB).
+ *
+ * A Linux page may be split across multiple non-contiguous Xen page so we
+ * have to keep track with frame based on 4KB page granularity.
+ *
+ * PV drivers should never make a direct usage of those helpers (particularly
+ * pfn_to_gfn and gfn_to_pfn).
+ */
+
 unsigned long __pfn_to_mfn(unsigned long pfn);
 extern struct rb_root phys_to_mach;
 
@@ -64,8 +75,8 @@ static inline unsigned long bfn_to_pfn(unsigned long bfn)
 #define bfn_to_local_pfn(bfn)	bfn_to_pfn(bfn)
 
 /* VIRT <-> GUEST conversion */
-#define virt_to_gfn(v)		(pfn_to_gfn(virt_to_pfn(v)))
-#define gfn_to_virt(m)		(__va(gfn_to_pfn(m) << PAGE_SHIFT))
+#define virt_to_gfn(v)		(pfn_to_gfn(virt_to_phys(v) >> XEN_PAGE_SHIFT))
+#define gfn_to_virt(m)		(__va(gfn_to_pfn(m) << XEN_PAGE_SHIFT))
 
 /* Only used in PV code. But ARM guests are always HVM. */
 static inline xmaddr_t arbitrary_virt_to_machine(void *vaddr)
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index eeeab07..50b4769 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,8 +89,8 @@ static void xen_percpu_init(void)
 	pr_info("Xen: initializing cpu%d\n", cpu);
 	vcpup = per_cpu_ptr(xen_vcpu_info, cpu);
 
-	info.mfn = __pa(vcpup) >> PAGE_SHIFT;
-	info.offset = offset_in_page(vcpup);
+	info.mfn = virt_to_gfn(vcpup);
+	info.offset = xen_offset_in_page(vcpup);
 
 	err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, cpu, &info);
 	BUG_ON(err);
@@ -213,7 +213,7 @@ static int __init xen_guest_init(void)
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
-	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+	xatp.gpfn = virt_to_gfn(shared_info_page);
 	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
 		BUG();
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux
  2015-09-07 15:33   ` Julien Grall
@ 2015-09-07 16:39     ` Stefano Stabellini
  -1 siblings, 0 replies; 144+ messages in thread
From: Stefano Stabellini @ 2015-09-07 16:39 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, linux-arm-kernel, ian.campbell, stefano.stabellini,
	linux-kernel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	David Vrabel, Wei Liu

On Mon, 7 Sep 2015, Julien Grall wrote:
> For ARM64 guests, Linux is able to support either 64K or 4K page
> granularity. Although, the hypercall interface is always based on 4K
> page granularity.
> 
> With 64K page granularity, a single page will be spread over multiple
> Xen frame.
> 
> To avoid splitting the page into 4K frame, take advantage of the
> extent_order field to directly allocate/free chunk of the Linux page
> size.
> 
> Note that PVMMU is only used for PV guest (which is x86) and the page
> granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
> that because the code has not been modified.
> 
> Signed-off-by: Julien Grall <julien.grall@citrix.com>

Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


> ---
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> 
> Note that two BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE) in code built
> for the PV MMU code is kept in order to have at least one even if we
> ever decide to drop of code section.
> 
>     Changes in v4:
>         - s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
>         - Use the field lru in the page to get a list of pages when
>         decreasing the memory reservation. It avoids to use a static
>         array to store the pages (see v3).
>         - Update comment for EXTENT_ORDER.
> 
>     Changes in v3:
>         - Fix errors reported by checkpatch.pl
>         - s/mfn/gfn/ based on the new naming
>         - Rather than splitting the page into 4KB chunk, use the
>         extent_order field to allocate directly a Linux page size. This
>         is avoid lots of code for no benefits.
> 
>     Changes in v2:
>         - Use xen_apply_to_page to split a page in 4K chunk
>         - It's not necessary to have a smaller frame list. Re-use
>         PAGE_SIZE
>         - Convert reserve_additional_memory to use XEN_... macro
> ---
>  drivers/xen/balloon.c | 59 ++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 44 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index c79329f..3babf13 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -70,6 +70,11 @@
>  #include <xen/features.h>
>  #include <xen/page.h>
>  
> +/* Use one extent per PAGE_SIZE to avoid to break down the page into
> + * multiple frame.
> + */
> +#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
> +
>  /*
>   * balloon_process() state:
>   *
> @@ -230,6 +235,11 @@ static enum bp_state reserve_additional_memory(long credit)
>  	nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
>  
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> +	/* We don't support PV MMU when Linux and Xen is using
> +	 * different page granularity.
> +	 */
> +	BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>          /*
>           * add_memory() will build page tables for the new memory so
>           * the p2m must contain invalid entries so the correct
> @@ -326,11 +336,11 @@ static enum bp_state reserve_additional_memory(long credit)
>  static enum bp_state increase_reservation(unsigned long nr_pages)
>  {
>  	int rc;
> -	unsigned long  pfn, i;
> +	unsigned long i;
>  	struct page   *page;
>  	struct xen_memory_reservation reservation = {
>  		.address_bits = 0,
> -		.extent_order = 0,
> +		.extent_order = EXTENT_ORDER,
>  		.domid        = DOMID_SELF
>  	};
>  
> @@ -352,7 +362,11 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>  			nr_pages = i;
>  			break;
>  		}
> -		frame_list[i] = page_to_pfn(page);
> +
> +		/* XENMEM_populate_physmap requires a PFN based on Xen
> +		 * granularity.
> +		 */
> +		frame_list[i] = page_to_xen_pfn(page);
>  		page = balloon_next_page(page);
>  	}
>  
> @@ -366,10 +380,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>  		page = balloon_retrieve(false);
>  		BUG_ON(page == NULL);
>  
> -		pfn = page_to_pfn(page);
> -
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> +		/* We don't support PV MMU when Linux and Xen is using
> +		 * different page granularity.
> +		 */
> +		BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>  		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> +			unsigned long pfn = page_to_pfn(page);
> +
>  			set_phys_to_machine(pfn, frame_list[i]);
>  
>  			/* Link back into the page tables if not highmem. */
> @@ -396,14 +415,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>  static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  {
>  	enum bp_state state = BP_DONE;
> -	unsigned long  pfn, i;
> -	struct page   *page;
> +	unsigned long i;
> +	struct page *page, *tmp;
>  	int ret;
>  	struct xen_memory_reservation reservation = {
>  		.address_bits = 0,
> -		.extent_order = 0,
> +		.extent_order = EXTENT_ORDER,
>  		.domid        = DOMID_SELF
>  	};
> +	LIST_HEAD(pages);
>  
>  #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
>  	if (balloon_stats.hotplug_pages) {
> @@ -425,8 +445,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  			break;
>  		}
>  		scrub_page(page);
> -
> -		frame_list[i] = page_to_pfn(page);
> +		list_add(&page->lru, &pages);
>  	}
>  
>  	/*
> @@ -438,14 +457,23 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  	 */
>  	kmap_flush_unused();
>  
> -	/* Update direct mapping, invalidate P2M, and add to balloon. */
> -	for (i = 0; i < nr_pages; i++) {
> -		pfn = frame_list[i];
> -		frame_list[i] = pfn_to_gfn(pfn);
> -		page = pfn_to_page(pfn);
> +	/*
> +	 * Setup the frame, update direct mapping, invalidate P2M,
> +	 * and add to balloon.
> +	 */
> +	list_for_each_entry_safe(page, tmp, &pages, lru) {
> +		/* XENMEM_decrease_reservation requires a GFN */
> +		frame_list[i] = xen_page_to_gfn(page);
>  
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> +		/* We don't support PV MMU when Linux and Xen is using
> +		 * different page granularity.
> +		 */
> +		BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>  		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> +			unsigned long pfn = page_to_pfn(page);
> +
>  			if (!PageHighMem(page)) {
>  				ret = HYPERVISOR_update_va_mapping(
>  						(unsigned long)__va(pfn << PAGE_SHIFT),
> @@ -455,6 +483,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  			__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
>  		}
>  #endif
> +		list_del(&page->lru);
>  
>  		balloon_append(page);
>  	}
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux
  2015-09-07 15:33   ` Julien Grall
  (?)
  (?)
@ 2015-09-07 16:39   ` Stefano Stabellini
  -1 siblings, 0 replies; 144+ messages in thread
From: Stefano Stabellini @ 2015-09-07 16:39 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, ian.campbell, stefano.stabellini, linux-kernel,
	David Vrabel, xen-devel, Boris Ostrovsky, linux-arm-kernel

On Mon, 7 Sep 2015, Julien Grall wrote:
> For ARM64 guests, Linux is able to support either 64K or 4K page
> granularity. Although, the hypercall interface is always based on 4K
> page granularity.
> 
> With 64K page granularity, a single page will be spread over multiple
> Xen frame.
> 
> To avoid splitting the page into 4K frame, take advantage of the
> extent_order field to directly allocate/free chunk of the Linux page
> size.
> 
> Note that PVMMU is only used for PV guest (which is x86) and the page
> granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
> that because the code has not been modified.
> 
> Signed-off-by: Julien Grall <julien.grall@citrix.com>

Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


> ---
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> 
> Note that two BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE) in code built
> for the PV MMU code is kept in order to have at least one even if we
> ever decide to drop of code section.
> 
>     Changes in v4:
>         - s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
>         - Use the field lru in the page to get a list of pages when
>         decreasing the memory reservation. It avoids to use a static
>         array to store the pages (see v3).
>         - Update comment for EXTENT_ORDER.
> 
>     Changes in v3:
>         - Fix errors reported by checkpatch.pl
>         - s/mfn/gfn/ based on the new naming
>         - Rather than splitting the page into 4KB chunk, use the
>         extent_order field to allocate directly a Linux page size. This
>         is avoid lots of code for no benefits.
> 
>     Changes in v2:
>         - Use xen_apply_to_page to split a page in 4K chunk
>         - It's not necessary to have a smaller frame list. Re-use
>         PAGE_SIZE
>         - Convert reserve_additional_memory to use XEN_... macro
> ---
>  drivers/xen/balloon.c | 59 ++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 44 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index c79329f..3babf13 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -70,6 +70,11 @@
>  #include <xen/features.h>
>  #include <xen/page.h>
>  
> +/* Use one extent per PAGE_SIZE to avoid to break down the page into
> + * multiple frame.
> + */
> +#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
> +
>  /*
>   * balloon_process() state:
>   *
> @@ -230,6 +235,11 @@ static enum bp_state reserve_additional_memory(long credit)
>  	nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
>  
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> +	/* We don't support PV MMU when Linux and Xen is using
> +	 * different page granularity.
> +	 */
> +	BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>          /*
>           * add_memory() will build page tables for the new memory so
>           * the p2m must contain invalid entries so the correct
> @@ -326,11 +336,11 @@ static enum bp_state reserve_additional_memory(long credit)
>  static enum bp_state increase_reservation(unsigned long nr_pages)
>  {
>  	int rc;
> -	unsigned long  pfn, i;
> +	unsigned long i;
>  	struct page   *page;
>  	struct xen_memory_reservation reservation = {
>  		.address_bits = 0,
> -		.extent_order = 0,
> +		.extent_order = EXTENT_ORDER,
>  		.domid        = DOMID_SELF
>  	};
>  
> @@ -352,7 +362,11 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>  			nr_pages = i;
>  			break;
>  		}
> -		frame_list[i] = page_to_pfn(page);
> +
> +		/* XENMEM_populate_physmap requires a PFN based on Xen
> +		 * granularity.
> +		 */
> +		frame_list[i] = page_to_xen_pfn(page);
>  		page = balloon_next_page(page);
>  	}
>  
> @@ -366,10 +380,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>  		page = balloon_retrieve(false);
>  		BUG_ON(page == NULL);
>  
> -		pfn = page_to_pfn(page);
> -
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> +		/* We don't support PV MMU when Linux and Xen is using
> +		 * different page granularity.
> +		 */
> +		BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>  		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> +			unsigned long pfn = page_to_pfn(page);
> +
>  			set_phys_to_machine(pfn, frame_list[i]);
>  
>  			/* Link back into the page tables if not highmem. */
> @@ -396,14 +415,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>  static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  {
>  	enum bp_state state = BP_DONE;
> -	unsigned long  pfn, i;
> -	struct page   *page;
> +	unsigned long i;
> +	struct page *page, *tmp;
>  	int ret;
>  	struct xen_memory_reservation reservation = {
>  		.address_bits = 0,
> -		.extent_order = 0,
> +		.extent_order = EXTENT_ORDER,
>  		.domid        = DOMID_SELF
>  	};
> +	LIST_HEAD(pages);
>  
>  #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
>  	if (balloon_stats.hotplug_pages) {
> @@ -425,8 +445,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  			break;
>  		}
>  		scrub_page(page);
> -
> -		frame_list[i] = page_to_pfn(page);
> +		list_add(&page->lru, &pages);
>  	}
>  
>  	/*
> @@ -438,14 +457,23 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  	 */
>  	kmap_flush_unused();
>  
> -	/* Update direct mapping, invalidate P2M, and add to balloon. */
> -	for (i = 0; i < nr_pages; i++) {
> -		pfn = frame_list[i];
> -		frame_list[i] = pfn_to_gfn(pfn);
> -		page = pfn_to_page(pfn);
> +	/*
> +	 * Setup the frame, update direct mapping, invalidate P2M,
> +	 * and add to balloon.
> +	 */
> +	list_for_each_entry_safe(page, tmp, &pages, lru) {
> +		/* XENMEM_decrease_reservation requires a GFN */
> +		frame_list[i] = xen_page_to_gfn(page);
>  
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> +		/* We don't support PV MMU when Linux and Xen is using
> +		 * different page granularity.
> +		 */
> +		BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>  		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> +			unsigned long pfn = page_to_pfn(page);
> +
>  			if (!PageHighMem(page)) {
>  				ret = HYPERVISOR_update_va_mapping(
>  						(unsigned long)__va(pfn << PAGE_SHIFT),
> @@ -455,6 +483,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  			__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
>  		}
>  #endif
> +		list_del(&page->lru);
>  
>  		balloon_append(page);
>  	}
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux
@ 2015-09-07 16:39     ` Stefano Stabellini
  0 siblings, 0 replies; 144+ messages in thread
From: Stefano Stabellini @ 2015-09-07 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 7 Sep 2015, Julien Grall wrote:
> For ARM64 guests, Linux is able to support either 64K or 4K page
> granularity. Although, the hypercall interface is always based on 4K
> page granularity.
> 
> With 64K page granularity, a single page will be spread over multiple
> Xen frame.
> 
> To avoid splitting the page into 4K frame, take advantage of the
> extent_order field to directly allocate/free chunk of the Linux page
> size.
> 
> Note that PVMMU is only used for PV guest (which is x86) and the page
> granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
> that because the code has not been modified.
> 
> Signed-off-by: Julien Grall <julien.grall@citrix.com>

Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


> ---
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> 
> Note that two BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE) in code built
> for the PV MMU code is kept in order to have at least one even if we
> ever decide to drop of code section.
> 
>     Changes in v4:
>         - s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
>         - Use the field lru in the page to get a list of pages when
>         decreasing the memory reservation. It avoids to use a static
>         array to store the pages (see v3).
>         - Update comment for EXTENT_ORDER.
> 
>     Changes in v3:
>         - Fix errors reported by checkpatch.pl
>         - s/mfn/gfn/ based on the new naming
>         - Rather than splitting the page into 4KB chunk, use the
>         extent_order field to allocate directly a Linux page size. This
>         is avoid lots of code for no benefits.
> 
>     Changes in v2:
>         - Use xen_apply_to_page to split a page in 4K chunk
>         - It's not necessary to have a smaller frame list. Re-use
>         PAGE_SIZE
>         - Convert reserve_additional_memory to use XEN_... macro
> ---
>  drivers/xen/balloon.c | 59 ++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 44 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index c79329f..3babf13 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -70,6 +70,11 @@
>  #include <xen/features.h>
>  #include <xen/page.h>
>  
> +/* Use one extent per PAGE_SIZE to avoid to break down the page into
> + * multiple frame.
> + */
> +#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
> +
>  /*
>   * balloon_process() state:
>   *
> @@ -230,6 +235,11 @@ static enum bp_state reserve_additional_memory(long credit)
>  	nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
>  
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> +	/* We don't support PV MMU when Linux and Xen is using
> +	 * different page granularity.
> +	 */
> +	BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>          /*
>           * add_memory() will build page tables for the new memory so
>           * the p2m must contain invalid entries so the correct
> @@ -326,11 +336,11 @@ static enum bp_state reserve_additional_memory(long credit)
>  static enum bp_state increase_reservation(unsigned long nr_pages)
>  {
>  	int rc;
> -	unsigned long  pfn, i;
> +	unsigned long i;
>  	struct page   *page;
>  	struct xen_memory_reservation reservation = {
>  		.address_bits = 0,
> -		.extent_order = 0,
> +		.extent_order = EXTENT_ORDER,
>  		.domid        = DOMID_SELF
>  	};
>  
> @@ -352,7 +362,11 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>  			nr_pages = i;
>  			break;
>  		}
> -		frame_list[i] = page_to_pfn(page);
> +
> +		/* XENMEM_populate_physmap requires a PFN based on Xen
> +		 * granularity.
> +		 */
> +		frame_list[i] = page_to_xen_pfn(page);
>  		page = balloon_next_page(page);
>  	}
>  
> @@ -366,10 +380,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>  		page = balloon_retrieve(false);
>  		BUG_ON(page == NULL);
>  
> -		pfn = page_to_pfn(page);
> -
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> +		/* We don't support PV MMU when Linux and Xen is using
> +		 * different page granularity.
> +		 */
> +		BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>  		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> +			unsigned long pfn = page_to_pfn(page);
> +
>  			set_phys_to_machine(pfn, frame_list[i]);
>  
>  			/* Link back into the page tables if not highmem. */
> @@ -396,14 +415,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>  static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  {
>  	enum bp_state state = BP_DONE;
> -	unsigned long  pfn, i;
> -	struct page   *page;
> +	unsigned long i;
> +	struct page *page, *tmp;
>  	int ret;
>  	struct xen_memory_reservation reservation = {
>  		.address_bits = 0,
> -		.extent_order = 0,
> +		.extent_order = EXTENT_ORDER,
>  		.domid        = DOMID_SELF
>  	};
> +	LIST_HEAD(pages);
>  
>  #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
>  	if (balloon_stats.hotplug_pages) {
> @@ -425,8 +445,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  			break;
>  		}
>  		scrub_page(page);
> -
> -		frame_list[i] = page_to_pfn(page);
> +		list_add(&page->lru, &pages);
>  	}
>  
>  	/*
> @@ -438,14 +457,23 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  	 */
>  	kmap_flush_unused();
>  
> -	/* Update direct mapping, invalidate P2M, and add to balloon. */
> -	for (i = 0; i < nr_pages; i++) {
> -		pfn = frame_list[i];
> -		frame_list[i] = pfn_to_gfn(pfn);
> -		page = pfn_to_page(pfn);
> +	/*
> +	 * Setup the frame, update direct mapping, invalidate P2M,
> +	 * and add to balloon.
> +	 */
> +	list_for_each_entry_safe(page, tmp, &pages, lru) {
> +		/* XENMEM_decrease_reservation requires a GFN */
> +		frame_list[i] = xen_page_to_gfn(page);
>  
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> +		/* We don't support PV MMU when Linux and Xen is using
> +		 * different page granularity.
> +		 */
> +		BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>  		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> +			unsigned long pfn = page_to_pfn(page);
> +
>  			if (!PageHighMem(page)) {
>  				ret = HYPERVISOR_update_va_mapping(
>  						(unsigned long)__va(pfn << PAGE_SHIFT),
> @@ -455,6 +483,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  			__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
>  		}
>  #endif
> +		list_del(&page->lru);
>  
>  		balloon_append(page);
>  	}
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
  2015-09-07 15:33   ` Julien Grall
@ 2015-09-07 16:57     ` Wei Liu
  -1 siblings, 0 replies; 144+ messages in thread
From: Wei Liu @ 2015-09-07 16:57 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, linux-arm-kernel, ian.campbell, stefano.stabellini,
	linux-kernel, Wei Liu, netdev

You might need to rebase you patch. A patch to netback went it recently.

On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote:
> The PV network protocol is using 4KB page granularity. The goal of this
> patch is to allow a Linux using 64KB page granularity working as a
> network backend on a non-modified Xen.
> 
> It's only necessary to adapt the ring size and break skb data in small
> chunk of 4KB. The rest of the code is relying on the grant table code.
> 
> Signed-off-by: Julien Grall <julien.grall@citrix.com>
> 

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
  2015-09-07 15:33   ` Julien Grall
                     ` (2 preceding siblings ...)
  (?)
@ 2015-09-07 16:57   ` Wei Liu
  -1 siblings, 0 replies; 144+ messages in thread
From: Wei Liu @ 2015-09-07 16:57 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, ian.campbell, stefano.stabellini, netdev, linux-kernel,
	xen-devel, linux-arm-kernel

You might need to rebase you patch. A patch to netback went it recently.

On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote:
> The PV network protocol is using 4KB page granularity. The goal of this
> patch is to allow a Linux using 64KB page granularity working as a
> network backend on a non-modified Xen.
> 
> It's only necessary to adapt the ring size and break skb data in small
> chunk of 4KB. The rest of the code is relying on the grant table code.
> 
> Signed-off-by: Julien Grall <julien.grall@citrix.com>
> 

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
@ 2015-09-07 16:57     ` Wei Liu
  0 siblings, 0 replies; 144+ messages in thread
From: Wei Liu @ 2015-09-07 16:57 UTC (permalink / raw)
  To: linux-arm-kernel

You might need to rebase you patch. A patch to netback went it recently.

On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote:
> The PV network protocol is using 4KB page granularity. The goal of this
> patch is to allow a Linux using 64KB page granularity working as a
> network backend on a non-modified Xen.
> 
> It's only necessary to adapt the ring size and break skb data in small
> chunk of 4KB. The rest of the code is relying on the grant table code.
> 
> Signed-off-by: Julien Grall <julien.grall@citrix.com>
> 

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [Xen-devel] [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
  2015-09-07 16:57     ` Wei Liu
@ 2015-09-08 11:07       ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-08 11:07 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, stefano.stabellini, netdev, linux-kernel, xen-devel,
	linux-arm-kernel

Hi Wei,

On 07/09/15 17:57, Wei Liu wrote:
> You might need to rebase you patch. A patch to netback went it recently.

Do you mean 210c34dcd8d912dcc740f1f17625a7293af5cb56 "xen-netback: add
support for multicast control"?

If so I didn't see any specific issue while rebasing on the latest
linus' master.

> On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote:
>> The PV network protocol is using 4KB page granularity. The goal of this
>> patch is to allow a Linux using 64KB page granularity working as a
>> network backend on a non-modified Xen.
>>
>> It's only necessary to adapt the ring size and break skb data in small
>> chunk of 4KB. The rest of the code is relying on the grant table code.
>>
>> Signed-off-by: Julien Grall <julien.grall@citrix.com>
>>
> 
> Reviewed-by: Wei Liu <wei.liu2@citrix.com>

Thank you!

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
  2015-09-07 16:57     ` Wei Liu
  (?)
@ 2015-09-08 11:07     ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-08 11:07 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, stefano.stabellini, netdev, linux-kernel, xen-devel,
	linux-arm-kernel

Hi Wei,

On 07/09/15 17:57, Wei Liu wrote:
> You might need to rebase you patch. A patch to netback went it recently.

Do you mean 210c34dcd8d912dcc740f1f17625a7293af5cb56 "xen-netback: add
support for multicast control"?

If so I didn't see any specific issue while rebasing on the latest
linus' master.

> On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote:
>> The PV network protocol is using 4KB page granularity. The goal of this
>> patch is to allow a Linux using 64KB page granularity working as a
>> network backend on a non-modified Xen.
>>
>> It's only necessary to adapt the ring size and break skb data in small
>> chunk of 4KB. The rest of the code is relying on the grant table code.
>>
>> Signed-off-by: Julien Grall <julien.grall@citrix.com>
>>
> 
> Reviewed-by: Wei Liu <wei.liu2@citrix.com>

Thank you!

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [Xen-devel] [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
@ 2015-09-08 11:07       ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-08 11:07 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Wei,

On 07/09/15 17:57, Wei Liu wrote:
> You might need to rebase you patch. A patch to netback went it recently.

Do you mean 210c34dcd8d912dcc740f1f17625a7293af5cb56 "xen-netback: add
support for multicast control"?

If so I didn't see any specific issue while rebasing on the latest
linus' master.

> On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote:
>> The PV network protocol is using 4KB page granularity. The goal of this
>> patch is to allow a Linux using 64KB page granularity working as a
>> network backend on a non-modified Xen.
>>
>> It's only necessary to adapt the ring size and break skb data in small
>> chunk of 4KB. The rest of the code is relying on the grant table code.
>>
>> Signed-off-by: Julien Grall <julien.grall@citrix.com>
>>
> 
> Reviewed-by: Wei Liu <wei.liu2@citrix.com>

Thank you!

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [Xen-devel] [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
  2015-09-08 11:07       ` Julien Grall
@ 2015-09-08 11:09         ` Wei Liu
  -1 siblings, 0 replies; 144+ messages in thread
From: Wei Liu @ 2015-09-08 11:09 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, ian.campbell, stefano.stabellini, netdev, linux-kernel,
	xen-devel, linux-arm-kernel

On Tue, Sep 08, 2015 at 12:07:31PM +0100, Julien Grall wrote:
> Hi Wei,
> 
> On 07/09/15 17:57, Wei Liu wrote:
> > You might need to rebase you patch. A patch to netback went it recently.
> 
> Do you mean 210c34dcd8d912dcc740f1f17625a7293af5cb56 "xen-netback: add
> support for multicast control"?
> 

Yes, that one.

> If so I didn't see any specific issue while rebasing on the latest
> linus' master.
> 

Good.

> > On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote:
> >> The PV network protocol is using 4KB page granularity. The goal of this
> >> patch is to allow a Linux using 64KB page granularity working as a
> >> network backend on a non-modified Xen.
> >>
> >> It's only necessary to adapt the ring size and break skb data in small
> >> chunk of 4KB. The rest of the code is relying on the grant table code.
> >>
> >> Signed-off-by: Julien Grall <julien.grall@citrix.com>
> >>
> > 
> > Reviewed-by: Wei Liu <wei.liu2@citrix.com>
> 
> Thank you!
> 
> Regards,
> 
> -- 
> Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
  2015-09-08 11:07       ` Julien Grall
  (?)
  (?)
@ 2015-09-08 11:09       ` Wei Liu
  -1 siblings, 0 replies; 144+ messages in thread
From: Wei Liu @ 2015-09-08 11:09 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, ian.campbell, stefano.stabellini, netdev, linux-kernel,
	xen-devel, linux-arm-kernel

On Tue, Sep 08, 2015 at 12:07:31PM +0100, Julien Grall wrote:
> Hi Wei,
> 
> On 07/09/15 17:57, Wei Liu wrote:
> > You might need to rebase you patch. A patch to netback went it recently.
> 
> Do you mean 210c34dcd8d912dcc740f1f17625a7293af5cb56 "xen-netback: add
> support for multicast control"?
> 

Yes, that one.

> If so I didn't see any specific issue while rebasing on the latest
> linus' master.
> 

Good.

> > On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote:
> >> The PV network protocol is using 4KB page granularity. The goal of this
> >> patch is to allow a Linux using 64KB page granularity working as a
> >> network backend on a non-modified Xen.
> >>
> >> It's only necessary to adapt the ring size and break skb data in small
> >> chunk of 4KB. The rest of the code is relying on the grant table code.
> >>
> >> Signed-off-by: Julien Grall <julien.grall@citrix.com>
> >>
> > 
> > Reviewed-by: Wei Liu <wei.liu2@citrix.com>
> 
> Thank you!
> 
> Regards,
> 
> -- 
> Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [Xen-devel] [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
@ 2015-09-08 11:09         ` Wei Liu
  0 siblings, 0 replies; 144+ messages in thread
From: Wei Liu @ 2015-09-08 11:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 08, 2015 at 12:07:31PM +0100, Julien Grall wrote:
> Hi Wei,
> 
> On 07/09/15 17:57, Wei Liu wrote:
> > You might need to rebase you patch. A patch to netback went it recently.
> 
> Do you mean 210c34dcd8d912dcc740f1f17625a7293af5cb56 "xen-netback: add
> support for multicast control"?
> 

Yes, that one.

> If so I didn't see any specific issue while rebasing on the latest
> linus' master.
> 

Good.

> > On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote:
> >> The PV network protocol is using 4KB page granularity. The goal of this
> >> patch is to allow a Linux using 64KB page granularity working as a
> >> network backend on a non-modified Xen.
> >>
> >> It's only necessary to adapt the ring size and break skb data in small
> >> chunk of 4KB. The rest of the code is relying on the grant table code.
> >>
> >> Signed-off-by: Julien Grall <julien.grall@citrix.com>
> >>
> > 
> > Reviewed-by: Wei Liu <wei.liu2@citrix.com>
> 
> Thank you!
> 
> Regards,
> 
> -- 
> Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [Xen-devel] [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-07 15:33 ` Julien Grall
@ 2015-09-11 19:39   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-11 19:39 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel, roger.pau

Hi,

A quick update on the TODO.

On 07/09/15 16:33, Julien Grall wrote:
> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
> hypercall interface and PV protocol are always based on 4KB page granularity.
> 
> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
> guest crash.
> 
> This series is a first attempt to allow those Linux running with the current
> hypercall interface and PV protocol.
> 
> This solution has been chosen because we want to run Linux 64KB in released
> Xen ARM version or/and platform using an old version of Linux DOM0.
> 
> There is room for improvement, such as support of 64KB grant, modification
> of PV protocol to support different page size... They will be explored in a
> separate patch series later.
> 
> TODO list:
>     - Convert swiotlb to 64KB

Sent http://lists.xen.org/archives/html/xen-devel/2015-09/msg01292.html

>     - Convert xenfb to 64KB
>     - Support for multiple page ring support
>     - Support for 64KB in gnttdev
>     - Support of non-indirect grant with 64KB frontend

Sent http://lists.xen.org/archives/html/xen-devel/2015-09/msg01577.html

>     - It may be possible to move some common define between
>     netback/netfront and blkfront/blkback in an header
> 
> I've got most of the patches for the TODO items. I'm planning to send them as
> a follow-up as it's not a requirement for a basic guests.
> 
> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
> to run it on x86 as I don't have a box with Xen x86 running. I would be
> happy if someone give a try and see possible regression for x86.
> 
> I know that Konrad as a test-suite for x86. Konrand, would it be possible to
> give a run to for this series?
> 
> A branch based on the latest xentip/for-linus-4.3 can be found here:
> 
> git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4

I will resend a new version when 4.3-rc1 is out in order to fix any
possible conflict with linux/master. I already know that the netback
patch (#18) is conflicting with 1d5d48523900a4b0f25d6b52f1a93c84bd671186
"xen-netback: require fewer guest Rx slots when not using GSO".

If the 2 series above are completely acked/reviewed, I will fold them in
this series.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-07 15:33 ` Julien Grall
                   ` (32 preceding siblings ...)
  (?)
@ 2015-09-11 19:39 ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-11 19:39 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel, roger.pau

Hi,

A quick update on the TODO.

On 07/09/15 16:33, Julien Grall wrote:
> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
> hypercall interface and PV protocol are always based on 4KB page granularity.
> 
> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
> guest crash.
> 
> This series is a first attempt to allow those Linux running with the current
> hypercall interface and PV protocol.
> 
> This solution has been chosen because we want to run Linux 64KB in released
> Xen ARM version or/and platform using an old version of Linux DOM0.
> 
> There is room for improvement, such as support of 64KB grant, modification
> of PV protocol to support different page size... They will be explored in a
> separate patch series later.
> 
> TODO list:
>     - Convert swiotlb to 64KB

Sent http://lists.xen.org/archives/html/xen-devel/2015-09/msg01292.html

>     - Convert xenfb to 64KB
>     - Support for multiple page ring support
>     - Support for 64KB in gnttdev
>     - Support of non-indirect grant with 64KB frontend

Sent http://lists.xen.org/archives/html/xen-devel/2015-09/msg01577.html

>     - It may be possible to move some common define between
>     netback/netfront and blkfront/blkback in an header
> 
> I've got most of the patches for the TODO items. I'm planning to send them as
> a follow-up as it's not a requirement for a basic guests.
> 
> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
> to run it on x86 as I don't have a box with Xen x86 running. I would be
> happy if someone give a try and see possible regression for x86.
> 
> I know that Konrad as a test-suite for x86. Konrand, would it be possible to
> give a run to for this series?
> 
> A branch based on the latest xentip/for-linus-4.3 can be found here:
> 
> git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4

I will resend a new version when 4.3-rc1 is out in order to fix any
possible conflict with linux/master. I already know that the netback
patch (#18) is conflicting with 1d5d48523900a4b0f25d6b52f1a93c84bd671186
"xen-netback: require fewer guest Rx slots when not using GSO".

If the 2 series above are completely acked/reviewed, I will fold them in
this series.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [Xen-devel] [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-11 19:39   ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-11 19:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

A quick update on the TODO.

On 07/09/15 16:33, Julien Grall wrote:
> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
> hypercall interface and PV protocol are always based on 4KB page granularity.
> 
> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
> guest crash.
> 
> This series is a first attempt to allow those Linux running with the current
> hypercall interface and PV protocol.
> 
> This solution has been chosen because we want to run Linux 64KB in released
> Xen ARM version or/and platform using an old version of Linux DOM0.
> 
> There is room for improvement, such as support of 64KB grant, modification
> of PV protocol to support different page size... They will be explored in a
> separate patch series later.
> 
> TODO list:
>     - Convert swiotlb to 64KB

Sent http://lists.xen.org/archives/html/xen-devel/2015-09/msg01292.html

>     - Convert xenfb to 64KB
>     - Support for multiple page ring support
>     - Support for 64KB in gnttdev
>     - Support of non-indirect grant with 64KB frontend

Sent http://lists.xen.org/archives/html/xen-devel/2015-09/msg01577.html

>     - It may be possible to move some common define between
>     netback/netfront and blkfront/blkback in an header
> 
> I've got most of the patches for the TODO items. I'm planning to send them as
> a follow-up as it's not a requirement for a basic guests.
> 
> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
> to run it on x86 as I don't have a box with Xen x86 running. I would be
> happy if someone give a try and see possible regression for x86.
> 
> I know that Konrad as a test-suite for x86. Konrand, would it be possible to
> give a run to for this series?
> 
> A branch based on the latest xentip/for-linus-4.3 can be found here:
> 
> git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4

I will resend a new version when 4.3-rc1 is out in order to fix any
possible conflict with linux/master. I already know that the netback
patch (#18) is conflicting with 1d5d48523900a4b0f25d6b52f1a93c84bd671186
"xen-netback: require fewer guest Rx slots when not using GSO".

If the 2 series above are completely acked/reviewed, I will fold them in
this series.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-07 15:33 ` Julien Grall
@ 2015-09-14  8:56   ` Roger Pau Monné
  -1 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14  8:56 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, konrad.wilk, boris.ostrovsky, wei.liu2

El 07/09/15 a les 17.33, Julien Grall ha escrit:
> Hi all,
> 
> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
> hypercall interface and PV protocol are always based on 4KB page granularity.
> 
> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
> guest crash.
> 
> This series is a first attempt to allow those Linux running with the current
> hypercall interface and PV protocol.
> 
> This solution has been chosen because we want to run Linux 64KB in released
> Xen ARM version or/and platform using an old version of Linux DOM0.
> 
> There is room for improvement, such as support of 64KB grant, modification
> of PV protocol to support different page size... They will be explored in a
> separate patch series later.
> 
> TODO list:
>     - Convert swiotlb to 64KB
>     - Convert xenfb to 64KB
>     - Support for multiple page ring support
>     - Support for 64KB in gnttdev
>     - Support of non-indirect grant with 64KB frontend
>     - It may be possible to move some common define between
>     netback/netfront and blkfront/blkback in an header
> 
> I've got most of the patches for the TODO items. I'm planning to send them as
> a follow-up as it's not a requirement for a basic guests.
> 
> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
> to run it on x86 as I don't have a box with Xen x86 running. I would be
> happy if someone give a try and see possible regression for x86.

Do you have figures regarding if/how much performance penalty do the
blkfront/blkback patches add to the traditional 4KB scenario (frontend
and backend running on guests using 4KB pages)?

Since there's been no design document about this and the TODO list
doesn't contain it, I would like to know which plans do we have in order
to fix this properly.

Roger.


^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-07 15:33 ` Julien Grall
                   ` (35 preceding siblings ...)
  (?)
@ 2015-09-14  8:56 ` Roger Pau Monné
  -1 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14  8:56 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel

El 07/09/15 a les 17.33, Julien Grall ha escrit:
> Hi all,
> 
> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
> hypercall interface and PV protocol are always based on 4KB page granularity.
> 
> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
> guest crash.
> 
> This series is a first attempt to allow those Linux running with the current
> hypercall interface and PV protocol.
> 
> This solution has been chosen because we want to run Linux 64KB in released
> Xen ARM version or/and platform using an old version of Linux DOM0.
> 
> There is room for improvement, such as support of 64KB grant, modification
> of PV protocol to support different page size... They will be explored in a
> separate patch series later.
> 
> TODO list:
>     - Convert swiotlb to 64KB
>     - Convert xenfb to 64KB
>     - Support for multiple page ring support
>     - Support for 64KB in gnttdev
>     - Support of non-indirect grant with 64KB frontend
>     - It may be possible to move some common define between
>     netback/netfront and blkfront/blkback in an header
> 
> I've got most of the patches for the TODO items. I'm planning to send them as
> a follow-up as it's not a requirement for a basic guests.
> 
> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
> to run it on x86 as I don't have a box with Xen x86 running. I would be
> happy if someone give a try and see possible regression for x86.

Do you have figures regarding if/how much performance penalty do the
blkfront/blkback patches add to the traditional 4KB scenario (frontend
and backend running on guests using 4KB pages)?

Since there's been no design document about this and the TODO list
doesn't contain it, I would like to know which plans do we have in order
to fix this properly.

Roger.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-14  8:56   ` Roger Pau Monné
  0 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14  8:56 UTC (permalink / raw)
  To: linux-arm-kernel

El 07/09/15 a les 17.33, Julien Grall ha escrit:
> Hi all,
> 
> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
> hypercall interface and PV protocol are always based on 4KB page granularity.
> 
> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
> guest crash.
> 
> This series is a first attempt to allow those Linux running with the current
> hypercall interface and PV protocol.
> 
> This solution has been chosen because we want to run Linux 64KB in released
> Xen ARM version or/and platform using an old version of Linux DOM0.
> 
> There is room for improvement, such as support of 64KB grant, modification
> of PV protocol to support different page size... They will be explored in a
> separate patch series later.
> 
> TODO list:
>     - Convert swiotlb to 64KB
>     - Convert xenfb to 64KB
>     - Support for multiple page ring support
>     - Support for 64KB in gnttdev
>     - Support of non-indirect grant with 64KB frontend
>     - It may be possible to move some common define between
>     netback/netfront and blkfront/blkback in an header
> 
> I've got most of the patches for the TODO items. I'm planning to send them as
> a follow-up as it's not a requirement for a basic guests.
> 
> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
> to run it on x86 as I don't have a box with Xen x86 running. I would be
> happy if someone give a try and see possible regression for x86.

Do you have figures regarding if/how much performance penalty do the
blkfront/blkback patches add to the traditional 4KB scenario (frontend
and backend running on guests using 4KB pages)?

Since there's been no design document about this and the TODO list
doesn't contain it, I would like to know which plans do we have in order
to fix this properly.

Roger.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14  8:56   ` Roger Pau Monné
@ 2015-09-14 10:40     ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-14 10:40 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, konrad.wilk, boris.ostrovsky, wei.liu2

Hi Roger,

On 14/09/15 09:56, Roger Pau Monné wrote:
> El 07/09/15 a les 17.33, Julien Grall ha escrit:
>> Hi all,
>>
>> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
>> hypercall interface and PV protocol are always based on 4KB page granularity.
>>
>> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
>> guest crash.
>>
>> This series is a first attempt to allow those Linux running with the current
>> hypercall interface and PV protocol.
>>
>> This solution has been chosen because we want to run Linux 64KB in released
>> Xen ARM version or/and platform using an old version of Linux DOM0.
>>
>> There is room for improvement, such as support of 64KB grant, modification
>> of PV protocol to support different page size... They will be explored in a
>> separate patch series later.
>>
>> TODO list:
>>     - Convert swiotlb to 64KB
>>     - Convert xenfb to 64KB
>>     - Support for multiple page ring support
>>     - Support for 64KB in gnttdev
>>     - Support of non-indirect grant with 64KB frontend
>>     - It may be possible to move some common define between
>>     netback/netfront and blkfront/blkback in an header
>>
>> I've got most of the patches for the TODO items. I'm planning to send them as
>> a follow-up as it's not a requirement for a basic guests.
>>
>> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
>> to run it on x86 as I don't have a box with Xen x86 running. I would be
>> happy if someone give a try and see possible regression for x86.
> 
> Do you have figures regarding if/how much performance penalty do the
> blkfront/blkback patches add to the traditional 4KB scenario (frontend
> and backend running on guests using 4KB pages)?

Which benchmark do you advice? Although, I don't have SSD on this
platform so it may be possible that the bottleneck will be the hard drive.

> 
> Since there's been no design document about this and the TODO list
> doesn't contain it, I would like to know which plans do we have in order
> to fix this properly.

Can you explain what kind of design document you were expecting? The
support of 64KB page granularity is pretty straight-forward and there is
not many way to do it. We have to split the page in 4KB chunk to feed
the ring.

TBH, I'm expecting a small impact to the performance. It would be hard
to get the exactly the same performance as today if we keep the helpers
to avoid the backend dealing himself with the splitting and page
granularity.

Although, if the performance impact is not acceptable, it may be
possible to optimize gnttab_foreach_grant_in_range by moving the
function inline. The current way to the loop is the fastest I've found
(I've wrote a small program to test different way) and we will need it
when different of size will be supported.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14  8:56   ` Roger Pau Monné
  (?)
  (?)
@ 2015-09-14 10:40   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-14 10:40 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel

Hi Roger,

On 14/09/15 09:56, Roger Pau Monné wrote:
> El 07/09/15 a les 17.33, Julien Grall ha escrit:
>> Hi all,
>>
>> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
>> hypercall interface and PV protocol are always based on 4KB page granularity.
>>
>> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
>> guest crash.
>>
>> This series is a first attempt to allow those Linux running with the current
>> hypercall interface and PV protocol.
>>
>> This solution has been chosen because we want to run Linux 64KB in released
>> Xen ARM version or/and platform using an old version of Linux DOM0.
>>
>> There is room for improvement, such as support of 64KB grant, modification
>> of PV protocol to support different page size... They will be explored in a
>> separate patch series later.
>>
>> TODO list:
>>     - Convert swiotlb to 64KB
>>     - Convert xenfb to 64KB
>>     - Support for multiple page ring support
>>     - Support for 64KB in gnttdev
>>     - Support of non-indirect grant with 64KB frontend
>>     - It may be possible to move some common define between
>>     netback/netfront and blkfront/blkback in an header
>>
>> I've got most of the patches for the TODO items. I'm planning to send them as
>> a follow-up as it's not a requirement for a basic guests.
>>
>> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
>> to run it on x86 as I don't have a box with Xen x86 running. I would be
>> happy if someone give a try and see possible regression for x86.
> 
> Do you have figures regarding if/how much performance penalty do the
> blkfront/blkback patches add to the traditional 4KB scenario (frontend
> and backend running on guests using 4KB pages)?

Which benchmark do you advice? Although, I don't have SSD on this
platform so it may be possible that the bottleneck will be the hard drive.

> 
> Since there's been no design document about this and the TODO list
> doesn't contain it, I would like to know which plans do we have in order
> to fix this properly.

Can you explain what kind of design document you were expecting? The
support of 64KB page granularity is pretty straight-forward and there is
not many way to do it. We have to split the page in 4KB chunk to feed
the ring.

TBH, I'm expecting a small impact to the performance. It would be hard
to get the exactly the same performance as today if we keep the helpers
to avoid the backend dealing himself with the splitting and page
granularity.

Although, if the performance impact is not acceptable, it may be
possible to optimize gnttab_foreach_grant_in_range by moving the
function inline. The current way to the loop is the fastest I've found
(I've wrote a small program to test different way) and we will need it
when different of size will be supported.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-14 10:40     ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-14 10:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Roger,

On 14/09/15 09:56, Roger Pau Monn? wrote:
> El 07/09/15 a les 17.33, Julien Grall ha escrit:
>> Hi all,
>>
>> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
>> hypercall interface and PV protocol are always based on 4KB page granularity.
>>
>> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
>> guest crash.
>>
>> This series is a first attempt to allow those Linux running with the current
>> hypercall interface and PV protocol.
>>
>> This solution has been chosen because we want to run Linux 64KB in released
>> Xen ARM version or/and platform using an old version of Linux DOM0.
>>
>> There is room for improvement, such as support of 64KB grant, modification
>> of PV protocol to support different page size... They will be explored in a
>> separate patch series later.
>>
>> TODO list:
>>     - Convert swiotlb to 64KB
>>     - Convert xenfb to 64KB
>>     - Support for multiple page ring support
>>     - Support for 64KB in gnttdev
>>     - Support of non-indirect grant with 64KB frontend
>>     - It may be possible to move some common define between
>>     netback/netfront and blkfront/blkback in an header
>>
>> I've got most of the patches for the TODO items. I'm planning to send them as
>> a follow-up as it's not a requirement for a basic guests.
>>
>> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
>> to run it on x86 as I don't have a box with Xen x86 running. I would be
>> happy if someone give a try and see possible regression for x86.
> 
> Do you have figures regarding if/how much performance penalty do the
> blkfront/blkback patches add to the traditional 4KB scenario (frontend
> and backend running on guests using 4KB pages)?

Which benchmark do you advice? Although, I don't have SSD on this
platform so it may be possible that the bottleneck will be the hard drive.

> 
> Since there's been no design document about this and the TODO list
> doesn't contain it, I would like to know which plans do we have in order
> to fix this properly.

Can you explain what kind of design document you were expecting? The
support of 64KB page granularity is pretty straight-forward and there is
not many way to do it. We have to split the page in 4KB chunk to feed
the ring.

TBH, I'm expecting a small impact to the performance. It would be hard
to get the exactly the same performance as today if we keep the helpers
to avoid the backend dealing himself with the splitting and page
granularity.

Although, if the performance impact is not acceptable, it may be
possible to optimize gnttab_foreach_grant_in_range by moving the
function inline. The current way to the loop is the fastest I've found
(I've wrote a small program to test different way) and we will need it
when different of size will be supported.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 10:40     ` Julien Grall
@ 2015-09-14 11:04       ` Roger Pau Monné
  -1 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14 11:04 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: linux-arm-kernel, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, konrad.wilk, boris.ostrovsky, wei.liu2

Hello,

El 14/09/15 a les 12.40, Julien Grall ha escrit:
> Hi Roger,
> 
> On 14/09/15 09:56, Roger Pau Monné wrote:
>> El 07/09/15 a les 17.33, Julien Grall ha escrit:
>>> Hi all,
>>>
>>> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
>>> hypercall interface and PV protocol are always based on 4KB page granularity.
>>>
>>> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
>>> guest crash.
>>>
>>> This series is a first attempt to allow those Linux running with the current
>>> hypercall interface and PV protocol.
>>>
>>> This solution has been chosen because we want to run Linux 64KB in released
>>> Xen ARM version or/and platform using an old version of Linux DOM0.
>>>
>>> There is room for improvement, such as support of 64KB grant, modification
>>> of PV protocol to support different page size... They will be explored in a
>>> separate patch series later.
>>>
>>> TODO list:
>>>     - Convert swiotlb to 64KB
>>>     - Convert xenfb to 64KB
>>>     - Support for multiple page ring support
>>>     - Support for 64KB in gnttdev
>>>     - Support of non-indirect grant with 64KB frontend
>>>     - It may be possible to move some common define between
>>>     netback/netfront and blkfront/blkback in an header
>>>
>>> I've got most of the patches for the TODO items. I'm planning to send them as
>>> a follow-up as it's not a requirement for a basic guests.
>>>
>>> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
>>> to run it on x86 as I don't have a box with Xen x86 running. I would be
>>> happy if someone give a try and see possible regression for x86.
>>
>> Do you have figures regarding if/how much performance penalty do the
>> blkfront/blkback patches add to the traditional 4KB scenario (frontend
>> and backend running on guests using 4KB pages)?
> 
> Which benchmark do you advice? Although, I don't have SSD on this
> platform so it may be possible that the bottleneck will be the hard drive.

I've normally used a ramdisk in order to test performance, but it seems
nullblk would be a better option (it wasn't available before) and it
doesn't reduce the size of RAM available to the system:

https://www.kernel.org/doc/Documentation/block/null_blk.txt

>>
>> Since there's been no design document about this and the TODO list
>> doesn't contain it, I would like to know which plans do we have in order
>> to fix this properly.
> 
> Can you explain what kind of design document you were expecting? The
> support of 64KB page granularity is pretty straight-forward and there is
> not many way to do it. We have to split the page in 4KB chunk to feed
> the ring.

I was thinking about adding support for 64KB grants with the aim of
eventually removing the splitting done in xen-blkfront and the
grant-table code in general.

Some modifications would be needed in xen-blkback in order to support
mapping 64KB grants, but this could also be used by x86 if we ever
enable 2MB grants and allow them to be used in the block PV protocol,
while the current code can only be used by ARM.

> TBH, I'm expecting a small impact to the performance. It would be hard
> to get the exactly the same performance as today if we keep the helpers
> to avoid the backend dealing himself with the splitting and page
> granularity.
> 
> Although, if the performance impact is not acceptable, it may be
> possible to optimize gnttab_foreach_grant_in_range by moving the
> function inline. The current way to the loop is the fastest I've found
> (I've wrote a small program to test different way) and we will need it
> when different of size will be supported.

I don't expect the performance to drop massively with this patches
applied, but it would be good to al least have an idea of the impact.

Roger.


^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 10:40     ` Julien Grall
  (?)
@ 2015-09-14 11:04     ` Roger Pau Monné
  -1 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14 11:04 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel

Hello,

El 14/09/15 a les 12.40, Julien Grall ha escrit:
> Hi Roger,
> 
> On 14/09/15 09:56, Roger Pau Monné wrote:
>> El 07/09/15 a les 17.33, Julien Grall ha escrit:
>>> Hi all,
>>>
>>> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
>>> hypercall interface and PV protocol are always based on 4KB page granularity.
>>>
>>> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
>>> guest crash.
>>>
>>> This series is a first attempt to allow those Linux running with the current
>>> hypercall interface and PV protocol.
>>>
>>> This solution has been chosen because we want to run Linux 64KB in released
>>> Xen ARM version or/and platform using an old version of Linux DOM0.
>>>
>>> There is room for improvement, such as support of 64KB grant, modification
>>> of PV protocol to support different page size... They will be explored in a
>>> separate patch series later.
>>>
>>> TODO list:
>>>     - Convert swiotlb to 64KB
>>>     - Convert xenfb to 64KB
>>>     - Support for multiple page ring support
>>>     - Support for 64KB in gnttdev
>>>     - Support of non-indirect grant with 64KB frontend
>>>     - It may be possible to move some common define between
>>>     netback/netfront and blkfront/blkback in an header
>>>
>>> I've got most of the patches for the TODO items. I'm planning to send them as
>>> a follow-up as it's not a requirement for a basic guests.
>>>
>>> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
>>> to run it on x86 as I don't have a box with Xen x86 running. I would be
>>> happy if someone give a try and see possible regression for x86.
>>
>> Do you have figures regarding if/how much performance penalty do the
>> blkfront/blkback patches add to the traditional 4KB scenario (frontend
>> and backend running on guests using 4KB pages)?
> 
> Which benchmark do you advice? Although, I don't have SSD on this
> platform so it may be possible that the bottleneck will be the hard drive.

I've normally used a ramdisk in order to test performance, but it seems
nullblk would be a better option (it wasn't available before) and it
doesn't reduce the size of RAM available to the system:

https://www.kernel.org/doc/Documentation/block/null_blk.txt

>>
>> Since there's been no design document about this and the TODO list
>> doesn't contain it, I would like to know which plans do we have in order
>> to fix this properly.
> 
> Can you explain what kind of design document you were expecting? The
> support of 64KB page granularity is pretty straight-forward and there is
> not many way to do it. We have to split the page in 4KB chunk to feed
> the ring.

I was thinking about adding support for 64KB grants with the aim of
eventually removing the splitting done in xen-blkfront and the
grant-table code in general.

Some modifications would be needed in xen-blkback in order to support
mapping 64KB grants, but this could also be used by x86 if we ever
enable 2MB grants and allow them to be used in the block PV protocol,
while the current code can only be used by ARM.

> TBH, I'm expecting a small impact to the performance. It would be hard
> to get the exactly the same performance as today if we keep the helpers
> to avoid the backend dealing himself with the splitting and page
> granularity.
> 
> Although, if the performance impact is not acceptable, it may be
> possible to optimize gnttab_foreach_grant_in_range by moving the
> function inline. The current way to the loop is the fastest I've found
> (I've wrote a small program to test different way) and we will need it
> when different of size will be supported.

I don't expect the performance to drop massively with this patches
applied, but it would be good to al least have an idea of the impact.

Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-14 11:04       ` Roger Pau Monné
  0 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14 11:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

El 14/09/15 a les 12.40, Julien Grall ha escrit:
> Hi Roger,
> 
> On 14/09/15 09:56, Roger Pau Monn? wrote:
>> El 07/09/15 a les 17.33, Julien Grall ha escrit:
>>> Hi all,
>>>
>>> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
>>> hypercall interface and PV protocol are always based on 4KB page granularity.
>>>
>>> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
>>> guest crash.
>>>
>>> This series is a first attempt to allow those Linux running with the current
>>> hypercall interface and PV protocol.
>>>
>>> This solution has been chosen because we want to run Linux 64KB in released
>>> Xen ARM version or/and platform using an old version of Linux DOM0.
>>>
>>> There is room for improvement, such as support of 64KB grant, modification
>>> of PV protocol to support different page size... They will be explored in a
>>> separate patch series later.
>>>
>>> TODO list:
>>>     - Convert swiotlb to 64KB
>>>     - Convert xenfb to 64KB
>>>     - Support for multiple page ring support
>>>     - Support for 64KB in gnttdev
>>>     - Support of non-indirect grant with 64KB frontend
>>>     - It may be possible to move some common define between
>>>     netback/netfront and blkfront/blkback in an header
>>>
>>> I've got most of the patches for the TODO items. I'm planning to send them as
>>> a follow-up as it's not a requirement for a basic guests.
>>>
>>> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
>>> to run it on x86 as I don't have a box with Xen x86 running. I would be
>>> happy if someone give a try and see possible regression for x86.
>>
>> Do you have figures regarding if/how much performance penalty do the
>> blkfront/blkback patches add to the traditional 4KB scenario (frontend
>> and backend running on guests using 4KB pages)?
> 
> Which benchmark do you advice? Although, I don't have SSD on this
> platform so it may be possible that the bottleneck will be the hard drive.

I've normally used a ramdisk in order to test performance, but it seems
nullblk would be a better option (it wasn't available before) and it
doesn't reduce the size of RAM available to the system:

https://www.kernel.org/doc/Documentation/block/null_blk.txt

>>
>> Since there's been no design document about this and the TODO list
>> doesn't contain it, I would like to know which plans do we have in order
>> to fix this properly.
> 
> Can you explain what kind of design document you were expecting? The
> support of 64KB page granularity is pretty straight-forward and there is
> not many way to do it. We have to split the page in 4KB chunk to feed
> the ring.

I was thinking about adding support for 64KB grants with the aim of
eventually removing the splitting done in xen-blkfront and the
grant-table code in general.

Some modifications would be needed in xen-blkback in order to support
mapping 64KB grants, but this could also be used by x86 if we ever
enable 2MB grants and allow them to be used in the block PV protocol,
while the current code can only be used by ARM.

> TBH, I'm expecting a small impact to the performance. It would be hard
> to get the exactly the same performance as today if we keep the helpers
> to avoid the backend dealing himself with the splitting and page
> granularity.
> 
> Although, if the performance impact is not acceptable, it may be
> possible to optimize gnttab_foreach_grant_in_range by moving the
> function inline. The current way to the loop is the fastest I've found
> (I've wrote a small program to test different way) and we will need it
> when different of size will be supported.

I don't expect the performance to drop massively with this patches
applied, but it would be good to al least have an idea of the impact.

Roger.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 11:04       ` Roger Pau Monné
@ 2015-09-14 11:21         ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-14 11:21 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: wei.liu2, ian.campbell, konrad.wilk, stefano.stabellini,
	linux-kernel, david.vrabel, boris.ostrovsky, linux-arm-kernel

On 14/09/15 12:04, Roger Pau Monné wrote:
> Hello,
> 
> El 14/09/15 a les 12.40, Julien Grall ha escrit:
>> Hi Roger,
>>
>> On 14/09/15 09:56, Roger Pau Monné wrote:
>>> El 07/09/15 a les 17.33, Julien Grall ha escrit:
>>>> Hi all,
>>>>
>>>> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
>>>> hypercall interface and PV protocol are always based on 4KB page granularity.
>>>>
>>>> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
>>>> guest crash.
>>>>
>>>> This series is a first attempt to allow those Linux running with the current
>>>> hypercall interface and PV protocol.
>>>>
>>>> This solution has been chosen because we want to run Linux 64KB in released
>>>> Xen ARM version or/and platform using an old version of Linux DOM0.
>>>>
>>>> There is room for improvement, such as support of 64KB grant, modification
>>>> of PV protocol to support different page size... They will be explored in a
>>>> separate patch series later.
>>>>
>>>> TODO list:
>>>>     - Convert swiotlb to 64KB
>>>>     - Convert xenfb to 64KB
>>>>     - Support for multiple page ring support
>>>>     - Support for 64KB in gnttdev
>>>>     - Support of non-indirect grant with 64KB frontend
>>>>     - It may be possible to move some common define between
>>>>     netback/netfront and blkfront/blkback in an header
>>>>
>>>> I've got most of the patches for the TODO items. I'm planning to send them as
>>>> a follow-up as it's not a requirement for a basic guests.
>>>>
>>>> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
>>>> to run it on x86 as I don't have a box with Xen x86 running. I would be
>>>> happy if someone give a try and see possible regression for x86.
>>>
>>> Do you have figures regarding if/how much performance penalty do the
>>> blkfront/blkback patches add to the traditional 4KB scenario (frontend
>>> and backend running on guests using 4KB pages)?
>>
>> Which benchmark do you advice? Although, I don't have SSD on this
>> platform so it may be possible that the bottleneck will be the hard drive.
> 
> I've normally used a ramdisk in order to test performance, but it seems
> nullblk would be a better option (it wasn't available before) and it
> doesn't reduce the size of RAM available to the system:
> 
> https://www.kernel.org/doc/Documentation/block/null_blk.txt

I will give a look to this.

>>>
>>> Since there's been no design document about this and the TODO list
>>> doesn't contain it, I would like to know which plans do we have in order
>>> to fix this properly.
>>
>> Can you explain what kind of design document you were expecting? The
>> support of 64KB page granularity is pretty straight-forward and there is
>> not many way to do it. We have to split the page in 4KB chunk to feed
>> the ring.
> 
> I was thinking about adding support for 64KB grants with the aim of
> eventually removing the splitting done in xen-blkfront and the
> grant-table code in general.

I'd like to see a basic support of 64KB page granularity upstream before
starting to think about performance improvement. And there is still a
lot to do.

Although, having 64KB grants won't remove the splitting as we would
still have to support frontend/backend which only handle 4KB grant.


>> TBH, I'm expecting a small impact to the performance. It would be hard
>> to get the exactly the same performance as today if we keep the helpers
>> to avoid the backend dealing himself with the splitting and page
>> granularity.
>>
>> Although, if the performance impact is not acceptable, it may be
>> possible to optimize gnttab_foreach_grant_in_range by moving the
>> function inline. The current way to the loop is the fastest I've found
>> (I've wrote a small program to test different way) and we will need it
>> when different of size will be supported.
> 
> I don't expect the performance to drop massively with this patches
> applied, but it would be good to al least have an idea of the impact.

I will only be able to give percentage. I guess this would be sufficient
for you?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 11:04       ` Roger Pau Monné
  (?)
@ 2015-09-14 11:21       ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-14 11:21 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel

On 14/09/15 12:04, Roger Pau Monné wrote:
> Hello,
> 
> El 14/09/15 a les 12.40, Julien Grall ha escrit:
>> Hi Roger,
>>
>> On 14/09/15 09:56, Roger Pau Monné wrote:
>>> El 07/09/15 a les 17.33, Julien Grall ha escrit:
>>>> Hi all,
>>>>
>>>> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
>>>> hypercall interface and PV protocol are always based on 4KB page granularity.
>>>>
>>>> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
>>>> guest crash.
>>>>
>>>> This series is a first attempt to allow those Linux running with the current
>>>> hypercall interface and PV protocol.
>>>>
>>>> This solution has been chosen because we want to run Linux 64KB in released
>>>> Xen ARM version or/and platform using an old version of Linux DOM0.
>>>>
>>>> There is room for improvement, such as support of 64KB grant, modification
>>>> of PV protocol to support different page size... They will be explored in a
>>>> separate patch series later.
>>>>
>>>> TODO list:
>>>>     - Convert swiotlb to 64KB
>>>>     - Convert xenfb to 64KB
>>>>     - Support for multiple page ring support
>>>>     - Support for 64KB in gnttdev
>>>>     - Support of non-indirect grant with 64KB frontend
>>>>     - It may be possible to move some common define between
>>>>     netback/netfront and blkfront/blkback in an header
>>>>
>>>> I've got most of the patches for the TODO items. I'm planning to send them as
>>>> a follow-up as it's not a requirement for a basic guests.
>>>>
>>>> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
>>>> to run it on x86 as I don't have a box with Xen x86 running. I would be
>>>> happy if someone give a try and see possible regression for x86.
>>>
>>> Do you have figures regarding if/how much performance penalty do the
>>> blkfront/blkback patches add to the traditional 4KB scenario (frontend
>>> and backend running on guests using 4KB pages)?
>>
>> Which benchmark do you advice? Although, I don't have SSD on this
>> platform so it may be possible that the bottleneck will be the hard drive.
> 
> I've normally used a ramdisk in order to test performance, but it seems
> nullblk would be a better option (it wasn't available before) and it
> doesn't reduce the size of RAM available to the system:
> 
> https://www.kernel.org/doc/Documentation/block/null_blk.txt

I will give a look to this.

>>>
>>> Since there's been no design document about this and the TODO list
>>> doesn't contain it, I would like to know which plans do we have in order
>>> to fix this properly.
>>
>> Can you explain what kind of design document you were expecting? The
>> support of 64KB page granularity is pretty straight-forward and there is
>> not many way to do it. We have to split the page in 4KB chunk to feed
>> the ring.
> 
> I was thinking about adding support for 64KB grants with the aim of
> eventually removing the splitting done in xen-blkfront and the
> grant-table code in general.

I'd like to see a basic support of 64KB page granularity upstream before
starting to think about performance improvement. And there is still a
lot to do.

Although, having 64KB grants won't remove the splitting as we would
still have to support frontend/backend which only handle 4KB grant.


>> TBH, I'm expecting a small impact to the performance. It would be hard
>> to get the exactly the same performance as today if we keep the helpers
>> to avoid the backend dealing himself with the splitting and page
>> granularity.
>>
>> Although, if the performance impact is not acceptable, it may be
>> possible to optimize gnttab_foreach_grant_in_range by moving the
>> function inline. The current way to the loop is the fastest I've found
>> (I've wrote a small program to test different way) and we will need it
>> when different of size will be supported.
> 
> I don't expect the performance to drop massively with this patches
> applied, but it would be good to al least have an idea of the impact.

I will only be able to give percentage. I guess this would be sufficient
for you?

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-14 11:21         ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-14 11:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/09/15 12:04, Roger Pau Monn? wrote:
> Hello,
> 
> El 14/09/15 a les 12.40, Julien Grall ha escrit:
>> Hi Roger,
>>
>> On 14/09/15 09:56, Roger Pau Monn? wrote:
>>> El 07/09/15 a les 17.33, Julien Grall ha escrit:
>>>> Hi all,
>>>>
>>>> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
>>>> hypercall interface and PV protocol are always based on 4KB page granularity.
>>>>
>>>> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
>>>> guest crash.
>>>>
>>>> This series is a first attempt to allow those Linux running with the current
>>>> hypercall interface and PV protocol.
>>>>
>>>> This solution has been chosen because we want to run Linux 64KB in released
>>>> Xen ARM version or/and platform using an old version of Linux DOM0.
>>>>
>>>> There is room for improvement, such as support of 64KB grant, modification
>>>> of PV protocol to support different page size... They will be explored in a
>>>> separate patch series later.
>>>>
>>>> TODO list:
>>>>     - Convert swiotlb to 64KB
>>>>     - Convert xenfb to 64KB
>>>>     - Support for multiple page ring support
>>>>     - Support for 64KB in gnttdev
>>>>     - Support of non-indirect grant with 64KB frontend
>>>>     - It may be possible to move some common define between
>>>>     netback/netfront and blkfront/blkback in an header
>>>>
>>>> I've got most of the patches for the TODO items. I'm planning to send them as
>>>> a follow-up as it's not a requirement for a basic guests.
>>>>
>>>> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
>>>> to run it on x86 as I don't have a box with Xen x86 running. I would be
>>>> happy if someone give a try and see possible regression for x86.
>>>
>>> Do you have figures regarding if/how much performance penalty do the
>>> blkfront/blkback patches add to the traditional 4KB scenario (frontend
>>> and backend running on guests using 4KB pages)?
>>
>> Which benchmark do you advice? Although, I don't have SSD on this
>> platform so it may be possible that the bottleneck will be the hard drive.
> 
> I've normally used a ramdisk in order to test performance, but it seems
> nullblk would be a better option (it wasn't available before) and it
> doesn't reduce the size of RAM available to the system:
> 
> https://www.kernel.org/doc/Documentation/block/null_blk.txt

I will give a look to this.

>>>
>>> Since there's been no design document about this and the TODO list
>>> doesn't contain it, I would like to know which plans do we have in order
>>> to fix this properly.
>>
>> Can you explain what kind of design document you were expecting? The
>> support of 64KB page granularity is pretty straight-forward and there is
>> not many way to do it. We have to split the page in 4KB chunk to feed
>> the ring.
> 
> I was thinking about adding support for 64KB grants with the aim of
> eventually removing the splitting done in xen-blkfront and the
> grant-table code in general.

I'd like to see a basic support of 64KB page granularity upstream before
starting to think about performance improvement. And there is still a
lot to do.

Although, having 64KB grants won't remove the splitting as we would
still have to support frontend/backend which only handle 4KB grant.


>> TBH, I'm expecting a small impact to the performance. It would be hard
>> to get the exactly the same performance as today if we keep the helpers
>> to avoid the backend dealing himself with the splitting and page
>> granularity.
>>
>> Although, if the performance impact is not acceptable, it may be
>> possible to optimize gnttab_foreach_grant_in_range by moving the
>> function inline. The current way to the loop is the fastest I've found
>> (I've wrote a small program to test different way) and we will need it
>> when different of size will be supported.
> 
> I don't expect the performance to drop massively with this patches
> applied, but it would be good to al least have an idea of the impact.

I will only be able to give percentage. I guess this would be sufficient
for you?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 11:04       ` Roger Pau Monné
@ 2015-09-14 11:32         ` Arnd Bergmann
  -1 siblings, 0 replies; 144+ messages in thread
From: Arnd Bergmann @ 2015-09-14 11:32 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Roger Pau Monné, Julien Grall, xen-devel, wei.liu2,
	ian.campbell, konrad.wilk, stefano.stabellini, linux-kernel,
	david.vrabel, boris.ostrovsky

On Monday 14 September 2015 13:04:59 Roger Pau Monné wrote:
> > TBH, I'm expecting a small impact to the performance. It would be hard
> > to get the exactly the same performance as today if we keep the helpers
> > to avoid the backend dealing himself with the splitting and page
> > granularity.
> > 
> > Although, if the performance impact is not acceptable, it may be
> > possible to optimize gnttab_foreach_grant_in_range by moving the
> > function inline. The current way to the loop is the fastest I've found
> > (I've wrote a small program to test different way) and we will need it
> > when different of size will be supported.
> 
> I don't expect the performance to drop massively with this patches
> applied, but it would be good to al least have an idea of the impact.

Note that using 64kb pages in Linux tends to destroy performance
in Linux in any case, as the memory consumption for most workloads
explodes. In a virtualized environment you already tend to be
memory constrained, so any measurement should take that into account
and put the extra overhead into perspective to the massive overhead
of running 64kb pages when RAM is tight.

	Arnd

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 11:04       ` Roger Pau Monné
                         ` (2 preceding siblings ...)
  (?)
@ 2015-09-14 11:32       ` Arnd Bergmann
  -1 siblings, 0 replies; 144+ messages in thread
From: Arnd Bergmann @ 2015-09-14 11:32 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, david.vrabel, xen-devel, boris.ostrovsky,
	Roger Pau Monné

On Monday 14 September 2015 13:04:59 Roger Pau Monné wrote:
> > TBH, I'm expecting a small impact to the performance. It would be hard
> > to get the exactly the same performance as today if we keep the helpers
> > to avoid the backend dealing himself with the splitting and page
> > granularity.
> > 
> > Although, if the performance impact is not acceptable, it may be
> > possible to optimize gnttab_foreach_grant_in_range by moving the
> > function inline. The current way to the loop is the fastest I've found
> > (I've wrote a small program to test different way) and we will need it
> > when different of size will be supported.
> 
> I don't expect the performance to drop massively with this patches
> applied, but it would be good to al least have an idea of the impact.

Note that using 64kb pages in Linux tends to destroy performance
in Linux in any case, as the memory consumption for most workloads
explodes. In a virtualized environment you already tend to be
memory constrained, so any measurement should take that into account
and put the extra overhead into perspective to the massive overhead
of running 64kb pages when RAM is tight.

	Arnd

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-14 11:32         ` Arnd Bergmann
  0 siblings, 0 replies; 144+ messages in thread
From: Arnd Bergmann @ 2015-09-14 11:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 14 September 2015 13:04:59 Roger Pau Monn? wrote:
> > TBH, I'm expecting a small impact to the performance. It would be hard
> > to get the exactly the same performance as today if we keep the helpers
> > to avoid the backend dealing himself with the splitting and page
> > granularity.
> > 
> > Although, if the performance impact is not acceptable, it may be
> > possible to optimize gnttab_foreach_grant_in_range by moving the
> > function inline. The current way to the loop is the fastest I've found
> > (I've wrote a small program to test different way) and we will need it
> > when different of size will be supported.
> 
> I don't expect the performance to drop massively with this patches
> applied, but it would be good to al least have an idea of the impact.

Note that using 64kb pages in Linux tends to destroy performance
in Linux in any case, as the memory consumption for most workloads
explodes. In a virtualized environment you already tend to be
memory constrained, so any measurement should take that into account
and put the extra overhead into perspective to the massive overhead
of running 64kb pages when RAM is tight.

	Arnd

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 11:21         ` Julien Grall
@ 2015-09-14 12:08           ` Roger Pau Monné
  -1 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14 12:08 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: wei.liu2, ian.campbell, konrad.wilk, stefano.stabellini,
	linux-kernel, david.vrabel, boris.ostrovsky, linux-arm-kernel

El 14/09/15 a les 13.21, Julien Grall ha escrit:
> On 14/09/15 12:04, Roger Pau Monné wrote:
>> Hello,
>>
>> El 14/09/15 a les 12.40, Julien Grall ha escrit:
>>> Hi Roger,
>>>
>>> On 14/09/15 09:56, Roger Pau Monné wrote:
>>>> El 07/09/15 a les 17.33, Julien Grall ha escrit:
>>>>> Hi all,
>>>>>
>>>>> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
>>>>> hypercall interface and PV protocol are always based on 4KB page granularity.
>>>>>
>>>>> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
>>>>> guest crash.
>>>>>
>>>>> This series is a first attempt to allow those Linux running with the current
>>>>> hypercall interface and PV protocol.
>>>>>
>>>>> This solution has been chosen because we want to run Linux 64KB in released
>>>>> Xen ARM version or/and platform using an old version of Linux DOM0.
>>>>>
>>>>> There is room for improvement, such as support of 64KB grant, modification
>>>>> of PV protocol to support different page size... They will be explored in a
>>>>> separate patch series later.
>>>>>
>>>>> TODO list:
>>>>>     - Convert swiotlb to 64KB
>>>>>     - Convert xenfb to 64KB
>>>>>     - Support for multiple page ring support
>>>>>     - Support for 64KB in gnttdev
>>>>>     - Support of non-indirect grant with 64KB frontend
>>>>>     - It may be possible to move some common define between
>>>>>     netback/netfront and blkfront/blkback in an header
>>>>>
>>>>> I've got most of the patches for the TODO items. I'm planning to send them as
>>>>> a follow-up as it's not a requirement for a basic guests.
>>>>>
>>>>> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
>>>>> to run it on x86 as I don't have a box with Xen x86 running. I would be
>>>>> happy if someone give a try and see possible regression for x86.
>>>>
>>>> Do you have figures regarding if/how much performance penalty do the
>>>> blkfront/blkback patches add to the traditional 4KB scenario (frontend
>>>> and backend running on guests using 4KB pages)?
>>>
>>> Which benchmark do you advice? Although, I don't have SSD on this
>>> platform so it may be possible that the bottleneck will be the hard drive.
>>
>> I've normally used a ramdisk in order to test performance, but it seems
>> nullblk would be a better option (it wasn't available before) and it
>> doesn't reduce the size of RAM available to the system:
>>
>> https://www.kernel.org/doc/Documentation/block/null_blk.txt
> 
> I will give a look to this.
> 
>>>>
>>>> Since there's been no design document about this and the TODO list
>>>> doesn't contain it, I would like to know which plans do we have in order
>>>> to fix this properly.
>>>
>>> Can you explain what kind of design document you were expecting? The
>>> support of 64KB page granularity is pretty straight-forward and there is
>>> not many way to do it. We have to split the page in 4KB chunk to feed
>>> the ring.
>>
>> I was thinking about adding support for 64KB grants with the aim of
>> eventually removing the splitting done in xen-blkfront and the
>> grant-table code in general.
> 
> I'd like to see a basic support of 64KB page granularity upstream before
> starting to think about performance improvement. And there is still a
> lot to do.

I wasn't actually thinking of this as a performance improvement, but
rather as a way of fixing this properly and removing the complexity
added to xen-blkfront, thus providing a PV block protocol that natively
supports 64KB pages. This would be like hardware components already do
AFAIK, because I don't think there are other drivers in the Linux tree
that do this kind of splitting.

> Although, having 64KB grants won't remove the splitting as we would
> still have to support frontend/backend which only handle 4KB grant.

IMHO, given enough time I would just drop those.

>>> TBH, I'm expecting a small impact to the performance. It would be hard
>>> to get the exactly the same performance as today if we keep the helpers
>>> to avoid the backend dealing himself with the splitting and page
>>> granularity.
>>>
>>> Although, if the performance impact is not acceptable, it may be
>>> possible to optimize gnttab_foreach_grant_in_range by moving the
>>> function inline. The current way to the loop is the fastest I've found
>>> (I've wrote a small program to test different way) and we will need it
>>> when different of size will be supported.
>>
>> I don't expect the performance to drop massively with this patches
>> applied, but it would be good to al least have an idea of the impact.
> 
> I will only be able to give percentage. I guess this would be sufficient
> for you?

Well, absolute numbers together with the standard deviation are IMHO the
best way to provide those figures (ie: see ministat(1) output for
example), but percentages should also be fine.

I'm just interested in knowing the performance difference between having
this patches applied or not when using 4KB pages on the frontend and the
backend.

Roger.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 11:21         ` Julien Grall
  (?)
@ 2015-09-14 12:08         ` Roger Pau Monné
  -1 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14 12:08 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel

El 14/09/15 a les 13.21, Julien Grall ha escrit:
> On 14/09/15 12:04, Roger Pau Monné wrote:
>> Hello,
>>
>> El 14/09/15 a les 12.40, Julien Grall ha escrit:
>>> Hi Roger,
>>>
>>> On 14/09/15 09:56, Roger Pau Monné wrote:
>>>> El 07/09/15 a les 17.33, Julien Grall ha escrit:
>>>>> Hi all,
>>>>>
>>>>> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
>>>>> hypercall interface and PV protocol are always based on 4KB page granularity.
>>>>>
>>>>> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
>>>>> guest crash.
>>>>>
>>>>> This series is a first attempt to allow those Linux running with the current
>>>>> hypercall interface and PV protocol.
>>>>>
>>>>> This solution has been chosen because we want to run Linux 64KB in released
>>>>> Xen ARM version or/and platform using an old version of Linux DOM0.
>>>>>
>>>>> There is room for improvement, such as support of 64KB grant, modification
>>>>> of PV protocol to support different page size... They will be explored in a
>>>>> separate patch series later.
>>>>>
>>>>> TODO list:
>>>>>     - Convert swiotlb to 64KB
>>>>>     - Convert xenfb to 64KB
>>>>>     - Support for multiple page ring support
>>>>>     - Support for 64KB in gnttdev
>>>>>     - Support of non-indirect grant with 64KB frontend
>>>>>     - It may be possible to move some common define between
>>>>>     netback/netfront and blkfront/blkback in an header
>>>>>
>>>>> I've got most of the patches for the TODO items. I'm planning to send them as
>>>>> a follow-up as it's not a requirement for a basic guests.
>>>>>
>>>>> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
>>>>> to run it on x86 as I don't have a box with Xen x86 running. I would be
>>>>> happy if someone give a try and see possible regression for x86.
>>>>
>>>> Do you have figures regarding if/how much performance penalty do the
>>>> blkfront/blkback patches add to the traditional 4KB scenario (frontend
>>>> and backend running on guests using 4KB pages)?
>>>
>>> Which benchmark do you advice? Although, I don't have SSD on this
>>> platform so it may be possible that the bottleneck will be the hard drive.
>>
>> I've normally used a ramdisk in order to test performance, but it seems
>> nullblk would be a better option (it wasn't available before) and it
>> doesn't reduce the size of RAM available to the system:
>>
>> https://www.kernel.org/doc/Documentation/block/null_blk.txt
> 
> I will give a look to this.
> 
>>>>
>>>> Since there's been no design document about this and the TODO list
>>>> doesn't contain it, I would like to know which plans do we have in order
>>>> to fix this properly.
>>>
>>> Can you explain what kind of design document you were expecting? The
>>> support of 64KB page granularity is pretty straight-forward and there is
>>> not many way to do it. We have to split the page in 4KB chunk to feed
>>> the ring.
>>
>> I was thinking about adding support for 64KB grants with the aim of
>> eventually removing the splitting done in xen-blkfront and the
>> grant-table code in general.
> 
> I'd like to see a basic support of 64KB page granularity upstream before
> starting to think about performance improvement. And there is still a
> lot to do.

I wasn't actually thinking of this as a performance improvement, but
rather as a way of fixing this properly and removing the complexity
added to xen-blkfront, thus providing a PV block protocol that natively
supports 64KB pages. This would be like hardware components already do
AFAIK, because I don't think there are other drivers in the Linux tree
that do this kind of splitting.

> Although, having 64KB grants won't remove the splitting as we would
> still have to support frontend/backend which only handle 4KB grant.

IMHO, given enough time I would just drop those.

>>> TBH, I'm expecting a small impact to the performance. It would be hard
>>> to get the exactly the same performance as today if we keep the helpers
>>> to avoid the backend dealing himself with the splitting and page
>>> granularity.
>>>
>>> Although, if the performance impact is not acceptable, it may be
>>> possible to optimize gnttab_foreach_grant_in_range by moving the
>>> function inline. The current way to the loop is the fastest I've found
>>> (I've wrote a small program to test different way) and we will need it
>>> when different of size will be supported.
>>
>> I don't expect the performance to drop massively with this patches
>> applied, but it would be good to al least have an idea of the impact.
> 
> I will only be able to give percentage. I guess this would be sufficient
> for you?

Well, absolute numbers together with the standard deviation are IMHO the
best way to provide those figures (ie: see ministat(1) output for
example), but percentages should also be fine.

I'm just interested in knowing the performance difference between having
this patches applied or not when using 4KB pages on the frontend and the
backend.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-14 12:08           ` Roger Pau Monné
  0 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14 12:08 UTC (permalink / raw)
  To: linux-arm-kernel

El 14/09/15 a les 13.21, Julien Grall ha escrit:
> On 14/09/15 12:04, Roger Pau Monn? wrote:
>> Hello,
>>
>> El 14/09/15 a les 12.40, Julien Grall ha escrit:
>>> Hi Roger,
>>>
>>> On 14/09/15 09:56, Roger Pau Monn? wrote:
>>>> El 07/09/15 a les 17.33, Julien Grall ha escrit:
>>>>> Hi all,
>>>>>
>>>>> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
>>>>> hypercall interface and PV protocol are always based on 4KB page granularity.
>>>>>
>>>>> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
>>>>> guest crash.
>>>>>
>>>>> This series is a first attempt to allow those Linux running with the current
>>>>> hypercall interface and PV protocol.
>>>>>
>>>>> This solution has been chosen because we want to run Linux 64KB in released
>>>>> Xen ARM version or/and platform using an old version of Linux DOM0.
>>>>>
>>>>> There is room for improvement, such as support of 64KB grant, modification
>>>>> of PV protocol to support different page size... They will be explored in a
>>>>> separate patch series later.
>>>>>
>>>>> TODO list:
>>>>>     - Convert swiotlb to 64KB
>>>>>     - Convert xenfb to 64KB
>>>>>     - Support for multiple page ring support
>>>>>     - Support for 64KB in gnttdev
>>>>>     - Support of non-indirect grant with 64KB frontend
>>>>>     - It may be possible to move some common define between
>>>>>     netback/netfront and blkfront/blkback in an header
>>>>>
>>>>> I've got most of the patches for the TODO items. I'm planning to send them as
>>>>> a follow-up as it's not a requirement for a basic guests.
>>>>>
>>>>> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
>>>>> to run it on x86 as I don't have a box with Xen x86 running. I would be
>>>>> happy if someone give a try and see possible regression for x86.
>>>>
>>>> Do you have figures regarding if/how much performance penalty do the
>>>> blkfront/blkback patches add to the traditional 4KB scenario (frontend
>>>> and backend running on guests using 4KB pages)?
>>>
>>> Which benchmark do you advice? Although, I don't have SSD on this
>>> platform so it may be possible that the bottleneck will be the hard drive.
>>
>> I've normally used a ramdisk in order to test performance, but it seems
>> nullblk would be a better option (it wasn't available before) and it
>> doesn't reduce the size of RAM available to the system:
>>
>> https://www.kernel.org/doc/Documentation/block/null_blk.txt
> 
> I will give a look to this.
> 
>>>>
>>>> Since there's been no design document about this and the TODO list
>>>> doesn't contain it, I would like to know which plans do we have in order
>>>> to fix this properly.
>>>
>>> Can you explain what kind of design document you were expecting? The
>>> support of 64KB page granularity is pretty straight-forward and there is
>>> not many way to do it. We have to split the page in 4KB chunk to feed
>>> the ring.
>>
>> I was thinking about adding support for 64KB grants with the aim of
>> eventually removing the splitting done in xen-blkfront and the
>> grant-table code in general.
> 
> I'd like to see a basic support of 64KB page granularity upstream before
> starting to think about performance improvement. And there is still a
> lot to do.

I wasn't actually thinking of this as a performance improvement, but
rather as a way of fixing this properly and removing the complexity
added to xen-blkfront, thus providing a PV block protocol that natively
supports 64KB pages. This would be like hardware components already do
AFAIK, because I don't think there are other drivers in the Linux tree
that do this kind of splitting.

> Although, having 64KB grants won't remove the splitting as we would
> still have to support frontend/backend which only handle 4KB grant.

IMHO, given enough time I would just drop those.

>>> TBH, I'm expecting a small impact to the performance. It would be hard
>>> to get the exactly the same performance as today if we keep the helpers
>>> to avoid the backend dealing himself with the splitting and page
>>> granularity.
>>>
>>> Although, if the performance impact is not acceptable, it may be
>>> possible to optimize gnttab_foreach_grant_in_range by moving the
>>> function inline. The current way to the loop is the fastest I've found
>>> (I've wrote a small program to test different way) and we will need it
>>> when different of size will be supported.
>>
>> I don't expect the performance to drop massively with this patches
>> applied, but it would be good to al least have an idea of the impact.
> 
> I will only be able to give percentage. I guess this would be sufficient
> for you?

Well, absolute numbers together with the standard deviation are IMHO the
best way to provide those figures (ie: see ministat(1) output for
example), but percentages should also be fine.

I'm just interested in knowing the performance difference between having
this patches applied or not when using 4KB pages on the frontend and the
backend.

Roger.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 12:08           ` Roger Pau Monné
@ 2015-09-14 12:47             ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-14 12:47 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: wei.liu2, ian.campbell, konrad.wilk, stefano.stabellini,
	linux-kernel, david.vrabel, boris.ostrovsky, linux-arm-kernel

On 14/09/15 13:08, Roger Pau Monné wrote:
>> I'd like to see a basic support of 64KB page granularity upstream before
>> starting to think about performance improvement. And there is still a
>> lot to do.
> 
> I wasn't actually thinking of this as a performance improvement, but
> rather as a way of fixing this properly and removing the complexity
> added to xen-blkfront, thus providing a PV block protocol that natively
> supports 64KB pages. This would be like hardware components already do
> AFAIK, because I don't think there are other drivers in the Linux tree
> that do this kind of splitting.
> 
>> Although, having 64KB grants won't remove the splitting as we would
>> still have to support frontend/backend which only handle 4KB grant.
> 
> IMHO, given enough time I would just drop those.

The PV protocol has always been backward compatible.

Introducing the support of 64KB grant will require some changes in Xen
which won't surely be backported to older version. We still have to be
able to run new Linux guest on older Xen and vice versa.

To give you an example, Centos 7, which will support Xen and only 64KB
page granularity, will be supported for years. Dropping any splitting in
a short future (3-5 years) will just break those guests to boot on Xen.

AFAIK, we never took a such radical decision on Xen based on the
complexity of the code.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 12:08           ` Roger Pau Monné
  (?)
@ 2015-09-14 12:47           ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-14 12:47 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel

On 14/09/15 13:08, Roger Pau Monné wrote:
>> I'd like to see a basic support of 64KB page granularity upstream before
>> starting to think about performance improvement. And there is still a
>> lot to do.
> 
> I wasn't actually thinking of this as a performance improvement, but
> rather as a way of fixing this properly and removing the complexity
> added to xen-blkfront, thus providing a PV block protocol that natively
> supports 64KB pages. This would be like hardware components already do
> AFAIK, because I don't think there are other drivers in the Linux tree
> that do this kind of splitting.
> 
>> Although, having 64KB grants won't remove the splitting as we would
>> still have to support frontend/backend which only handle 4KB grant.
> 
> IMHO, given enough time I would just drop those.

The PV protocol has always been backward compatible.

Introducing the support of 64KB grant will require some changes in Xen
which won't surely be backported to older version. We still have to be
able to run new Linux guest on older Xen and vice versa.

To give you an example, Centos 7, which will support Xen and only 64KB
page granularity, will be supported for years. Dropping any splitting in
a short future (3-5 years) will just break those guests to boot on Xen.

AFAIK, we never took a such radical decision on Xen based on the
complexity of the code.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-14 12:47             ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-14 12:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/09/15 13:08, Roger Pau Monn? wrote:
>> I'd like to see a basic support of 64KB page granularity upstream before
>> starting to think about performance improvement. And there is still a
>> lot to do.
> 
> I wasn't actually thinking of this as a performance improvement, but
> rather as a way of fixing this properly and removing the complexity
> added to xen-blkfront, thus providing a PV block protocol that natively
> supports 64KB pages. This would be like hardware components already do
> AFAIK, because I don't think there are other drivers in the Linux tree
> that do this kind of splitting.
> 
>> Although, having 64KB grants won't remove the splitting as we would
>> still have to support frontend/backend which only handle 4KB grant.
> 
> IMHO, given enough time I would just drop those.

The PV protocol has always been backward compatible.

Introducing the support of 64KB grant will require some changes in Xen
which won't surely be backported to older version. We still have to be
able to run new Linux guest on older Xen and vice versa.

To give you an example, Centos 7, which will support Xen and only 64KB
page granularity, will be supported for years. Dropping any splitting in
a short future (3-5 years) will just break those guests to boot on Xen.

AFAIK, we never took a such radical decision on Xen based on the
complexity of the code.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 12:47             ` Julien Grall
@ 2015-09-14 14:29               ` Roger Pau Monné
  -1 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14 14:29 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: wei.liu2, ian.campbell, konrad.wilk, stefano.stabellini,
	linux-kernel, david.vrabel, boris.ostrovsky, linux-arm-kernel

Hello,

El 14/09/15 a les 14.47, Julien Grall ha escrit:
> On 14/09/15 13:08, Roger Pau Monné wrote:
>>> I'd like to see a basic support of 64KB page granularity upstream before
>>> starting to think about performance improvement. And there is still a
>>> lot to do.
>>
>> I wasn't actually thinking of this as a performance improvement, but
>> rather as a way of fixing this properly and removing the complexity
>> added to xen-blkfront, thus providing a PV block protocol that natively
>> supports 64KB pages. This would be like hardware components already do
>> AFAIK, because I don't think there are other drivers in the Linux tree
>> that do this kind of splitting.
>>
>>> Although, having 64KB grants won't remove the splitting as we would
>>> still have to support frontend/backend which only handle 4KB grant.
>>
>> IMHO, given enough time I would just drop those.
> 
> The PV protocol has always been backward compatible.
> 
> Introducing the support of 64KB grant will require some changes in Xen
> which won't surely be backported to older version. We still have to be
> able to run new Linux guest on older Xen and vice versa.

OK, so let's wait a couple of Xen and Linux releases before dropping
this from the Linux kernel once 64KB grant support is added. Guests
should still be able to boot, it's just the disk that won't be attached.

IMHO this splitting is just a workaround for the fact that we don't have
a 64KB PV block protocol, and this is the real problem that should be
solved.

In the long term this will put a burden on all blkfronts (if 64KB pages
are also used by other OSes), while introducing a 64KB PV block protocol
will make the blkfront implementation in all OSes very similar to what
we have now, without replicating the splitting code amongst all the
possible blkfront implementations.

Granted that some changes to blkback will be needed in order to support
mapping 64KB grants, but there are much fewer blkback implementations
out there than blkfronts.

> To give you an example, Centos 7, which will support Xen and only 64KB
> page granularity, will be supported for years. Dropping any splitting in
> a short future (3-5 years) will just break those guests to boot on Xen.

Can't the patches to support 64KB grants be backported to CentOS? In the
past you said that distros had no problem in picking the needed Xen
patches for 64KB support, I don't see how that would be different.

Also, a blkfront with the splitting code should also be able to work
perfectly fine once we have a 64KB PV block protocol, it will just be
treated like it's using 4KB pages, that's all.

> AFAIK, we never took a such radical decision on Xen based on the
> complexity of the code.

We never had to deal with a problem like this I'm afraid (mixing guests
with different page sizes), so I don't think we can make statements
about that.

And it's not just about the complexity, it's because as stated above
this IMHO is just sweeping the real problem under the carpet instead of
solving it.

Roger.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 12:47             ` Julien Grall
  (?)
  (?)
@ 2015-09-14 14:29             ` Roger Pau Monné
  -1 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14 14:29 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel

Hello,

El 14/09/15 a les 14.47, Julien Grall ha escrit:
> On 14/09/15 13:08, Roger Pau Monné wrote:
>>> I'd like to see a basic support of 64KB page granularity upstream before
>>> starting to think about performance improvement. And there is still a
>>> lot to do.
>>
>> I wasn't actually thinking of this as a performance improvement, but
>> rather as a way of fixing this properly and removing the complexity
>> added to xen-blkfront, thus providing a PV block protocol that natively
>> supports 64KB pages. This would be like hardware components already do
>> AFAIK, because I don't think there are other drivers in the Linux tree
>> that do this kind of splitting.
>>
>>> Although, having 64KB grants won't remove the splitting as we would
>>> still have to support frontend/backend which only handle 4KB grant.
>>
>> IMHO, given enough time I would just drop those.
> 
> The PV protocol has always been backward compatible.
> 
> Introducing the support of 64KB grant will require some changes in Xen
> which won't surely be backported to older version. We still have to be
> able to run new Linux guest on older Xen and vice versa.

OK, so let's wait a couple of Xen and Linux releases before dropping
this from the Linux kernel once 64KB grant support is added. Guests
should still be able to boot, it's just the disk that won't be attached.

IMHO this splitting is just a workaround for the fact that we don't have
a 64KB PV block protocol, and this is the real problem that should be
solved.

In the long term this will put a burden on all blkfronts (if 64KB pages
are also used by other OSes), while introducing a 64KB PV block protocol
will make the blkfront implementation in all OSes very similar to what
we have now, without replicating the splitting code amongst all the
possible blkfront implementations.

Granted that some changes to blkback will be needed in order to support
mapping 64KB grants, but there are much fewer blkback implementations
out there than blkfronts.

> To give you an example, Centos 7, which will support Xen and only 64KB
> page granularity, will be supported for years. Dropping any splitting in
> a short future (3-5 years) will just break those guests to boot on Xen.

Can't the patches to support 64KB grants be backported to CentOS? In the
past you said that distros had no problem in picking the needed Xen
patches for 64KB support, I don't see how that would be different.

Also, a blkfront with the splitting code should also be able to work
perfectly fine once we have a 64KB PV block protocol, it will just be
treated like it's using 4KB pages, that's all.

> AFAIK, we never took a such radical decision on Xen based on the
> complexity of the code.

We never had to deal with a problem like this I'm afraid (mixing guests
with different page sizes), so I don't think we can make statements
about that.

And it's not just about the complexity, it's because as stated above
this IMHO is just sweeping the real problem under the carpet instead of
solving it.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-14 14:29               ` Roger Pau Monné
  0 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14 14:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

El 14/09/15 a les 14.47, Julien Grall ha escrit:
> On 14/09/15 13:08, Roger Pau Monn? wrote:
>>> I'd like to see a basic support of 64KB page granularity upstream before
>>> starting to think about performance improvement. And there is still a
>>> lot to do.
>>
>> I wasn't actually thinking of this as a performance improvement, but
>> rather as a way of fixing this properly and removing the complexity
>> added to xen-blkfront, thus providing a PV block protocol that natively
>> supports 64KB pages. This would be like hardware components already do
>> AFAIK, because I don't think there are other drivers in the Linux tree
>> that do this kind of splitting.
>>
>>> Although, having 64KB grants won't remove the splitting as we would
>>> still have to support frontend/backend which only handle 4KB grant.
>>
>> IMHO, given enough time I would just drop those.
> 
> The PV protocol has always been backward compatible.
> 
> Introducing the support of 64KB grant will require some changes in Xen
> which won't surely be backported to older version. We still have to be
> able to run new Linux guest on older Xen and vice versa.

OK, so let's wait a couple of Xen and Linux releases before dropping
this from the Linux kernel once 64KB grant support is added. Guests
should still be able to boot, it's just the disk that won't be attached.

IMHO this splitting is just a workaround for the fact that we don't have
a 64KB PV block protocol, and this is the real problem that should be
solved.

In the long term this will put a burden on all blkfronts (if 64KB pages
are also used by other OSes), while introducing a 64KB PV block protocol
will make the blkfront implementation in all OSes very similar to what
we have now, without replicating the splitting code amongst all the
possible blkfront implementations.

Granted that some changes to blkback will be needed in order to support
mapping 64KB grants, but there are much fewer blkback implementations
out there than blkfronts.

> To give you an example, Centos 7, which will support Xen and only 64KB
> page granularity, will be supported for years. Dropping any splitting in
> a short future (3-5 years) will just break those guests to boot on Xen.

Can't the patches to support 64KB grants be backported to CentOS? In the
past you said that distros had no problem in picking the needed Xen
patches for 64KB support, I don't see how that would be different.

Also, a blkfront with the splitting code should also be able to work
perfectly fine once we have a 64KB PV block protocol, it will just be
treated like it's using 4KB pages, that's all.

> AFAIK, we never took a such radical decision on Xen based on the
> complexity of the code.

We never had to deal with a problem like this I'm afraid (mixing guests
with different page sizes), so I don't think we can make statements
about that.

And it's not just about the complexity, it's because as stated above
this IMHO is just sweeping the real problem under the carpet instead of
solving it.

Roger.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 14:29               ` Roger Pau Monné
@ 2015-09-14 14:46                 ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-14 14:46 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: wei.liu2, ian.campbell, konrad.wilk, stefano.stabellini,
	linux-kernel, david.vrabel, boris.ostrovsky, linux-arm-kernel

On 14/09/15 15:29, Roger Pau Monné wrote:
>> To give you an example, Centos 7, which will support Xen and only 64KB
>> page granularity, will be supported for years. Dropping any splitting in
>> a short future (3-5 years) will just break those guests to boot on Xen.
> 
> Can't the patches to support 64KB grants be backported to CentOS? In the
> past you said that distros had no problem in picking the needed Xen
> patches for 64KB support, I don't see how that would be different.

Centos 7 is not yet released so it's more easy to get patch in. I don't
know what will be position when the guest is shipped.

I bet they will still want to support 64KB with older Xen (i.e where
64KB grant is not supported).

Anyway, I don't expect to see the 64KB grants support before next year
as my TODO list is already pretty full. So let's talk about it when I
(or someone else) will send an RFC to extend the grant size.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 14:29               ` Roger Pau Monné
  (?)
  (?)
@ 2015-09-14 14:46               ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-14 14:46 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel

On 14/09/15 15:29, Roger Pau Monné wrote:
>> To give you an example, Centos 7, which will support Xen and only 64KB
>> page granularity, will be supported for years. Dropping any splitting in
>> a short future (3-5 years) will just break those guests to boot on Xen.
> 
> Can't the patches to support 64KB grants be backported to CentOS? In the
> past you said that distros had no problem in picking the needed Xen
> patches for 64KB support, I don't see how that would be different.

Centos 7 is not yet released so it's more easy to get patch in. I don't
know what will be position when the guest is shipped.

I bet they will still want to support 64KB with older Xen (i.e where
64KB grant is not supported).

Anyway, I don't expect to see the 64KB grants support before next year
as my TODO list is already pretty full. So let's talk about it when I
(or someone else) will send an RFC to extend the grant size.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-14 14:46                 ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-14 14:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/09/15 15:29, Roger Pau Monn? wrote:
>> To give you an example, Centos 7, which will support Xen and only 64KB
>> page granularity, will be supported for years. Dropping any splitting in
>> a short future (3-5 years) will just break those guests to boot on Xen.
> 
> Can't the patches to support 64KB grants be backported to CentOS? In the
> past you said that distros had no problem in picking the needed Xen
> patches for 64KB support, I don't see how that would be different.

Centos 7 is not yet released so it's more easy to get patch in. I don't
know what will be position when the guest is shipped.

I bet they will still want to support 64KB with older Xen (i.e where
64KB grant is not supported).

Anyway, I don't expect to see the 64KB grants support before next year
as my TODO list is already pretty full. So let's talk about it when I
(or someone else) will send an RFC to extend the grant size.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 14:29               ` Roger Pau Monné
@ 2015-09-14 14:54                 ` Stefano Stabellini
  -1 siblings, 0 replies; 144+ messages in thread
From: Stefano Stabellini @ 2015-09-14 14:54 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Julien Grall, xen-devel, wei.liu2, ian.campbell, konrad.wilk,
	stefano.stabellini, linux-kernel, david.vrabel, boris.ostrovsky,
	linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 1236 bytes --]

On Mon, 14 Sep 2015, Roger Pau Monné wrote:
> IMHO this splitting is just a workaround for the fact that we don't have
> a 64KB PV block protocol, and this is the real problem that should be
> solved.

64K is a pure one guest kernel configuration option, not a platform wide
option. The hypervisor interfaces are still the same, the ABI is the
same and all the other guests are still the same, the Xen binary is
still the same.

A 64K block protocol could be a good performance imprevement, but should
not be required to run kernels which have different config options.


> In the long term this will put a burden on all blkfronts (if 64KB pages
> are also used by other OSes), while introducing a 64KB PV block protocol
> will make the blkfront implementation in all OSes very similar to what
> we have now, without replicating the splitting code amongst all the
> possible blkfront implementations.
> 
> Granted that some changes to blkback will be needed in order to support
> mapping 64KB grants, but there are much fewer blkback implementations
> out there than blkfronts.

I don't think we can rely on blkback having something in order to run
new guests, otherwise we break compatibility: new guests won't run on
old hypervisors.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 14:29               ` Roger Pau Monné
                                 ` (3 preceding siblings ...)
  (?)
@ 2015-09-14 14:54               ` Stefano Stabellini
  -1 siblings, 0 replies; 144+ messages in thread
From: Stefano Stabellini @ 2015-09-14 14:54 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, david.vrabel, xen-devel, boris.ostrovsky,
	linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 1261 bytes --]

On Mon, 14 Sep 2015, Roger Pau Monné wrote:
> IMHO this splitting is just a workaround for the fact that we don't have
> a 64KB PV block protocol, and this is the real problem that should be
> solved.

64K is a pure one guest kernel configuration option, not a platform wide
option. The hypervisor interfaces are still the same, the ABI is the
same and all the other guests are still the same, the Xen binary is
still the same.

A 64K block protocol could be a good performance imprevement, but should
not be required to run kernels which have different config options.


> In the long term this will put a burden on all blkfronts (if 64KB pages
> are also used by other OSes), while introducing a 64KB PV block protocol
> will make the blkfront implementation in all OSes very similar to what
> we have now, without replicating the splitting code amongst all the
> possible blkfront implementations.
>
> Granted that some changes to blkback will be needed in order to support
> mapping 64KB grants, but there are much fewer blkback implementations
> out there than blkfronts.

I don't think we can rely on blkback having something in order to run
new guests, otherwise we break compatibility: new guests won't run on
old hypervisors.

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-14 14:54                 ` Stefano Stabellini
  0 siblings, 0 replies; 144+ messages in thread
From: Stefano Stabellini @ 2015-09-14 14:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 14 Sep 2015, Roger Pau Monn? wrote:
> IMHO this splitting is just a workaround for the fact that we don't have
> a 64KB PV block protocol, and this is the real problem that should be
> solved.

64K is a pure one guest kernel configuration option, not a platform wide
option. The hypervisor interfaces are still the same, the ABI is the
same and all the other guests are still the same, the Xen binary is
still the same.

A 64K block protocol could be a good performance imprevement, but should
not be required to run kernels which have different config options.

> In the long term this will put a burden on all blkfronts (if 64KB pages
> are also used by other OSes), while introducing a 64KB PV block protocol
> will make the blkfront implementation in all OSes very similar to what
> we have now, without replicating the splitting code amongst all the
> possible blkfront implementations.
> 
> Granted that some changes to blkback will be needed in order to support
> mapping 64KB grants, but there are much fewer blkback implementations
> out there than blkfronts.

I don't think we can rely on blkback having something in order to run
new guests, otherwise we break compatibility: new guests won't run on
old hypervisors.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 14:54                 ` Stefano Stabellini
@ 2015-09-14 15:23                   ` Roger Pau Monné
  -1 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14 15:23 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, xen-devel, wei.liu2, ian.campbell, konrad.wilk,
	linux-kernel, david.vrabel, boris.ostrovsky, linux-arm-kernel

El 14/09/15 a les 16.54, Stefano Stabellini ha escrit:
> On Mon, 14 Sep 2015, Roger Pau Monné wrote:
>> IMHO this splitting is just a workaround for the fact that we don't have
>> a 64KB PV block protocol, and this is the real problem that should be
>> solved.
> 
> 64K is a pure one guest kernel configuration option, not a platform wide
> option. The hypervisor interfaces are still the same, the ABI is the
> same and all the other guests are still the same, the Xen binary is
> still the same.

Yes, I understand that, but the PV block protocol is missing 64KB page
support, and that's a fact that cannot be ignored. To put an example, is
there a hardware SATA controller on ARM that doesn't support 64KB pages
and needs a similar workaround?

> A 64K block protocol could be a good performance imprevement, but should
> not be required to run kernels which have different config options.
> 
>> In the long term this will put a burden on all blkfronts (if 64KB pages
>> are also used by other OSes), while introducing a 64KB PV block protocol
>> will make the blkfront implementation in all OSes very similar to what
>> we have now, without replicating the splitting code amongst all the
>> possible blkfront implementations.
>>
>> Granted that some changes to blkback will be needed in order to support
>> mapping 64KB grants, but there are much fewer blkback implementations
>> out there than blkfronts.
> 
> I don't think we can rely on blkback having something in order to run
> new guests, otherwise we break compatibility: new guests won't run on
> old hypervisors.

I agree that this is far from ideal, but I don't think it's so
outrageous. For example Linux PVOPS Dom0 kernels require Xen 4.0.1 at
least in order to run, because previous versions lack the necessary
IOAPIC setup hypercall. Also, it won't prevent guests from booting, it
would just prevent them from using blkfront, but you can still get a
root filesystem using iSCSI, NFS or other means.

I'm not saying that we shouldn't take those patches, I'm just saying
that IMHO this is a workaround, and I would like to see a plan and
somebody committed to have it fixed in a proper way, by introducing a
64KB PV block protocol.

Roger.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 14:54                 ` Stefano Stabellini
  (?)
@ 2015-09-14 15:23                 ` Roger Pau Monné
  -1 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14 15:23 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: wei.liu2, ian.campbell, linux-kernel, Julien Grall, david.vrabel,
	xen-devel, boris.ostrovsky, linux-arm-kernel

El 14/09/15 a les 16.54, Stefano Stabellini ha escrit:
> On Mon, 14 Sep 2015, Roger Pau Monné wrote:
>> IMHO this splitting is just a workaround for the fact that we don't have
>> a 64KB PV block protocol, and this is the real problem that should be
>> solved.
> 
> 64K is a pure one guest kernel configuration option, not a platform wide
> option. The hypervisor interfaces are still the same, the ABI is the
> same and all the other guests are still the same, the Xen binary is
> still the same.

Yes, I understand that, but the PV block protocol is missing 64KB page
support, and that's a fact that cannot be ignored. To put an example, is
there a hardware SATA controller on ARM that doesn't support 64KB pages
and needs a similar workaround?

> A 64K block protocol could be a good performance imprevement, but should
> not be required to run kernels which have different config options.
> 
>> In the long term this will put a burden on all blkfronts (if 64KB pages
>> are also used by other OSes), while introducing a 64KB PV block protocol
>> will make the blkfront implementation in all OSes very similar to what
>> we have now, without replicating the splitting code amongst all the
>> possible blkfront implementations.
>>
>> Granted that some changes to blkback will be needed in order to support
>> mapping 64KB grants, but there are much fewer blkback implementations
>> out there than blkfronts.
> 
> I don't think we can rely on blkback having something in order to run
> new guests, otherwise we break compatibility: new guests won't run on
> old hypervisors.

I agree that this is far from ideal, but I don't think it's so
outrageous. For example Linux PVOPS Dom0 kernels require Xen 4.0.1 at
least in order to run, because previous versions lack the necessary
IOAPIC setup hypercall. Also, it won't prevent guests from booting, it
would just prevent them from using blkfront, but you can still get a
root filesystem using iSCSI, NFS or other means.

I'm not saying that we shouldn't take those patches, I'm just saying
that IMHO this is a workaround, and I would like to see a plan and
somebody committed to have it fixed in a proper way, by introducing a
64KB PV block protocol.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-14 15:23                   ` Roger Pau Monné
  0 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-09-14 15:23 UTC (permalink / raw)
  To: linux-arm-kernel

El 14/09/15 a les 16.54, Stefano Stabellini ha escrit:
> On Mon, 14 Sep 2015, Roger Pau Monn? wrote:
>> IMHO this splitting is just a workaround for the fact that we don't have
>> a 64KB PV block protocol, and this is the real problem that should be
>> solved.
> 
> 64K is a pure one guest kernel configuration option, not a platform wide
> option. The hypervisor interfaces are still the same, the ABI is the
> same and all the other guests are still the same, the Xen binary is
> still the same.

Yes, I understand that, but the PV block protocol is missing 64KB page
support, and that's a fact that cannot be ignored. To put an example, is
there a hardware SATA controller on ARM that doesn't support 64KB pages
and needs a similar workaround?

> A 64K block protocol could be a good performance imprevement, but should
> not be required to run kernels which have different config options.
> 
>> In the long term this will put a burden on all blkfronts (if 64KB pages
>> are also used by other OSes), while introducing a 64KB PV block protocol
>> will make the blkfront implementation in all OSes very similar to what
>> we have now, without replicating the splitting code amongst all the
>> possible blkfront implementations.
>>
>> Granted that some changes to blkback will be needed in order to support
>> mapping 64KB grants, but there are much fewer blkback implementations
>> out there than blkfronts.
> 
> I don't think we can rely on blkback having something in order to run
> new guests, otherwise we break compatibility: new guests won't run on
> old hypervisors.

I agree that this is far from ideal, but I don't think it's so
outrageous. For example Linux PVOPS Dom0 kernels require Xen 4.0.1 at
least in order to run, because previous versions lack the necessary
IOAPIC setup hypercall. Also, it won't prevent guests from booting, it
would just prevent them from using blkfront, but you can still get a
root filesystem using iSCSI, NFS or other means.

I'm not saying that we shouldn't take those patches, I'm just saying
that IMHO this is a workaround, and I would like to see a plan and
somebody committed to have it fixed in a proper way, by introducing a
64KB PV block protocol.

Roger.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [Xen-devel] [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 11:32         ` Arnd Bergmann
@ 2015-09-15 13:14           ` David Vrabel
  -1 siblings, 0 replies; 144+ messages in thread
From: David Vrabel @ 2015-09-15 13:14 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arm-kernel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, david.vrabel, xen-devel, boris.ostrovsky,
	Roger Pau Monné

On 14/09/15 12:32, Arnd Bergmann wrote:
> On Monday 14 September 2015 13:04:59 Roger Pau Monné wrote:
>>> TBH, I'm expecting a small impact to the performance. It would be hard
>>> to get the exactly the same performance as today if we keep the helpers
>>> to avoid the backend dealing himself with the splitting and page
>>> granularity.
>>>
>>> Although, if the performance impact is not acceptable, it may be
>>> possible to optimize gnttab_foreach_grant_in_range by moving the
>>> function inline. The current way to the loop is the fastest I've found
>>> (I've wrote a small program to test different way) and we will need it
>>> when different of size will be supported.
>>
>> I don't expect the performance to drop massively with this patches
>> applied, but it would be good to al least have an idea of the impact.
> 
> Note that using 64kb pages in Linux tends to destroy performance
> in Linux in any case, as the memory consumption for most workloads
> explodes. In a virtualized environment you already tend to be
> memory constrained, so any measurement should take that into account
> and put the extra overhead into perspective to the massive overhead
> of running 64kb pages when RAM is tight.

If this is the case, why are some distros using 64 KiB pages then?

David

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 11:32         ` Arnd Bergmann
  (?)
  (?)
@ 2015-09-15 13:14         ` David Vrabel
  -1 siblings, 0 replies; 144+ messages in thread
From: David Vrabel @ 2015-09-15 13:14 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arm-kernel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, david.vrabel, xen-devel, boris.ostrovsky,
	Roger Pau Monné

On 14/09/15 12:32, Arnd Bergmann wrote:
> On Monday 14 September 2015 13:04:59 Roger Pau Monné wrote:
>>> TBH, I'm expecting a small impact to the performance. It would be hard
>>> to get the exactly the same performance as today if we keep the helpers
>>> to avoid the backend dealing himself with the splitting and page
>>> granularity.
>>>
>>> Although, if the performance impact is not acceptable, it may be
>>> possible to optimize gnttab_foreach_grant_in_range by moving the
>>> function inline. The current way to the loop is the fastest I've found
>>> (I've wrote a small program to test different way) and we will need it
>>> when different of size will be supported.
>>
>> I don't expect the performance to drop massively with this patches
>> applied, but it would be good to al least have an idea of the impact.
> 
> Note that using 64kb pages in Linux tends to destroy performance
> in Linux in any case, as the memory consumption for most workloads
> explodes. In a virtualized environment you already tend to be
> memory constrained, so any measurement should take that into account
> and put the extra overhead into perspective to the massive overhead
> of running 64kb pages when RAM is tight.

If this is the case, why are some distros using 64 KiB pages then?

David

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [Xen-devel] [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-15 13:14           ` David Vrabel
  0 siblings, 0 replies; 144+ messages in thread
From: David Vrabel @ 2015-09-15 13:14 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/09/15 12:32, Arnd Bergmann wrote:
> On Monday 14 September 2015 13:04:59 Roger Pau Monn? wrote:
>>> TBH, I'm expecting a small impact to the performance. It would be hard
>>> to get the exactly the same performance as today if we keep the helpers
>>> to avoid the backend dealing himself with the splitting and page
>>> granularity.
>>>
>>> Although, if the performance impact is not acceptable, it may be
>>> possible to optimize gnttab_foreach_grant_in_range by moving the
>>> function inline. The current way to the loop is the fastest I've found
>>> (I've wrote a small program to test different way) and we will need it
>>> when different of size will be supported.
>>
>> I don't expect the performance to drop massively with this patches
>> applied, but it would be good to al least have an idea of the impact.
> 
> Note that using 64kb pages in Linux tends to destroy performance
> in Linux in any case, as the memory consumption for most workloads
> explodes. In a virtualized environment you already tend to be
> memory constrained, so any measurement should take that into account
> and put the extra overhead into perspective to the massive overhead
> of running 64kb pages when RAM is tight.

If this is the case, why are some distros using 64 KiB pages then?

David

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [Xen-devel] [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-15 13:14           ` David Vrabel
@ 2015-09-15 13:24             ` Arnd Bergmann
  -1 siblings, 0 replies; 144+ messages in thread
From: Arnd Bergmann @ 2015-09-15 13:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: David Vrabel, wei.liu2, ian.campbell, stefano.stabellini,
	linux-kernel, Julien Grall, xen-devel, boris.ostrovsky,
	Roger Pau Monné

On Tuesday 15 September 2015 14:14:09 David Vrabel wrote:
> On 14/09/15 12:32, Arnd Bergmann wrote:
> > On Monday 14 September 2015 13:04:59 Roger Pau Monné wrote:
> >>> TBH, I'm expecting a small impact to the performance. It would be hard
> >>> to get the exactly the same performance as today if we keep the helpers
> >>> to avoid the backend dealing himself with the splitting and page
> >>> granularity.
> >>>
> >>> Although, if the performance impact is not acceptable, it may be
> >>> possible to optimize gnttab_foreach_grant_in_range by moving the
> >>> function inline. The current way to the loop is the fastest I've found
> >>> (I've wrote a small program to test different way) and we will need it
> >>> when different of size will be supported.
> >>
> >> I don't expect the performance to drop massively with this patches
> >> applied, but it would be good to al least have an idea of the impact.
> > 
> > Note that using 64kb pages in Linux tends to destroy performance
> > in Linux in any case, as the memory consumption for most workloads
> > explodes. In a virtualized environment you already tend to be
> > memory constrained, so any measurement should take that into account
> > and put the extra overhead into perspective to the massive overhead
> > of running 64kb pages when RAM is tight.
> 
> If this is the case, why are some distros using 64 KiB pages then?

I believe that IBM started this on PowerPC64 when they found that it
performs better in microbenchmarks and for certain database workloads
when you do have enough memory and do not work with small files or
many processes.

I also believe that the AIX memory management code is designed differently
enough that it suffers less from large pages (or suffers more from
small pages), so the hardware designers pushed Linux to do the same as AIX.

Why you'd do it on ARM64 I have no idea at all, other than copying what
IBM did on PowerPC. If your TLB is too small, hardware designers may
of course try to push the OS to using pages as large as possible for the
low-level benchmarks to suck less, but in general, you'd be better off
with transparent hugepages and 4kb or 8kb pages for real workloads.

	Arnd

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-15 13:14           ` David Vrabel
  (?)
@ 2015-09-15 13:24           ` Arnd Bergmann
  -1 siblings, 0 replies; 144+ messages in thread
From: Arnd Bergmann @ 2015-09-15 13:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	Julien Grall, David Vrabel, xen-devel, boris.ostrovsky,
	Roger Pau Monné

On Tuesday 15 September 2015 14:14:09 David Vrabel wrote:
> On 14/09/15 12:32, Arnd Bergmann wrote:
> > On Monday 14 September 2015 13:04:59 Roger Pau Monné wrote:
> >>> TBH, I'm expecting a small impact to the performance. It would be hard
> >>> to get the exactly the same performance as today if we keep the helpers
> >>> to avoid the backend dealing himself with the splitting and page
> >>> granularity.
> >>>
> >>> Although, if the performance impact is not acceptable, it may be
> >>> possible to optimize gnttab_foreach_grant_in_range by moving the
> >>> function inline. The current way to the loop is the fastest I've found
> >>> (I've wrote a small program to test different way) and we will need it
> >>> when different of size will be supported.
> >>
> >> I don't expect the performance to drop massively with this patches
> >> applied, but it would be good to al least have an idea of the impact.
> > 
> > Note that using 64kb pages in Linux tends to destroy performance
> > in Linux in any case, as the memory consumption for most workloads
> > explodes. In a virtualized environment you already tend to be
> > memory constrained, so any measurement should take that into account
> > and put the extra overhead into perspective to the massive overhead
> > of running 64kb pages when RAM is tight.
> 
> If this is the case, why are some distros using 64 KiB pages then?

I believe that IBM started this on PowerPC64 when they found that it
performs better in microbenchmarks and for certain database workloads
when you do have enough memory and do not work with small files or
many processes.

I also believe that the AIX memory management code is designed differently
enough that it suffers less from large pages (or suffers more from
small pages), so the hardware designers pushed Linux to do the same as AIX.

Why you'd do it on ARM64 I have no idea at all, other than copying what
IBM did on PowerPC. If your TLB is too small, hardware designers may
of course try to push the OS to using pages as large as possible for the
low-level benchmarks to suck less, but in general, you'd be better off
with transparent hugepages and 4kb or 8kb pages for real workloads.

	Arnd

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [Xen-devel] [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-15 13:24             ` Arnd Bergmann
  0 siblings, 0 replies; 144+ messages in thread
From: Arnd Bergmann @ 2015-09-15 13:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 15 September 2015 14:14:09 David Vrabel wrote:
> On 14/09/15 12:32, Arnd Bergmann wrote:
> > On Monday 14 September 2015 13:04:59 Roger Pau Monn? wrote:
> >>> TBH, I'm expecting a small impact to the performance. It would be hard
> >>> to get the exactly the same performance as today if we keep the helpers
> >>> to avoid the backend dealing himself with the splitting and page
> >>> granularity.
> >>>
> >>> Although, if the performance impact is not acceptable, it may be
> >>> possible to optimize gnttab_foreach_grant_in_range by moving the
> >>> function inline. The current way to the loop is the fastest I've found
> >>> (I've wrote a small program to test different way) and we will need it
> >>> when different of size will be supported.
> >>
> >> I don't expect the performance to drop massively with this patches
> >> applied, but it would be good to al least have an idea of the impact.
> > 
> > Note that using 64kb pages in Linux tends to destroy performance
> > in Linux in any case, as the memory consumption for most workloads
> > explodes. In a virtualized environment you already tend to be
> > memory constrained, so any measurement should take that into account
> > and put the extra overhead into perspective to the massive overhead
> > of running 64kb pages when RAM is tight.
> 
> If this is the case, why are some distros using 64 KiB pages then?

I believe that IBM started this on PowerPC64 when they found that it
performs better in microbenchmarks and for certain database workloads
when you do have enough memory and do not work with small files or
many processes.

I also believe that the AIX memory management code is designed differently
enough that it suffers less from large pages (or suffers more from
small pages), so the hardware designers pushed Linux to do the same as AIX.

Why you'd do it on ARM64 I have no idea at all, other than copying what
IBM did on PowerPC. If your TLB is too small, hardware designers may
of course try to push the OS to using pages as large as possible for the
low-level benchmarks to suck less, but in general, you'd be better off
with transparent hugepages and 4kb or 8kb pages for real workloads.

	Arnd

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 12:08           ` Roger Pau Monné
@ 2015-09-18 14:10             ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-18 14:10 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, konrad.wilk,
	linux-kernel, david.vrabel, boris.ostrovsky, linux-arm-kernel

Hi Roger,

On 14/09/15 13:08, Roger Pau Monné wrote:
> Well, absolute numbers together with the standard deviation are IMHO the
> best way to provide those figures (ie: see ministat(1) output for
> example), but percentages should also be fine.
> 
> I'm just interested in knowing the performance difference between having
> this patches applied or not when using 4KB pages on the frontend and the
> backend.

I did some benchmark:
	DOM0: 1 VCPU, 4G of RAM based on 4.3-rc1 without this series
	GUEST: 4 VPUs, 4G of RAM, second disk associate to a nullbk device

I used fio with the following options

42sh> fio --name=test --ioengine=libaio --rw=read --numjobs=8 \
--iodepth=32 --time_based=1 --runtime=30 --bs=4KB \
--filename=/dev/xvdb--direct=1 --group_reporting=1 --iodepth_batch=16

The guest is also based on 4.3-rc1 with and without the series.

The overhead with my series is about 0.56%.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 12:08           ` Roger Pau Monné
                             ` (3 preceding siblings ...)
  (?)
@ 2015-09-18 14:10           ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-18 14:10 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel

Hi Roger,

On 14/09/15 13:08, Roger Pau Monné wrote:
> Well, absolute numbers together with the standard deviation are IMHO the
> best way to provide those figures (ie: see ministat(1) output for
> example), but percentages should also be fine.
> 
> I'm just interested in knowing the performance difference between having
> this patches applied or not when using 4KB pages on the frontend and the
> backend.

I did some benchmark:
	DOM0: 1 VCPU, 4G of RAM based on 4.3-rc1 without this series
	GUEST: 4 VPUs, 4G of RAM, second disk associate to a nullbk device

I used fio with the following options

42sh> fio --name=test --ioengine=libaio --rw=read --numjobs=8 \
--iodepth=32 --time_based=1 --runtime=30 --bs=4KB \
--filename=/dev/xvdb--direct=1 --group_reporting=1 --iodepth_batch=16

The guest is also based on 4.3-rc1 with and without the series.

The overhead with my series is about 0.56%.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-18 14:10             ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-18 14:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Roger,

On 14/09/15 13:08, Roger Pau Monn? wrote:
> Well, absolute numbers together with the standard deviation are IMHO the
> best way to provide those figures (ie: see ministat(1) output for
> example), but percentages should also be fine.
> 
> I'm just interested in knowing the performance difference between having
> this patches applied or not when using 4KB pages on the frontend and the
> backend.

I did some benchmark:
	DOM0: 1 VCPU, 4G of RAM based on 4.3-rc1 without this series
	GUEST: 4 VPUs, 4G of RAM, second disk associate to a nullbk device

I used fio with the following options

42sh> fio --name=test --ioengine=libaio --rw=read --numjobs=8 \
--iodepth=32 --time_based=1 --runtime=30 --bs=4KB \
--filename=/dev/xvdb--direct=1 --group_reporting=1 --iodepth_batch=16

The guest is also based on 4.3-rc1 with and without the series.

The overhead with my series is about 0.56%.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 15:23                   ` Roger Pau Monné
@ 2015-09-22 10:59                     ` Ian Campbell
  -1 siblings, 0 replies; 144+ messages in thread
From: Ian Campbell @ 2015-09-22 10:59 UTC (permalink / raw)
  To: Roger Pau Monné, Stefano Stabellini
  Cc: Julien Grall, xen-devel, wei.liu2, konrad.wilk, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel

On Mon, 2015-09-14 at 17:23 +0200, Roger Pau Monné wrote:
> I'm not saying that we shouldn't take those patches, I'm just saying
> that IMHO this is a workaround, and I would like to see a plan and
> somebody committed to have it fixed in a proper way, by introducing a
> 64KB PV block protocol.

In my view the basic unit of operation for all Xen interfaces (on x86 and
arm at least[0]) is 4K. The peer at either end of a PV protocol should
therefore be entitled to assume that at a minimum the other end supports
operation in 4K mode (i.e. 4K is the baseline).

Operations in larger sizes (which would necessarily be multiples of 4K) are
then an extension which is subject to a negotiation between the two ends,
it doesn't really matter if those larger sizes arise because of the use of
superpages or because the guest is internally using some larger basic page
size (which ARM calls a "granule", and where 64K comes from here).

I think this line of reasoning applies just as strongly to the hypercall
ABI as well BTW, they all use 4K as their basic unit, but might support
extensions to operation on multiples of that (negotiated either via a
specific error return and fallback or via the use of XENFEAT).

Ian.

[0] It's not implausible that some other architecture may exsit which
doesn't use 4K as the basic page szie)

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-14 15:23                   ` Roger Pau Monné
  (?)
@ 2015-09-22 10:59                   ` Ian Campbell
  -1 siblings, 0 replies; 144+ messages in thread
From: Ian Campbell @ 2015-09-22 10:59 UTC (permalink / raw)
  To: Roger Pau Monné, Stefano Stabellini
  Cc: wei.liu2, linux-kernel, Julien Grall, david.vrabel, xen-devel,
	boris.ostrovsky, linux-arm-kernel

On Mon, 2015-09-14 at 17:23 +0200, Roger Pau Monné wrote:
> I'm not saying that we shouldn't take those patches, I'm just saying
> that IMHO this is a workaround, and I would like to see a plan and
> somebody committed to have it fixed in a proper way, by introducing a
> 64KB PV block protocol.

In my view the basic unit of operation for all Xen interfaces (on x86 and
arm at least[0]) is 4K. The peer at either end of a PV protocol should
therefore be entitled to assume that at a minimum the other end supports
operation in 4K mode (i.e. 4K is the baseline).

Operations in larger sizes (which would necessarily be multiples of 4K) are
then an extension which is subject to a negotiation between the two ends,
it doesn't really matter if those larger sizes arise because of the use of
superpages or because the guest is internally using some larger basic page
size (which ARM calls a "granule", and where 64K comes from here).

I think this line of reasoning applies just as strongly to the hypercall
ABI as well BTW, they all use 4K as their basic unit, but might support
extensions to operation on multiples of that (negotiated either via a
specific error return and fallback or via the use of XENFEAT).

Ian.

[0] It's not implausible that some other architecture may exsit which
doesn't use 4K as the basic page szie)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-22 10:59                     ` Ian Campbell
  0 siblings, 0 replies; 144+ messages in thread
From: Ian Campbell @ 2015-09-22 10:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2015-09-14 at 17:23 +0200, Roger Pau Monn? wrote:
> I'm not saying that we shouldn't take those patches, I'm just saying
> that IMHO this is a workaround, and I would like to see a plan and
> somebody committed to have it fixed in a proper way, by introducing a
> 64KB PV block protocol.

In my view the basic unit of operation for all Xen interfaces (on x86 and
arm at least[0]) is 4K. The peer at either end of a PV protocol should
therefore be entitled to assume that at a minimum the other end supports
operation in 4K mode (i.e. 4K is the baseline).

Operations in larger sizes (which would necessarily be multiples of 4K) are
then an extension which is subject to a negotiation between the two ends,
it doesn't really matter if those larger sizes arise because of the use of
superpages or because the guest is internally using some larger basic page
size (which ARM calls a "granule", and where 64K comes from here).

I think this line of reasoning applies just as strongly to the hypercall
ABI as well BTW, they all use 4K as their basic unit, but might support
extensions to operation on multiples of that (negotiated either via a
specific error return and fallback or via the use of XENFEAT).

Ian.

[0] It's not implausible that some other architecture may exsit which
doesn't use 4K as the basic page szie)

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [Xen-devel] [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-07 15:33 ` Julien Grall
@ 2015-09-29 16:27   ` David Vrabel
  -1 siblings, 0 replies; 144+ messages in thread
From: David Vrabel @ 2015-09-29 16:27 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel, roger.pau

On 07/09/15 16:33, Julien Grall wrote:
> 
> A branch based on the latest xentip/for-linus-4.3 can be found here:
> 
> git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4

This has too many conflicts with the
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git for-linus-4.4
branch.

Please rebase.

David

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-07 15:33 ` Julien Grall
                   ` (37 preceding siblings ...)
  (?)
@ 2015-09-29 16:27 ` David Vrabel
  -1 siblings, 0 replies; 144+ messages in thread
From: David Vrabel @ 2015-09-29 16:27 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel, roger.pau

On 07/09/15 16:33, Julien Grall wrote:
> 
> A branch based on the latest xentip/for-linus-4.3 can be found here:
> 
> git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4

This has too many conflicts with the
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git for-linus-4.4
branch.

Please rebase.

David

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [Xen-devel] [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-29 16:27   ` David Vrabel
  0 siblings, 0 replies; 144+ messages in thread
From: David Vrabel @ 2015-09-29 16:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/09/15 16:33, Julien Grall wrote:
> 
> A branch based on the latest xentip/for-linus-4.3 can be found here:
> 
> git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4

This has too many conflicts with the
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git for-linus-4.4
branch.

Please rebase.

David

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [Xen-devel] [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-29 16:27   ` David Vrabel
@ 2015-09-29 16:33     ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-29 16:33 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	boris.ostrovsky, linux-arm-kernel, roger.pau

Hi David,

On 29/09/15 17:27, David Vrabel wrote:
> On 07/09/15 16:33, Julien Grall wrote:
>>
>> A branch based on the latest xentip/for-linus-4.3 can be found here:
>>
>> git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4
> 
> This has too many conflicts with the
> git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git for-linus-4.4
> branch.
> 
> Please rebase.

I was about to send a new version rebased (no changes in it so far
except for rebasing purpose). I will do it on top of for-linus-4.4.
I will also include the swiotlb changes at the end as it was reviewed by
Stefano.

BTW, some of the patches should, strictly speaking, go via the block and
network subsystem. Although, they heavily depends on other patches in
this series. Would it be fine to send all through xentip?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-29 16:27   ` David Vrabel
  (?)
  (?)
@ 2015-09-29 16:33   ` Julien Grall
  -1 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-29 16:33 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	boris.ostrovsky, linux-arm-kernel, roger.pau

Hi David,

On 29/09/15 17:27, David Vrabel wrote:
> On 07/09/15 16:33, Julien Grall wrote:
>>
>> A branch based on the latest xentip/for-linus-4.3 can be found here:
>>
>> git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4
> 
> This has too many conflicts with the
> git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git for-linus-4.4
> branch.
> 
> Please rebase.

I was about to send a new version rebased (no changes in it so far
except for rebasing purpose). I will do it on top of for-linus-4.4.
I will also include the swiotlb changes at the end as it was reviewed by
Stefano.

BTW, some of the patches should, strictly speaking, go via the block and
network subsystem. Although, they heavily depends on other patches in
this series. Would it be fine to send all through xentip?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [Xen-devel] [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-29 16:33     ` Julien Grall
  0 siblings, 0 replies; 144+ messages in thread
From: Julien Grall @ 2015-09-29 16:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi David,

On 29/09/15 17:27, David Vrabel wrote:
> On 07/09/15 16:33, Julien Grall wrote:
>>
>> A branch based on the latest xentip/for-linus-4.3 can be found here:
>>
>> git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4
> 
> This has too many conflicts with the
> git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git for-linus-4.4
> branch.
> 
> Please rebase.

I was about to send a new version rebased (no changes in it so far
except for rebasing purpose). I will do it on top of for-linus-4.4.
I will also include the swiotlb changes at the end as it was reviewed by
Stefano.

BTW, some of the patches should, strictly speaking, go via the block and
network subsystem. Although, they heavily depends on other patches in
this series. Would it be fine to send all through xentip?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [Xen-devel] [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-29 16:33     ` Julien Grall
@ 2015-09-29 16:36       ` David Vrabel
  -1 siblings, 0 replies; 144+ messages in thread
From: David Vrabel @ 2015-09-29 16:36 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	boris.ostrovsky, linux-arm-kernel, roger.pau

On 29/09/15 17:33, Julien Grall wrote:
> Hi David,
> 
> On 29/09/15 17:27, David Vrabel wrote:
>> On 07/09/15 16:33, Julien Grall wrote:
>>>
>>> A branch based on the latest xentip/for-linus-4.3 can be found here:
>>>
>>> git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4
>>
>> This has too many conflicts with the
>> git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git for-linus-4.4
>> branch.
>>
>> Please rebase.
> 
> I was about to send a new version rebased (no changes in it so far
> except for rebasing purpose). I will do it on top of for-linus-4.4.
> I will also include the swiotlb changes at the end as it was reviewed by
> Stefano.

Thanks.

> BTW, some of the patches should, strictly speaking, go via the block and
> network subsystem. Although, they heavily depends on other patches in
> this series. Would it be fine to send all through xentip?

Yes, if they've been acked by the appropriate maintainer.

David

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-29 16:33     ` Julien Grall
  (?)
@ 2015-09-29 16:36     ` David Vrabel
  -1 siblings, 0 replies; 144+ messages in thread
From: David Vrabel @ 2015-09-29 16:36 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: wei.liu2, ian.campbell, stefano.stabellini, linux-kernel,
	boris.ostrovsky, linux-arm-kernel, roger.pau

On 29/09/15 17:33, Julien Grall wrote:
> Hi David,
> 
> On 29/09/15 17:27, David Vrabel wrote:
>> On 07/09/15 16:33, Julien Grall wrote:
>>>
>>> A branch based on the latest xentip/for-linus-4.3 can be found here:
>>>
>>> git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4
>>
>> This has too many conflicts with the
>> git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git for-linus-4.4
>> branch.
>>
>> Please rebase.
> 
> I was about to send a new version rebased (no changes in it so far
> except for rebasing purpose). I will do it on top of for-linus-4.4.
> I will also include the swiotlb changes at the end as it was reviewed by
> Stefano.

Thanks.

> BTW, some of the patches should, strictly speaking, go via the block and
> network subsystem. Although, they heavily depends on other patches in
> this series. Would it be fine to send all through xentip?

Yes, if they've been acked by the appropriate maintainer.

David

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [Xen-devel] [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-09-29 16:36       ` David Vrabel
  0 siblings, 0 replies; 144+ messages in thread
From: David Vrabel @ 2015-09-29 16:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 29/09/15 17:33, Julien Grall wrote:
> Hi David,
> 
> On 29/09/15 17:27, David Vrabel wrote:
>> On 07/09/15 16:33, Julien Grall wrote:
>>>
>>> A branch based on the latest xentip/for-linus-4.3 can be found here:
>>>
>>> git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4
>>
>> This has too many conflicts with the
>> git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git for-linus-4.4
>> branch.
>>
>> Please rebase.
> 
> I was about to send a new version rebased (no changes in it so far
> except for rebasing purpose). I will do it on top of for-linus-4.4.
> I will also include the swiotlb changes at the end as it was reviewed by
> Stefano.

Thanks.

> BTW, some of the patches should, strictly speaking, go via the block and
> network subsystem. Although, they heavily depends on other patches in
> this series. Would it be fine to send all through xentip?

Yes, if they've been acked by the appropriate maintainer.

David

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-22 10:59                     ` Ian Campbell
@ 2015-10-06  9:28                       ` Roger Pau Monné
  -1 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-10-06  9:28 UTC (permalink / raw)
  To: Ian Campbell, Stefano Stabellini
  Cc: Julien Grall, xen-devel, wei.liu2, konrad.wilk, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel

El 22/09/15 a les 12.59, Ian Campbell ha escrit:
> On Mon, 2015-09-14 at 17:23 +0200, Roger Pau Monné wrote:
>> I'm not saying that we shouldn't take those patches, I'm just saying
>> that IMHO this is a workaround, and I would like to see a plan and
>> somebody committed to have it fixed in a proper way, by introducing a
>> 64KB PV block protocol.
> 
> In my view the basic unit of operation for all Xen interfaces (on x86 and
> arm at least[0]) is 4K. The peer at either end of a PV protocol should
> therefore be entitled to assume that at a minimum the other end supports
> operation in 4K mode (i.e. 4K is the baseline).
> 
> Operations in larger sizes (which would necessarily be multiples of 4K) are
> then an extension which is subject to a negotiation between the two ends,
> it doesn't really matter if those larger sizes arise because of the use of
> superpages or because the guest is internally using some larger basic page
> size (which ARM calls a "granule", and where 64K comes from here).
> 
> I think this line of reasoning applies just as strongly to the hypercall
> ABI as well BTW, they all use 4K as their basic unit, but might support
> extensions to operation on multiples of that (negotiated either via a
> specific error return and fallback or via the use of XENFEAT).

Yes, I completely agree that the current Xen interface is based on 4KB
chunks. What I'm trying to say is that the approach taken, which is
"let's not modify Xen" puts all the burden on the guest and it is going
to hurt us in the long run.

Are we expecting all guests that want to use 64KB page to implement all
the modifications that are needed? IMHO, those modifications are very
far from trivial, and we are putting the bar for Xen guest support very
high, and that is wrong. We are already struggling to have a decent set
of PV drivers on guests different than Linux, and this is certainly not
making things easier.

Do KVM guests also need such extensive set of modifications in order to
run using 64KB pages? I know it's not fair to compare KVM and Xen in
this regard, because the interfaces are very different, but that's what
developers are going to look at.

Roger.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-09-22 10:59                     ` Ian Campbell
  (?)
@ 2015-10-06  9:28                     ` Roger Pau Monné
  -1 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-10-06  9:28 UTC (permalink / raw)
  To: Ian Campbell, Stefano Stabellini
  Cc: wei.liu2, linux-kernel, Julien Grall, david.vrabel, xen-devel,
	boris.ostrovsky, linux-arm-kernel

El 22/09/15 a les 12.59, Ian Campbell ha escrit:
> On Mon, 2015-09-14 at 17:23 +0200, Roger Pau Monné wrote:
>> I'm not saying that we shouldn't take those patches, I'm just saying
>> that IMHO this is a workaround, and I would like to see a plan and
>> somebody committed to have it fixed in a proper way, by introducing a
>> 64KB PV block protocol.
> 
> In my view the basic unit of operation for all Xen interfaces (on x86 and
> arm at least[0]) is 4K. The peer at either end of a PV protocol should
> therefore be entitled to assume that at a minimum the other end supports
> operation in 4K mode (i.e. 4K is the baseline).
> 
> Operations in larger sizes (which would necessarily be multiples of 4K) are
> then an extension which is subject to a negotiation between the two ends,
> it doesn't really matter if those larger sizes arise because of the use of
> superpages or because the guest is internally using some larger basic page
> size (which ARM calls a "granule", and where 64K comes from here).
> 
> I think this line of reasoning applies just as strongly to the hypercall
> ABI as well BTW, they all use 4K as their basic unit, but might support
> extensions to operation on multiples of that (negotiated either via a
> specific error return and fallback or via the use of XENFEAT).

Yes, I completely agree that the current Xen interface is based on 4KB
chunks. What I'm trying to say is that the approach taken, which is
"let's not modify Xen" puts all the burden on the guest and it is going
to hurt us in the long run.

Are we expecting all guests that want to use 64KB page to implement all
the modifications that are needed? IMHO, those modifications are very
far from trivial, and we are putting the bar for Xen guest support very
high, and that is wrong. We are already struggling to have a decent set
of PV drivers on guests different than Linux, and this is certainly not
making things easier.

Do KVM guests also need such extensive set of modifications in order to
run using 64KB pages? I know it's not fair to compare KVM and Xen in
this regard, because the interfaces are very different, but that's what
developers are going to look at.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-10-06  9:28                       ` Roger Pau Monné
  0 siblings, 0 replies; 144+ messages in thread
From: Roger Pau Monné @ 2015-10-06  9:28 UTC (permalink / raw)
  To: linux-arm-kernel

El 22/09/15 a les 12.59, Ian Campbell ha escrit:
> On Mon, 2015-09-14 at 17:23 +0200, Roger Pau Monn? wrote:
>> I'm not saying that we shouldn't take those patches, I'm just saying
>> that IMHO this is a workaround, and I would like to see a plan and
>> somebody committed to have it fixed in a proper way, by introducing a
>> 64KB PV block protocol.
> 
> In my view the basic unit of operation for all Xen interfaces (on x86 and
> arm at least[0]) is 4K. The peer at either end of a PV protocol should
> therefore be entitled to assume that at a minimum the other end supports
> operation in 4K mode (i.e. 4K is the baseline).
> 
> Operations in larger sizes (which would necessarily be multiples of 4K) are
> then an extension which is subject to a negotiation between the two ends,
> it doesn't really matter if those larger sizes arise because of the use of
> superpages or because the guest is internally using some larger basic page
> size (which ARM calls a "granule", and where 64K comes from here).
> 
> I think this line of reasoning applies just as strongly to the hypercall
> ABI as well BTW, they all use 4K as their basic unit, but might support
> extensions to operation on multiples of that (negotiated either via a
> specific error return and fallback or via the use of XENFEAT).

Yes, I completely agree that the current Xen interface is based on 4KB
chunks. What I'm trying to say is that the approach taken, which is
"let's not modify Xen" puts all the burden on the guest and it is going
to hurt us in the long run.

Are we expecting all guests that want to use 64KB page to implement all
the modifications that are needed? IMHO, those modifications are very
far from trivial, and we are putting the bar for Xen guest support very
high, and that is wrong. We are already struggling to have a decent set
of PV drivers on guests different than Linux, and this is certainly not
making things easier.

Do KVM guests also need such extensive set of modifications in order to
run using 64KB pages? I know it's not fair to compare KVM and Xen in
this regard, because the interfaces are very different, but that's what
developers are going to look at.

Roger.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-10-06  9:28                       ` Roger Pau Monné
@ 2015-10-06 10:17                         ` Ian Campbell
  -1 siblings, 0 replies; 144+ messages in thread
From: Ian Campbell @ 2015-10-06 10:17 UTC (permalink / raw)
  To: Roger Pau Monné, Stefano Stabellini
  Cc: Julien Grall, xen-devel, wei.liu2, konrad.wilk, linux-kernel,
	david.vrabel, boris.ostrovsky, linux-arm-kernel

On Tue, 2015-10-06 at 11:28 +0200, Roger Pau Monné wrote:
> El 22/09/15 a les 12.59, Ian Campbell ha escrit:
> > On Mon, 2015-09-14 at 17:23 +0200, Roger Pau Monné wrote:
> > > I'm not saying that we shouldn't take those patches, I'm just saying
> > > that IMHO this is a workaround, and I would like to see a plan and
> > > somebody committed to have it fixed in a proper way, by introducing a
> > > 64KB PV block protocol.
> > 
> > In my view the basic unit of operation for all Xen interfaces (on x86
> > and
> > arm at least[0]) is 4K. The peer at either end of a PV protocol should
> > therefore be entitled to assume that at a minimum the other end
> > supports
> > operation in 4K mode (i.e. 4K is the baseline).
> > 
> > Operations in larger sizes (which would necessarily be multiples of 4K)
> > are
> > then an extension which is subject to a negotiation between the two
> > ends,
> > it doesn't really matter if those larger sizes arise because of the use
> > of
> > superpages or because the guest is internally using some larger basic
> > page
> > size (which ARM calls a "granule", and where 64K comes from here).
> > 
> > I think this line of reasoning applies just as strongly to the
> > hypercall
> > ABI as well BTW, they all use 4K as their basic unit, but might support
> > extensions to operation on multiples of that (negotiated either via a
> > specific error return and fallback or via the use of XENFEAT).
> 
> Yes, I completely agree that the current Xen interface is based on 4KB
> chunks. What I'm trying to say is that the approach taken, which is
> "let's not modify Xen" puts all the burden on the guest and it is going
> to hurt us in the long run.
> 
> Are we expecting all guests that want to use 64KB page to implement all
> the modifications that are needed? IMHO, those modifications are very
> far from trivial, and we are putting the bar for Xen guest support very
> high, and that is wrong. We are already struggling to have a decent set
> of PV drivers on guests different than Linux, and this is certainly not
> making things easier.

I think you are underestimating the impact/burden of not having a common
lowest common denominator interface which all front and backends should be
expected to provide in order to interact successfully with each other at a
basic level.

With your approach of requiring a 64KB guest to use native 64K interfaces
only you end up with guests containing front and backends which simply
cannot communicate at all because they do not have a common denominator or
a combinatorial explosion of guests supporting different sets of mutually
incompatible interfaces with no common ground.

I think that fragmentation is a far worse danger than of developers being
unwilling or unable to provide helpers which iterate over pages in whatever
size the guest happens to have (either larger granule or superpages) and
performing however many of whatever ops size has been negotiated with the
peer.

I'm also not convinced that Julien's patches are any less trivial than the
equivalent code to select the correct operations based on the kernels page
size without breaking them up, the developer still needs to be aware of the
4K vs large distinction if they want to support larger than 4K pages.

Developers can still choose to only support 4K pages in their OS, if they
feel this scheme of supporting larger pages is too intrusive or hard I
doubt they are going to want to implement a scheme based on an alternative
large-page protocol either.

> Do KVM guests also need such extensive set of modifications in order to
> run using 64KB pages? I know it's not fair to compare KVM and Xen in
> this regard, because the interfaces are very different, but that's what
> developers are going to look at.

I'm not sure that virtio cares about pages at all, just addresses which
have no particular alignment or granularity constraints, since they don't
have grant tables nor any requirement to mediate accesses to guest memory
by the host at any granularity at all.

Ian.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-10-06  9:28                       ` Roger Pau Monné
  (?)
@ 2015-10-06 10:17                       ` Ian Campbell
  -1 siblings, 0 replies; 144+ messages in thread
From: Ian Campbell @ 2015-10-06 10:17 UTC (permalink / raw)
  To: Roger Pau Monné, Stefano Stabellini
  Cc: wei.liu2, linux-kernel, Julien Grall, david.vrabel, xen-devel,
	boris.ostrovsky, linux-arm-kernel

On Tue, 2015-10-06 at 11:28 +0200, Roger Pau Monné wrote:
> El 22/09/15 a les 12.59, Ian Campbell ha escrit:
> > On Mon, 2015-09-14 at 17:23 +0200, Roger Pau Monné wrote:
> > > I'm not saying that we shouldn't take those patches, I'm just saying
> > > that IMHO this is a workaround, and I would like to see a plan and
> > > somebody committed to have it fixed in a proper way, by introducing a
> > > 64KB PV block protocol.
> > 
> > In my view the basic unit of operation for all Xen interfaces (on x86
> > and
> > arm at least[0]) is 4K. The peer at either end of a PV protocol should
> > therefore be entitled to assume that at a minimum the other end
> > supports
> > operation in 4K mode (i.e. 4K is the baseline).
> > 
> > Operations in larger sizes (which would necessarily be multiples of 4K)
> > are
> > then an extension which is subject to a negotiation between the two
> > ends,
> > it doesn't really matter if those larger sizes arise because of the use
> > of
> > superpages or because the guest is internally using some larger basic
> > page
> > size (which ARM calls a "granule", and where 64K comes from here).
> > 
> > I think this line of reasoning applies just as strongly to the
> > hypercall
> > ABI as well BTW, they all use 4K as their basic unit, but might support
> > extensions to operation on multiples of that (negotiated either via a
> > specific error return and fallback or via the use of XENFEAT).
> 
> Yes, I completely agree that the current Xen interface is based on 4KB
> chunks. What I'm trying to say is that the approach taken, which is
> "let's not modify Xen" puts all the burden on the guest and it is going
> to hurt us in the long run.
> 
> Are we expecting all guests that want to use 64KB page to implement all
> the modifications that are needed? IMHO, those modifications are very
> far from trivial, and we are putting the bar for Xen guest support very
> high, and that is wrong. We are already struggling to have a decent set
> of PV drivers on guests different than Linux, and this is certainly not
> making things easier.

I think you are underestimating the impact/burden of not having a common
lowest common denominator interface which all front and backends should be
expected to provide in order to interact successfully with each other at a
basic level.

With your approach of requiring a 64KB guest to use native 64K interfaces
only you end up with guests containing front and backends which simply
cannot communicate at all because they do not have a common denominator or
a combinatorial explosion of guests supporting different sets of mutually
incompatible interfaces with no common ground.

I think that fragmentation is a far worse danger than of developers being
unwilling or unable to provide helpers which iterate over pages in whatever
size the guest happens to have (either larger granule or superpages) and
performing however many of whatever ops size has been negotiated with the
peer.

I'm also not convinced that Julien's patches are any less trivial than the
equivalent code to select the correct operations based on the kernels page
size without breaking them up, the developer still needs to be aware of the
4K vs large distinction if they want to support larger than 4K pages.

Developers can still choose to only support 4K pages in their OS, if they
feel this scheme of supporting larger pages is too intrusive or hard I
doubt they are going to want to implement a scheme based on an alternative
large-page protocol either.

> Do KVM guests also need such extensive set of modifications in order to
> run using 64KB pages? I know it's not fair to compare KVM and Xen in
> this regard, because the interfaces are very different, but that's what
> developers are going to look at.

I'm not sure that virtio cares about pages at all, just addresses which
have no particular alignment or granularity constraints, since they don't
have grant tables nor any requirement to mediate accesses to guest memory
by the host at any granularity at all.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-10-06 10:17                         ` Ian Campbell
  0 siblings, 0 replies; 144+ messages in thread
From: Ian Campbell @ 2015-10-06 10:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2015-10-06 at 11:28 +0200, Roger Pau Monn? wrote:
> El 22/09/15 a les 12.59, Ian Campbell ha escrit:
> > On Mon, 2015-09-14 at 17:23 +0200, Roger Pau Monn? wrote:
> > > I'm not saying that we shouldn't take those patches, I'm just saying
> > > that IMHO this is a workaround, and I would like to see a plan and
> > > somebody committed to have it fixed in a proper way, by introducing a
> > > 64KB PV block protocol.
> > 
> > In my view the basic unit of operation for all Xen interfaces (on x86
> > and
> > arm at least[0]) is 4K. The peer at either end of a PV protocol should
> > therefore be entitled to assume that at a minimum the other end
> > supports
> > operation in 4K mode (i.e. 4K is the baseline).
> > 
> > Operations in larger sizes (which would necessarily be multiples of 4K)
> > are
> > then an extension which is subject to a negotiation between the two
> > ends,
> > it doesn't really matter if those larger sizes arise because of the use
> > of
> > superpages or because the guest is internally using some larger basic
> > page
> > size (which ARM calls a "granule", and where 64K comes from here).
> > 
> > I think this line of reasoning applies just as strongly to the
> > hypercall
> > ABI as well BTW, they all use 4K as their basic unit, but might support
> > extensions to operation on multiples of that (negotiated either via a
> > specific error return and fallback or via the use of XENFEAT).
> 
> Yes, I completely agree that the current Xen interface is based on 4KB
> chunks. What I'm trying to say is that the approach taken, which is
> "let's not modify Xen" puts all the burden on the guest and it is going
> to hurt us in the long run.
> 
> Are we expecting all guests that want to use 64KB page to implement all
> the modifications that are needed? IMHO, those modifications are very
> far from trivial, and we are putting the bar for Xen guest support very
> high, and that is wrong. We are already struggling to have a decent set
> of PV drivers on guests different than Linux, and this is certainly not
> making things easier.

I think you are underestimating the impact/burden of not having a common
lowest common denominator interface which all front and backends should be
expected to provide in order to interact successfully with each other at a
basic level.

With your approach of requiring a 64KB guest to use native 64K interfaces
only you end up with guests containing front and backends which simply
cannot communicate at all because they do not have a common denominator or
a combinatorial explosion of guests supporting different sets of mutually
incompatible interfaces with no common ground.

I think that fragmentation is a far worse danger than of developers being
unwilling or unable to provide helpers which iterate over pages in whatever
size the guest happens to have (either larger granule or superpages) and
performing however many of whatever ops size has been negotiated with the
peer.

I'm also not convinced that Julien's patches are any less trivial than the
equivalent code to select the correct operations based on the kernels page
size without breaking them up, the developer still needs to be aware of the
4K vs large distinction if they want to support larger than 4K pages.

Developers can still choose to only support 4K pages in their OS, if they
feel this scheme of supporting larger pages is too intrusive or hard I
doubt they are going to want to implement a scheme based on an alternative
large-page protocol either.

> Do KVM guests also need such extensive set of modifications in order to
> run using 64KB pages? I know it's not fair to compare KVM and Xen in
> this regard, because the interfaces are very different, but that's what
> developers are going to look at.

I'm not sure that virtio cares about pages at all, just addresses which
have no particular alignment or granularity constraints, since they don't
have grant tables nor any requirement to mediate accesses to guest memory
by the host at any granularity at all.

Ian.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-10-06  9:28                       ` Roger Pau Monné
@ 2015-10-06 13:55                         ` Stefano Stabellini
  -1 siblings, 0 replies; 144+ messages in thread
From: Stefano Stabellini @ 2015-10-06 13:55 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Ian Campbell, Stefano Stabellini, Julien Grall, xen-devel,
	wei.liu2, konrad.wilk, linux-kernel, david.vrabel,
	boris.ostrovsky, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 455 bytes --]

On Tue, 6 Oct 2015, Roger Pau Monné wrote:
> Do KVM guests also need such extensive set of modifications in order to
> run using 64KB pages? I know it's not fair to compare KVM and Xen in
> this regard, because the interfaces are very different, but that's what
> developers are going to look at.

QEMU doesn't need extensive modifications to run 64K guests on a 4K
host, but on the other hand KVM needs its own set of non-trivial
modifications to do it.

^ permalink raw reply	[flat|nested] 144+ messages in thread

* Re: [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
  2015-10-06  9:28                       ` Roger Pau Monné
                                         ` (2 preceding siblings ...)
  (?)
@ 2015-10-06 13:55                       ` Stefano Stabellini
  -1 siblings, 0 replies; 144+ messages in thread
From: Stefano Stabellini @ 2015-10-06 13:55 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: wei.liu2, Ian Campbell, Stefano Stabellini, linux-kernel,
	Julien Grall, david.vrabel, xen-devel, boris.ostrovsky,
	linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 463 bytes --]

On Tue, 6 Oct 2015, Roger Pau Monné wrote:
> Do KVM guests also need such extensive set of modifications in order to
> run using 64KB pages? I know it's not fair to compare KVM and Xen in
> this regard, because the interfaces are very different, but that's what
> developers are going to look at.

QEMU doesn't need extensive modifications to run 64K guests on a 4K
host, but on the other hand KVM needs its own set of non-trivial
modifications to do it.

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 144+ messages in thread

* [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux
@ 2015-10-06 13:55                         ` Stefano Stabellini
  0 siblings, 0 replies; 144+ messages in thread
From: Stefano Stabellini @ 2015-10-06 13:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 6 Oct 2015, Roger Pau Monn? wrote:
> Do KVM guests also need such extensive set of modifications in order to
> run using 64KB pages? I know it's not fair to compare KVM and Xen in
> this regard, because the interfaces are very different, but that's what
> developers are going to look at.

QEMU doesn't need extensive modifications to run 64K guests on a 4K
host, but on the other hand KVM needs its own set of non-trivial
modifications to do it.

^ permalink raw reply	[flat|nested] 144+ messages in thread

end of thread, other threads:[~2015-10-06 13:58 UTC | newest]

Thread overview: 144+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-07 15:33 [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux Julien Grall
2015-09-07 15:33 ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 01/20] net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 02/20] arm/xen: Drop pte_mfn and mfn_pte Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 03/20] xen: Add Xen specific page definition Julien Grall
2015-09-07 15:33 ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 04/20] xen/grant: Introduce helpers to split a page into grant Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 05/20] xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 06/20] block/xen-blkfront: Split blkif_queue_request in 2 Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 07/20] block/xen-blkfront: Store a page rather a pfn in the grant structure Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 08/20] block/xen-blkfront: split get_grant in 2 Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 09/20] xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB pages Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 10/20] xen/xenbus: Use Xen page definition Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 11/20] tty/hvc: xen: Use xen " Julien Grall
2015-09-07 15:33 ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 16:39   ` Stefano Stabellini
2015-09-07 16:39     ` Stefano Stabellini
2015-09-07 16:39   ` Stefano Stabellini
2015-09-07 15:33 ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 13/20] xen/events: fifo: Make it running on 64KB granularity Julien Grall
2015-09-07 15:33 ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 14/20] xen/grant-table: " Julien Grall
2015-09-07 15:33 ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 15/20] block/xen-blkfront: Make it running on 64KB page granularity Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 16/20] block/xen-blkback: " Julien Grall
2015-09-07 15:33 ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 17/20] net/xen-netfront: " Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 18/20] net/xen-netback: " Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 16:57   ` Wei Liu
2015-09-07 16:57   ` Wei Liu
2015-09-07 16:57     ` Wei Liu
2015-09-08 11:07     ` Julien Grall
2015-09-08 11:07     ` [Xen-devel] " Julien Grall
2015-09-08 11:07       ` Julien Grall
2015-09-08 11:09       ` Wei Liu
2015-09-08 11:09         ` Wei Liu
2015-09-08 11:09       ` Wei Liu
2015-09-07 15:33 ` [PATCH v4 19/20] xen/privcmd: Add support for Linux " Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` Julien Grall
2015-09-07 15:33 ` [PATCH v4 20/20] arm/xen: Add support for " Julien Grall
2015-09-07 15:33   ` Julien Grall
2015-09-07 15:33 ` Julien Grall
2015-09-11 19:39 ` [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux Julien Grall
2015-09-11 19:39 ` [Xen-devel] " Julien Grall
2015-09-11 19:39   ` Julien Grall
2015-09-14  8:56 ` Roger Pau Monné
2015-09-14  8:56   ` Roger Pau Monné
2015-09-14 10:40   ` Julien Grall
2015-09-14 10:40     ` Julien Grall
2015-09-14 11:04     ` Roger Pau Monné
2015-09-14 11:04     ` Roger Pau Monné
2015-09-14 11:04       ` Roger Pau Monné
2015-09-14 11:21       ` Julien Grall
2015-09-14 11:21       ` Julien Grall
2015-09-14 11:21         ` Julien Grall
2015-09-14 12:08         ` Roger Pau Monné
2015-09-14 12:08         ` Roger Pau Monné
2015-09-14 12:08           ` Roger Pau Monné
2015-09-14 12:47           ` Julien Grall
2015-09-14 12:47           ` Julien Grall
2015-09-14 12:47             ` Julien Grall
2015-09-14 14:29             ` Roger Pau Monné
2015-09-14 14:29               ` Roger Pau Monné
2015-09-14 14:46               ` Julien Grall
2015-09-14 14:46                 ` Julien Grall
2015-09-14 14:46               ` Julien Grall
2015-09-14 14:54               ` Stefano Stabellini
2015-09-14 14:54                 ` Stefano Stabellini
2015-09-14 15:23                 ` Roger Pau Monné
2015-09-14 15:23                 ` Roger Pau Monné
2015-09-14 15:23                   ` Roger Pau Monné
2015-09-22 10:59                   ` Ian Campbell
2015-09-22 10:59                   ` Ian Campbell
2015-09-22 10:59                     ` Ian Campbell
2015-10-06  9:28                     ` Roger Pau Monné
2015-10-06  9:28                     ` Roger Pau Monné
2015-10-06  9:28                       ` Roger Pau Monné
2015-10-06 10:17                       ` Ian Campbell
2015-10-06 10:17                       ` Ian Campbell
2015-10-06 10:17                         ` Ian Campbell
2015-10-06 13:55                       ` Stefano Stabellini
2015-10-06 13:55                       ` Stefano Stabellini
2015-10-06 13:55                         ` Stefano Stabellini
2015-09-14 14:54               ` Stefano Stabellini
2015-09-14 14:29             ` Roger Pau Monné
2015-09-18 14:10           ` Julien Grall
2015-09-18 14:10             ` Julien Grall
2015-09-18 14:10           ` Julien Grall
2015-09-14 11:32       ` Arnd Bergmann
2015-09-14 11:32       ` Arnd Bergmann
2015-09-14 11:32         ` Arnd Bergmann
2015-09-15 13:14         ` [Xen-devel] " David Vrabel
2015-09-15 13:14           ` David Vrabel
2015-09-15 13:24           ` Arnd Bergmann
2015-09-15 13:24           ` [Xen-devel] " Arnd Bergmann
2015-09-15 13:24             ` Arnd Bergmann
2015-09-15 13:14         ` David Vrabel
2015-09-14 10:40   ` Julien Grall
2015-09-14  8:56 ` Roger Pau Monné
2015-09-29 16:27 ` [Xen-devel] " David Vrabel
2015-09-29 16:27   ` David Vrabel
2015-09-29 16:33   ` Julien Grall
2015-09-29 16:33     ` Julien Grall
2015-09-29 16:36     ` David Vrabel
2015-09-29 16:36     ` [Xen-devel] " David Vrabel
2015-09-29 16:36       ` David Vrabel
2015-09-29 16:33   ` Julien Grall
2015-09-29 16:27 ` David Vrabel
  -- strict thread matches above, loose matches on Subject: below --
2015-09-07 15:33 Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.