All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] [PATCH 0/6] usb:samsung: Exynos4 SoC USB code improvements
@ 2014-01-31 12:16 Lukasz Majewski
  2014-01-31 12:16 ` [U-Boot] [PATCH 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment Lukasz Majewski
                   ` (6 more replies)
  0 siblings, 7 replies; 36+ messages in thread
From: Lukasz Majewski @ 2014-01-31 12:16 UTC (permalink / raw
  To: u-boot

This patch series comprises several improvements for Exynos4 USB code.

The most notable is transmission speed improvement (measured on Trats):
From: 9.51 MiB/s up to 27 MiB/s

This is due to UDC driver optimizations.

Also a code cleanup for THOR gadget has been included.

Lukasz Majewski (6):
  usb:gadget:ums: Replace malloc calls with memalign to fix cache
    buffer alignment
  usb:udc:samsung: Remove redundant cache operation from Samsung UDC
    driver
  usb:udc:samsung: Allow burst transfers for non EP0 endpints
  usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC
    driver
  usb:gadget:f_thor: Allocate request up to THOR_PACKET_SIZE not
    ep->maxpacket
  usb:gadget:f_thor: cosmetic: Remove debug memset

 drivers/usb/gadget/f_mass_storage.c       |    4 +-
 drivers/usb/gadget/f_thor.c               |    4 +-
 drivers/usb/gadget/s3c_udc_otg.c          |   19 ++++----
 drivers/usb/gadget/s3c_udc_otg_xfer_dma.c |   74 ++++++++++-------------------
 include/usb/s3c_udc.h                     |    5 +-
 5 files changed, 38 insertions(+), 68 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment
  2014-01-31 12:16 [U-Boot] [PATCH 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
@ 2014-01-31 12:16 ` Lukasz Majewski
  2014-02-01  2:48   ` Marek Vasut
  2014-01-31 12:16 ` [U-Boot] [PATCH 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver Lukasz Majewski
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Lukasz Majewski @ 2014-01-31 12:16 UTC (permalink / raw
  To: u-boot

Calls to malloc() have been replaced by memalign. It provides proper
buffer alignment.

Change-Id: Iffcf42082a125f848124bc84d1a95353493798a4
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>
---
 drivers/usb/gadget/f_mass_storage.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/f_mass_storage.c b/drivers/usb/gadget/f_mass_storage.c
index b1fe8bd..f896169 100644
--- a/drivers/usb/gadget/f_mass_storage.c
+++ b/drivers/usb/gadget/f_mass_storage.c
@@ -2515,7 +2515,7 @@ static struct fsg_common *fsg_common_init(struct fsg_common *common,
 buffhds_first_it:
 		bh->inreq_busy = 0;
 		bh->outreq_busy = 0;
-		bh->buf = kmalloc(FSG_BUFLEN, GFP_KERNEL);
+		bh->buf = memalign(CONFIG_SYS_CACHELINE_SIZE, FSG_BUFLEN);
 		if (unlikely(!bh->buf)) {
 			rc = -ENOMEM;
 			goto error_release;
@@ -2622,7 +2622,7 @@ usb_copy_descriptors(struct usb_descriptor_header **src)
 		bytes += (*tmp)->bLength;
 	bytes += (n_desc + 1) * sizeof(*tmp);
 
-	mem = kmalloc(bytes, GFP_KERNEL);
+	mem = memalign(CONFIG_SYS_CACHELINE_SIZE, bytes);
 	if (!mem)
 		return NULL;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver
  2014-01-31 12:16 [U-Boot] [PATCH 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
  2014-01-31 12:16 ` [U-Boot] [PATCH 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment Lukasz Majewski
@ 2014-01-31 12:16 ` Lukasz Majewski
  2014-02-01  2:50   ` Marek Vasut
  2014-01-31 12:16 ` [U-Boot] [PATCH 3/6] usb:udc:samsung: Allow burst transfers for non EP0 endpints Lukasz Majewski
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Lukasz Majewski @ 2014-01-31 12:16 UTC (permalink / raw
  To: u-boot

A set of cache operations (both invalidation and flush) were redundant
in the S3C HS OTG Samsung driver.

Test condition
- test HW + measurement: Trats - Exynos4210 rev.1
- test HW Trats2 - Exynos4412 rev.1
400 MiB compressed rootfs image download with `thor 0 mmc 0`

Measurements:

Base values (without improvement):
Transmission speed: 9.51 MiB/s

After the change:
Transmission speed: 10.15 MiB/s

Change-Id: I0d55da4de724b14e988fefdb37a012562f7da8a2
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>
---
 drivers/usb/gadget/s3c_udc_otg_xfer_dma.c |   13 -------------
 1 file changed, 13 deletions(-)

diff --git a/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c b/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
index 1cbf8f6..eaa3a20 100644
--- a/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
+++ b/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
@@ -29,10 +29,6 @@ static inline void s3c_udc_ep0_zlp(struct s3c_udc *dev)
 {
 	u32 ep_ctrl;
 
-	flush_dcache_range((unsigned long) usb_ctrl_dma_addr,
-			   (unsigned long) usb_ctrl_dma_addr
-			   + DMA_BUFFER_SIZE);
-
 	writel(usb_ctrl_dma_addr, &reg->in_endp[EP0_CON].diepdma);
 	writel(DIEPT_SIZ_PKT_CNT(1), &reg->in_endp[EP0_CON].dieptsiz);
 
@@ -52,10 +48,6 @@ void s3c_udc_pre_setup(void)
 	debug_cond(DEBUG_IN_EP,
 		   "%s : Prepare Setup packets.\n", __func__);
 
-	invalidate_dcache_range((unsigned long) usb_ctrl_dma_addr,
-				(unsigned long) usb_ctrl_dma_addr
-				+ DMA_BUFFER_SIZE);
-
 	writel(DOEPT_SIZ_PKT_CNT(1) | sizeof(struct usb_ctrlrequest),
 	       &reg->out_endp[EP0_CON].doeptsiz);
 	writel(usb_ctrl_dma_addr, &reg->out_endp[EP0_CON].doepdma);
@@ -115,11 +107,6 @@ static int setdma_rx(struct s3c_ep *ep, struct s3c_request *req)
 	ep->len = length;
 	ep->dma_buf = buf;
 
-	invalidate_dcache_range((unsigned long) ep->dev->dma_buf[ep_num],
-				(unsigned long) ep->dev->dma_buf[ep_num]
-				+ ROUND(ep->ep.maxpacket,
-					CONFIG_SYS_CACHELINE_SIZE));
-
 	if (length == 0)
 		pktcnt = 1;
 	else
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 3/6] usb:udc:samsung: Allow burst transfers for non EP0 endpints
  2014-01-31 12:16 [U-Boot] [PATCH 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
  2014-01-31 12:16 ` [U-Boot] [PATCH 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment Lukasz Majewski
  2014-01-31 12:16 ` [U-Boot] [PATCH 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver Lukasz Majewski
@ 2014-01-31 12:16 ` Lukasz Majewski
  2014-01-31 12:16 ` [U-Boot] [PATCH 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver Lukasz Majewski
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Lukasz Majewski @ 2014-01-31 12:16 UTC (permalink / raw
  To: u-boot

This patch removed obscure restriction on the HW setting of DMA transfers.
Before this change each transaction sent up to 512 bytes (with packet count
equal to 1) for non EP0 transfer.

Now it is possible to setup DMA transaction up to DMA_BUFFER_SIZE.

Test condition
- test HW + measurement: Trats - Exynos4210 rev.1
- test HW Trats2 - Exynos4412 rev.1
400 MiB compressed rootfs image download with `thor 0 mmc 0`

Measurement:
Transmission speed: 20.74 MiB/s

Change-Id: Id7f60b96cc5804038a2b4050fb6ddc96414cab5d
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>
---
 drivers/usb/gadget/s3c_udc_otg_xfer_dma.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c b/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
index eaa3a20..274b68c 100644
--- a/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
+++ b/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
@@ -101,18 +101,17 @@ static int setdma_rx(struct s3c_ep *ep, struct s3c_request *req)
 	u32 ep_num = ep_index(ep);
 
 	buf = req->req.buf + req->req.actual;
-
-	length = min(req->req.length - req->req.actual, (int)ep->ep.maxpacket);
+	length = min(req->req.length - req->req.actual,
+		     ep_num ? DMA_BUFFER_SIZE : ep->ep.maxpacket);
 
 	ep->len = length;
 	ep->dma_buf = buf;
 
-	if (length == 0)
+	if (ep_num == EP0_CON || length == 0)
 		pktcnt = 1;
 	else
 		pktcnt = (length - 1)/(ep->ep.maxpacket) + 1;
 
-	pktcnt = 1;
 	ctrl =  readl(&reg->out_endp[ep_num].doepctl);
 
 	writel(the_controller->dma_addr[ep_index(ep)+1],
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver
  2014-01-31 12:16 [U-Boot] [PATCH 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
                   ` (2 preceding siblings ...)
  2014-01-31 12:16 ` [U-Boot] [PATCH 3/6] usb:udc:samsung: Allow burst transfers for non EP0 endpints Lukasz Majewski
@ 2014-01-31 12:16 ` Lukasz Majewski
  2014-02-01  2:55   ` Marek Vasut
  2014-01-31 12:16 ` [U-Boot] [PATCH 5/6] usb:gadget:f_thor: Allocate request up to THOR_PACKET_SIZE not ep->maxpacket Lukasz Majewski
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Lukasz Majewski @ 2014-01-31 12:16 UTC (permalink / raw
  To: u-boot

The Samsung's UDC driver is not anymore copying data from USB requests to
data aligned internal buffers. Now it works directly in data allocated in
the upper layers like UMS, DFU, THOR.

This change is possible since those gadgets now take care to allocate
buffers aligned to cache line (CONFIG_SYS_CACHELINE_SIZE ).

Previously the UDC needed to copy this data to internal aligned buffer to
prevent from unaligned access exceptions.

Test condition
- test HW + measurement: Trats - Exynos4210 rev.1
- test HW Trats2 - Exynos4412 rev.1
400 MiB compressed rootfs image download with `thor 0 mmc 0`

Measurement:
Transmission speed: 27.04 MiB/s

Change-Id: I1df1fbafc72ec703f0367ddee3fedf3a3f5523ed
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>
---
 drivers/usb/gadget/s3c_udc_otg.c          |   19 +++++-----
 drivers/usb/gadget/s3c_udc_otg_xfer_dma.c |   54 ++++++++++++-----------------
 include/usb/s3c_udc.h                     |    5 +--
 3 files changed, 32 insertions(+), 46 deletions(-)

diff --git a/drivers/usb/gadget/s3c_udc_otg.c b/drivers/usb/gadget/s3c_udc_otg.c
index ba17a04..63d4487 100644
--- a/drivers/usb/gadget/s3c_udc_otg.c
+++ b/drivers/usb/gadget/s3c_udc_otg.c
@@ -843,7 +843,7 @@ static struct s3c_udc memory = {
 int s3c_udc_probe(struct s3c_plat_otg_data *pdata)
 {
 	struct s3c_udc *dev = &memory;
-	int retval = 0, i;
+	int retval = 0;
 
 	debug("%s: %p\n", __func__, pdata);
 
@@ -864,16 +864,15 @@ int s3c_udc_probe(struct s3c_plat_otg_data *pdata)
 
 	the_controller = dev;
 
-	for (i = 0; i < S3C_MAX_ENDPOINTS+1; i++) {
-		dev->dma_buf[i] = memalign(CONFIG_SYS_CACHELINE_SIZE,
-					   DMA_BUFFER_SIZE);
-		dev->dma_addr[i] = (dma_addr_t) dev->dma_buf[i];
-		invalidate_dcache_range((unsigned long) dev->dma_buf[i],
-					(unsigned long) (dev->dma_buf[i]
-							 + DMA_BUFFER_SIZE));
+	usb_ctrl = memalign(CONFIG_SYS_CACHELINE_SIZE,
+			    ROUND(sizeof(struct usb_ctrlrequest),
+				  CONFIG_SYS_CACHELINE_SIZE));
+	if (!usb_ctrl) {
+		error("No memory available for UDC!\n");
+		return -ENOMEM;
 	}
-	usb_ctrl = dev->dma_buf[0];
-	usb_ctrl_dma_addr = dev->dma_addr[0];
+
+	usb_ctrl_dma_addr = (dma_addr_t) usb_ctrl;
 
 	udc_reinit(dev);
 
diff --git a/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c b/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
index 274b68c..3f9febc 100644
--- a/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
+++ b/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
@@ -75,8 +75,9 @@ static inline void s3c_ep0_complete_out(void)
 		"%s : Prepare Complete Out packet.\n", __func__);
 
 	invalidate_dcache_range((unsigned long) usb_ctrl_dma_addr,
-				(unsigned long) usb_ctrl_dma_addr
-				+ DMA_BUFFER_SIZE);
+				(unsigned long) usb_ctrl_dma_addr +
+				ROUND(sizeof(struct usb_ctrlrequest),
+				      CONFIG_SYS_CACHELINE_SIZE));
 
 	writel(DOEPT_SIZ_PKT_CNT(1) | sizeof(struct usb_ctrlrequest),
 	       &reg->out_endp[EP0_CON].doeptsiz);
@@ -114,8 +115,7 @@ static int setdma_rx(struct s3c_ep *ep, struct s3c_request *req)
 
 	ctrl =  readl(&reg->out_endp[ep_num].doepctl);
 
-	writel(the_controller->dma_addr[ep_index(ep)+1],
-	       &reg->out_endp[ep_num].doepdma);
+	writel((unsigned int) ep->dma_buf, &reg->out_endp[ep_num].doepdma);
 	writel(DOEPT_SIZ_PKT_CNT(pktcnt) | DOEPT_SIZ_XFER_SIZE(length),
 	       &reg->out_endp[ep_num].doeptsiz);
 	writel(DEPCTL_EPENA|DEPCTL_CNAK|ctrl, &reg->out_endp[ep_num].doepctl);
@@ -138,7 +138,6 @@ int setdma_tx(struct s3c_ep *ep, struct s3c_request *req)
 	u32 *buf, ctrl = 0;
 	u32 length, pktcnt;
 	u32 ep_num = ep_index(ep);
-	u32 *p = the_controller->dma_buf[ep_index(ep)+1];
 
 	buf = req->req.buf + req->req.actual;
 	length = req->req.length - req->req.actual;
@@ -148,10 +147,10 @@ int setdma_tx(struct s3c_ep *ep, struct s3c_request *req)
 
 	ep->len = length;
 	ep->dma_buf = buf;
-	memcpy(p, ep->dma_buf, length);
 
-	flush_dcache_range((unsigned long) p ,
-			   (unsigned long) p + DMA_BUFFER_SIZE);
+	flush_dcache_range((unsigned long) ep->dma_buf,
+			   (unsigned long) ep->dma_buf +
+			   ROUND(ep->len, CONFIG_SYS_CACHELINE_SIZE));
 
 	if (length == 0)
 		pktcnt = 1;
@@ -164,8 +163,7 @@ int setdma_tx(struct s3c_ep *ep, struct s3c_request *req)
 	while (readl(&reg->grstctl) & TX_FIFO_FLUSH)
 		;
 
-	writel(the_controller->dma_addr[ep_index(ep)+1],
-	       &reg->in_endp[ep_num].diepdma);
+	writel((unsigned long) ep->dma_buf, &reg->in_endp[ep_num].diepdma);
 	writel(DIEPT_SIZ_PKT_CNT(pktcnt) | DIEPT_SIZ_XFER_SIZE(length),
 	       &reg->in_endp[ep_num].dieptsiz);
 
@@ -198,7 +196,6 @@ static void complete_rx(struct s3c_udc *dev, u8 ep_num)
 	struct s3c_ep *ep = &dev->ep[ep_num];
 	struct s3c_request *req = NULL;
 	u32 ep_tsr = 0, xfer_size = 0, is_short = 0;
-	u32 *p = the_controller->dma_buf[ep_index(ep)+1];
 
 	if (list_empty(&ep->queue)) {
 		debug_cond(DEBUG_OUT_EP != 0,
@@ -218,10 +215,9 @@ static void complete_rx(struct s3c_udc *dev, u8 ep_num)
 
 	xfer_size = ep->len - xfer_size;
 
-	invalidate_dcache_range((unsigned long) p,
-				(unsigned long) p + DMA_BUFFER_SIZE);
-
-	memcpy(ep->dma_buf, p, ep->len);
+	invalidate_dcache_range((unsigned long) ep->dma_buf,
+				(unsigned long) ep->dma_buf +
+				ROUND(xfer_size, CONFIG_SYS_CACHELINE_SIZE));
 
 	req->req.actual += min(xfer_size, req->req.length - req->req.actual);
 	is_short = (xfer_size < ep->ep.maxpacket);
@@ -715,19 +711,14 @@ static int write_fifo_ep0(struct s3c_ep *ep, struct s3c_request *req)
 
 int s3c_fifo_read(struct s3c_ep *ep, u32 *cp, int max)
 {
-	u32 bytes;
-
-	bytes = sizeof(struct usb_ctrlrequest);
-
-	invalidate_dcache_range((unsigned long) ep->dev->dma_buf[ep_index(ep)],
-				(unsigned long) ep->dev->dma_buf[ep_index(ep)]
-				+ DMA_BUFFER_SIZE);
+	invalidate_dcache_range((unsigned long)cp, (unsigned long)cp +
+				ROUND(max, CONFIG_SYS_CACHELINE_SIZE));
 
 	debug_cond(DEBUG_EP0 != 0,
-		   "%s: bytes=%d, ep_index=%d %p\n", __func__,
-		   bytes, ep_index(ep), ep->dev->dma_buf[ep_index(ep)]);
+		   "%s: bytes=%d, ep_index=%d 0x%p\n", __func__,
+		   max, ep_index(ep), cp);
 
-	return bytes;
+	return max;
 }
 
 /**
@@ -859,14 +850,12 @@ static int s3c_ep0_write(struct s3c_udc *dev)
 	return 1;
 }
 
-u16	g_status;
-
 int s3c_udc_get_status(struct s3c_udc *dev,
 		struct usb_ctrlrequest *crq)
 {
 	u8 ep_num = crq->wIndex & 0x7F;
+	u16 g_status = 0;
 	u32 ep_ctrl;
-	u32 *p = the_controller->dma_buf[1];
 
 	debug_cond(DEBUG_SETUP != 0,
 		   "%s: *** USB_REQ_GET_STATUS\n", __func__);
@@ -904,12 +893,13 @@ int s3c_udc_get_status(struct s3c_udc *dev,
 		return 1;
 	}
 
-	memcpy(p, &g_status, sizeof(g_status));
+	memcpy(usb_ctrl, &g_status, sizeof(g_status));
 
-	flush_dcache_range((unsigned long) p,
-			   (unsigned long) p + DMA_BUFFER_SIZE);
+	flush_dcache_range((unsigned long) usb_ctrl,
+			   (unsigned long) usb_ctrl +
+			   ROUND(sizeof(g_status), CONFIG_SYS_CACHELINE_SIZE));
 
-	writel(the_controller->dma_addr[1], &reg->in_endp[EP0_CON].diepdma);
+	writel(usb_ctrl_dma_addr, &reg->in_endp[EP0_CON].diepdma);
 	writel(DIEPT_SIZ_PKT_CNT(1) | DIEPT_SIZ_XFER_SIZE(2),
 	       &reg->in_endp[EP0_CON].dieptsiz);
 
diff --git a/include/usb/s3c_udc.h b/include/usb/s3c_udc.h
index 734c6cd..6dead2f 100644
--- a/include/usb/s3c_udc.h
+++ b/include/usb/s3c_udc.h
@@ -19,7 +19,7 @@
 
 /*-------------------------------------------------------------------------*/
 /* DMA bounce buffer size, 16K is enough even for mass storage */
-#define DMA_BUFFER_SIZE	(4096*4)
+#define DMA_BUFFER_SIZE	(16*SZ_1K)
 
 #define EP0_FIFO_SIZE		64
 #define EP_FIFO_SIZE		512
@@ -81,9 +81,6 @@ struct s3c_udc {
 
 	struct s3c_plat_otg_data *pdata;
 
-	void *dma_buf[S3C_MAX_ENDPOINTS+1];
-	dma_addr_t dma_addr[S3C_MAX_ENDPOINTS+1];
-
 	int ep0state;
 	struct s3c_ep ep[S3C_MAX_ENDPOINTS];
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 5/6] usb:gadget:f_thor: Allocate request up to THOR_PACKET_SIZE not ep->maxpacket
  2014-01-31 12:16 [U-Boot] [PATCH 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
                   ` (3 preceding siblings ...)
  2014-01-31 12:16 ` [U-Boot] [PATCH 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver Lukasz Majewski
@ 2014-01-31 12:16 ` Lukasz Majewski
  2014-01-31 12:16 ` [U-Boot] [PATCH 6/6] usb:gadget:f_thor: cosmetic: Remove debug memset Lukasz Majewski
  2014-02-05  9:10 ` [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
  6 siblings, 0 replies; 36+ messages in thread
From: Lukasz Majewski @ 2014-01-31 12:16 UTC (permalink / raw
  To: u-boot

Now it is possible to allocate static request - which receives data from
the host (OUT transaction) to the size of THOR packet.

Change-Id: I7cb19a0f22d7c7fe8d7e62ce71f7d1bb4e95c5bb
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>
---
 drivers/usb/gadget/f_thor.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/f_thor.c b/drivers/usb/gadget/f_thor.c
index c4c9909..780729a 100644
--- a/drivers/usb/gadget/f_thor.c
+++ b/drivers/usb/gadget/f_thor.c
@@ -614,7 +614,7 @@ static struct usb_request *thor_start_ep(struct usb_ep *ep)
 {
 	struct usb_request *req;
 
-	req = alloc_ep_req(ep, ep->maxpacket);
+	req = alloc_ep_req(ep, THOR_PACKET_SIZE);
 	debug("%s: ep:%p req:%p\n", __func__, ep, req);
 
 	if (!req)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/6] usb:gadget:f_thor: cosmetic: Remove debug memset
  2014-01-31 12:16 [U-Boot] [PATCH 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
                   ` (4 preceding siblings ...)
  2014-01-31 12:16 ` [U-Boot] [PATCH 5/6] usb:gadget:f_thor: Allocate request up to THOR_PACKET_SIZE not ep->maxpacket Lukasz Majewski
@ 2014-01-31 12:16 ` Lukasz Majewski
  2014-02-05  9:10 ` [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
  6 siblings, 0 replies; 36+ messages in thread
From: Lukasz Majewski @ 2014-01-31 12:16 UTC (permalink / raw
  To: u-boot

Apparently debug memset (with a 0x55 value) has been overlooked in the
f_thor code.

Change-Id: I5ecfb1f86d7a0c1abf6738d0bc60908957554410
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>
---
 drivers/usb/gadget/f_thor.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/usb/gadget/f_thor.c b/drivers/usb/gadget/f_thor.c
index 780729a..f5c0224 100644
--- a/drivers/usb/gadget/f_thor.c
+++ b/drivers/usb/gadget/f_thor.c
@@ -623,8 +623,6 @@ static struct usb_request *thor_start_ep(struct usb_ep *ep)
 	memset(req->buf, 0, req->length);
 	req->complete = thor_rx_tx_complete;
 
-	memset(req->buf, 0x55, req->length);
-
 	return req;
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment
  2014-01-31 12:16 ` [U-Boot] [PATCH 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment Lukasz Majewski
@ 2014-02-01  2:48   ` Marek Vasut
  2014-02-01  9:10     ` Lukasz Majewski
  0 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2014-02-01  2:48 UTC (permalink / raw
  To: u-boot

On Friday, January 31, 2014 at 01:16:24 PM, Lukasz Majewski wrote:
> Calls to malloc() have been replaced by memalign. It provides proper
> buffer alignment.
> 
> Change-Id: Iffcf42082a125f848124bc84d1a95353493798a4

^ Where does this Change-Id come from ?

> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> Cc: Marek Vasut <marex@denx.de>
> ---
>  drivers/usb/gadget/f_mass_storage.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/usb/gadget/f_mass_storage.c
> b/drivers/usb/gadget/f_mass_storage.c index b1fe8bd..f896169 100644
> --- a/drivers/usb/gadget/f_mass_storage.c
> +++ b/drivers/usb/gadget/f_mass_storage.c
> @@ -2515,7 +2515,7 @@ static struct fsg_common *fsg_common_init(struct
> fsg_common *common, buffhds_first_it:
>  		bh->inreq_busy = 0;
>  		bh->outreq_busy = 0;
> -		bh->buf = kmalloc(FSG_BUFLEN, GFP_KERNEL);
> +		bh->buf = memalign(CONFIG_SYS_CACHELINE_SIZE, FSG_BUFLEN);
>  		if (unlikely(!bh->buf)) {
>  			rc = -ENOMEM;
>  			goto error_release;
> @@ -2622,7 +2622,7 @@ usb_copy_descriptors(struct usb_descriptor_header
> **src) bytes += (*tmp)->bLength;
>  	bytes += (n_desc + 1) * sizeof(*tmp);
> 
> -	mem = kmalloc(bytes, GFP_KERNEL);
> +	mem = memalign(CONFIG_SYS_CACHELINE_SIZE, bytes);
>  	if (!mem)
>  		return NULL;

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver
  2014-01-31 12:16 ` [U-Boot] [PATCH 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver Lukasz Majewski
@ 2014-02-01  2:50   ` Marek Vasut
  2014-02-01  9:56     ` Lukasz Majewski
  0 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2014-02-01  2:50 UTC (permalink / raw
  To: u-boot

On Friday, January 31, 2014 at 01:16:25 PM, Lukasz Majewski wrote:
> A set of cache operations (both invalidation and flush) were redundant
> in the S3C HS OTG Samsung driver.
> 
> Test condition
> - test HW + measurement: Trats - Exynos4210 rev.1
> - test HW Trats2 - Exynos4412 rev.1
> 400 MiB compressed rootfs image download with `thor 0 mmc 0`

The commit message is missing a proper explanation _WHY_ were they redundant. I 
do not understand why they were redundant ... and no, the test you performed 
does not justify removal of cache management calls.

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver
  2014-01-31 12:16 ` [U-Boot] [PATCH 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver Lukasz Majewski
@ 2014-02-01  2:55   ` Marek Vasut
  2014-02-01 11:05     ` Lukasz Majewski
  0 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2014-02-01  2:55 UTC (permalink / raw
  To: u-boot

On Friday, January 31, 2014 at 01:16:27 PM, Lukasz Majewski wrote:
> The Samsung's UDC driver is not anymore copying data from USB requests to
> data aligned internal buffers. Now it works directly in data allocated in
> the upper layers like UMS, DFU, THOR.
> 
> This change is possible since those gadgets now take care to allocate
> buffers aligned to cache line (CONFIG_SYS_CACHELINE_SIZE ).
> 
> Previously the UDC needed to copy this data to internal aligned buffer to
> prevent from unaligned access exceptions.
> 
> Test condition
> - test HW + measurement: Trats - Exynos4210 rev.1
> - test HW Trats2 - Exynos4412 rev.1
> 400 MiB compressed rootfs image download with `thor 0 mmc 0`
> 
> Measurement:
> Transmission speed: 27.04 MiB/s
> 
> Change-Id: I1df1fbafc72ec703f0367ddee3fedf3a3f5523ed
> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> Cc: Marek Vasut <marex@denx.de>

You should use ROUND_UP(), not ROUND() throughout the patch. Otherwise you might 
fail to flush/invalidate the last little bit of data in some cacheline.

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment
  2014-02-01  2:48   ` Marek Vasut
@ 2014-02-01  9:10     ` Lukasz Majewski
  0 siblings, 0 replies; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-01  9:10 UTC (permalink / raw
  To: u-boot

On Sat, 1 Feb 2014 03:48:57 +0100
Marek Vasut <marex@denx.de> wrote:
Hi Marek,

> On Friday, January 31, 2014 at 01:16:24 PM, Lukasz Majewski wrote:
> > Calls to malloc() have been replaced by memalign. It provides proper
> > buffer alignment.
> > 
> > Change-Id: Iffcf42082a125f848124bc84d1a95353493798a4
> 
> ^ Where does this Change-Id come from ?

This comes from our internal Gerrit. By my mistake I've forgotten to
remove it from a commit.

I will remove it from the commit in the next patch version.

> 
> > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > Cc: Marek Vasut <marex@denx.de>
> > ---
> >  drivers/usb/gadget/f_mass_storage.c |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/usb/gadget/f_mass_storage.c
> > b/drivers/usb/gadget/f_mass_storage.c index b1fe8bd..f896169 100644
> > --- a/drivers/usb/gadget/f_mass_storage.c
> > +++ b/drivers/usb/gadget/f_mass_storage.c
> > @@ -2515,7 +2515,7 @@ static struct fsg_common
> > *fsg_common_init(struct fsg_common *common, buffhds_first_it:
> >  		bh->inreq_busy = 0;
> >  		bh->outreq_busy = 0;
> > -		bh->buf = kmalloc(FSG_BUFLEN, GFP_KERNEL);
> > +		bh->buf = memalign(CONFIG_SYS_CACHELINE_SIZE,
> > FSG_BUFLEN); if (unlikely(!bh->buf)) {
> >  			rc = -ENOMEM;
> >  			goto error_release;
> > @@ -2622,7 +2622,7 @@ usb_copy_descriptors(struct
> > usb_descriptor_header **src) bytes += (*tmp)->bLength;
> >  	bytes += (n_desc + 1) * sizeof(*tmp);
> > 
> > -	mem = kmalloc(bytes, GFP_KERNEL);
> > +	mem = memalign(CONFIG_SYS_CACHELINE_SIZE, bytes);
> >  	if (!mem)
> >  		return NULL;
> 
> Best regards,
> Marek Vasut
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot

Best regards,
Lukasz Majewski
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140201/0106c645/attachment.pgp>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver
  2014-02-01  2:50   ` Marek Vasut
@ 2014-02-01  9:56     ` Lukasz Majewski
  2014-02-01 22:49       ` Marek Vasut
  0 siblings, 1 reply; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-01  9:56 UTC (permalink / raw
  To: u-boot

On Sat, 1 Feb 2014 03:50:26 +0100
Marek Vasut <marex@denx.de> wrote:

> On Friday, January 31, 2014 at 01:16:25 PM, Lukasz Majewski wrote:
> > A set of cache operations (both invalidation and flush) were
> > redundant in the S3C HS OTG Samsung driver.
> > 
> > Test condition
> > - test HW + measurement: Trats - Exynos4210 rev.1
> > - test HW Trats2 - Exynos4412 rev.1
> > 400 MiB compressed rootfs image download with `thor 0 mmc 0`
> 
> The commit message is missing a proper explanation _WHY_ were they
> redundant. I do not understand why they were redundant ... and no,
> the test you performed does not justify removal of cache management
> calls.
> 

The s3c UDC driver is in u-boot since 2011. It has been added when
at Samsung boards (s5p_goni) we didn't have cache enabled.

Then there was a transition, after which L1 was enabled. Since UDC is
co-working with gadgets on the beginning it was easier to perform the
cache management inside the UDC driver.

That is why we had to copy the buffers (since e.g. device descriptors
tends to be unaligned) - which also degraded performance.

Now, all gadget code seems to be memalign'ed and ready for direct buffer
passing (despite the two overlooked kmallocs in the mass storage
gadget - which I fix in this patch set).

To sum up:

1. s3c_udc_ep0_zlp - EP0 ZLP packets don't need to invalidate the cache
(since it is zero length transmission)

2. s3c_udc_pre_setup - cache invalidation is not needed when I setup
buffer for OUT EP0 transmission.

The above two invalidation calls had been added by me, and are mine
mistakes. Those don't contribute to transmission speed up (and shall be
regarded as a cosmetic changes)

3. setdma_rx - here I invalidate parts of the s3c UDC driver's internal
buffer. This call is not needed anymore since we reuse the buffers
passed from gadgets.

This is the key speed improvement here.

Best regards,
Lukasz Majewski

> Best regards,
> Marek Vasut
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140201/57f33374/attachment.pgp>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver
  2014-02-01  2:55   ` Marek Vasut
@ 2014-02-01 11:05     ` Lukasz Majewski
  2014-02-01 22:55       ` Marek Vasut
  0 siblings, 1 reply; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-01 11:05 UTC (permalink / raw
  To: u-boot

On Sat, 1 Feb 2014 03:55:20 +0100
Marek Vasut <marex@denx.de> wrote:

> On Friday, January 31, 2014 at 01:16:27 PM, Lukasz Majewski wrote:
> > The Samsung's UDC driver is not anymore copying data from USB
> > requests to data aligned internal buffers. Now it works directly in
> > data allocated in the upper layers like UMS, DFU, THOR.
> > 
> > This change is possible since those gadgets now take care to
> > allocate buffers aligned to cache line (CONFIG_SYS_CACHELINE_SIZE ).
> > 
> > Previously the UDC needed to copy this data to internal aligned
> > buffer to prevent from unaligned access exceptions.
> > 
> > Test condition
> > - test HW + measurement: Trats - Exynos4210 rev.1
> > - test HW Trats2 - Exynos4412 rev.1
> > 400 MiB compressed rootfs image download with `thor 0 mmc 0`
> > 
> > Measurement:
> > Transmission speed: 27.04 MiB/s
> > 
> > Change-Id: I1df1fbafc72ec703f0367ddee3fedf3a3f5523ed
> > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > Cc: Marek Vasut <marex@denx.de>
> 
> You should use ROUND_UP(), not ROUND() throughout the patch.
> Otherwise you might fail to flush/invalidate the last little bit of
> data in some cacheline.

I might overlooked something, so please correct me if needed.

I allocate buffers in gadgets which are aligned to cache line with
starting address and its size is a multiplication of cache line size
(so I will not trash data allocated next to it when I invalidate cache).

In the code I'm using ROUND to invalidate/flush more data than needed
(ROUND(176, 32) = 192). I'm prepared for this since buffer in gadget is
properly allocated (with DEFINE_CACHE_ALIGN_BUFFER() which uses
roundup() internally).

Best regards,
Lukasz Majewski

> 
> Best regards,
> Marek Vasut
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140201/b6505cbd/attachment.pgp>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver
  2014-02-01  9:56     ` Lukasz Majewski
@ 2014-02-01 22:49       ` Marek Vasut
  2014-02-03  8:05         ` Lukasz Majewski
  0 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2014-02-01 22:49 UTC (permalink / raw
  To: u-boot

On Saturday, February 01, 2014 at 10:56:27 AM, Lukasz Majewski wrote:
> On Sat, 1 Feb 2014 03:50:26 +0100
> 
> Marek Vasut <marex@denx.de> wrote:
> > On Friday, January 31, 2014 at 01:16:25 PM, Lukasz Majewski wrote:
> > > A set of cache operations (both invalidation and flush) were
> > > redundant in the S3C HS OTG Samsung driver.
> > > 
> > > Test condition
> > > - test HW + measurement: Trats - Exynos4210 rev.1
> > > - test HW Trats2 - Exynos4412 rev.1
> > > 400 MiB compressed rootfs image download with `thor 0 mmc 0`
> > 
> > The commit message is missing a proper explanation _WHY_ were they
> > redundant. I do not understand why they were redundant ... and no,
> > the test you performed does not justify removal of cache management
> > calls.
> 
> The s3c UDC driver is in u-boot since 2011. It has been added when
> at Samsung boards (s5p_goni) we didn't have cache enabled.
> 
> Then there was a transition, after which L1 was enabled. Since UDC is
> co-working with gadgets on the beginning it was easier to perform the
> cache management inside the UDC driver.
> 
> That is why we had to copy the buffers (since e.g. device descriptors
> tends to be unaligned) - which also degraded performance.
> 
> Now, all gadget code seems to be memalign'ed and ready for direct buffer
> passing (despite the two overlooked kmallocs in the mass storage
> gadget - which I fix in this patch set).
> 
> To sum up:
> 
> 1. s3c_udc_ep0_zlp - EP0 ZLP packets don't need to invalidate the cache
> (since it is zero length transmission)
> 
> 2. s3c_udc_pre_setup - cache invalidation is not needed when I setup
> buffer for OUT EP0 transmission.
> 
> The above two invalidation calls had been added by me, and are mine
> mistakes. Those don't contribute to transmission speed up (and shall be
> regarded as a cosmetic changes)
> 
> 3. setdma_rx - here I invalidate parts of the s3c UDC driver's internal
> buffer. This call is not needed anymore since we reuse the buffers
> passed from gadgets.

And you do correct cache management on those in the UDC driver or in the gadget 
driver ?
 
> This is the key speed improvement here.

This should be in the commit message really ;-)

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver
  2014-02-01 11:05     ` Lukasz Majewski
@ 2014-02-01 22:55       ` Marek Vasut
  2014-02-03 11:06         ` Lukasz Majewski
  0 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2014-02-01 22:55 UTC (permalink / raw
  To: u-boot

On Saturday, February 01, 2014 at 12:05:29 PM, Lukasz Majewski wrote:
> On Sat, 1 Feb 2014 03:55:20 +0100
> 
> Marek Vasut <marex@denx.de> wrote:
> > On Friday, January 31, 2014 at 01:16:27 PM, Lukasz Majewski wrote:
> > > The Samsung's UDC driver is not anymore copying data from USB
> > > requests to data aligned internal buffers. Now it works directly in
> > > data allocated in the upper layers like UMS, DFU, THOR.
> > > 
> > > This change is possible since those gadgets now take care to
> > > allocate buffers aligned to cache line (CONFIG_SYS_CACHELINE_SIZE ).
> > > 
> > > Previously the UDC needed to copy this data to internal aligned
> > > buffer to prevent from unaligned access exceptions.
> > > 
> > > Test condition
> > > - test HW + measurement: Trats - Exynos4210 rev.1
> > > - test HW Trats2 - Exynos4412 rev.1
> > > 400 MiB compressed rootfs image download with `thor 0 mmc 0`
> > > 
> > > Measurement:
> > > Transmission speed: 27.04 MiB/s
> > > 
> > > Change-Id: I1df1fbafc72ec703f0367ddee3fedf3a3f5523ed
> > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > Cc: Marek Vasut <marex@denx.de>
> > 
> > You should use ROUND_UP(), not ROUND() throughout the patch.
> > Otherwise you might fail to flush/invalidate the last little bit of
> > data in some cacheline.
> 
> I might overlooked something, so please correct me if needed.
> 
> I allocate buffers in gadgets which are aligned to cache line with
> starting address and its size is a multiplication of cache line size
> (so I will not trash data allocated next to it when I invalidate cache).
> 
> In the code I'm using ROUND to invalidate/flush more data than needed
> (ROUND(176, 32) = 192). I'm prepared for this since buffer in gadget is
> properly allocated (with DEFINE_CACHE_ALIGN_BUFFER() which uses
> roundup() internally).

The problem is in case you receive buffer which is aligned to cacheline with 
it's start, but is [(k * cacheline_size) + (cacheline_size / 2) - 1] big. I 
think it's unlikely, but if this happens, you will get corruption, right ? You 
might actually want to check for this condition and throw a warning in such a 
case.

I understand your argument with trying to not trash data, but then you will get 
a corruption during transfer, right ?

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver
  2014-02-01 22:49       ` Marek Vasut
@ 2014-02-03  8:05         ` Lukasz Majewski
  2014-02-03 18:06           ` Marek Vasut
  0 siblings, 1 reply; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-03  8:05 UTC (permalink / raw
  To: u-boot

Hi Marek,

> On Saturday, February 01, 2014 at 10:56:27 AM, Lukasz Majewski wrote:
> > On Sat, 1 Feb 2014 03:50:26 +0100
> > 
> > Marek Vasut <marex@denx.de> wrote:
> > > On Friday, January 31, 2014 at 01:16:25 PM, Lukasz Majewski wrote:
> > > > A set of cache operations (both invalidation and flush) were
> > > > redundant in the S3C HS OTG Samsung driver.
> > > > 
> > > > Test condition
> > > > - test HW + measurement: Trats - Exynos4210 rev.1
> > > > - test HW Trats2 - Exynos4412 rev.1
> > > > 400 MiB compressed rootfs image download with `thor 0 mmc 0`
> > > 
> > > The commit message is missing a proper explanation _WHY_ were they
> > > redundant. I do not understand why they were redundant ... and no,
> > > the test you performed does not justify removal of cache
> > > management calls.
> > 
> > The s3c UDC driver is in u-boot since 2011. It has been added when
> > at Samsung boards (s5p_goni) we didn't have cache enabled.
> > 
> > Then there was a transition, after which L1 was enabled. Since UDC
> > is co-working with gadgets on the beginning it was easier to
> > perform the cache management inside the UDC driver.
> > 
> > That is why we had to copy the buffers (since e.g. device
> > descriptors tends to be unaligned) - which also degraded
> > performance.
> > 
> > Now, all gadget code seems to be memalign'ed and ready for direct
> > buffer passing (despite the two overlooked kmallocs in the mass
> > storage gadget - which I fix in this patch set).
> > 
> > To sum up:
> > 
> > 1. s3c_udc_ep0_zlp - EP0 ZLP packets don't need to invalidate the
> > cache (since it is zero length transmission)
> > 
> > 2. s3c_udc_pre_setup - cache invalidation is not needed when I setup
> > buffer for OUT EP0 transmission.
> > 
> > The above two invalidation calls had been added by me, and are mine
> > mistakes. Those don't contribute to transmission speed up (and
> > shall be regarded as a cosmetic changes)
> > 
> > 3. setdma_rx - here I invalidate parts of the s3c UDC driver's
> > internal buffer. This call is not needed anymore since we reuse the
> > buffers passed from gadgets.
> 
> And you do correct cache management on those in the UDC driver or in
> the gadget driver ?

For download, buffers are allocated in gadgets. Then buffer is passed
to the UDC driver in a USB request. 
After receiving data via USB the UDC driver takes care to invalidate
cache, hence the gadget can work on the data.

Cache management is performed in the UDC driver.

>  
> > This is the key speed improvement here.
> 
> This should be in the commit message really ;-)

I wrongly assumed, that code explains what was the rationale :-). I'm
going to prepare more verbose commit message for v2.

> 
> Best regards,
> Marek Vasut



-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver
  2014-02-01 22:55       ` Marek Vasut
@ 2014-02-03 11:06         ` Lukasz Majewski
  2014-02-03 18:11           ` Marek Vasut
  0 siblings, 1 reply; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-03 11:06 UTC (permalink / raw
  To: u-boot

Hi Marek,

> On Saturday, February 01, 2014 at 12:05:29 PM, Lukasz Majewski wrote:
> > On Sat, 1 Feb 2014 03:55:20 +0100
> > 
> > Marek Vasut <marex@denx.de> wrote:
> > > On Friday, January 31, 2014 at 01:16:27 PM, Lukasz Majewski wrote:
> > > > The Samsung's UDC driver is not anymore copying data from USB
> > > > requests to data aligned internal buffers. Now it works
> > > > directly in data allocated in the upper layers like UMS, DFU,
> > > > THOR.
> > > > 
> > > > This change is possible since those gadgets now take care to
> > > > allocate buffers aligned to cache line
> > > > (CONFIG_SYS_CACHELINE_SIZE ).
> > > > 
> > > > Previously the UDC needed to copy this data to internal aligned
> > > > buffer to prevent from unaligned access exceptions.
> > > > 
> > > > Test condition
> > > > - test HW + measurement: Trats - Exynos4210 rev.1
> > > > - test HW Trats2 - Exynos4412 rev.1
> > > > 400 MiB compressed rootfs image download with `thor 0 mmc 0`
> > > > 
> > > > Measurement:
> > > > Transmission speed: 27.04 MiB/s
> > > > 
> > > > Change-Id: I1df1fbafc72ec703f0367ddee3fedf3a3f5523ed
> > > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > > Cc: Marek Vasut <marex@denx.de>
> > > 
> > > You should use ROUND_UP(), not ROUND() throughout the patch.
> > > Otherwise you might fail to flush/invalidate the last little bit
> > > of data in some cacheline.
> > 
> > I might overlooked something, so please correct me if needed.
> > 
> > I allocate buffers in gadgets which are aligned to cache line with
> > starting address and its size is a multiplication of cache line size
> > (so I will not trash data allocated next to it when I invalidate
> > cache).
> > 
> > In the code I'm using ROUND to invalidate/flush more data than
> > needed (ROUND(176, 32) = 192). I'm prepared for this since buffer
> > in gadget is properly allocated (with DEFINE_CACHE_ALIGN_BUFFER()
> > which uses roundup() internally).
> 
> The problem is in case you receive buffer which is aligned to
> cacheline with it's start, but is [(k * cacheline_size) +
> (cacheline_size / 2) - 1] big. I think it's unlikely, but if this
> happens, you will get corruption, right ? 

Let's suppose, that I will receive 2063B = [(64 * 32) + 16 -1] from UDC.
If the passed buffer was exactly 2063 B in size, then we would have
here a data corruption.

However this situation will not happen since the buffer at gadget is
allocated with DEFINE_CACHE_ALIGN_BUFFER() or is an aligned
multiplication of cache line size (like 1MiB).

I think, that it is the responsibility of gadget developer to allocate
buffers with proper alignment and size.

> You might actually want to
> check for this condition and throw a warning in such a case.

The check is already implemented at ./arch/arm/cpu/armv7/cache_v7.c.

It complains with "ERROR" message when start or end address is not
aligned (that is how I've discovered the unaligned buffers at UMS).

> 
> I understand your argument with trying to not trash data, but then
> you will get a corruption during transfer, right ?

After applying those patches, the cache management would be performed
when the USB request is completed (in the UDC).

The only requirement for UDC is the correctly allocated buffer at
gadget.

> 
> Best regards,
> Marek Vasut



-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver
  2014-02-03  8:05         ` Lukasz Majewski
@ 2014-02-03 18:06           ` Marek Vasut
  2014-02-04  6:23             ` Lukasz Majewski
  0 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2014-02-03 18:06 UTC (permalink / raw
  To: u-boot

On Monday, February 03, 2014 at 09:05:06 AM, Lukasz Majewski wrote:

[...]

> > > To sum up:
> > > 
> > > 1. s3c_udc_ep0_zlp - EP0 ZLP packets don't need to invalidate the
> > > cache (since it is zero length transmission)
> > > 
> > > 2. s3c_udc_pre_setup - cache invalidation is not needed when I setup
> > > buffer for OUT EP0 transmission.
> > > 
> > > The above two invalidation calls had been added by me, and are mine
> > > mistakes. Those don't contribute to transmission speed up (and
> > > shall be regarded as a cosmetic changes)
> > > 
> > > 3. setdma_rx - here I invalidate parts of the s3c UDC driver's
> > > internal buffer. This call is not needed anymore since we reuse the
> > > buffers passed from gadgets.
> > 
> > And you do correct cache management on those in the UDC driver or in
> > the gadget driver ?
> 
> For download, buffers are allocated in gadgets. Then buffer is passed
> to the UDC driver in a USB request.
> After receiving data via USB the UDC driver takes care to invalidate
> cache, hence the gadget can work on the data.
> 
> Cache management is performed in the UDC driver.

OK, this is the correct place. I just wanted to make sure about this. Thanks :)

> > > This is the key speed improvement here.
> > 
> > This should be in the commit message really ;-)
> 
> I wrongly assumed, that code explains what was the rationale :-). I'm
> going to prepare more verbose commit message for v2.

Please do, thanks!

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver
  2014-02-03 11:06         ` Lukasz Majewski
@ 2014-02-03 18:11           ` Marek Vasut
  2014-02-04  7:29             ` Lukasz Majewski
  0 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2014-02-03 18:11 UTC (permalink / raw
  To: u-boot

On Monday, February 03, 2014 at 12:06:59 PM, Lukasz Majewski wrote:
> Hi Marek,
> 
> > On Saturday, February 01, 2014 at 12:05:29 PM, Lukasz Majewski wrote:
> > > On Sat, 1 Feb 2014 03:55:20 +0100
> > > 
> > > Marek Vasut <marex@denx.de> wrote:
> > > > On Friday, January 31, 2014 at 01:16:27 PM, Lukasz Majewski wrote:
> > > > > The Samsung's UDC driver is not anymore copying data from USB
> > > > > requests to data aligned internal buffers. Now it works
> > > > > directly in data allocated in the upper layers like UMS, DFU,
> > > > > THOR.
> > > > > 
> > > > > This change is possible since those gadgets now take care to
> > > > > allocate buffers aligned to cache line
> > > > > (CONFIG_SYS_CACHELINE_SIZE ).
> > > > > 
> > > > > Previously the UDC needed to copy this data to internal aligned
> > > > > buffer to prevent from unaligned access exceptions.
> > > > > 
> > > > > Test condition
> > > > > - test HW + measurement: Trats - Exynos4210 rev.1
> > > > > - test HW Trats2 - Exynos4412 rev.1
> > > > > 400 MiB compressed rootfs image download with `thor 0 mmc 0`
> > > > > 
> > > > > Measurement:
> > > > > Transmission speed: 27.04 MiB/s
> > > > > 
> > > > > Change-Id: I1df1fbafc72ec703f0367ddee3fedf3a3f5523ed
> > > > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > > > Cc: Marek Vasut <marex@denx.de>
> > > > 
> > > > You should use ROUND_UP(), not ROUND() throughout the patch.
> > > > Otherwise you might fail to flush/invalidate the last little bit
> > > > of data in some cacheline.
> > > 
> > > I might overlooked something, so please correct me if needed.
> > > 
> > > I allocate buffers in gadgets which are aligned to cache line with
> > > starting address and its size is a multiplication of cache line size
> > > (so I will not trash data allocated next to it when I invalidate
> > > cache).
> > > 
> > > In the code I'm using ROUND to invalidate/flush more data than
> > > needed (ROUND(176, 32) = 192). I'm prepared for this since buffer
> > > in gadget is properly allocated (with DEFINE_CACHE_ALIGN_BUFFER()
> > > which uses roundup() internally).
> > 
> > The problem is in case you receive buffer which is aligned to
> > cacheline with it's start, but is [(k * cacheline_size) +
> > (cacheline_size / 2) - 1] big. I think it's unlikely, but if this
> > happens, you will get corruption, right ?
> 
> Let's suppose, that I will receive 2063B = [(64 * 32) + 16 -1] from UDC.
> If the passed buffer was exactly 2063 B in size, then we would have
> here a data corruption.
> 
> However this situation will not happen

_Should_ not happen ... I am absolutelly positive someone will be bitten by such 
assumption. I think this assumption about buffer alignment should really be 
documented somewhere.

> since the buffer at gadget is
> allocated with DEFINE_CACHE_ALIGN_BUFFER() or is an aligned
> multiplication of cache line size (like 1MiB).
> 
> I think, that it is the responsibility of gadget developer to allocate
> buffers with proper alignment and size.

Document that please, I doubt this is documented anywhere, but it's clearly part 
of the API. Also, some checks might be put in place for the alignment , they 
might be in #ifdef DEBUG for all I care, but it would be nice to have such a 
check, since I'm worried someone will really be bitten.

> > You might actually want to
> > check for this condition and throw a warning in such a case.
> 
> The check is already implemented at ./arch/arm/cpu/armv7/cache_v7.c.

Yeah, for arm926ejs core as well. Maybe that check shall be shifted into the 
cache management routine prototypes somehow ... not all CPUs implement that 
check :-(

> It complains with "ERROR" message when start or end address is not
> aligned (that is how I've discovered the unaligned buffers at UMS).

Yes.

> > I understand your argument with trying to not trash data, but then
> > you will get a corruption during transfer, right ?
> 
> After applying those patches, the cache management would be performed
> when the USB request is completed (in the UDC).
> 
> The only requirement for UDC is the correctly allocated buffer at
> gadget.

Got it.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver
  2014-02-03 18:06           ` Marek Vasut
@ 2014-02-04  6:23             ` Lukasz Majewski
  0 siblings, 0 replies; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-04  6:23 UTC (permalink / raw
  To: u-boot

Hi Marek,

> On Monday, February 03, 2014 at 09:05:06 AM, Lukasz Majewski wrote:
> 
> [...]
> 
> > > > To sum up:
> > > > 
> > > > 1. s3c_udc_ep0_zlp - EP0 ZLP packets don't need to invalidate
> > > > the cache (since it is zero length transmission)
> > > > 
> > > > 2. s3c_udc_pre_setup - cache invalidation is not needed when I
> > > > setup buffer for OUT EP0 transmission.
> > > > 
> > > > The above two invalidation calls had been added by me, and are
> > > > mine mistakes. Those don't contribute to transmission speed up
> > > > (and shall be regarded as a cosmetic changes)
> > > > 
> > > > 3. setdma_rx - here I invalidate parts of the s3c UDC driver's
> > > > internal buffer. This call is not needed anymore since we reuse
> > > > the buffers passed from gadgets.
> > > 
> > > And you do correct cache management on those in the UDC driver or
> > > in the gadget driver ?
> > 
> > For download, buffers are allocated in gadgets. Then buffer is
> > passed to the UDC driver in a USB request.
> > After receiving data via USB the UDC driver takes care to invalidate
> > cache, hence the gadget can work on the data.
> > 
> > Cache management is performed in the UDC driver.
> 
> OK, this is the correct place. I just wanted to make sure about this.
> Thanks :)

No problem :-)

> 
> > > > This is the key speed improvement here.
> > > 
> > > This should be in the commit message really ;-)
> > 
> > I wrongly assumed, that code explains what was the rationale :-).
> > I'm going to prepare more verbose commit message for v2.
> 
> Please do, thanks!
> 
> Best regards,
> Marek Vasut



-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver
  2014-02-03 18:11           ` Marek Vasut
@ 2014-02-04  7:29             ` Lukasz Majewski
  2014-02-04 20:21               ` Marek Vasut
  0 siblings, 1 reply; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-04  7:29 UTC (permalink / raw
  To: u-boot

Hi Marek,

> On Monday, February 03, 2014 at 12:06:59 PM, Lukasz Majewski wrote:
> > Hi Marek,
> > 
> > > On Saturday, February 01, 2014 at 12:05:29 PM, Lukasz Majewski
> > > wrote:
> > > > On Sat, 1 Feb 2014 03:55:20 +0100
> > > > 
> > > > Marek Vasut <marex@denx.de> wrote:
> > > > > On Friday, January 31, 2014 at 01:16:27 PM, Lukasz Majewski
> > > > > wrote:
> > > > > > The Samsung's UDC driver is not anymore copying data from
> > > > > > USB requests to data aligned internal buffers. Now it works
> > > > > > directly in data allocated in the upper layers like UMS,
> > > > > > DFU, THOR.
> > > > > > 
> > > > > > This change is possible since those gadgets now take care to
> > > > > > allocate buffers aligned to cache line
> > > > > > (CONFIG_SYS_CACHELINE_SIZE ).
> > > > > > 
> > > > > > Previously the UDC needed to copy this data to internal
> > > > > > aligned buffer to prevent from unaligned access exceptions.
> > > > > > 
> > > > > > Test condition
> > > > > > - test HW + measurement: Trats - Exynos4210 rev.1
> > > > > > - test HW Trats2 - Exynos4412 rev.1
> > > > > > 400 MiB compressed rootfs image download with `thor 0 mmc 0`
> > > > > > 
> > > > > > Measurement:
> > > > > > Transmission speed: 27.04 MiB/s
> > > > > > 
> > > > > > Change-Id: I1df1fbafc72ec703f0367ddee3fedf3a3f5523ed
> > > > > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > > > > Cc: Marek Vasut <marex@denx.de>
> > > > > 
> > > > > You should use ROUND_UP(), not ROUND() throughout the patch.
> > > > > Otherwise you might fail to flush/invalidate the last little
> > > > > bit of data in some cacheline.
> > > > 
> > > > I might overlooked something, so please correct me if needed.
> > > > 
> > > > I allocate buffers in gadgets which are aligned to cache line
> > > > with starting address and its size is a multiplication of cache
> > > > line size (so I will not trash data allocated next to it when I
> > > > invalidate cache).
> > > > 
> > > > In the code I'm using ROUND to invalidate/flush more data than
> > > > needed (ROUND(176, 32) = 192). I'm prepared for this since
> > > > buffer in gadget is properly allocated (with
> > > > DEFINE_CACHE_ALIGN_BUFFER() which uses roundup() internally).
> > > 
> > > The problem is in case you receive buffer which is aligned to
> > > cacheline with it's start, but is [(k * cacheline_size) +
> > > (cacheline_size / 2) - 1] big. I think it's unlikely, but if this
> > > happens, you will get corruption, right ?
> > 
> > Let's suppose, that I will receive 2063B = [(64 * 32) + 16 -1] from
> > UDC. If the passed buffer was exactly 2063 B in size, then we would
> > have here a data corruption.
> > 
> > However this situation will not happen
> 
> _Should_ not happen ... I am absolutelly positive someone will be
> bitten by such assumption. I think this assumption about buffer
> alignment should really be documented somewhere.

I will add verbose commit message for this.

> 
> > since the buffer at gadget is
> > allocated with DEFINE_CACHE_ALIGN_BUFFER() or is an aligned
> > multiplication of cache line size (like 1MiB).
> > 
> > I think, that it is the responsibility of gadget developer to
> > allocate buffers with proper alignment and size.
> 
> Document that please, I doubt this is documented anywhere, but it's
> clearly part of the API. Also, some checks might be put in place for
> the alignment , they might be in #ifdef DEBUG for all I care, but it
> would be nice to have such a check, since I'm worried someone will
> really be bitten.

I rely on the code embedded at cache_v7.c. It works and saved me a lot
of trouble (since it always print "ERROR" when buffer alignment and
size is wrong). 

Also I think, that those checks shall be done at invalidate_* and
flush_* cache routines, not at USB driver.

> 
> > > You might actually want to
> > > check for this condition and throw a warning in such a case.
> > 
> > The check is already implemented at ./arch/arm/cpu/armv7/cache_v7.c.
> 
> Yeah, for arm926ejs core as well. Maybe that check shall be shifted
> into the cache management routine prototypes somehow ... not all CPUs
> implement that check :-(

Maybe other ARM architectures shall have the cache management code
updated to work in a similar way to cache_v7.c ?

> 
> > It complains with "ERROR" message when start or end address is not
> > aligned (that is how I've discovered the unaligned buffers at UMS).
> 
> Yes.
> 
> > > I understand your argument with trying to not trash data, but then
> > > you will get a corruption during transfer, right ?
> > 
> > After applying those patches, the cache management would be
> > performed when the USB request is completed (in the UDC).
> > 
> > The only requirement for UDC is the correctly allocated buffer at
> > gadget.
> 
> Got it.

I will emphasize the need of correct buffer allocation at commit
message and add some comment to UDC in the v2.


-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver
  2014-02-04  7:29             ` Lukasz Majewski
@ 2014-02-04 20:21               ` Marek Vasut
  2014-02-04 21:49                 ` Lukasz Majewski
  0 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2014-02-04 20:21 UTC (permalink / raw
  To: u-boot

On Tuesday, February 04, 2014 at 08:29:49 AM, Lukasz Majewski wrote:
> Hi Marek,
> 
> > On Monday, February 03, 2014 at 12:06:59 PM, Lukasz Majewski wrote:
> > > Hi Marek,
> > > 
> > > > On Saturday, February 01, 2014 at 12:05:29 PM, Lukasz Majewski
> > > > 
> > > > wrote:
> > > > > On Sat, 1 Feb 2014 03:55:20 +0100
> > > > > 
> > > > > Marek Vasut <marex@denx.de> wrote:
> > > > > > On Friday, January 31, 2014 at 01:16:27 PM, Lukasz Majewski
> > > > > > 
> > > > > > wrote:
> > > > > > > The Samsung's UDC driver is not anymore copying data from
> > > > > > > USB requests to data aligned internal buffers. Now it works
> > > > > > > directly in data allocated in the upper layers like UMS,
> > > > > > > DFU, THOR.
> > > > > > > 
> > > > > > > This change is possible since those gadgets now take care to
> > > > > > > allocate buffers aligned to cache line
> > > > > > > (CONFIG_SYS_CACHELINE_SIZE ).
> > > > > > > 
> > > > > > > Previously the UDC needed to copy this data to internal
> > > > > > > aligned buffer to prevent from unaligned access exceptions.
> > > > > > > 
> > > > > > > Test condition
> > > > > > > - test HW + measurement: Trats - Exynos4210 rev.1
> > > > > > > - test HW Trats2 - Exynos4412 rev.1
> > > > > > > 400 MiB compressed rootfs image download with `thor 0 mmc 0`
> > > > > > > 
> > > > > > > Measurement:
> > > > > > > Transmission speed: 27.04 MiB/s
> > > > > > > 
> > > > > > > Change-Id: I1df1fbafc72ec703f0367ddee3fedf3a3f5523ed
> > > > > > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > > > > > Cc: Marek Vasut <marex@denx.de>
> > > > > > 
> > > > > > You should use ROUND_UP(), not ROUND() throughout the patch.
> > > > > > Otherwise you might fail to flush/invalidate the last little
> > > > > > bit of data in some cacheline.
> > > > > 
> > > > > I might overlooked something, so please correct me if needed.
> > > > > 
> > > > > I allocate buffers in gadgets which are aligned to cache line
> > > > > with starting address and its size is a multiplication of cache
> > > > > line size (so I will not trash data allocated next to it when I
> > > > > invalidate cache).
> > > > > 
> > > > > In the code I'm using ROUND to invalidate/flush more data than
> > > > > needed (ROUND(176, 32) = 192). I'm prepared for this since
> > > > > buffer in gadget is properly allocated (with
> > > > > DEFINE_CACHE_ALIGN_BUFFER() which uses roundup() internally).
> > > > 
> > > > The problem is in case you receive buffer which is aligned to
> > > > cacheline with it's start, but is [(k * cacheline_size) +
> > > > (cacheline_size / 2) - 1] big. I think it's unlikely, but if this
> > > > happens, you will get corruption, right ?
> > > 
> > > Let's suppose, that I will receive 2063B = [(64 * 32) + 16 -1] from
> > > UDC. If the passed buffer was exactly 2063 B in size, then we would
> > > have here a data corruption.
> > > 
> > > However this situation will not happen
> > 
> > _Should_ not happen ... I am absolutelly positive someone will be
> > bitten by such assumption. I think this assumption about buffer
> > alignment should really be documented somewhere.
> 
> I will add verbose commit message for this.

The commit message will get lost as time moves on, this should be clearly 
documented in some doc/ or code comment.

> > > since the buffer at gadget is
> > > allocated with DEFINE_CACHE_ALIGN_BUFFER() or is an aligned
> > > multiplication of cache line size (like 1MiB).
> > > 
> > > I think, that it is the responsibility of gadget developer to
> > > allocate buffers with proper alignment and size.
> > 
> > Document that please, I doubt this is documented anywhere, but it's
> > clearly part of the API. Also, some checks might be put in place for
> > the alignment , they might be in #ifdef DEBUG for all I care, but it
> > would be nice to have such a check, since I'm worried someone will
> > really be bitten.
> 
> I rely on the code embedded at cache_v7.c. It works and saved me a lot
> of trouble (since it always print "ERROR" when buffer alignment and
> size is wrong).
> 
> Also I think, that those checks shall be done at invalidate_* and
> flush_* cache routines, not at USB driver.

Not all CPUs are ARMv7 though.

> > > > You might actually want to
> > > > check for this condition and throw a warning in such a case.
> > > 
> > > The check is already implemented at ./arch/arm/cpu/armv7/cache_v7.c.
> > 
> > Yeah, for arm926ejs core as well. Maybe that check shall be shifted
> > into the cache management routine prototypes somehow ... not all CPUs
> > implement that check :-(
> 
> Maybe other ARM architectures shall have the cache management code
> updated to work in a similar way to cache_v7.c ?

Not all CPUs are ARM architecture ... there're others, you know ;-)

> > > It complains with "ERROR" message when start or end address is not
> > > aligned (that is how I've discovered the unaligned buffers at UMS).
> > 
> > Yes.
> > 
> > > > I understand your argument with trying to not trash data, but then
> > > > you will get a corruption during transfer, right ?
> > > 
> > > After applying those patches, the cache management would be
> > > performed when the USB request is completed (in the UDC).
> > > 
> > > The only requirement for UDC is the correctly allocated buffer at
> > > gadget.
> > 
> > Got it.
> 
> I will emphasize the need of correct buffer allocation at commit
> message and add some comment to UDC in the v2.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver
  2014-02-04 20:21               ` Marek Vasut
@ 2014-02-04 21:49                 ` Lukasz Majewski
  2014-02-05  2:35                   ` Marek Vasut
  0 siblings, 1 reply; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-04 21:49 UTC (permalink / raw
  To: u-boot

On Tue, 4 Feb 2014 21:21:16 +0100
Marek Vasut <marex@denx.de> wrote:

> On Tuesday, February 04, 2014 at 08:29:49 AM, Lukasz Majewski wrote:
> > Hi Marek,
> > 
> > > On Monday, February 03, 2014 at 12:06:59 PM, Lukasz Majewski
> > > wrote:
> > > > Hi Marek,
> > > > 
> > > > > On Saturday, February 01, 2014 at 12:05:29 PM, Lukasz Majewski
> > > > > 
> > > > > wrote:
> > > > > > On Sat, 1 Feb 2014 03:55:20 +0100
> > > > > > 
> > > > > > Marek Vasut <marex@denx.de> wrote:
> > > > > > > On Friday, January 31, 2014 at 01:16:27 PM, Lukasz
> > > > > > > Majewski
> > > > > > > 
> > > > > > > wrote:
> > > > > > > > The Samsung's UDC driver is not anymore copying data
> > > > > > > > from USB requests to data aligned internal buffers. Now
> > > > > > > > it works directly in data allocated in the upper layers
> > > > > > > > like UMS, DFU, THOR.
> > > > > > > > 
> > > > > > > > This change is possible since those gadgets now take
> > > > > > > > care to allocate buffers aligned to cache line
> > > > > > > > (CONFIG_SYS_CACHELINE_SIZE ).
> > > > > > > > 
> > > > > > > > Previously the UDC needed to copy this data to internal
> > > > > > > > aligned buffer to prevent from unaligned access
> > > > > > > > exceptions.
> > > > > > > > 
> > > > > > > > Test condition
> > > > > > > > - test HW + measurement: Trats - Exynos4210 rev.1
> > > > > > > > - test HW Trats2 - Exynos4412 rev.1
> > > > > > > > 400 MiB compressed rootfs image download with `thor 0
> > > > > > > > mmc 0`
> > > > > > > > 
> > > > > > > > Measurement:
> > > > > > > > Transmission speed: 27.04 MiB/s
> > > > > > > > 
> > > > > > > > Change-Id: I1df1fbafc72ec703f0367ddee3fedf3a3f5523ed
> > > > > > > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > > > > > > Cc: Marek Vasut <marex@denx.de>
> > > > > > > 
> > > > > > > You should use ROUND_UP(), not ROUND() throughout the
> > > > > > > patch. Otherwise you might fail to flush/invalidate the
> > > > > > > last little bit of data in some cacheline.
> > > > > > 
> > > > > > I might overlooked something, so please correct me if
> > > > > > needed.
> > > > > > 
> > > > > > I allocate buffers in gadgets which are aligned to cache
> > > > > > line with starting address and its size is a multiplication
> > > > > > of cache line size (so I will not trash data allocated next
> > > > > > to it when I invalidate cache).
> > > > > > 
> > > > > > In the code I'm using ROUND to invalidate/flush more data
> > > > > > than needed (ROUND(176, 32) = 192). I'm prepared for this
> > > > > > since buffer in gadget is properly allocated (with
> > > > > > DEFINE_CACHE_ALIGN_BUFFER() which uses roundup()
> > > > > > internally).
> > > > > 
> > > > > The problem is in case you receive buffer which is aligned to
> > > > > cacheline with it's start, but is [(k * cacheline_size) +
> > > > > (cacheline_size / 2) - 1] big. I think it's unlikely, but if
> > > > > this happens, you will get corruption, right ?
> > > > 
> > > > Let's suppose, that I will receive 2063B = [(64 * 32) + 16 -1]
> > > > from UDC. If the passed buffer was exactly 2063 B in size, then
> > > > we would have here a data corruption.
> > > > 
> > > > However this situation will not happen
> > > 
> > > _Should_ not happen ... I am absolutelly positive someone will be
> > > bitten by such assumption. I think this assumption about buffer
> > > alignment should really be documented somewhere.
> > 
> > I will add verbose commit message for this.
> 
> The commit message will get lost as time moves on, this should be
> clearly documented in some doc/ or code comment.

Ok. I will add code comment.

> 
> > > > since the buffer at gadget is
> > > > allocated with DEFINE_CACHE_ALIGN_BUFFER() or is an aligned
> > > > multiplication of cache line size (like 1MiB).
> > > > 
> > > > I think, that it is the responsibility of gadget developer to
> > > > allocate buffers with proper alignment and size.
> > > 
> > > Document that please, I doubt this is documented anywhere, but
> > > it's clearly part of the API. Also, some checks might be put in
> > > place for the alignment , they might be in #ifdef DEBUG for all I
> > > care, but it would be nice to have such a check, since I'm
> > > worried someone will really be bitten.
> > 
> > I rely on the code embedded at cache_v7.c. It works and saved me a
> > lot of trouble (since it always print "ERROR" when buffer alignment
> > and size is wrong).
> > 
> > Also I think, that those checks shall be done at invalidate_* and
> > flush_* cache routines, not at USB driver.
> 
> Not all CPUs are ARMv7 though.
> 
> > > > > You might actually want to
> > > > > check for this condition and throw a warning in such a case.
> > > > 
> > > > The check is already implemented
> > > > at ./arch/arm/cpu/armv7/cache_v7.c.
> > > 
> > > Yeah, for arm926ejs core as well. Maybe that check shall be
> > > shifted into the cache management routine prototypes somehow ...
> > > not all CPUs implement that check :-(
> > 
> > Maybe other ARM architectures shall have the cache management code
> > updated to work in a similar way to cache_v7.c ?
> 
> Not all CPUs are ARM architecture ... there're others, you know ;-)

:-) ... but who cares about the rest :-)

To be serious (quite), I do believe that checking if unaligned
cache flush/invalidation is performed, shall be handled at the code
which is responsible for cache management.

Conceptually, it shall not be done at UDC code.

> 
> > > > It complains with "ERROR" message when start or end address is
> > > > not aligned (that is how I've discovered the unaligned buffers
> > > > at UMS).
> > > 
> > > Yes.
> > > 
> > > > > I understand your argument with trying to not trash data, but
> > > > > then you will get a corruption during transfer, right ?
> > > > 
> > > > After applying those patches, the cache management would be
> > > > performed when the USB request is completed (in the UDC).
> > > > 
> > > > The only requirement for UDC is the correctly allocated buffer
> > > > at gadget.
> > > 
> > > Got it.
> > 
> > I will emphasize the need of correct buffer allocation at commit
> > message and add some comment to UDC in the v2.

Best regards,
Lukasz Majewski
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140204/2dcef69b/attachment.pgp>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver
  2014-02-04 21:49                 ` Lukasz Majewski
@ 2014-02-05  2:35                   ` Marek Vasut
  0 siblings, 0 replies; 36+ messages in thread
From: Marek Vasut @ 2014-02-05  2:35 UTC (permalink / raw
  To: u-boot

On Tuesday, February 04, 2014 at 10:49:24 PM, Lukasz Majewski wrote:

[...]

> > > Maybe other ARM architectures shall have the cache management code
> > > updated to work in a similar way to cache_v7.c ?
> > 
> > Not all CPUs are ARM architecture ... there're others, you know ;-)
> :
> :-) ... but who cares about the rest :-)

Me, duh. I have mips, sparc and m68k devices here and I'm not afraid to use 
them!

> To be serious (quite), I do believe that checking if unaligned
> cache flush/invalidation is performed, shall be handled at the code
> which is responsible for cache management.
> 
> Conceptually, it shall not be done at UDC code.

You have a point, ACK. Even better, we should have a wrapper for that which will 
only then call the cache management routine.
[...]

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements
  2014-01-31 12:16 [U-Boot] [PATCH 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
                   ` (5 preceding siblings ...)
  2014-01-31 12:16 ` [U-Boot] [PATCH 6/6] usb:gadget:f_thor: cosmetic: Remove debug memset Lukasz Majewski
@ 2014-02-05  9:10 ` Lukasz Majewski
  2014-02-05  9:10   ` [U-Boot] [PATCH v2 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment Lukasz Majewski
                     ` (6 more replies)
  6 siblings, 7 replies; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-05  9:10 UTC (permalink / raw
  To: u-boot

This patch series comprises several improvements for Exynos4 USB code.

The most notable is transmission speed improvement (measured on Trats):

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH v2 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment
  2014-02-05  9:10 ` [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
@ 2014-02-05  9:10   ` Lukasz Majewski
  2014-02-06  1:20     ` Marek Vasut
  2014-02-05  9:10   ` [U-Boot] [PATCH v2 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver Lukasz Majewski
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-05  9:10 UTC (permalink / raw
  To: u-boot

Calls to malloc() have been replaced by memalign. It now provides proper
buffer alignment.

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>

---
Changes for v2:
- Remove Change-Id.
---
 drivers/usb/gadget/f_mass_storage.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/f_mass_storage.c b/drivers/usb/gadget/f_mass_storage.c
index b1fe8bd..f896169 100644
--- a/drivers/usb/gadget/f_mass_storage.c
+++ b/drivers/usb/gadget/f_mass_storage.c
@@ -2515,7 +2515,7 @@ static struct fsg_common *fsg_common_init(struct fsg_common *common,
 buffhds_first_it:
 		bh->inreq_busy = 0;
 		bh->outreq_busy = 0;
-		bh->buf = kmalloc(FSG_BUFLEN, GFP_KERNEL);
+		bh->buf = memalign(CONFIG_SYS_CACHELINE_SIZE, FSG_BUFLEN);
 		if (unlikely(!bh->buf)) {
 			rc = -ENOMEM;
 			goto error_release;
@@ -2622,7 +2622,7 @@ usb_copy_descriptors(struct usb_descriptor_header **src)
 		bytes += (*tmp)->bLength;
 	bytes += (n_desc + 1) * sizeof(*tmp);
 
-	mem = kmalloc(bytes, GFP_KERNEL);
+	mem = memalign(CONFIG_SYS_CACHELINE_SIZE, bytes);
 	if (!mem)
 		return NULL;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH v2 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver
  2014-02-05  9:10 ` [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
  2014-02-05  9:10   ` [U-Boot] [PATCH v2 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment Lukasz Majewski
@ 2014-02-05  9:10   ` Lukasz Majewski
  2014-02-05  9:10   ` [U-Boot] [PATCH v2 3/6] usb:udc:samsung: Allow burst transfers for non EP0 endpints Lukasz Majewski
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-05  9:10 UTC (permalink / raw
  To: u-boot

A set of cache operations (both invalidation and flush) were redundant
in the S3C HS OTG Samsung driver:

1. s3c_udc_ep0_zlp - to transmit EP0's ZLP packets one don't need to flush
the cache (since it is the zero length transmission)

2. s3c_udc_pre_setup and s3c_ep0_complete_out - cache invalidation is not
needed when the buffer for OUT EP0 transmission is setup, since no data
has yet arrived.

Cache cleanups presented above don't contribute much to transmission speed
up, hence shall be regarded as cosmetic changes.

3. setdma_rx - here the s3c UDC driver's internal buffers were invalidated.
This call is not needed anymore since we reuse the buffers passed from
gadgets. This is a key contribution to transmission speed improvement.


Test condition
- test HW + measurement: Trats - Exynos4210 rev.1
- test HW Trats2 - Exynos4412 rev.1
400 MiB compressed rootfs image download with `thor 0 mmc 0`

Measurements:

Base values (without improvement):
Transmission speed: 9.51 MiB/s

After the change:
Transmission speed: 10.15 MiB/s

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>

---
Changes for v2:
- Remove Change-Id:
- More verbose commit message
- Remove superficial cache invalidation at s3c_ep0_complete_out() function
---
 drivers/usb/gadget/s3c_udc_otg_xfer_dma.c |   17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c b/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
index 1cbf8f6..cbc8734 100644
--- a/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
+++ b/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
@@ -29,10 +29,6 @@ static inline void s3c_udc_ep0_zlp(struct s3c_udc *dev)
 {
 	u32 ep_ctrl;
 
-	flush_dcache_range((unsigned long) usb_ctrl_dma_addr,
-			   (unsigned long) usb_ctrl_dma_addr
-			   + DMA_BUFFER_SIZE);
-
 	writel(usb_ctrl_dma_addr, &reg->in_endp[EP0_CON].diepdma);
 	writel(DIEPT_SIZ_PKT_CNT(1), &reg->in_endp[EP0_CON].dieptsiz);
 
@@ -52,10 +48,6 @@ void s3c_udc_pre_setup(void)
 	debug_cond(DEBUG_IN_EP,
 		   "%s : Prepare Setup packets.\n", __func__);
 
-	invalidate_dcache_range((unsigned long) usb_ctrl_dma_addr,
-				(unsigned long) usb_ctrl_dma_addr
-				+ DMA_BUFFER_SIZE);
-
 	writel(DOEPT_SIZ_PKT_CNT(1) | sizeof(struct usb_ctrlrequest),
 	       &reg->out_endp[EP0_CON].doeptsiz);
 	writel(usb_ctrl_dma_addr, &reg->out_endp[EP0_CON].doepdma);
@@ -82,10 +74,6 @@ static inline void s3c_ep0_complete_out(void)
 	debug_cond(DEBUG_IN_EP,
 		"%s : Prepare Complete Out packet.\n", __func__);
 
-	invalidate_dcache_range((unsigned long) usb_ctrl_dma_addr,
-				(unsigned long) usb_ctrl_dma_addr
-				+ DMA_BUFFER_SIZE);
-
 	writel(DOEPT_SIZ_PKT_CNT(1) | sizeof(struct usb_ctrlrequest),
 	       &reg->out_endp[EP0_CON].doeptsiz);
 	writel(usb_ctrl_dma_addr, &reg->out_endp[EP0_CON].doepdma);
@@ -115,11 +103,6 @@ static int setdma_rx(struct s3c_ep *ep, struct s3c_request *req)
 	ep->len = length;
 	ep->dma_buf = buf;
 
-	invalidate_dcache_range((unsigned long) ep->dev->dma_buf[ep_num],
-				(unsigned long) ep->dev->dma_buf[ep_num]
-				+ ROUND(ep->ep.maxpacket,
-					CONFIG_SYS_CACHELINE_SIZE));
-
 	if (length == 0)
 		pktcnt = 1;
 	else
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH v2 3/6] usb:udc:samsung: Allow burst transfers for non EP0 endpints
  2014-02-05  9:10 ` [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
  2014-02-05  9:10   ` [U-Boot] [PATCH v2 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment Lukasz Majewski
  2014-02-05  9:10   ` [U-Boot] [PATCH v2 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver Lukasz Majewski
@ 2014-02-05  9:10   ` Lukasz Majewski
  2014-02-05  9:10   ` [U-Boot] [PATCH v2 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver Lukasz Majewski
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-05  9:10 UTC (permalink / raw
  To: u-boot

This patch removed obscure restriction on the HW setting of DMA transfers.
Before this change each transaction sent up to 512 bytes (with packet count
equal to 1) for non EP0 transfer.

Now it is possible to setup DMA transaction up to DMA_BUFFER_SIZE.

Test condition
- test HW + measurement: Trats - Exynos4210 rev.1
- test HW Trats2 - Exynos4412 rev.1
400 MiB compressed rootfs image download with `thor 0 mmc 0`

Measurement:
Transmission speed: 20.74 MiB/s

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>

---
Changes for v2:
- Remove Change-Id:
---
 drivers/usb/gadget/s3c_udc_otg_xfer_dma.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c b/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
index cbc8734..7c7176b 100644
--- a/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
+++ b/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
@@ -97,18 +97,17 @@ static int setdma_rx(struct s3c_ep *ep, struct s3c_request *req)
 	u32 ep_num = ep_index(ep);
 
 	buf = req->req.buf + req->req.actual;
-
-	length = min(req->req.length - req->req.actual, (int)ep->ep.maxpacket);
+	length = min(req->req.length - req->req.actual,
+		     ep_num ? DMA_BUFFER_SIZE : ep->ep.maxpacket);
 
 	ep->len = length;
 	ep->dma_buf = buf;
 
-	if (length == 0)
+	if (ep_num == EP0_CON || length == 0)
 		pktcnt = 1;
 	else
 		pktcnt = (length - 1)/(ep->ep.maxpacket) + 1;
 
-	pktcnt = 1;
 	ctrl =  readl(&reg->out_endp[ep_num].doepctl);
 
 	writel(the_controller->dma_addr[ep_index(ep)+1],
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH v2 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver
  2014-02-05  9:10 ` [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
                     ` (2 preceding siblings ...)
  2014-02-05  9:10   ` [U-Boot] [PATCH v2 3/6] usb:udc:samsung: Allow burst transfers for non EP0 endpints Lukasz Majewski
@ 2014-02-05  9:10   ` Lukasz Majewski
  2014-02-05  9:10   ` [U-Boot] [PATCH v2 5/6] usb:gadget:f_thor: Allocate request up to THOR_PACKET_SIZE not ep->maxpacket Lukasz Majewski
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-05  9:10 UTC (permalink / raw
  To: u-boot

The Samsung's UDC driver is not anymore copying data from USB requests to
aligned internal buffers. Now it works directly in data allocated in the
upper layers like UMS, DFU, THOR.

This change is possible since those gadgets now must take care to allocate
buffers aligned to cache line (CONFIG_SYS_CACHELINE_SIZE).

This can be achieved by using DEFINE_CACHE_ALIGN_BUFFER() or
ALLOC_CACHE_ALIGN_BUFFER() macros. Those take care to allocate buffer
aligned to cache line in both starting address and its size.
Sometimes it is enough to just use memalign() with size being a
multiplication of cache line size.

Test condition
- test HW + measurement: Trats - Exynos4210 rev.1
- test HW Trats2 - Exynos4412 rev.1
400 MiB compressed rootfs image download with `thor 0 mmc 0`

Measurement:
Transmission speed: 27.04 MiB/s

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>

---
Changes for v2:
- Remove Change-Id:
- More verbose commit message
- Add comment explaining the need for proper buffer alignment
---
 drivers/usb/gadget/s3c_udc_otg.c          |   19 +++++----
 drivers/usb/gadget/s3c_udc_otg_xfer_dma.c |   63 +++++++++++++++--------------
 include/usb/s3c_udc.h                     |    5 +--
 3 files changed, 43 insertions(+), 44 deletions(-)

diff --git a/drivers/usb/gadget/s3c_udc_otg.c b/drivers/usb/gadget/s3c_udc_otg.c
index ba17a04..63d4487 100644
--- a/drivers/usb/gadget/s3c_udc_otg.c
+++ b/drivers/usb/gadget/s3c_udc_otg.c
@@ -843,7 +843,7 @@ static struct s3c_udc memory = {
 int s3c_udc_probe(struct s3c_plat_otg_data *pdata)
 {
 	struct s3c_udc *dev = &memory;
-	int retval = 0, i;
+	int retval = 0;
 
 	debug("%s: %p\n", __func__, pdata);
 
@@ -864,16 +864,15 @@ int s3c_udc_probe(struct s3c_plat_otg_data *pdata)
 
 	the_controller = dev;
 
-	for (i = 0; i < S3C_MAX_ENDPOINTS+1; i++) {
-		dev->dma_buf[i] = memalign(CONFIG_SYS_CACHELINE_SIZE,
-					   DMA_BUFFER_SIZE);
-		dev->dma_addr[i] = (dma_addr_t) dev->dma_buf[i];
-		invalidate_dcache_range((unsigned long) dev->dma_buf[i],
-					(unsigned long) (dev->dma_buf[i]
-							 + DMA_BUFFER_SIZE));
+	usb_ctrl = memalign(CONFIG_SYS_CACHELINE_SIZE,
+			    ROUND(sizeof(struct usb_ctrlrequest),
+				  CONFIG_SYS_CACHELINE_SIZE));
+	if (!usb_ctrl) {
+		error("No memory available for UDC!\n");
+		return -ENOMEM;
 	}
-	usb_ctrl = dev->dma_buf[0];
-	usb_ctrl_dma_addr = dev->dma_addr[0];
+
+	usb_ctrl_dma_addr = (dma_addr_t) usb_ctrl;
 
 	udc_reinit(dev);
 
diff --git a/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c b/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
index 7c7176b..06dfeed 100644
--- a/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
+++ b/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
@@ -110,8 +110,7 @@ static int setdma_rx(struct s3c_ep *ep, struct s3c_request *req)
 
 	ctrl =  readl(&reg->out_endp[ep_num].doepctl);
 
-	writel(the_controller->dma_addr[ep_index(ep)+1],
-	       &reg->out_endp[ep_num].doepdma);
+	writel((unsigned int) ep->dma_buf, &reg->out_endp[ep_num].doepdma);
 	writel(DOEPT_SIZ_PKT_CNT(pktcnt) | DOEPT_SIZ_XFER_SIZE(length),
 	       &reg->out_endp[ep_num].doeptsiz);
 	writel(DEPCTL_EPENA|DEPCTL_CNAK|ctrl, &reg->out_endp[ep_num].doepctl);
@@ -134,7 +133,6 @@ int setdma_tx(struct s3c_ep *ep, struct s3c_request *req)
 	u32 *buf, ctrl = 0;
 	u32 length, pktcnt;
 	u32 ep_num = ep_index(ep);
-	u32 *p = the_controller->dma_buf[ep_index(ep)+1];
 
 	buf = req->req.buf + req->req.actual;
 	length = req->req.length - req->req.actual;
@@ -144,10 +142,10 @@ int setdma_tx(struct s3c_ep *ep, struct s3c_request *req)
 
 	ep->len = length;
 	ep->dma_buf = buf;
-	memcpy(p, ep->dma_buf, length);
 
-	flush_dcache_range((unsigned long) p ,
-			   (unsigned long) p + DMA_BUFFER_SIZE);
+	flush_dcache_range((unsigned long) ep->dma_buf,
+			   (unsigned long) ep->dma_buf +
+			   ROUND(ep->len, CONFIG_SYS_CACHELINE_SIZE));
 
 	if (length == 0)
 		pktcnt = 1;
@@ -160,8 +158,7 @@ int setdma_tx(struct s3c_ep *ep, struct s3c_request *req)
 	while (readl(&reg->grstctl) & TX_FIFO_FLUSH)
 		;
 
-	writel(the_controller->dma_addr[ep_index(ep)+1],
-	       &reg->in_endp[ep_num].diepdma);
+	writel((unsigned long) ep->dma_buf, &reg->in_endp[ep_num].diepdma);
 	writel(DIEPT_SIZ_PKT_CNT(pktcnt) | DIEPT_SIZ_XFER_SIZE(length),
 	       &reg->in_endp[ep_num].dieptsiz);
 
@@ -194,7 +191,6 @@ static void complete_rx(struct s3c_udc *dev, u8 ep_num)
 	struct s3c_ep *ep = &dev->ep[ep_num];
 	struct s3c_request *req = NULL;
 	u32 ep_tsr = 0, xfer_size = 0, is_short = 0;
-	u32 *p = the_controller->dma_buf[ep_index(ep)+1];
 
 	if (list_empty(&ep->queue)) {
 		debug_cond(DEBUG_OUT_EP != 0,
@@ -214,10 +210,23 @@ static void complete_rx(struct s3c_udc *dev, u8 ep_num)
 
 	xfer_size = ep->len - xfer_size;
 
-	invalidate_dcache_range((unsigned long) p,
-				(unsigned long) p + DMA_BUFFER_SIZE);
-
-	memcpy(ep->dma_buf, p, ep->len);
+	/*
+	 * NOTE:
+	 *
+	 * Please be careful with proper buffer allocation for USB request,
+	 * which needs to be aligned to CONFIG_SYS_CACHELINE_SIZE, not only
+	 * with starting address, but also its size shall be a cache line
+	 * multiplication.
+	 *
+	 * This will prevent from corruption of data allocated immediatelly
+	 * before or after the buffer.
+	 *
+	 * For armv7, the cache_v7.c provides proper code to emit "ERROR"
+	 * message to warn users.
+	 */
+	invalidate_dcache_range((unsigned long) ep->dma_buf,
+				(unsigned long) ep->dma_buf +
+				ROUND(xfer_size, CONFIG_SYS_CACHELINE_SIZE));
 
 	req->req.actual += min(xfer_size, req->req.length - req->req.actual);
 	is_short = (xfer_size < ep->ep.maxpacket);
@@ -711,19 +720,14 @@ static int write_fifo_ep0(struct s3c_ep *ep, struct s3c_request *req)
 
 int s3c_fifo_read(struct s3c_ep *ep, u32 *cp, int max)
 {
-	u32 bytes;
-
-	bytes = sizeof(struct usb_ctrlrequest);
-
-	invalidate_dcache_range((unsigned long) ep->dev->dma_buf[ep_index(ep)],
-				(unsigned long) ep->dev->dma_buf[ep_index(ep)]
-				+ DMA_BUFFER_SIZE);
+	invalidate_dcache_range((unsigned long)cp, (unsigned long)cp +
+				ROUND(max, CONFIG_SYS_CACHELINE_SIZE));
 
 	debug_cond(DEBUG_EP0 != 0,
-		   "%s: bytes=%d, ep_index=%d %p\n", __func__,
-		   bytes, ep_index(ep), ep->dev->dma_buf[ep_index(ep)]);
+		   "%s: bytes=%d, ep_index=%d 0x%p\n", __func__,
+		   max, ep_index(ep), cp);
 
-	return bytes;
+	return max;
 }
 
 /**
@@ -855,14 +859,12 @@ static int s3c_ep0_write(struct s3c_udc *dev)
 	return 1;
 }
 
-u16	g_status;
-
 int s3c_udc_get_status(struct s3c_udc *dev,
 		struct usb_ctrlrequest *crq)
 {
 	u8 ep_num = crq->wIndex & 0x7F;
+	u16 g_status = 0;
 	u32 ep_ctrl;
-	u32 *p = the_controller->dma_buf[1];
 
 	debug_cond(DEBUG_SETUP != 0,
 		   "%s: *** USB_REQ_GET_STATUS\n", __func__);
@@ -900,12 +902,13 @@ int s3c_udc_get_status(struct s3c_udc *dev,
 		return 1;
 	}
 
-	memcpy(p, &g_status, sizeof(g_status));
+	memcpy(usb_ctrl, &g_status, sizeof(g_status));
 
-	flush_dcache_range((unsigned long) p,
-			   (unsigned long) p + DMA_BUFFER_SIZE);
+	flush_dcache_range((unsigned long) usb_ctrl,
+			   (unsigned long) usb_ctrl +
+			   ROUND(sizeof(g_status), CONFIG_SYS_CACHELINE_SIZE));
 
-	writel(the_controller->dma_addr[1], &reg->in_endp[EP0_CON].diepdma);
+	writel(usb_ctrl_dma_addr, &reg->in_endp[EP0_CON].diepdma);
 	writel(DIEPT_SIZ_PKT_CNT(1) | DIEPT_SIZ_XFER_SIZE(2),
 	       &reg->in_endp[EP0_CON].dieptsiz);
 
diff --git a/include/usb/s3c_udc.h b/include/usb/s3c_udc.h
index 734c6cd..6dead2f 100644
--- a/include/usb/s3c_udc.h
+++ b/include/usb/s3c_udc.h
@@ -19,7 +19,7 @@
 
 /*-------------------------------------------------------------------------*/
 /* DMA bounce buffer size, 16K is enough even for mass storage */
-#define DMA_BUFFER_SIZE	(4096*4)
+#define DMA_BUFFER_SIZE	(16*SZ_1K)
 
 #define EP0_FIFO_SIZE		64
 #define EP_FIFO_SIZE		512
@@ -81,9 +81,6 @@ struct s3c_udc {
 
 	struct s3c_plat_otg_data *pdata;
 
-	void *dma_buf[S3C_MAX_ENDPOINTS+1];
-	dma_addr_t dma_addr[S3C_MAX_ENDPOINTS+1];
-
 	int ep0state;
 	struct s3c_ep ep[S3C_MAX_ENDPOINTS];
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH v2 5/6] usb:gadget:f_thor: Allocate request up to THOR_PACKET_SIZE not ep->maxpacket
  2014-02-05  9:10 ` [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
                     ` (3 preceding siblings ...)
  2014-02-05  9:10   ` [U-Boot] [PATCH v2 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver Lukasz Majewski
@ 2014-02-05  9:10   ` Lukasz Majewski
  2014-02-05  9:10   ` [U-Boot] [PATCH v2 6/6] usb:gadget:f_thor: cosmetic: Remove debug memset Lukasz Majewski
  2014-02-06  1:24   ` [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements Marek Vasut
  6 siblings, 0 replies; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-05  9:10 UTC (permalink / raw
  To: u-boot

Now it is possible to allocate static request - which receives data from
the host (OUT transaction) to the size of THOR packet.

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>

---
Changes for v2:
- Remove Change-Id:
---
 drivers/usb/gadget/f_thor.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/f_thor.c b/drivers/usb/gadget/f_thor.c
index c4c9909..780729a 100644
--- a/drivers/usb/gadget/f_thor.c
+++ b/drivers/usb/gadget/f_thor.c
@@ -614,7 +614,7 @@ static struct usb_request *thor_start_ep(struct usb_ep *ep)
 {
 	struct usb_request *req;
 
-	req = alloc_ep_req(ep, ep->maxpacket);
+	req = alloc_ep_req(ep, THOR_PACKET_SIZE);
 	debug("%s: ep:%p req:%p\n", __func__, ep, req);
 
 	if (!req)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH v2 6/6] usb:gadget:f_thor: cosmetic: Remove debug memset
  2014-02-05  9:10 ` [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
                     ` (4 preceding siblings ...)
  2014-02-05  9:10   ` [U-Boot] [PATCH v2 5/6] usb:gadget:f_thor: Allocate request up to THOR_PACKET_SIZE not ep->maxpacket Lukasz Majewski
@ 2014-02-05  9:10   ` Lukasz Majewski
  2014-02-06  1:24   ` [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements Marek Vasut
  6 siblings, 0 replies; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-05  9:10 UTC (permalink / raw
  To: u-boot

Apparently debug memset (with a 0x55 value) has been overlooked in the
f_thor code.

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>

---
Changes for v2:
- Remove Change-id:
---
 drivers/usb/gadget/f_thor.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/usb/gadget/f_thor.c b/drivers/usb/gadget/f_thor.c
index 780729a..f5c0224 100644
--- a/drivers/usb/gadget/f_thor.c
+++ b/drivers/usb/gadget/f_thor.c
@@ -623,8 +623,6 @@ static struct usb_request *thor_start_ep(struct usb_ep *ep)
 	memset(req->buf, 0, req->length);
 	req->complete = thor_rx_tx_complete;
 
-	memset(req->buf, 0x55, req->length);
-
 	return req;
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH v2 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment
  2014-02-05  9:10   ` [U-Boot] [PATCH v2 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment Lukasz Majewski
@ 2014-02-06  1:20     ` Marek Vasut
  2014-02-06  6:33       ` Lukasz Majewski
  0 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2014-02-06  1:20 UTC (permalink / raw
  To: u-boot

On Wednesday, February 05, 2014 at 10:10:41 AM, Lukasz Majewski wrote:
> Calls to malloc() have been replaced by memalign. It now provides proper
> buffer alignment.
> 
> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> Cc: Marek Vasut <marex@denx.de>
> 
> ---
> Changes for v2:
> - Remove Change-Id.
> ---
>  drivers/usb/gadget/f_mass_storage.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/usb/gadget/f_mass_storage.c
> b/drivers/usb/gadget/f_mass_storage.c index b1fe8bd..f896169 100644
> --- a/drivers/usb/gadget/f_mass_storage.c
> +++ b/drivers/usb/gadget/f_mass_storage.c
> @@ -2515,7 +2515,7 @@ static struct fsg_common *fsg_common_init(struct
> fsg_common *common, buffhds_first_it:
>  		bh->inreq_busy = 0;
>  		bh->outreq_busy = 0;
> -		bh->buf = kmalloc(FSG_BUFLEN, GFP_KERNEL);
> +		bh->buf = memalign(CONFIG_SYS_CACHELINE_SIZE, FSG_BUFLEN);
>  		if (unlikely(!bh->buf)) {
>  			rc = -ENOMEM;
>  			goto error_release;
> @@ -2622,7 +2622,7 @@ usb_copy_descriptors(struct usb_descriptor_header
> **src) bytes += (*tmp)->bLength;
>  	bytes += (n_desc + 1) * sizeof(*tmp);
> 
> -	mem = kmalloc(bytes, GFP_KERNEL);
> +	mem = memalign(CONFIG_SYS_CACHELINE_SIZE, bytes);

I wonder, does this align the begining of the buffer as well or only the size? 
Can we rely on the fact the begining is also cacheline-aligned ?

>  	if (!mem)
>  		return NULL;

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements
  2014-02-05  9:10 ` [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
                     ` (5 preceding siblings ...)
  2014-02-05  9:10   ` [U-Boot] [PATCH v2 6/6] usb:gadget:f_thor: cosmetic: Remove debug memset Lukasz Majewski
@ 2014-02-06  1:24   ` Marek Vasut
  6 siblings, 0 replies; 36+ messages in thread
From: Marek Vasut @ 2014-02-06  1:24 UTC (permalink / raw
  To: u-boot

On Wednesday, February 05, 2014 at 10:10:40 AM, Lukasz Majewski wrote:
> This patch series comprises several improvements for Exynos4 USB code.
> 
> The most notable is transmission speed improvement (measured on Trats):
> From 9.51 MiB/s up to 27 MiB/s
> 
> This is due to UDC driver optimizations.
> 
> Also a code cleanup for THOR gadget has been included.
> 
> Lukasz Majewski (6):
>   usb:gadget:ums: Replace malloc calls with memalign to fix cache
>     buffer alignment
>   usb:udc:samsung: Remove redundant cache operation from Samsung UDC
>     driver
>   usb:udc:samsung: Allow burst transfers for non EP0 endpints
>   usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC
>     driver
>   usb:gadget:f_thor: Allocate request up to THOR_PACKET_SIZE not
>     ep->maxpacket
>   usb:gadget:f_thor: cosmetic: Remove debug memset
> 
>  drivers/usb/gadget/f_mass_storage.c       |    4 +-
>  drivers/usb/gadget/f_thor.c               |    4 +-
>  drivers/usb/gadget/s3c_udc_otg.c          |   19 +++----
>  drivers/usb/gadget/s3c_udc_otg_xfer_dma.c |   87
> ++++++++++++----------------- include/usb/s3c_udc.h                     | 
>   5 +-
>  5 files changed, 49 insertions(+), 70 deletions(-)

Applied all, but I'd like to know about the memalign() calls' precise alignment 
behavior -- that is, if the begining if cacheline aligned or if it's only the 
size or what -- see my rant /wrt patch #1 .

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH v2 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment
  2014-02-06  1:20     ` Marek Vasut
@ 2014-02-06  6:33       ` Lukasz Majewski
  2014-02-06  6:41         ` Marek Vasut
  0 siblings, 1 reply; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-06  6:33 UTC (permalink / raw
  To: u-boot

Hi Marek,

> On Wednesday, February 05, 2014 at 10:10:41 AM, Lukasz Majewski wrote:
> > Calls to malloc() have been replaced by memalign. It now provides
> > proper buffer alignment.
> > 
> > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > Cc: Marek Vasut <marex@denx.de>
> > 
> > ---
> > Changes for v2:
> > - Remove Change-Id.
> > ---
> >  drivers/usb/gadget/f_mass_storage.c |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/usb/gadget/f_mass_storage.c
> > b/drivers/usb/gadget/f_mass_storage.c index b1fe8bd..f896169 100644
> > --- a/drivers/usb/gadget/f_mass_storage.c
> > +++ b/drivers/usb/gadget/f_mass_storage.c
> > @@ -2515,7 +2515,7 @@ static struct fsg_common
> > *fsg_common_init(struct fsg_common *common, buffhds_first_it:
> >  		bh->inreq_busy = 0;
> >  		bh->outreq_busy = 0;
> > -		bh->buf = kmalloc(FSG_BUFLEN, GFP_KERNEL);
> > +		bh->buf = memalign(CONFIG_SYS_CACHELINE_SIZE,
> > FSG_BUFLEN); if (unlikely(!bh->buf)) {
> >  			rc = -ENOMEM;
> >  			goto error_release;
> > @@ -2622,7 +2622,7 @@ usb_copy_descriptors(struct
> > usb_descriptor_header **src) bytes += (*tmp)->bLength;
> >  	bytes += (n_desc + 1) * sizeof(*tmp);
> > 
> > -	mem = kmalloc(bytes, GFP_KERNEL);
> > +	mem = memalign(CONFIG_SYS_CACHELINE_SIZE, bytes);
> 
> I wonder, does this align the begining of the buffer as well or only
> the size? Can we rely on the fact the begining is also
> cacheline-aligned ?

In this case, the memalign assures, that beginning of the buffer is
aligned to a cache line.

It is the programmer's responsibility to either have "bytes" aligned to
cache line size or use a wrapper like DEFINE_CACHE_ALIGN_BUFFER().

> 
> >  	if (!mem)
> >  		return NULL;
> 
> Best regards,
> Marek Vasut



-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH v2 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment
  2014-02-06  6:33       ` Lukasz Majewski
@ 2014-02-06  6:41         ` Marek Vasut
  2014-02-06  8:16           ` Lukasz Majewski
  0 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2014-02-06  6:41 UTC (permalink / raw
  To: u-boot

On Thursday, February 06, 2014 at 07:33:07 AM, Lukasz Majewski wrote:
> Hi Marek,
> 
> > On Wednesday, February 05, 2014 at 10:10:41 AM, Lukasz Majewski wrote:
> > > Calls to malloc() have been replaced by memalign. It now provides
> > > proper buffer alignment.
> > > 
> > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > Cc: Marek Vasut <marex@denx.de>
> > > 
> > > ---
> > > Changes for v2:
> > > - Remove Change-Id.
> > > ---
> > > 
> > >  drivers/usb/gadget/f_mass_storage.c |    4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/usb/gadget/f_mass_storage.c
> > > b/drivers/usb/gadget/f_mass_storage.c index b1fe8bd..f896169 100644
> > > --- a/drivers/usb/gadget/f_mass_storage.c
> > > +++ b/drivers/usb/gadget/f_mass_storage.c
> > > @@ -2515,7 +2515,7 @@ static struct fsg_common
> > > 
> > > *fsg_common_init(struct fsg_common *common, buffhds_first_it:
> > >  		bh->inreq_busy = 0;
> > >  		bh->outreq_busy = 0;
> > > 
> > > -		bh->buf = kmalloc(FSG_BUFLEN, GFP_KERNEL);
> > > +		bh->buf = memalign(CONFIG_SYS_CACHELINE_SIZE,
> > > FSG_BUFLEN); if (unlikely(!bh->buf)) {
> > > 
> > >  			rc = -ENOMEM;
> > >  			goto error_release;
> > > 
> > > @@ -2622,7 +2622,7 @@ usb_copy_descriptors(struct
> > > usb_descriptor_header **src) bytes += (*tmp)->bLength;
> > > 
> > >  	bytes += (n_desc + 1) * sizeof(*tmp);
> > > 
> > > -	mem = kmalloc(bytes, GFP_KERNEL);
> > > +	mem = memalign(CONFIG_SYS_CACHELINE_SIZE, bytes);
> > 
> > I wonder, does this align the begining of the buffer as well or only
> > the size? Can we rely on the fact the begining is also
> > cacheline-aligned ?
> 
> In this case, the memalign assures, that beginning of the buffer is
> aligned to a cache line.

How so ? :)
[...]

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH v2 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment
  2014-02-06  6:41         ` Marek Vasut
@ 2014-02-06  8:16           ` Lukasz Majewski
  0 siblings, 0 replies; 36+ messages in thread
From: Lukasz Majewski @ 2014-02-06  8:16 UTC (permalink / raw
  To: u-boot

Hi Marek,

> On Thursday, February 06, 2014 at 07:33:07 AM, Lukasz Majewski wrote:
> > Hi Marek,
> > 
> > > On Wednesday, February 05, 2014 at 10:10:41 AM, Lukasz Majewski
> > > wrote:
> > > > Calls to malloc() have been replaced by memalign. It now
> > > > provides proper buffer alignment.
> > > > 
> > > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > > Cc: Marek Vasut <marex@denx.de>
> > > > 
> > > > ---
> > > > Changes for v2:
> > > > - Remove Change-Id.
> > > > ---
> > > > 
> > > >  drivers/usb/gadget/f_mass_storage.c |    4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/usb/gadget/f_mass_storage.c
> > > > b/drivers/usb/gadget/f_mass_storage.c index b1fe8bd..f896169
> > > > 100644 --- a/drivers/usb/gadget/f_mass_storage.c
> > > > +++ b/drivers/usb/gadget/f_mass_storage.c
> > > > @@ -2515,7 +2515,7 @@ static struct fsg_common
> > > > 
> > > > *fsg_common_init(struct fsg_common *common, buffhds_first_it:
> > > >  		bh->inreq_busy = 0;
> > > >  		bh->outreq_busy = 0;
> > > > 
> > > > -		bh->buf = kmalloc(FSG_BUFLEN, GFP_KERNEL);
> > > > +		bh->buf = memalign(CONFIG_SYS_CACHELINE_SIZE,
> > > > FSG_BUFLEN); if (unlikely(!bh->buf)) {
> > > > 
> > > >  			rc = -ENOMEM;
> > > >  			goto error_release;
> > > > 
> > > > @@ -2622,7 +2622,7 @@ usb_copy_descriptors(struct
> > > > usb_descriptor_header **src) bytes += (*tmp)->bLength;
> > > > 
> > > >  	bytes += (n_desc + 1) * sizeof(*tmp);
> > > > 
> > > > -	mem = kmalloc(bytes, GFP_KERNEL);
> > > > +	mem = memalign(CONFIG_SYS_CACHELINE_SIZE, bytes);
> > > 
> > > I wonder, does this align the begining of the buffer as well or
> > > only the size? Can we rely on the fact the begining is also
> > > cacheline-aligned ?
> > 
> > In this case, the memalign assures, that beginning of the buffer is
> > aligned to a cache line.
> 
> How so ? :)

I'm just following what is written at ./include/malloc.h  :-)

memalign(size_t alignment, size_t n);
     Return a pointer to a newly allocated chunk of n bytes, aligned
     in accord with the alignment argument, which must be a power of
     two.

Usage of memalign seemed to fix the wrong cache alignment buffer
allocation issue in a code which I know.

> [...]
> 
> Best regards,
> Marek Vasut



-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2014-02-06  8:16 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-31 12:16 [U-Boot] [PATCH 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
2014-01-31 12:16 ` [U-Boot] [PATCH 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment Lukasz Majewski
2014-02-01  2:48   ` Marek Vasut
2014-02-01  9:10     ` Lukasz Majewski
2014-01-31 12:16 ` [U-Boot] [PATCH 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver Lukasz Majewski
2014-02-01  2:50   ` Marek Vasut
2014-02-01  9:56     ` Lukasz Majewski
2014-02-01 22:49       ` Marek Vasut
2014-02-03  8:05         ` Lukasz Majewski
2014-02-03 18:06           ` Marek Vasut
2014-02-04  6:23             ` Lukasz Majewski
2014-01-31 12:16 ` [U-Boot] [PATCH 3/6] usb:udc:samsung: Allow burst transfers for non EP0 endpints Lukasz Majewski
2014-01-31 12:16 ` [U-Boot] [PATCH 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver Lukasz Majewski
2014-02-01  2:55   ` Marek Vasut
2014-02-01 11:05     ` Lukasz Majewski
2014-02-01 22:55       ` Marek Vasut
2014-02-03 11:06         ` Lukasz Majewski
2014-02-03 18:11           ` Marek Vasut
2014-02-04  7:29             ` Lukasz Majewski
2014-02-04 20:21               ` Marek Vasut
2014-02-04 21:49                 ` Lukasz Majewski
2014-02-05  2:35                   ` Marek Vasut
2014-01-31 12:16 ` [U-Boot] [PATCH 5/6] usb:gadget:f_thor: Allocate request up to THOR_PACKET_SIZE not ep->maxpacket Lukasz Majewski
2014-01-31 12:16 ` [U-Boot] [PATCH 6/6] usb:gadget:f_thor: cosmetic: Remove debug memset Lukasz Majewski
2014-02-05  9:10 ` [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements Lukasz Majewski
2014-02-05  9:10   ` [U-Boot] [PATCH v2 1/6] usb:gadget:ums: Replace malloc calls with memalign to fix cache buffer alignment Lukasz Majewski
2014-02-06  1:20     ` Marek Vasut
2014-02-06  6:33       ` Lukasz Majewski
2014-02-06  6:41         ` Marek Vasut
2014-02-06  8:16           ` Lukasz Majewski
2014-02-05  9:10   ` [U-Boot] [PATCH v2 2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver Lukasz Majewski
2014-02-05  9:10   ` [U-Boot] [PATCH v2 3/6] usb:udc:samsung: Allow burst transfers for non EP0 endpints Lukasz Majewski
2014-02-05  9:10   ` [U-Boot] [PATCH v2 4/6] usb:udc:samsung: Zero copy approach for data passed to Samsung's UDC driver Lukasz Majewski
2014-02-05  9:10   ` [U-Boot] [PATCH v2 5/6] usb:gadget:f_thor: Allocate request up to THOR_PACKET_SIZE not ep->maxpacket Lukasz Majewski
2014-02-05  9:10   ` [U-Boot] [PATCH v2 6/6] usb:gadget:f_thor: cosmetic: Remove debug memset Lukasz Majewski
2014-02-06  1:24   ` [U-Boot] [PATCH v2 0/6] usb:samsung: Exynos4 SoC USB code improvements Marek Vasut

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.