All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* Write combining support in the upstream kernel
@ 2013-09-02  7:15 Jack Morgenstein
  0 siblings, 0 replies; only message in thread
From: Jack Morgenstein @ 2013-09-02  7:15 UTC (permalink / raw
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	yevgenyp-VPRAkNaXOzVWk0Htik3J/w, ogerlitz-VPRAkNaXOzVWk0Htik3J/w,
	eli-VPRAkNaXOzVWk0Htik3J/w

Hi Roland,

This is a re-posting (and rewording) of a question I sent you on July 6,
2009.

I've been looking at the write-combining support in the kernel,
and it looks good. The caller simply invokes pgprot_writecombine() and
if write combining is available, the region is mapped for it (if wc is
not available, the regions is mapped as non-cached).

However, the API silently activates write combining without providing
any architecture-independent means of knowing whether write combining
is enabled or not. 

For example, in X86 the procedure pgprot_writecombine is as follows:
 pgprot_t pgprot_writecombine(pgprot_t prot)
 {
         if (pat_enabled)
                 return __pgprot(pgprot_val(prot) | _PAGE_CACHE_WC);
         else
                 return pgprot_noncached(prot);
 }

Note that pat_enabled is an architecture-dependent variable!

Silent activation of WC is OK in situations where for feature X, if
write-combining is available, X works better and the driver's
performance improves. (the driver simply calls pgprot_writecombine(),
and if WC is available it is activated for the region; if it is not
available, the region is mapped in the usual fashion).

However, what about situations where we wish to enable feature X ONLY
if write combining is available? (In this case the driver cannot simply
call pgprot_writecombine() not knowing if write-combining is really
used or not).

The required logic here is:
	if (write-combining is available)
		Activate feature X, and use pgprot_writecombine() for
		its regions;
	else
		Do NOT activate feature X.

In MLNX_OFED, to get around this problem, I introduced some
architecture-dependent wrapper functions to take care of this (where
these functions simply indicate in a fixed manner whether write
combining is enabled for specific architectures):

#include <linux/pci.h>
#include "wc.h"

#if defined(__i386__) || defined(__x86_64__)

pgprot_t pgprot_wc(pgprot_t _prot)
{
        return pgprot_writecombine(_prot);
}

int mlx4_wc_enabled(void)
{
        return 1;
}

#elif defined(CONFIG_PPC64)

pgprot_t pgprot_wc(pgprot_t _prot)
{
        return __pgprot((pgprot_val(_prot) | _PAGE_NO_CACHE) &
                                     ~(pgprot_t)_PAGE_GUARDED);
}

int mlx4_wc_enabled(void)
{
        return 1;
}

#else   /* !(defined(__i386__) || defined(__x86_64__)) */

pgprot_t pgprot_wc(pgprot_t _prot)
{
        return pgprot_noncached(_prot);
}

int mlx4_wc_enabled(void)
{
        return 0;
}

#endif

I then use mlx4_wc_enabled() to determine whether or not to use
blueflame (which is feature X in this case):

static struct ib_ucontext *mlx4_ib_alloc_ucontext(struct
ib_device *ibdev, struct ib_udata *udata)
{

....
===>	if (mlx4_wc_enabled()) {
		resp.bf_reg_size      = dev->dev->caps.bf_reg_size;
		resp.bf_regs_per_page = dev->dev->caps.bf_regs_per_page;
	} else {
		resp.bf_reg_size      = 0;
		resp.bf_regs_per_page = 0;
	}

I would like, though, to have the capability in the kernel API to
determine if write-combining is available on a given host.

I thought of possibly comparing the result returned by
pgprot_writecombine(prot) to that returned by pgprot_noncached(prot)
-- if they are identical, then assume that write-combining is not
supported. (pgprot_noncached() is the default mapping of
pgprot_writecombine if it is not defined under the arch directory --
see file include/linux/pgtable.h).

This has a problem, however, in that I have no way of determining what
value of "prot" to use when doing this comparison -- there may be some
architectures which use bits of the prot structure to determine per
specific call whether or not to use write-combining (i.e.,
pgprot_writecombine(prot) could invoke pgprot_noncached(prot) if
certain bits were set in the prot structure, or return a
write-combining prot value if those bits are not set).

Using a zeroed-out pgprot structure in the comparison, for example, may
not be appropriate. (we may be allowing blueflame when it should not
be, or preventing blueflame when it should be allowed).

Do you have any ideas for how to determine if in fact write-combining
is available? How about introducing an external variable (say
extern int write_combining_active) which would be initialized by the
kernel (per architecture) to be 1 or 0? 

-Jack

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2013-09-02  7:15 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-02  7:15 Write combining support in the upstream kernel Jack Morgenstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.