From mboxrd@z Thu Jan  1 00:00:00 1970
From: George Dunlap <dunlapg@umich.edu>
Subject: Re: [v7][PATCH 06/16] hvmloader/pci: skip reserved
	ranges
Date: Wed, 15 Jul 2015 14:40:11 +0100
Message-ID: <CAFLBxZa-A+pJUcMGbi467DLju-Wd3pwvUJTtQA2fm7VfpJz_3g@mail.gmail.com>
References: <1436420047-25356-1-git-send-email-tiejun.chen@intel.com>
	<1436420047-25356-7-git-send-email-tiejun.chen@intel.com>
	<55A3D5600200007800090330@mail.emea.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <55A3D5600200007800090330@mail.emea.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: Wei Liu <wei.liu2@citrix.com>, Ian Campbell <ian.campbell@citrix.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Ian Jackson <ian.jackson@eu.citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Tiejun Chen <tiejun.chen@intel.com>, Keir Fraser <keir@xen.org>
List-Id: xen-devel@lists.xenproject.org

On Mon, Jul 13, 2015 at 2:12 PM, Jan Beulich <JBeulich@suse.com> wrote:
> Therefore I'll not make any further comments on the rest of the
> patch, but instead outline an allocation model that I think would
> fit our needs: Subject to the constraints mentioned above, set up
> a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
> bits], i.e. reasonably small a memory block). Each bit represents a
> page usable for MMIO: First of all you remove the range from
> PCI_MEM_END upwards. Then remove all RDM pages. Now do a
> first pass over all devices, allocating (in the bitmap) space for only
> the 32-bit MMIO BARs, starting with the biggest one(s), by finding
> a best fit (i.e. preferably a range not usable by any bigger BAR)
> from top down. For example, if you have available
>
> [f0000000,f8000000)
> [f9000000,f9001000)
> [fa000000,fa003000)
> [fa010000,fa012000)
>
> and you're looking for a single page slot, you should end up
> picking fa002000.
>
> After this pass you should be able to do RAM relocation in a
> single attempt just like we do today (you may still grow the MMIO
> window if you know you need to and can fit some of the 64-bit
> BARs in there, subject to said constraints; this is in an attempt
> to help OSes not comfortable with 64-bit resources).
>
> In a 2nd pass you'd then assign 64-bit resources: If you can fit
> them below 4G (you still have the bitmap left of what you've got
> available), put them there. Allocation strategy could be the same
> as above (biggest first), perhaps allowing for some factoring out
> of logic, but here smallest first probably could work equally well.
> The main thought to decide between the two is whether it is
> better to fit as many (small) or as big (in total) as possible a set
> under 4G. I'd generally expect the former (as many as possible,
> leaving only a few huge ones to go above 4G) to be the better
> approach, but that's more a gut feeling than based on hard data.

I agree that it would be more sensible for hvmloader to make a "plan"
first, and then do the memory reallocation (if it's possible) at one
time, then go through and actually update the device BARs according to
the "plan".

However, I don't really see how having a bitmap really helps in this
case.  I would think having a list of free ranges (perhaps aligned by
powers of two?), sorted small->large, makes the most sense.

So suppose we had the above example, but with the range
[fa000000,fa005000) instead, and we're looking for a 4-page region.
Then our "free list" initially would look like this:

[f9000000,f9001000)
[fa010000,fa012000)
[fa000000,fa005000)
[f0000000,f8000000)

After skipping the first two because they aren't big enough, we'd take
0x4000 from the third one, placing the BAR at [fa000000,fa004000), and
putting the remainder [fa004000,fa005000) back on the free list in
order, thus:

[f9000000,f9001000)
[fa004000,fa005000)
[fa010000,fa012000)
[f0000000,f8000000)

If we got to the end and hadn't found a region large enough, *and* we
could still expand the MMIO hole, we could lower pci_mem_start until
it could fit.

What do you think?

 -George