On 2020/7/17 下午10:18, Peter Xu wrote:
On Thu, Jul 16, 2020 at 10:54:31AM +0800, Jason Wang wrote:
> On 2020/7/16 上午9:00, Peter Xu wrote:
>> On Mon, Jul 13, 2020 at 12:04:16PM +0800, Jason Wang wrote:
>>> On 2020/7/10 下午9:30, Peter Xu wrote:
>>>> On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote:
>>>>> On 2020/7/9 下午10:10, Peter Xu wrote:
>>>>>> On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote:
>>>>>>>>> - If we care the performance, it's better to
implement the MAP event for
>>>>>>>>> vhost, otherwise it could be a lot of IOTLB miss
>>>>>>>> I feel like these are two things.
>>>>>>>>
>>>>>>>> So far what we are talking about is whether vt-d should
have knowledge about
>>>>>>>> what kind of events one iommu notifier is interested in.
I still think we
>>>>>>>> should keep this as answered in question 1.
>>>>>>>>
>>>>>>>> The other question is whether we want to switch vhost
from UNMAP to MAP/UNMAP
>>>>>>>> events even without vDMA, so that vhost can establish the
mapping even before
>>>>>>>> IO starts. IMHO it's doable, but only if the guest
runs DPDK workloads. When
>>>>>>>> the guest is using dynamic iommu page mappings, I feel
like that can be even
>>>>>>>> slower, because then the worst case is for each IO
we'll need to vmexit twice:
>>>>>>>>
>>>>>>>> - The first vmexit caused by an invalidation to
MAP the page tables, so vhost
>>>>>>>> will setup the page table before IO starts
>>>>>>>>
>>>>>>>> - IO/DMA triggers and completes
>>>>>>>>
>>>>>>>> - The second vmexit caused by another invalidation
to UNMAP the page tables
>>>>>>>>
>>>>>>>> So it seems to be worse than when vhost only uses UNMAP
like right now. At
>>>>>>>> least we only have one vmexit (when UNMAP). We'll
have a vhost translate()
>>>>>>>> request from kernel to userspace, but IMHO that's
cheaper than the vmexit.
>>>>>>> Right but then I would still prefer to have another
notifier.
>>>>>>>
>>>>>>> Since vtd_page_walk has nothing to do with device IOTLB.
IOMMU have a
>>>>>>> dedicated command for flushing device IOTLB. But the check
for
>>>>>>> vtd_as_has_map_notifier is used to skip the device which can
do demand
>>>>>>> paging via ATS or device specific way. If we have two
different notifiers,
>>>>>>> vhost will be on the device iotlb notifier so we don't
need it at all?
>>>>>> But we can still have iommu notifier that only registers to UNMAP
even after we
>>>>>> introduce dev-iotlb notifier? We don't want to do page walk
for them as well.
>>>>>> TCG should be the only one so far, but I don't know.. maybe
there can still be
>>>>>> new ones?
>>>>> I think you're right. But looking at the codes, it looks like the
check of
>>>>> vtd_as_has_map_notifier() was only used in:
>>>>>
>>>>> 1) vtd_iommu_replay()
>>>>> 2) vtd_iotlb_page_invalidate_notify() (PSI)
>>>>>
>>>>> For the replay, it's expensive anyhow. For PSI, I think it's
just about one
>>>>> or few mappings, not sure it will have obvious performance impact.
>>>>>
>>>>> And I had two questions:
>>>>>
>>>>> 1) The codes doesn't check map for DSI or GI, does this match
what spec
>>>>> said? (It looks to me the spec is unclear in this part)
>>>> Both DSI/GI should cover maps too? E.g. vtd_sync_shadow_page_table() in
>>>> vtd_iotlb_domain_invalidate().
>>> I meant the code doesn't check whether there's an MAP notifier :)
>> It's actually checked, because it loops over vtd_as_with_notifiers, and only
>> MAP notifiers register to that. :)
>
> I may miss something but I don't find the code to block UNMAP notifiers?
>
> vhost_iommu_region_add()
> memory_region_register_iommu_notifier()
> memory_region_update_iommu_notify_flags()
> imrc->notify_flag_changed()
> vtd_iommu_notify_flag_changed()
>
> ?
Yeah I think you're right. I might have confused with some previous
implementations. Maybe we should also do similar thing for DSI/GI just like
what we do in PSI.
Ok.
>>>>> 2) for the replay() I don't see other implementations (either
spapr or
>>>>> generic one) that did unmap (actually they skip unmap explicitly),
any
>>>>> reason for doing this in intel IOMMU?
>>>> I could be wrong, but I'd guess it's because vt-d implemented the
caching mode
>>>> by leveraging the same invalidation strucuture, so it's harder to
make all
>>>> things right (IOW, we can't clearly identify MAP with UNMAP when we
receive an
>>>> invalidation request, because MAP/UNMAP requests look the same).
>>>>
>>>> I didn't check others, but I believe spapr is doing it differently by
using
>>>> some hypercalls to deliver IOMMU map/unmap requests, which seems a bit
close to
>>>> what virtio-iommu is doing. Anyway, the point is if we have explicit
MAP/UNMAP
>>>> from the guest, logically the replay indeed does not need to do any
unmap
>>>> because we don't need to call replay() on an already existing device
but only
>>>> for e.g. hot plug.
>>> But this looks conflict with what memory_region_iommu_replay( ) did, for
>>> IOMMU that doesn't have a replay method, it skips UNMAP request:
>>>
>>> for (addr = 0; addr < memory_region_size(mr); addr += granularity)
{
>>> iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE,
n->iommu_idx);
>>> if (iotlb.perm != IOMMU_NONE) {
>>> n->notify(n, &iotlb);
>>> }
>>>
>>> I guess there's no knowledge of whether guest have an explicit MAP/UMAP
for
>>> this generic code. Or replay implies that guest doesn't have explicit
>>> MAP/UNMAP?
>> I think it matches exactly with a hot plug case? Note that when IOMMU_NONE
>> could also mean the translation does not exist. So it's actually trying to
map
>> everything that can be translated and then notify().
>
> Yes, so the question is what's the assumption before calling
> memory_region_iommu_replay(). If it assumes an empty mapping, there's
> probably no need for unamp.
The only caller of memory_region_iommu_replay() is vfio_listener_region_add(),
when there's a new vIOMMU memory region detected. So IIUC that guarantees the
previous state should be all empty.
Right, so there's no need to deal with unmap in vtd's replay
implementation (as what generic one did).
Thanks
Thanks,