From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1031648AbbEERyG (ORCPT ); Tue, 5 May 2015 13:54:06 -0400 Received: from mail-wi0-f182.google.com ([209.85.212.182]:34288 "EHLO mail-wi0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030821AbbEERyC (ORCPT ); Tue, 5 May 2015 13:54:02 -0400 From: Ingo Molnar To: linux-kernel@vger.kernel.org Cc: Andy Lutomirski , Borislav Petkov , Dave Hansen , Fenghua Yu , "H. Peter Anvin" , Linus Torvalds , Oleg Nesterov , Thomas Gleixner Subject: [PATCH 148/208] x86/fpu: Optimize fpu_copy() some more on lazy switching systems Date: Tue, 5 May 2015 19:50:40 +0200 Message-Id: <1430848300-27877-70-git-send-email-mingo@kernel.org> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1430848300-27877-1-git-send-email-mingo@kernel.org> References: <1430848300-27877-1-git-send-email-mingo@kernel.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The current fpu_copy() code on lazy switching CPUs always saves into the current fpstate and then copies it over into the child context: preempt_disable(); if (!copy_fpregs_to_fpstate(src_fpu)) fpregs_deactivate(src_fpu); preempt_enable(); memcpy(&dst_fpu->state, &src_fpu->state, xstate_size); That memcpy() can be avoided on all lazy switching setups except really old FNSAVE-only systems: change fpu_copy() to directly save into the child context, for both the lazy and the eager context switching case. Note that we still have to do a memcpy() back into the parent context in the FNSAVE case, but this won't be executed on the majority of x86 systems that got built in the last 10 years or so. Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Dave Hansen Cc: Fenghua Yu Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Thomas Gleixner Signed-off-by: Ingo Molnar --- arch/x86/kernel/fpu/core.c | 35 +++++++++++++++++++++++++++-------- 1 file changed, 27 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 7fdeabc1f2af..8ae4c2450c2b 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -220,16 +220,35 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu *src_fpu) { WARN_ON(src_fpu != ¤t->thread.fpu); - if (use_eager_fpu()) { + /* + * Don't let 'init optimized' areas of the XSAVE area + * leak into the child task: + */ + if (use_eager_fpu()) memset(&dst_fpu->state.xsave, 0, xstate_size); - copy_fpregs_to_fpstate(dst_fpu); - } else { - preempt_disable(); - if (!copy_fpregs_to_fpstate(src_fpu)) - fpregs_deactivate(src_fpu); - preempt_enable(); - memcpy(&dst_fpu->state, &src_fpu->state, xstate_size); + + /* + * Save current FPU registers directly into the child + * FPU context, without any memory-to-memory copying. + * + * If the FPU context got destroyed in the process (FNSAVE + * done on old CPUs) then copy it back into the source + * context and mark the current task for lazy restore. + * + * We have to do all this with preemption disabled, + * mostly because of the FNSAVE case, because in that + * case we must not allow preemption in the window + * between the FNSAVE and us marking the context lazy. + * + * It shouldn't be an issue as even FNSAVE is plenty + * fast in terms of critical section length. + */ + preempt_disable(); + if (!copy_fpregs_to_fpstate(dst_fpu)) { + memcpy(&src_fpu->state, &dst_fpu->state, xstate_size); + fpregs_deactivate(src_fpu); } + preempt_enable(); } int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu) -- 2.1.0