[ 00/48] 2.6.32.66-longterm review

LKML Archive mirror
 help / color / mirror / Atom feed

* [ 00/48] 2.6.32.66-longterm review
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 01/48] x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs Willy Tarreau
                   ` (46 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 7815 bytes --]

This is the start of the longterm review cycle for the 2.6.32.66 release.
All patches will be posted as a response to this one. If anyone has any
issue with these being applied, please let me know. If anyone is a
maintainer of the proper subsystem, and wants to add a Signed-off-by: line
to the patch, please respond with it. If anyone thinks some important
patches are missing and should be added prior to the release, please
report them quickly with their respective mainline commit IDs.

Responses should be made by Thu May 21 10:05:29 CEST 2015.
Anything received after that time might be too late. If someone
wants a bit more time for a deeper review, please let me know.

NOTE: 2.6.32 is approaching end of support. There will probably be one
or maybe two other versions issued in the next 3-6 months, and that will
be all, at least for me. Adding to this the time it can take to validate
and deploy in some environments, it probably makes sense to start to
think about switching to another longterm branch. 3.2 and 3.4 are good
candidates for those seeking rock-solid versions. Longterm branches and
their projected EOLs are listed here :

     https://www.kernel.org/category/releases.html

The whole patch series can be found in one patch at :
     https://kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.66-rc1.gz

The shortlog and diffstat are appended below.

Thanks,
Willy

===============

Al Viro (1):
      rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()

Alexey Khoroshilov (1):
      sound/oss: fix deadlock in sequencer_ioctl(SNDCTL_SEQ_OUTOFBAND)

Alexey Kodanev (1):
      net: sysctl_net_core: check SNDBUF and RCVBUF for min length

Andy Lutomirski (10):
      x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs
      x86/tls: Validate TLS entries to protect espfix
      x86, tls, ldt: Stop checking lm in LDT_empty
      x86, tls: Interpret an all-zero struct user_desc as "no segment"
      x86_64, switch_to(): Load TLS descriptors before switching DS and ES
      x86/tls: Disallow unusual TLS segments
      x86/tls: Don't validate lm in set_thread_area() after all
      x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit
      x86_64, vdso: Fix the vdso address randomization algorithm
      x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization

Ani Sinha (1):
      net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland.

Arnd Bergmann (1):
      rds: avoid potential stack overflow

Ben Hutchings (1):
      splice: Apply generic position and size checks to each write

Benjamin Coddington (1):
      lockd: Try to reconnect if statd has moved

Borislav Petkov (1):
      x86, cpu, amd: Add workaround for family 16h, erratum 793

D.S. Ljungmark (1):
      ipv6: Don't reduce hop limit for an interface

Dan Carpenter (1):
      ipvs: uninitialized data with IP_VS_IPV6

Daniel Borkmann (2):
      net: sctp: fix memory leak in auth key management
      net: sctp: fix slab corruption from use after free on INIT collisions

Eli Cohen (1):
      IB/core: Avoid leakage from kernel to user space

Eric Dumazet (2):
      tcp: make connect() mem charging friendly
      tcp: avoid looping in tcp_send_fin()

Florian Westphal (2):
      netfilter: conntrack: disable generic tracking for known protocols
      ppp: deflate: never return len larger than output buffer

Hector Marco-Gisbert (1):
      ASLR: fix stack randomization on 64-bit systems

Ian Abbott (1):
      spi: spidev: fix possible arithmetic overflow for multi-transfer message

Ignacy GawÄ™dzki (1):
      ematch: Fix auto-loading of ematch modules.

Jan Kara (3):
      isofs: Fix infinite looping over CE entries
      isofs: Fix unchecked printing of ER records
      scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND

Jann Horn (1):
      fs: take i_mutex during prepare_binprm for set[ug]id executables

Jiri Pirko (1):
      ipv4: fix nexthop attlen check in fib_nh_match

Kirill A. Shutemov (1):
      pagemap: do not leak physical addresses to non-privileged userspace

Mathias Krause (1):
      posix-timers: Fix stack info leak in timer_create()

Matthew Thode (1):
      net: reject creation of netdev names with colons

Michal KubeÄek (1):
      udp: only allow UFO for packets from SOCK_DGRAM sockets

Robert Baldyga (1):
      serial: samsung: wait for transfer completion before clock disable

Sasha Levin (2):
      net: llc: use correct size for sysctl timeout entries
      net: rds: use correct size for max unacked packets and bytes

Sebastian Pöhn (1):
      ip_forward: Drop frames with attached skb->sk

Sergei Antonov (1):
      hfsplus: fix B-tree corruption after insertion at position 0

Shachar Raindel (1):
      IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic

Shai Fultheim (1):
      x86: Conditionally update time when ack-ing pending irqs

Steffen Klassert (1):
      ipv4: Don't use ufo handling on later transformed packets

bingtian.ly@taobao.com (1):
      net: avoid to hang up on sending due to sysctl configuration overflow.

 arch/x86/include/asm/desc.h                |  20 ++++--
 arch/x86/include/asm/ldt.h                 |   7 ++
 arch/x86/include/asm/msr-index.h           |   1 +
 arch/x86/kernel/apic/apic.c                |  12 ++--
 arch/x86/kernel/cpu/amd.c                  |  10 +++
 arch/x86/kernel/entry_64.S                 |  13 ++--
 arch/x86/kernel/kvm.c                      |   9 ++-
 arch/x86/kernel/kvmclock.c                 |   1 -
 arch/x86/kernel/process_64.c               | 101 +++++++++++++++++++++--------
 arch/x86/kernel/tls.c                      |  62 +++++++++++++++++-
 arch/x86/kernel/traps.c                    |   4 +-
 arch/x86/mm/mmap.c                         |   6 +-
 arch/x86/vdso/vma.c                        |  36 +++++++---
 block/scsi_ioctl.c                         |   3 +-
 drivers/infiniband/core/umem.c             |   8 +++
 drivers/infiniband/core/uverbs_main.c      |   1 +
 drivers/net/ppp_deflate.c                  |   2 +-
 drivers/serial/samsung.c                   |   4 ++
 drivers/spi/spidev.c                       |   5 +-
 fs/binfmt_elf.c                            |   5 +-
 fs/exec.c                                  |  65 ++++++++++++-------
 fs/hfsplus/brec.c                          |  20 +++---
 fs/isofs/rock.c                            |   9 +++
 fs/lockd/mon.c                             |   6 ++
 fs/ocfs2/file.c                            |   8 ++-
 fs/proc/task_mmu.c                         |  10 +++
 fs/splice.c                                |   8 ++-
 kernel/posix-timers.c                      |   1 +
 net/core/dev.c                             |   2 +-
 net/core/sysctl_net_core.c                 |  19 ++++--
 net/ipv4/fib_semantics.c                   |   2 +-
 net/ipv4/ip_forward.c                      |   3 +
 net/ipv4/ip_output.c                       |   3 +-
 net/ipv4/sysctl_net_ipv4.c                 |  13 ++--
 net/ipv4/tcp_output.c                      |  52 ++++++++-------
 net/ipv6/ip6_output.c                      |   3 +-
 net/ipv6/ndisc.c                           |   9 ++-
 net/llc/sysctl_net_llc.c                   |   8 +--
 net/netfilter/ipvs/ip_vs_ftp.c             |  10 +--
 net/netfilter/nf_conntrack_proto_generic.c |  26 +++++++-
 net/rds/iw_rdma.c                          |  40 +++++++-----
 net/rds/sysctl.c                           |   4 +-
 net/rxrpc/ar-recvmsg.c                     |   2 +-
 net/sched/ematch.c                         |   1 +
 net/sctp/associola.c                       |   1 -
 net/sctp/auth.c                            |   2 -
 net/socket.c                               |   3 +
 sound/oss/sequencer.c                      |  12 +---
 48 files changed, 465 insertions(+), 187 deletions(-)
--



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [ 01/48] x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
  2015-05-15  8:05 ` [ 00/48] 2.6.32.66-longterm review Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 02/48] x86/tls: Validate TLS entries to protect espfix Willy Tarreau
                   ` (45 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Andy Lutomirski, Linus Torvalds, Steven Rostedt, Ingo Molnar,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

commit 7ddc6a2199f1da405a2fb68c40db8899b1a8cd87 upstream.

These functions can be executed on the int3 stack, so kprobes
are dangerous. Tracing is probably a bad idea, too.

Fixes: b645af2d5905 ("x86_64, traps: Rework bad_iret")
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/50e33d26adca60816f3ba968875801652507d0c4.1416870125.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2:
 - Use __kprobes instead of NOKPROBE_SYMBOL()
 - Don't use __visible]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 8ea4c465ecb59846abed3d000d64b21b8e31aeb0)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/traps.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 8a39a6c..b999043 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -493,7 +493,7 @@ dotraplinkage void __kprobes do_int3(struct pt_regs *regs, long error_code)
  * for scheduling or signal handling. The actual stack switch is done in
  * entry.S
  */
-asmlinkage __kprobes struct pt_regs *sync_regs(struct pt_regs *eregs)
+asmlinkage notrace __kprobes struct pt_regs *sync_regs(struct pt_regs *eregs)
 {
 	struct pt_regs *regs = eregs;
 	/* Did already sync */
@@ -518,7 +518,7 @@ struct bad_iret_stack {
 	struct pt_regs regs;
 };
 
-asmlinkage
+asmlinkage notrace __kprobes
 struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
 {
 	/*
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 02/48] x86/tls: Validate TLS entries to protect espfix
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
  2015-05-15  8:05 ` [ 00/48] 2.6.32.66-longterm review Willy Tarreau
  2015-05-15  8:05 ` [ 01/48] x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 03/48] x86, tls, ldt: Stop checking lm in LDT_empty Willy Tarreau
                   ` (44 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Andy Lutomirski, H. Peter Anvin, Konrad Rzeszutek Wilk,
	Linus Torvalds, Willy Tarreau, Ingo Molnar, Ben Hutchings

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

commit 41bdc78544b8a93a9c6814b8bbbfef966272abbe upstream

Installing a 16-bit RW data segment into the GDT defeats espfix.
AFAICT this will not affect glibc, Wine, or dosemu at all.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: security@kernel.org <security@kernel.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/tls.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c
index bcfec2d..7af7338 100644
--- a/arch/x86/kernel/tls.c
+++ b/arch/x86/kernel/tls.c
@@ -28,6 +28,21 @@ static int get_free_idx(void)
 	return -ESRCH;
 }
 
+static bool tls_desc_okay(const struct user_desc *info)
+{
+	if (LDT_empty(info))
+		return true;
+
+	/*
+	 * espfix is required for 16-bit data segments, but espfix
+	 * only works for LDT segments.
+	 */
+	if (!info->seg_32bit)
+		return false;
+
+	return true;
+}
+
 static void set_tls_desc(struct task_struct *p, int idx,
 			 const struct user_desc *info, int n)
 {
@@ -67,6 +82,9 @@ int do_set_thread_area(struct task_struct *p, int idx,
 	if (copy_from_user(&info, u_info, sizeof(info)))
 		return -EFAULT;
 
+	if (!tls_desc_okay(&info))
+		return -EINVAL;
+
 	if (idx == -1)
 		idx = info.entry_number;
 
@@ -197,6 +215,7 @@ int regset_tls_set(struct task_struct *target, const struct user_regset *regset,
 {
 	struct user_desc infobuf[GDT_ENTRY_TLS_ENTRIES];
 	const struct user_desc *info;
+	int i;
 
 	if (pos >= GDT_ENTRY_TLS_ENTRIES * sizeof(struct user_desc) ||
 	    (pos % sizeof(struct user_desc)) != 0 ||
@@ -210,6 +229,10 @@ int regset_tls_set(struct task_struct *target, const struct user_regset *regset,
 	else
 		info = infobuf;
 
+	for (i = 0; i < count / sizeof(struct user_desc); i++)
+		if (!tls_desc_okay(info + i))
+			return -EINVAL;
+
 	set_tls_desc(target,
 		     GDT_ENTRY_TLS_MIN + (pos / sizeof(struct user_desc)),
 		     info, count / sizeof(struct user_desc));
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 03/48] x86, tls, ldt: Stop checking lm in LDT_empty
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (2 preceding siblings ...)
  2015-05-15  8:05 ` [ 02/48] x86/tls: Validate TLS entries to protect espfix Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 04/48] x86, tls: Interpret an all-zero struct user_desc as "no segment" Willy Tarreau
                   ` (43 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Andy Lutomirski, torvalds, Thomas Gleixner, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

commit e30ab185c490e9a9381385529e0fd32f0a399495 upstream.

32-bit programs don't have an lm bit in their ABI, so they can't
reliably cause LDT_empty to return true without resorting to memset.
They shouldn't need to do this.

This should fix a longstanding, if minor, issue in all 64-bit kernels
as well as a potential regression in the TLS hardening code.

Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit f62570cbdcb6dd53d0e2361488f9ea2c4cf17ec9)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/desc.h | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 617bd56..66973a0 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -250,7 +250,8 @@ static inline void native_load_tls(struct thread_struct *t, unsigned int cpu)
 		gdt[GDT_ENTRY_TLS_MIN + i] = t->tls_array[i];
 }
 
-#define _LDT_empty(info)				\
+/* This intentionally ignores lm, since 32-bit apps don't have that field. */
+#define LDT_empty(info)					\
 	((info)->base_addr		== 0	&&	\
 	 (info)->limit			== 0	&&	\
 	 (info)->contents		== 0	&&	\
@@ -260,12 +261,6 @@ static inline void native_load_tls(struct thread_struct *t, unsigned int cpu)
 	 (info)->seg_not_present	== 1	&&	\
 	 (info)->useable		== 0)
 
-#ifdef CONFIG_X86_64
-#define LDT_empty(info) (_LDT_empty(info) && ((info)->lm == 0))
-#else
-#define LDT_empty(info) (_LDT_empty(info))
-#endif
-
 static inline void clear_LDT(void)
 {
 	set_ldt(NULL, 0);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 04/48] x86, tls: Interpret an all-zero struct user_desc as "no segment"
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (3 preceding siblings ...)
  2015-05-15  8:05 ` [ 03/48] x86, tls, ldt: Stop checking lm in LDT_empty Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES Willy Tarreau
                   ` (42 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Andy Lutomirski, torvalds, Thomas Gleixner, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

commit 3669ef9fa7d35f573ec9c0e0341b29251c2734a7 upstream.

The Witcher 2 did something like this to allocate a TLS segment index:

        struct user_desc u_info;
        bzero(&u_info, sizeof(u_info));
        u_info.entry_number = (uint32_t)-1;

        syscall(SYS_set_thread_area, &u_info);

Strictly speaking, this code was never correct.  It should have set
read_exec_only and seg_not_present to 1 to indicate that it wanted
to find a free slot without putting anything there, or it should
have put something sensible in the TLS slot if it wanted to allocate
a TLS entry for real.  The actual effect of this code was to
allocate a bogus segment that could be used to exploit espfix.

The set_thread_area hardening patches changed the behavior, causing
set_thread_area to return -EINVAL and crashing the game.

This changes set_thread_area to interpret this as a request to find
a free slot and to leave it empty, which isn't *quite* what the game
expects but should be close enough to keep it working.  In
particular, using the code above to allocate two segments will
allocate the same segment both times.

According to FrostbittenKing on Github, this fixes The Witcher 2.

If this somehow still causes problems, we could instead allocate
a limit==0 32-bit data segment, but that seems rather ugly to me.

Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 3175b4cb1aa4b1430fada4679be4598f6eb8872b)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/desc.h | 13 +++++++++++++
 arch/x86/kernel/tls.c       | 25 +++++++++++++++++++++++--
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 66973a0..fe4652d 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -261,6 +261,19 @@ static inline void native_load_tls(struct thread_struct *t, unsigned int cpu)
 	 (info)->seg_not_present	== 1	&&	\
 	 (info)->useable		== 0)
 
+/* Lots of programs expect an all-zero user_desc to mean "no segment at all". */
+static inline bool LDT_zero(const struct user_desc *info)
+{
+	return (info->base_addr		== 0 &&
+		info->limit		== 0 &&
+		info->contents		== 0 &&
+		info->read_exec_only	== 0 &&
+		info->seg_32bit		== 0 &&
+		info->limit_in_pages	== 0 &&
+		info->seg_not_present	== 0 &&
+		info->useable		== 0);
+}
+
 static inline void clear_LDT(void)
 {
 	set_ldt(NULL, 0);
diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c
index 7af7338..8dda590 100644
--- a/arch/x86/kernel/tls.c
+++ b/arch/x86/kernel/tls.c
@@ -30,7 +30,28 @@ static int get_free_idx(void)
 
 static bool tls_desc_okay(const struct user_desc *info)
 {
-	if (LDT_empty(info))
+	/*
+	 * For historical reasons (i.e. no one ever documented how any
+	 * of the segmentation APIs work), user programs can and do
+	 * assume that a struct user_desc that's all zeros except for
+	 * entry_number means "no segment at all".  This never actually
+	 * worked.  In fact, up to Linux 3.19, a struct user_desc like
+	 * this would create a 16-bit read-write segment with base and
+	 * limit both equal to zero.
+	 *
+	 * That was close enough to "no segment at all" until we
+	 * hardened this function to disallow 16-bit TLS segments.  Fix
+	 * it up by interpreting these zeroed segments the way that they
+	 * were almost certainly intended to be interpreted.
+	 *
+	 * The correct way to ask for "no segment at all" is to specify
+	 * a user_desc that satisfies LDT_empty.  To keep everything
+	 * working, we accept both.
+	 *
+	 * Note that there's a similar kludge in modify_ldt -- look at
+	 * the distinction between modes 1 and 0x11.
+	 */
+	if (LDT_empty(info) || LDT_zero(info))
 		return true;
 
 	/*
@@ -56,7 +77,7 @@ static void set_tls_desc(struct task_struct *p, int idx,
 	cpu = get_cpu();
 
 	while (n-- > 0) {
-		if (LDT_empty(info))
+		if (LDT_empty(info) || LDT_zero(info))
 			desc->a = desc->b = 0;
 		else
 			fill_ldt(desc, info);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (4 preceding siblings ...)
  2015-05-15  8:05 ` [ 04/48] x86, tls: Interpret an all-zero struct user_desc as "no segment" Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15 12:32   ` Ben Hutchings
  2015-05-15  8:05 ` [ 06/48] x86/tls: Disallow unusual TLS segments Willy Tarreau
                   ` (41 subsequent siblings)
  47 siblings, 1 reply; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Andy Lutomirski, Andi Kleen, Linus Torvalds, Ingo Molnar,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

commit f647d7c155f069c1a068030255c300663516420e upstream.

Otherwise, if buggy user code points DS or ES into the TLS
array, they would be corrupted after a context switch.

This also significantly improves the comments and documents some
gotchas in the code.

Before this patch, the both tests below failed.  With this
patch, the es test passes, although the gsbase test still fails.

 ----- begin es test -----

/*
 * Copyright (c) 2014 Andy Lutomirski
 * GPL v2
 */

static unsigned short GDT3(int idx)
{
	return (idx << 3) | 3;
}

static int create_tls(int idx, unsigned int base)
{
	struct user_desc desc = {
		.entry_number    = idx,
		.base_addr       = base,
		.limit           = 0xfffff,
		.seg_32bit       = 1,
		.contents        = 0, /* Data, grow-up */
		.read_exec_only  = 0,
		.limit_in_pages  = 1,
		.seg_not_present = 0,
		.useable         = 0,
	};

	if (syscall(SYS_set_thread_area, &desc) != 0)
		err(1, "set_thread_area");

	return desc.entry_number;
}

int main()
{
	int idx = create_tls(-1, 0);
	printf("Allocated GDT index %d\n", idx);

	unsigned short orig_es;
	asm volatile ("mov %%es,%0" : "=rm" (orig_es));

	int errors = 0;
	int total = 1000;
	for (int i = 0; i < total; i++) {
		asm volatile ("mov %0,%%es" : : "rm" (GDT3(idx)));
		usleep(100);

		unsigned short es;
		asm volatile ("mov %%es,%0" : "=rm" (es));
		asm volatile ("mov %0,%%es" : : "rm" (orig_es));
		if (es != GDT3(idx)) {
			if (errors == 0)
				printf("[FAIL]\tES changed from 0x%hx to 0x%hx\n",
				       GDT3(idx), es);
			errors++;
		}
	}

	if (errors) {
		printf("[FAIL]\tES was corrupted %d/%d times\n", errors, total);
		return 1;
	} else {
		printf("[OK]\tES was preserved\n");
		return 0;
	}
}

 ----- end es test -----

 ----- begin gsbase test -----

/*
 * gsbase.c, a gsbase test
 * Copyright (c) 2014 Andy Lutomirski
 * GPL v2
 */

static unsigned char *testptr, *testptr2;

static unsigned char read_gs_testvals(void)
{
	unsigned char ret;
	asm volatile ("movb %%gs:%1, %0" : "=r" (ret) : "m" (*testptr));
	return ret;
}

int main()
{
	int errors = 0;

	testptr = mmap((void *)0x200000000UL, 1, PROT_READ | PROT_WRITE,
		       MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
	if (testptr == MAP_FAILED)
		err(1, "mmap");

	testptr2 = mmap((void *)0x300000000UL, 1, PROT_READ | PROT_WRITE,
		       MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
	if (testptr2 == MAP_FAILED)
		err(1, "mmap");

	*testptr = 0;
	*testptr2 = 1;

	if (syscall(SYS_arch_prctl, ARCH_SET_GS,
		    (unsigned long)testptr2 - (unsigned long)testptr) != 0)
		err(1, "ARCH_SET_GS");

	usleep(100);

	if (read_gs_testvals() == 1) {
		printf("[OK]\tARCH_SET_GS worked\n");
	} else {
		printf("[FAIL]\tARCH_SET_GS failed\n");
		errors++;
	}

	asm volatile ("mov %0,%%gs" : : "r" (0));

	if (read_gs_testvals() == 0) {
		printf("[OK]\tWriting 0 to gs worked\n");
	} else {
		printf("[FAIL]\tWriting 0 to gs failed\n");
		errors++;
	}

	usleep(100);

	if (read_gs_testvals() == 0) {
		printf("[OK]\tgsbase is still zero\n");
	} else {
		printf("[FAIL]\tgsbase was corrupted\n");
		errors++;
	}

	return errors == 0 ? 0 : 1;
}

 ----- end gsbase test -----

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/509d27c9fec78217691c3dad91cec87e1006b34a.1418075657.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit cca3e6170e186ad88c11ee91cfd37d400dcaa9b0)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/process_64.c | 101 +++++++++++++++++++++++++++++++------------
 1 file changed, 73 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 39493bc..0b3d98b 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -394,24 +394,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	if (preload_fpu)
 		prefetch(next->xstate);
 
-	/*
-	 * Reload esp0, LDT and the page table pointer:
-	 */
+	/* Reload esp0 and ss1. */
 	load_sp0(tss, next);
 
-	/*
-	 * Switch DS and ES.
-	 * This won't pick up thread selector changes, but I guess that is ok.
-	 */
-	savesegment(es, prev->es);
-	if (unlikely(next->es | prev->es))
-		loadsegment(es, next->es);
-
-	savesegment(ds, prev->ds);
-	if (unlikely(next->ds | prev->ds))
-		loadsegment(ds, next->ds);
-
-
 	/* We must save %fs and %gs before load_TLS() because
 	 * %fs and %gs may be cleared by load_TLS().
 	 *
@@ -420,6 +405,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	savesegment(fs, fsindex);
 	savesegment(gs, gsindex);
 
+	/*
+	 * Load TLS before restoring any segments so that segment loads
+	 * reference the correct GDT entries.
+	 */
 	load_TLS(next, cpu);
 
 	/* Must be after DS reload */
@@ -430,38 +419,94 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 		clts();
 
 	/*
-	 * Leave lazy mode, flushing any hypercalls made here.
-	 * This must be done before restoring TLS segments so
-	 * the GDT and LDT are properly updated, and must be
-	 * done before math_state_restore, so the TS bit is up
-	 * to date.
+	 * Leave lazy mode, flushing any hypercalls made here.  This
+	 * must be done after loading TLS entries in the GDT but before
+	 * loading segments that might reference them, and and it must
+	 * be done before math_state_restore, so the TS bit is up to
+	 * date.
 	 */
 	arch_end_context_switch(next_p);
 
+	/* Switch DS and ES.
+	 *
+	 * Reading them only returns the selectors, but writing them (if
+	 * nonzero) loads the full descriptor from the GDT or LDT.  The
+	 * LDT for next is loaded in switch_mm, and the GDT is loaded
+	 * above.
+	 *
+	 * We therefore need to write new values to the segment
+	 * registers on every context switch unless both the new and old
+	 * values are zero.
+	 *
+	 * Note that we don't need to do anything for CS and SS, as
+	 * those are saved and restored as part of pt_regs.
+	 */
+	savesegment(es, prev->es);
+	if (unlikely(next->es | prev->es))
+		loadsegment(es, next->es);
+
+	savesegment(ds, prev->ds);
+	if (unlikely(next->ds | prev->ds))
+		loadsegment(ds, next->ds);
+
 	/*
 	 * Switch FS and GS.
 	 *
-	 * Segment register != 0 always requires a reload.  Also
-	 * reload when it has changed.  When prev process used 64bit
-	 * base always reload to avoid an information leak.
+	 * These are even more complicated than FS and GS: they have
+	 * 64-bit bases are that controlled by arch_prctl.  Those bases
+	 * only differ from the values in the GDT or LDT if the selector
+	 * is 0.
+	 *
+	 * Loading the segment register resets the hidden base part of
+	 * the register to 0 or the value from the GDT / LDT.  If the
+	 * next base address zero, writing 0 to the segment register is
+	 * much faster than using wrmsr to explicitly zero the base.
+	 *
+	 * The thread_struct.fs and thread_struct.gs values are 0
+	 * if the fs and gs bases respectively are not overridden
+	 * from the values implied by fsindex and gsindex.  They
+	 * are nonzero, and store the nonzero base addresses, if
+	 * the bases are overridden.
+	 *
+	 * (fs != 0 && fsindex != 0) || (gs != 0 && gsindex != 0) should
+	 * be impossible.
+	 *
+	 * Therefore we need to reload the segment registers if either
+	 * the old or new selector is nonzero, and we need to override
+	 * the base address if next thread expects it to be overridden.
+	 *
+	 * This code is unnecessarily slow in the case where the old and
+	 * new indexes are zero and the new base is nonzero -- it will
+	 * unnecessarily write 0 to the selector before writing the new
+	 * base address.
+	 *
+	 * Note: This all depends on arch_prctl being the only way that
+	 * user code can override the segment base.  Once wrfsbase and
+	 * wrgsbase are enabled, most of this code will need to change.
 	 */
 	if (unlikely(fsindex | next->fsindex | prev->fs)) {
 		loadsegment(fs, next->fsindex);
+
 		/*
-		 * Check if the user used a selector != 0; if yes
-		 *  clear 64bit base, since overloaded base is always
-		 *  mapped to the Null selector
+		 * If user code wrote a nonzero value to FS, then it also
+		 * cleared the overridden base address.
+		 *
+		 * XXX: if user code wrote 0 to FS and cleared the base
+		 * address itself, we won't notice and we'll incorrectly
+		 * restore the prior base address next time we reschdule
+		 * the process.
 		 */
 		if (fsindex)
 			prev->fs = 0;
 	}
-	/* when next process has a 64bit base use it */
 	if (next->fs)
 		wrmsrl(MSR_FS_BASE, next->fs);
 	prev->fsindex = fsindex;
 
 	if (unlikely(gsindex | next->gsindex | prev->gs)) {
 		load_gs_index(next->gsindex);
+
+		/* This works (and fails) the same way as fsindex above. */
 		if (gsindex)
 			prev->gs = 0;
 	}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 06/48] x86/tls: Disallow unusual TLS segments
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (5 preceding siblings ...)
  2015-05-15  8:05 ` [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 07/48] x86/tls: Dont validate lm in set_thread_area() after all Willy Tarreau
                   ` (40 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Andy Lutomirski, H. Peter Anvin, Konrad Rzeszutek Wilk,
	Linus Torvalds, Willy Tarreau, Ingo Molnar, Ben Hutchings

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

commit 0e58af4e1d2166e9e33375a0f121e4867010d4f8 upstream.

Users have no business installing custom code segments into the
GDT, and segments that are not present but are otherwise valid
are a historical source of interesting attacks.

For completeness, block attempts to set the L bit.  (Prior to
this patch, the L bit would have been silently dropped.)

This is an ABI break.  I've checked glibc, musl, and Wine, and
none of them look like they'll have any trouble.

Note to stable maintainers: this is a hardening patch that fixes
no known bugs.  Given the possibility of ABI issues, this
probably shouldn't be backported quickly.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: security@kernel.org <security@kernel.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit fbc3c534ddffeebba6f943945ac71ec83cfa04b8)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/tls.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c
index 8dda590..6146cc0 100644
--- a/arch/x86/kernel/tls.c
+++ b/arch/x86/kernel/tls.c
@@ -61,6 +61,28 @@ static bool tls_desc_okay(const struct user_desc *info)
 	if (!info->seg_32bit)
 		return false;

+	/* Only allow data segments in the TLS array. */
+	if (info->contents > 1)
+		return false;
+
+	/*
+	 * Non-present segments with DPL 3 present an interesting attack
+	 * surface.  The kernel should handle such segments correctly,
+	 * but TLS is very difficult to protect in a sandbox, so prevent
+	 * such segments from being created.
+	 *
+	 * If userspace needs to remove a TLS entry, it can still delete
+	 * it outright.
+	 */
+	if (info->seg_not_present)
+		return false;
+
+#ifdef CONFIG_X86_64
+	/* The L bit makes no sense for data. */
+	if (info->lm)
+		return false;
+#endif
+
 	return true;
 }

-- 
1.7.12.2.21.g234cd45.dirty

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 07/48] x86/tls: Dont validate lm in set_thread_area() after all
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (6 preceding siblings ...)
  2015-05-15  8:05 ` [ 06/48] x86/tls: Disallow unusual TLS segments Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 08/48] x86, kvm: Clear paravirt_enabled on KVM guests for espfix32s benefit Willy Tarreau
                   ` (39 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Andy Lutomirski, Thomas Gleixner, Linus Torvalds, Ingo Molnar,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

commit 3fb2f4237bb452eb4e98f6a5dbd5a445b4fed9d0 upstream.

It turns out that there's a lurking ABI issue.  GCC, when
compiling this in a 32-bit program:

struct user_desc desc = {
	.entry_number    = idx,
	.base_addr       = base,
	.limit           = 0xfffff,
	.seg_32bit       = 1,
	.contents        = 0, /* Data, grow-up */
	.read_exec_only  = 0,
	.limit_in_pages  = 1,
	.seg_not_present = 0,
	.useable         = 0,
};

will leave .lm uninitialized.  This means that anything in the
kernel that reads user_desc.lm for 32-bit tasks is unreliable.

Revert the .lm check in set_thread_area().  The value never did
anything in the first place.

Fixes: 0e58af4e1d21 ("x86/tls: Disallow unusual TLS segments")
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit c759a579c902167d656ee303d518cb5eed2af278)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/ldt.h | 7 +++++++
 arch/x86/kernel/tls.c      | 6 ------
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/ldt.h b/arch/x86/include/asm/ldt.h
index 46727eb..6e1aaf7 100644
--- a/arch/x86/include/asm/ldt.h
+++ b/arch/x86/include/asm/ldt.h
@@ -28,6 +28,13 @@ struct user_desc {
 	unsigned int  seg_not_present:1;
 	unsigned int  useable:1;
 #ifdef __x86_64__
+	/*
+	 * Because this bit is not present in 32-bit user code, user
+	 * programs can pass uninitialized values here.  Therefore, in
+	 * any context in which a user_desc comes from a 32-bit program,
+	 * the kernel must act as though lm == 0, regardless of the
+	 * actual value.
+	 */
 	unsigned int  lm:1;
 #endif
 };
diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c
index 6146cc0..0c38d06 100644
--- a/arch/x86/kernel/tls.c
+++ b/arch/x86/kernel/tls.c
@@ -77,12 +77,6 @@ static bool tls_desc_okay(const struct user_desc *info)
 	if (info->seg_not_present)
 		return false;
 
-#ifdef CONFIG_X86_64
-	/* The L bit makes no sense for data. */
-	if (info->lm)
-		return false;
-#endif
-
 	return true;
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 08/48] x86, kvm: Clear paravirt_enabled on KVM guests for espfix32s benefit
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (7 preceding siblings ...)
  2015-05-15  8:05 ` [ 07/48] x86/tls: Dont validate lm in set_thread_area() after all Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 09/48] x86_64, vdso: Fix the vdso address randomization algorithm Willy Tarreau
                   ` (38 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Paolo Bonzini,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

commit 29fa6825463c97e5157284db80107d1bfac5d77b upstream

paravirt_enabled has the following effects:

 - Disables the F00F bug workaround warning.  There is no F00F bug
   workaround any more because Linux's standard IDT handling already
   works around the F00F bug, but the warning still exists.  This
   is only cosmetic, and, in any event, there is no such thing as
   KVM on a CPU with the F00F bug.

 - Disables 32-bit APM BIOS detection.  On a KVM paravirt system,
   there should be no APM BIOS anyway.

 - Disables tboot.  I think that the tboot code should check the
   CPUID hypervisor bit directly if it matters.

 - paravirt_enabled disables espfix32.  espfix32 should *not* be
   disabled under KVM paravirt.

The last point is the purpose of this patch.  It fixes a leak of the
high 16 bits of the kernel stack address on 32-bit KVM paravirt
guests.  Fixes CVE-2014-8134.

Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[bwh: Backported to 2.6.32: adjust indentation, context]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/kvm.c      | 9 ++++++++-
 arch/x86/kernel/kvmclock.c | 1 -
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 63b0ec8..1ee78af 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -198,7 +198,14 @@ static void kvm_leave_lazy_mmu(void)
 static void __init paravirt_ops_setup(void)
 {
 	pv_info.name = "KVM";
-	pv_info.paravirt_enabled = 1;
+
+	/*
+	 * KVM isn't paravirt in the sense of paravirt_enabled.  A KVM
+	 * guest kernel works like a bare metal kernel with additional
+	 * features, and paravirt_enabled is about features that are
+	 * missing.
+	 */
+	pv_info.paravirt_enabled = 0;
 
 	if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY))
 		pv_cpu_ops.io_delay = kvm_io_delay;
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index feaeb0d..5deb619 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -201,7 +201,6 @@ void __init kvmclock_init(void)
 #endif
 		kvm_get_preset_lpj();
 		clocksource_register(&kvm_clock);
-		pv_info.paravirt_enabled = 1;
 		pv_info.name = "KVM";
 	}
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 09/48] x86_64, vdso: Fix the vdso address randomization algorithm
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (8 preceding siblings ...)
  2015-05-15  8:05 ` [ 08/48] x86, kvm: Clear paravirt_enabled on KVM guests for espfix32s benefit Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15 21:02   ` Ben Hutchings
  2015-05-15  8:05 ` [ 10/48] ASLR: fix stack randomization on 64-bit systems Willy Tarreau
                   ` (37 subsequent siblings)
  47 siblings, 1 reply; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable; +Cc: Kees Cook, Andy Lutomirski, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

commit 394f56fe480140877304d342dec46d50dc823d46 upstream

The theory behind vdso randomization is that it's mapped at a random
offset above the top of the stack.  To avoid wasting a page of
memory for an extra page table, the vdso isn't supposed to extend
past the lowest PMD into which it can fit.  Other than that, the
address should be a uniformly distributed address that meets all of
the alignment requirements.

The current algorithm is buggy: the vdso has about a 50% probability
of being at the very end of a PMD.  The current algorithm also has a
decent chance of failing outright due to incorrect handling of the
case where the top of the stack is near the top of its PMD.

This fixes the implementation.  The paxtest estimate of vdso
"randomisation" improves from 11 bits to 18 bits.  (Disclaimer: I
don't know what the paxtest code is actually calculating.)

It's worth noting that this algorithm is inherently biased: the vdso
is more likely to end up near the end of its PMD than near the
beginning.  Ideally we would either nix the PMD sharing requirement
or jointly randomize the vdso and the stack to reduce the bias.

In the mean time, this is a considerable improvement with basically
no risk of compatibility issues, since the allowed outputs of the
algorithm are unchanged.

As an easy test, doing this:

for i in `seq 10000`
  do grep -P vdso /proc/self/maps |cut -d- -f1
done |sort |uniq -d

used to produce lots of output (1445 lines on my most recent run).
A tiny subset looks like this:

7fffdfffe000
7fffe01fe000
7fffe05fe000
7fffe07fe000
7fffe09fe000
7fffe0bfe000
7fffe0dfe000

Note the suspicious fe000 endings.  With the fix, I get a much more
palatable 76 repeated addresses.

Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[bwh: Backported to 2.6.32:
 - The whole file is only built for x86_64; adjust context and comment for this
 - We don't have align_vdso_addr()]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/vdso/vma.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 21e1aeb..3efc633 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -77,23 +77,39 @@ __initcall(init_vdso_vars);
 
 struct linux_binprm;
 
-/* Put the vdso above the (randomized) stack with another randomized offset.
-   This way there is no hole in the middle of address space.
-   To save memory make sure it is still in the same PTE as the stack top.
-   This doesn't give that many random bits */
+/*
+ * Put the vdso above the (randomized) stack with another randomized
+ * offset.  This way there is no hole in the middle of address space.
+ * To save memory make sure it is still in the same PTE as the stack
+ * top.  This doesn't give that many random bits.
+ *
+ * Note that this algorithm is imperfect: the distribution of the vdso
+ * start address within a PMD is biased toward the end.
+ */
 static unsigned long vdso_addr(unsigned long start, unsigned len)
 {
 	unsigned long addr, end;
 	unsigned offset;
-	end = (start + PMD_SIZE - 1) & PMD_MASK;
+
+	/*
+	 * Round up the start address.  It can start out unaligned as a result
+	 * of stack start randomization.
+	 */
+	start = PAGE_ALIGN(start);
+
+	/* Round the lowest possible end address up to a PMD boundary. */
+	end = (start + len + PMD_SIZE - 1) & PMD_MASK;
 	if (end >= TASK_SIZE_MAX)
 		end = TASK_SIZE_MAX;
 	end -= len;
-	/* This loses some more bits than a modulo, but is cheaper */
-	offset = get_random_int() & (PTRS_PER_PTE - 1);
-	addr = start + (offset << PAGE_SHIFT);
-	if (addr >= end)
-		addr = end;
+
+	if (end > start) {
+		offset = get_random_int() % (((end - start) >> PAGE_SHIFT) + 1);
+		addr = start + (offset << PAGE_SHIFT);
+	} else {
+		addr = start;
+	}
+
 	return addr;
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 10/48] ASLR: fix stack randomization on 64-bit systems
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (9 preceding siblings ...)
  2015-05-15  8:05 ` [ 09/48] x86_64, vdso: Fix the vdso address randomization algorithm Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 11/48] x86, cpu, amd: Add workaround for family 16h, erratum 793 Willy Tarreau
                   ` (36 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Hector Marco-Gisbert, Ismael Ripoll, Kees Cook, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Hector Marco-Gisbert <hecmargi@upv.es>

commit 4e7c22d447bb6d7e37bfe39ff658486ae78e8d77 upstream

The issue is that the stack for processes is not properly randomized on 64 bit
architectures due to an integer overflow.

The affected function is randomize_stack_top() in file "fs/binfmt_elf.c":

static unsigned long randomize_stack_top(unsigned long stack_top)
{
         unsigned int random_variable = 0;

         if ((current->flags & PF_RANDOMIZE) &&
                 !(current->personality & ADDR_NO_RANDOMIZE)) {
                 random_variable = get_random_int() & STACK_RND_MASK;
                 random_variable <<= PAGE_SHIFT;
         }
         return PAGE_ALIGN(stack_top) + random_variable;
         return PAGE_ALIGN(stack_top) - random_variable;
}

Note that, it declares the "random_variable" variable as "unsigned int". Since
the result of the shifting operation between STACK_RND_MASK (which is
0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):

random_variable <<= PAGE_SHIFT;

then the two leftmost bits are dropped when storing the result in the
"random_variable". This variable shall be at least 34 bits long to hold the
(22+12) result.

These two dropped bits have an impact on the entropy of process stack.
Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One
fourth of expected entropy).

This patch restores back the entropy by correcting the types involved in the
operations in the functions randomize_stack_top() and stack_maxrandom_size().

The successful fix can be tested with:
$ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0                          [stack]
7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0                          [stack]
7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0                          [stack]
7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0                          [stack]
...

Once corrected, the leading bytes should be between 7ffc and 7fff, rather
than always being 7fff.

CVE-2015-1593

Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es>
Signed-off-by: Ismael Ripoll <iripoll@upv.es>
[kees: rebase, fix 80 char, clean up commit message, add test example, cve]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/mm/mmap.c | 6 +++---
 fs/binfmt_elf.c    | 5 +++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index c9e57af..5dd8e15 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,12 +31,12 @@
 #include <linux/sched.h>
 #include <asm/elf.h>
 
-static unsigned int stack_maxrandom_size(void)
+static unsigned long stack_maxrandom_size(void)
 {
-	unsigned int max = 0;
+	unsigned long max = 0;
 	if ((current->flags & PF_RANDOMIZE) &&
 		!(current->personality & ADDR_NO_RANDOMIZE)) {
-		max = ((-1U) & STACK_RND_MASK) << PAGE_SHIFT;
+		max = ((-1UL) & STACK_RND_MASK) << PAGE_SHIFT;
 	}
 
 	return max;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index c564293..400786e 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -546,11 +546,12 @@ out:
 
 static unsigned long randomize_stack_top(unsigned long stack_top)
 {
-	unsigned int random_variable = 0;
+	unsigned long random_variable = 0;
 
 	if ((current->flags & PF_RANDOMIZE) &&
 		!(current->personality & ADDR_NO_RANDOMIZE)) {
-		random_variable = get_random_int() & STACK_RND_MASK;
+		random_variable = (unsigned long) get_random_int();
+		random_variable &= STACK_RND_MASK;
 		random_variable <<= PAGE_SHIFT;
 	}
 #ifdef CONFIG_STACK_GROWSUP
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 11/48] x86, cpu, amd: Add workaround for family 16h, erratum 793
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (10 preceding siblings ...)
  2015-05-15  8:05 ` [ 10/48] ASLR: fix stack randomization on 64-bit systems Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 12/48] x86/asm/entry/64: Remove a bogus ret_from_fork optimization Willy Tarreau
                   ` (35 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Borislav Petkov, Aravind Gopalakrishnan, H. Peter Anvin,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <bp@suse.de>

commit 3b56496865f9f7d9bcb2f93b44c63f274f08e3b6 upstream

This adds the workaround for erratum 793 as a precaution in case not
every BIOS implements it.  This addresses CVE-2013-6885.

Erratum text:

[Revision Guide for AMD Family 16h Models 00h-0Fh Processors,
document 51810 Rev. 3.04 November 2013]

793 Specific Combination of Writes to Write Combined Memory Types and
Locked Instructions May Cause Core Hang

Description

Under a highly specific and detailed set of internal timing
conditions, a locked instruction may trigger a timing sequence whereby
the write to a write combined memory type is not flushed, causing the
locked instruction to stall indefinitely.

Potential Effect on System

Processor core hang.

Suggested Workaround

BIOS should set MSR
C001_1020[15] = 1b.

Fix Planned

No fix planned

[ hpa: updated description, fixed typo in MSR name ]

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic
Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
[bwh: Backported to 3.2:
 - Adjust filename
 - Venkatesh Srinivas pointed out we should use {rd,wr}msrl_safe() to
   avoid crashing on KVM.  This was fixed upstream by commit 8f86a7373a1c
   ("x86, AMD: Convert to the new bit access MSR accessors") but that's too
   much trouble to backport.  Here we must use {rd,wr}msrl_amd_safe().]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/msr-index.h |  1 +
 arch/x86/kernel/cpu/amd.c        | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 883037b..6057b70 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -110,6 +110,7 @@
 #define MSR_AMD64_PATCH_LOADER		0xc0010020
 #define MSR_AMD64_OSVW_ID_LENGTH	0xc0010140
 #define MSR_AMD64_OSVW_STATUS		0xc0010141
+#define MSR_AMD64_LS_CFG		0xc0011020
 #define MSR_AMD64_DC_CFG		0xc0011022
 #define MSR_AMD64_IBSFETCHCTL		0xc0011030
 #define MSR_AMD64_IBSFETCHLINAD		0xc0011031
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 6e082dc..ae8b02c 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -413,6 +413,16 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 			set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
 	}
 #endif
+
+	/* F16h erratum 793, CVE-2013-6885 */
+	if (c->x86 == 0x16 && c->x86_model <= 0xf) {
+		u64 val;
+
+		if (!rdmsrl_amd_safe(MSR_AMD64_LS_CFG, &val) &&
+		    !(val & BIT(15)))
+			wrmsrl_amd_safe(MSR_AMD64_LS_CFG, val | BIT(15));
+	}
+
 }
 
 static void __cpuinit init_amd(struct cpuinfo_x86 *c)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 12/48] x86/asm/entry/64: Remove a bogus ret_from_fork optimization
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (11 preceding siblings ...)
  2015-05-15  8:05 ` [ 11/48] x86, cpu, amd: Add workaround for family 16h, erratum 793 Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 13/48] x86: Conditionally update time when ack-ing pending irqs Willy Tarreau
                   ` (34 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Andy Lutomirski, Borislav Petkov, Denys Vlasenko, H. Peter Anvin,
	Linus Torvalds, Oleg Nesterov, Thomas Gleixner, Ingo Molnar,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

commit 956421fbb74c3a6261903f3836c0740187cf038b upstream.

'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and
the related state make sense for 'ret_from_sys_call'.  This is
entirely the wrong check.  TS_COMPAT would make a little more
sense, but there's really no point in keeping this optimization
at all.

This fixes a return to the wrong user CS if we came from int
0x80 in a 64-bit task.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net
[ Backported from tip:x86/asm. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 159891c0953a89a28f793fc52373b031262c44d2)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/entry_64.S | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index d9bcee0..303eaeb8 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -413,11 +413,14 @@ ENTRY(ret_from_fork)
 	testl $3, CS-ARGOFFSET(%rsp)		# from kernel_thread?
 	je   int_ret_from_sys_call

-	testl $_TIF_IA32, TI_flags(%rcx)	# 32-bit compat task needs IRET
-	jnz  int_ret_from_sys_call
-
-	RESTORE_TOP_OF_STACK %rdi, -ARGOFFSET
-	jmp ret_from_sys_call			# go to the SYSRET fastpath
+	/*
+	 * By the time we get here, we have no idea whether our pt_regs,
+	 * ti flags, and ti status came from the 64-bit SYSCALL fast path,
+	 * the slow path, or one of the ia32entry paths.
+	 * Use int_ret_from_sys_call to return, since it can safely handle
+	 * all of the above.
+	 */
+	jmp  int_ret_from_sys_call

 	CFI_ENDPROC
 END(ret_from_fork)
-- 
1.7.12.2.21.g234cd45.dirty

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 13/48] x86: Conditionally update time when ack-ing pending irqs
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (12 preceding siblings ...)
  2015-05-15  8:05 ` [ 12/48] x86/asm/entry/64: Remove a bogus ret_from_fork optimization Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 14/48] serial: samsung: wait for transfer completion before clock disable Willy Tarreau
                   ` (33 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Shai Fultheim, Ido Yariv, Thomas Gleixner, Ingo Molnar,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Shai Fultheim <shai@scalemp.com>

commit 42fa4250436304d4650fa271f37671f6cee24e08 upstream.

On virtual environments, apic_read could take a long time. As a
result, under certain conditions the ack pending loop may exit
without any queued irqs left, but after more than one second. A
warning will be printed needlessly in this case.

If the loop is about to exit regardless of max_loops, don't
update it.

Signed-off-by: Shai Fultheim <shai@scalemp.com>
[ rebased and reworded the commit message]
Signed-off-by: Ido Yariv <ido@wizery.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1334873552-31346-1-git-send-email-ido@wizery.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit c9f1417be9acae3a9867f8bdab2b7924d76cf6ac)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/apic/apic.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 1d2d670..be4bf4c 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1250,11 +1250,13 @@ void __cpuinit setup_local_APIC(void)
 			       acked);
 			break;
 		}
-		if (cpu_has_tsc) {
-			rdtscll(ntsc);
-			max_loops = (cpu_khz << 10) - (ntsc - tsc);
-		} else
-			max_loops--;
+		if (queued) {
+			if (cpu_has_tsc) {
+				rdtscll(ntsc);
+				max_loops = (cpu_khz << 10) - (ntsc - tsc);
+			} else
+				max_loops--;
+		}
 	} while (queued && max_loops > 0);
 	WARN_ON(max_loops <= 0);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 14/48] serial: samsung: wait for transfer completion before clock disable
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (13 preceding siblings ...)
  2015-05-15  8:05 ` [ 13/48] x86: Conditionally update time when ack-ing pending irqs Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 15/48] splice: Apply generic position and size checks to each write Willy Tarreau
                   ` (32 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable; +Cc: Robert Baldyga, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Robert Baldyga <r.baldyga@samsung.com>

This patch adds waiting until transmit buffer and shifter will be empty
before clock disabling.

Without this fix it's possible to have clock disabled while data was
not transmited yet, which causes unproper state of TX line and problems
in following data transfers.

Cc: stable@vger.kernel.org
Signed-off-by: Robert Baldyga <r.baldyga@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 1ff383a4c3eda8893ec61b02831826e1b1f46b41)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/serial/samsung.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/serial/samsung.c b/drivers/serial/samsung.c
index 1523e8d..05801c0 100644
--- a/drivers/serial/samsung.c
+++ b/drivers/serial/samsung.c
@@ -443,11 +443,15 @@ static void s3c24xx_serial_pm(struct uart_port *port, unsigned int level,
 			      unsigned int old)
 {
 	struct s3c24xx_uart_port *ourport = to_ourport(port);
+	int timeout = 10000;
 
 	ourport->pm_level = level;
 
 	switch (level) {
 	case 3:
+		while (--timeout && !s3c24xx_serial_txempty_nofifo(port))
+			udelay(100);
+
 		if (!IS_ERR(ourport->baudclk) && ourport->baudclk != NULL)
 			clk_disable(ourport->baudclk);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 15/48] splice: Apply generic position and size checks to each write
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (14 preceding siblings ...)
  2015-05-15  8:05 ` [ 14/48] serial: samsung: wait for transfer completion before clock disable Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 16/48] netfilter: conntrack: disable generic tracking for known protocols Willy Tarreau
                   ` (31 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable; +Cc: Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Ben Hutchings <ben@decadent.org.uk>

We need to check the position and size of file writes against various
limits, using generic_write_check().  This was not being done for
the splice write path.  It was fixed upstream by commit 8d0207652cbe
("->splice_write() via ->write_iter()") but we can't apply that.

CVE-2014-7822

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ocfs2/file.c | 8 ++++++--
 fs/splice.c     | 8 ++++++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index de059f4..6aede32 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2081,9 +2081,7 @@ static ssize_t ocfs2_file_splice_write(struct pipe_inode_info *pipe,
 	struct address_space *mapping = out->f_mapping;
 	struct inode *inode = mapping->host;
 	struct splice_desc sd = {
-		.total_len = len,
 		.flags = flags,
-		.pos = *ppos,
 		.u.file = out,
 	};
 
@@ -2092,6 +2090,12 @@ static ssize_t ocfs2_file_splice_write(struct pipe_inode_info *pipe,
 		   out->f_path.dentry->d_name.len,
 		   out->f_path.dentry->d_name.name);
 
+	ret = generic_write_checks(out, ppos, &len, 0);
+	if (ret)
+		return ret;
+	sd.total_len = len;
+	sd.pos = *ppos;
+
 	if (pipe->inode)
 		mutex_lock_nested(&pipe->inode->i_mutex, I_MUTEX_PARENT);
 
diff --git a/fs/splice.c b/fs/splice.c
index cdad986..1ef1c00 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -945,13 +945,17 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	struct address_space *mapping = out->f_mapping;
 	struct inode *inode = mapping->host;
 	struct splice_desc sd = {
-		.total_len = len,
 		.flags = flags,
-		.pos = *ppos,
 		.u.file = out,
 	};
 	ssize_t ret;
 
+	ret = generic_write_checks(out, ppos, &len, S_ISBLK(inode->i_mode));
+	if (ret)
+		return ret;
+	sd.total_len = len;
+	sd.pos = *ppos;
+
 	pipe_lock(pipe);
 
 	splice_from_pipe_begin(&sd);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 16/48] netfilter: conntrack: disable generic tracking for known protocols
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (15 preceding siblings ...)
  2015-05-15  8:05 ` [ 15/48] splice: Apply generic position and size checks to each write Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15 21:05   ` Ben Hutchings
  2015-05-15  8:05 ` [ 17/48] isofs: Fix infinite looping over CE entries Willy Tarreau
                   ` (30 subsequent siblings)
  47 siblings, 1 reply; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Florian Westphal, Daniel Borkmann, Jozsef Kadlecsik,
	Pablo Neira Ayuso, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Florian Westphal <fw@strlen.de>

commit db29a9508a9246e77087c5531e45b2c88ec6988b upstream

Given following iptables ruleset:

-P FORWARD DROP
-A FORWARD -m sctp --dport 9 -j ACCEPT
-A FORWARD -p tcp --dport 80 -j ACCEPT
-A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT

One would assume that this allows SCTP on port 9 and TCP on port 80.
Unfortunately, if the SCTP conntrack module is not loaded, this allows
*all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
which we think is a security issue.

This is because on the first SCTP packet on port 9, we create a dummy
"generic l4" conntrack entry without any port information (since
conntrack doesn't know how to extract this information).

All subsequent packets that are unknown will then be in established
state since they will fallback to proto_generic and will match the
'generic' entry.

Our originally proposed version [1] completely disabled generic protocol
tracking, but Jozsef suggests to not track protocols for which a more
suitable helper is available, hence we now mitigate the issue for in
tree known ct protocol helpers only, so that at least NAT and direction
information will still be preserved for others.

 [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html

Joint work with Daniel Borkmann.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 2.6.32: adjust context]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/netfilter/nf_conntrack_proto_generic.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_proto_generic.c b/net/netfilter/nf_conntrack_proto_generic.c
index 829374f..b91074f 100644
--- a/net/netfilter/nf_conntrack_proto_generic.c
+++ b/net/netfilter/nf_conntrack_proto_generic.c
@@ -14,6 +14,30 @@
 
 static unsigned int nf_ct_generic_timeout __read_mostly = 600*HZ;
 
+static bool nf_generic_should_process(u8 proto)
+{
+	switch (proto) {
+#ifdef CONFIG_NF_CT_PROTO_SCTP_MODULE
+	case IPPROTO_SCTP:
+		return false;
+#endif
+#ifdef CONFIG_NF_CT_PROTO_DCCP_MODULE
+	case IPPROTO_DCCP:
+		return false;
+#endif
+#ifdef CONFIG_NF_CT_PROTO_GRE_MODULE
+	case IPPROTO_GRE:
+		return false;
+#endif
+#ifdef CONFIG_NF_CT_PROTO_UDPLITE_MODULE
+	case IPPROTO_UDPLITE:
+		return false;
+#endif
+	default:
+		return true;
+	}
+}
+
 static bool generic_pkt_to_tuple(const struct sk_buff *skb,
 				 unsigned int dataoff,
 				 struct nf_conntrack_tuple *tuple)
@@ -56,7 +80,7 @@ static int packet(struct nf_conn *ct,
 static bool new(struct nf_conn *ct, const struct sk_buff *skb,
 		unsigned int dataoff)
 {
-	return true;
+	return nf_generic_should_process(nf_ct_protonum(ct));
 }
 
 #ifdef CONFIG_SYSCTL
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 17/48] isofs: Fix infinite looping over CE entries
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (16 preceding siblings ...)
  2015-05-15  8:05 ` [ 16/48] netfilter: conntrack: disable generic tracking for known protocols Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 18/48] isofs: Fix unchecked printing of ER records Willy Tarreau
                   ` (29 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable; +Cc: P J P, Jan Kara, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit f54e18f1b831c92f6512d2eedb224cd63d607d3d upstream

Rock Ridge extensions define so called Continuation Entries (CE) which
define where is further space with Rock Ridge data. Corrupted isofs
image can contain arbitrarily long chain of these, including a one
containing loop and thus causing kernel to end in an infinite loop when
traversing these entries.

Limit the traversal to 32 entries which should be more than enough space
to store all the Rock Ridge data.

Reported-by: P J P <ppandit@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/isofs/rock.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/isofs/rock.c b/fs/isofs/rock.c
index 6fa4a86..69c737d 100644
--- a/fs/isofs/rock.c
+++ b/fs/isofs/rock.c
@@ -31,6 +31,7 @@ struct rock_state {
 	int cont_size;
 	int cont_extent;
 	int cont_offset;
+	int cont_loops;
 	struct inode *inode;
 };
 
@@ -74,6 +75,9 @@ static void init_rock_state(struct rock_state *rs, struct inode *inode)
 	rs->inode = inode;
 }
 
+/* Maximum number of Rock Ridge continuation entries */
+#define RR_MAX_CE_ENTRIES 32
+
 /*
  * Returns 0 if the caller should continue scanning, 1 if the scan must end
  * and -ve on error.
@@ -106,6 +110,8 @@ static int rock_continue(struct rock_state *rs)
 			goto out;
 		}
 		ret = -EIO;
+		if (++rs->cont_loops >= RR_MAX_CE_ENTRIES)
+			goto out;
 		bh = sb_bread(rs->inode->i_sb, rs->cont_extent);
 		if (bh) {
 			memcpy(rs->buffer, bh->b_data + rs->cont_offset,
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 18/48] isofs: Fix unchecked printing of ER records
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (17 preceding siblings ...)
  2015-05-15  8:05 ` [ 17/48] isofs: Fix infinite looping over CE entries Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 19/48] net: sctp: fix memory leak in auth key management Willy Tarreau
                   ` (28 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable; +Cc: Jan Kara, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 4e2024624e678f0ebb916e6192bd23c1f9fdf696 upstream

We didn't check length of rock ridge ER records before printing them.
Thus corrupted isofs image can cause us to access and print some memory
behind the buffer with obvious consequences.

Reported-and-tested-by: Carl Henrik Lunde <chlunde@ping.uio.no>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/isofs/rock.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/isofs/rock.c b/fs/isofs/rock.c
index 69c737d..2ec72ae 100644
--- a/fs/isofs/rock.c
+++ b/fs/isofs/rock.c
@@ -363,6 +363,9 @@ repeat:
 			rs.cont_size = isonum_733(rr->u.CE.size);
 			break;
 		case SIG('E', 'R'):
+			/* Invalid length of ER tag id? */
+			if (rr->u.ER.len_id + offsetof(struct rock_ridge, u.ER.data) > rr->len)
+				goto out;
 			ISOFS_SB(inode->i_sb)->s_rock = 1;
 			printk(KERN_DEBUG "ISO 9660 Extensions: ");
 			{
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 19/48] net: sctp: fix memory leak in auth key management
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (18 preceding siblings ...)
  2015-05-15  8:05 ` [ 18/48] isofs: Fix unchecked printing of ER records Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 20/48] net: sctp: fix slab corruption from use after free on INIT collisions Willy Tarreau
                   ` (27 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Daniel Borkmann, Vlad Yasevich, Neil Horman, David S. Miller,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

commit 4184b2a79a7612a9272ce20d639934584a1f3786 upstream.

A very minimal and simple user space application allocating an SCTP
socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
the socket again will leak the memory containing the authentication
key from user space:

unreferenced object 0xffff8800837047c0 (size 16):
  comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
  hex dump (first 16 bytes):
    01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811c88d8>] __kmalloc+0xe8/0x270
    [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp]
    [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp]
    [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp]
    [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20
    [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0
    [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17
    [<ffffffffffffffff>] 0xffffffffffffffff

This is bad because of two things, we can bring down a machine from
user space when auth_enable=1, but also we would leave security sensitive
keying material in memory without clearing it after use. The issue is
that sctp_auth_create_key() already sets the refcount to 1, but after
allocation sctp_auth_set_key() does an additional refcount on it, and
thus leaving it around when we free the socket.

Fixes: 65b07e5d0d0 ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 3af10169145c8eed7b3591c0644da4298405efbc)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/auth.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/sctp/auth.c b/net/sctp/auth.c
index 7363b9f..133ce49 100644
--- a/net/sctp/auth.c
+++ b/net/sctp/auth.c
@@ -865,8 +865,6 @@ int sctp_auth_set_key(struct sctp_endpoint *ep,
 		list_add(&cur_key->key_list, sh_keys);
 
 	cur_key->key = key;
-	sctp_auth_key_hold(key);
-
 	return 0;
 nomem:
 	if (!replace)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 20/48] net: sctp: fix slab corruption from use after free on INIT collisions
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (19 preceding siblings ...)
  2015-05-15  8:05 ` [ 19/48] net: sctp: fix memory leak in auth key management Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 21/48] IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic Willy Tarreau
                   ` (26 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Daniel Borkmann, Vlad Yasevich, Neil Horman, David S. Miller,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

commit 600ddd6825543962fb807884169e57b580dba208 upstream

When hitting an INIT collision case during the 4WHS with AUTH enabled, as
already described in detail in commit 1be9a950c646 ("net: sctp: inherit
auth_capable on INIT collisions"), it can happen that we occasionally
still remotely trigger the following panic on server side which seems to
have been uncovered after the fix from commit 1be9a950c646 ...

[  533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
[  533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230
[  533.940559] PGD 5030f2067 PUD 0
[  533.957104] Oops: 0000 [#1] SMP
[  533.974283] Modules linked in: sctp mlx4_en [...]
[  534.939704] Call Trace:
[  534.951833]  [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0
[  534.984213]  [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0
[  535.015025]  [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170
[  535.045661]  [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0
[  535.074593]  [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50
[  535.105239]  [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp]
[  535.138606]  [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0
[  535.166848]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b

... or depending on the the application, for example this one:

[ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
[ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0
[ 1370.054568] PGD 633c94067 PUD 0
[ 1370.070446] Oops: 0000 [#1] SMP
[ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
[ 1370.963431] Call Trace:
[ 1370.974632]  [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960
[ 1371.000863]  [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960
[ 1371.027154]  [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170
[ 1371.054679]  [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130
[ 1371.080183]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b

With slab debugging enabled, we can see that the poison has been overwritten:

[  669.826368] BUG kmalloc-128 (Tainted: G        W     ): Poison overwritten
[  669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
[  669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
[  669.826424]  __slab_alloc+0x4bf/0x566
[  669.826433]  __kmalloc+0x280/0x310
[  669.826453]  sctp_auth_create_key+0x23/0x50 [sctp]
[  669.826471]  sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
[  669.826488]  sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
[  669.826505]  sctp_do_sm+0x29d/0x17c0 [sctp] [...]
[  669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
[  669.826635]  __slab_free+0x39/0x2a8
[  669.826643]  kfree+0x1d6/0x230
[  669.826650]  kzfree+0x31/0x40
[  669.826666]  sctp_auth_key_put+0x19/0x20 [sctp]
[  669.826681]  sctp_assoc_update+0x1ee/0x2d0 [sctp]
[  669.826695]  sctp_do_sm+0x674/0x17c0 [sctp]

Since this only triggers in some collision-cases with AUTH, the problem at
heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
when having refcnt 1, once directly in sctp_assoc_update() and yet again
from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
the already kzfree'd memory, which is also consistent with the observation
of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
at a later point in time when poison is checked on new allocation).

Reference counting of auth keys revisited:

Shared keys for AUTH chunks are being stored in endpoints and associations
in endpoint_shared_keys list. On endpoint creation, a null key is being
added; on association creation, all endpoint shared keys are being cached
and thus cloned over to the association. struct sctp_shared_key only holds
a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
keeps track of users internally through refcounting. Naturally, on assoc
or enpoint destruction, sctp_shared_key are being destroyed directly and
the reference on sctp_auth_bytes dropped.

User space can add keys to either list via setsockopt(2) through struct
sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
with refcount 1 and in case of replacement drops the reference on the old
sctp_auth_bytes. A key can be set active from user space through setsockopt()
on the id via sctp_auth_set_active_key(), which iterates through either
endpoint_shared_keys and in case of an assoc, invokes (one of various places)
sctp_auth_asoc_init_active_key().

sctp_auth_asoc_init_active_key() computes the actual secret from local's
and peer's random, hmac and shared key parameters and returns a new key
directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
the reference if there was a previous one. The secret, which where we
eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
intitial refcount of 1, which also stays unchanged eventually in
sctp_assoc_update(). This key is later being used for crypto layer to
set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().

To close the loop: asoc->asoc_shared_key is freshly allocated secret
material and independant of the sctp_shared_key management keeping track
of only shared keys in endpoints and assocs. Hence, also commit 4184b2a79a76
("net: sctp: fix memory leak in auth key management") is independant of
this bug here since it concerns a different layer (though same structures
being used eventually). asoc->asoc_shared_key is reference dropped correctly
on assoc destruction in sctp_association_free() and when active keys are
being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
to remove that sctp_auth_key_put() from there which fixes these panics.

Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/associola.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 8802516..bbf56a7 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1260,7 +1260,6 @@ void sctp_assoc_update(struct sctp_association *asoc,
 	asoc->peer.peer_hmacs = new->peer.peer_hmacs;
 	new->peer.peer_hmacs = NULL;

-	sctp_auth_key_put(asoc->asoc_shared_key);
 	sctp_auth_asoc_init_active_key(asoc, GFP_ATOMIC);
 }

-- 
1.7.12.2.21.g234cd45.dirty

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 21/48] IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (20 preceding siblings ...)
  2015-05-15  8:05 ` [ 20/48] net: sctp: fix slab corruption from use after free on INIT collisions Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 22/48] net: llc: use correct size for sysctl timeout entries Willy Tarreau
                   ` (25 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Shachar Raindel, Jack Morgenstein, Or Gerlitz, Roland Dreier,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Shachar Raindel <raindel@mellanox.com>

commit 8494057ab5e40df590ef6ef7d66324d3ae33356b upstream.

Properly verify that the resulting page aligned end address is larger
than both the start address and the length of the memory area requested.

Both the start and length arguments for ib_umem_get are controlled by
the user. A misbehaving user can provide values which will cause an
integer overflow when calculating the page aligned end address.

This overflow can cause also miscalculation of the number of pages
mapped, and additional logic issues.

Addresses: CVE-2014-8159
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 485f16b743d98527620396639b73d7214006f3c7)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/infiniband/core/umem.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 6f7c096..2ecd8d6 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -92,6 +92,14 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	if (dmasync)
 		dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs);
 
+	/*
+	 * If the combination of the addr and size requested for this memory
+	 * region causes an integer overflow, return error.
+	 */
+	if ((PAGE_ALIGN(addr + size) <= size) ||
+	    (PAGE_ALIGN(addr + size) <= addr))
+		return ERR_PTR(-EINVAL);
+
 	if (!can_do_mlock())
 		return ERR_PTR(-EPERM);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 22/48] net: llc: use correct size for sysctl timeout entries
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (21 preceding siblings ...)
  2015-05-15  8:05 ` [ 21/48] IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 23/48] net: rds: use correct size for max unacked packets and bytes Willy Tarreau
                   ` (24 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Sasha Levin, David S. Miller, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Sasha Levin <sasha.levin@oracle.com>

commit 6b8d9117ccb4f81b1244aafa7bc70ef8fa45fc49 upstream.

The timeout entries are sizeof(int) rather than sizeof(long), which
means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 88fe14be08a475ad0eea4ca7c51f32437baf41af)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/llc/sysctl_net_llc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/llc/sysctl_net_llc.c b/net/llc/sysctl_net_llc.c
index 57b9304..cd78b3a 100644
--- a/net/llc/sysctl_net_llc.c
+++ b/net/llc/sysctl_net_llc.c
@@ -18,7 +18,7 @@ static struct ctl_table llc2_timeout_table[] = {
 		.ctl_name	= NET_LLC2_ACK_TIMEOUT,
 		.procname	= "ack",
 		.data		= &sysctl_llc2_ack_timeout,
-		.maxlen		= sizeof(long),
+		.maxlen		= sizeof(sysctl_llc2_ack_timeout),
 		.mode		= 0644,
 		.proc_handler   = proc_dointvec_jiffies,
 		.strategy       = sysctl_jiffies,
@@ -27,7 +27,7 @@ static struct ctl_table llc2_timeout_table[] = {
 		.ctl_name	= NET_LLC2_BUSY_TIMEOUT,
 		.procname	= "busy",
 		.data		= &sysctl_llc2_busy_timeout,
-		.maxlen		= sizeof(long),
+		.maxlen		= sizeof(sysctl_llc2_busy_timeout),
 		.mode		= 0644,
 		.proc_handler   = proc_dointvec_jiffies,
 		.strategy       = sysctl_jiffies,
@@ -36,7 +36,7 @@ static struct ctl_table llc2_timeout_table[] = {
 		.ctl_name	= NET_LLC2_P_TIMEOUT,
 		.procname	= "p",
 		.data		= &sysctl_llc2_p_timeout,
-		.maxlen		= sizeof(long),
+		.maxlen		= sizeof(sysctl_llc2_p_timeout),
 		.mode		= 0644,
 		.proc_handler   = proc_dointvec_jiffies,
 		.strategy       = sysctl_jiffies,
@@ -45,7 +45,7 @@ static struct ctl_table llc2_timeout_table[] = {
 		.ctl_name	= NET_LLC2_REJ_TIMEOUT,
 		.procname	= "rej",
 		.data		= &sysctl_llc2_rej_timeout,
-		.maxlen		= sizeof(long),
+		.maxlen		= sizeof(sysctl_llc2_rej_timeout),
 		.mode		= 0644,
 		.proc_handler   = proc_dointvec_jiffies,
 		.strategy       = sysctl_jiffies,
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 23/48] net: rds: use correct size for max unacked packets and bytes
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (22 preceding siblings ...)
  2015-05-15  8:05 ` [ 22/48] net: llc: use correct size for sysctl timeout entries Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 24/48] ipv6: Dont reduce hop limit for an interface Willy Tarreau
                   ` (23 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Sasha Levin, David S. Miller, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Sasha Levin <sasha.levin@oracle.com>

commit db27ebb111e9f69efece08e4cb6a34ff980f8896 upstream.

Max unacked packets/bytes is an int while sizeof(long) was used in the
sysctl table.

This means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 3760b67b3e419b9ac42a45417491360a14a35357)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/rds/sysctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rds/sysctl.c b/net/rds/sysctl.c
index 307dc5c..870e808 100644
--- a/net/rds/sysctl.c
+++ b/net/rds/sysctl.c
@@ -74,7 +74,7 @@ static ctl_table rds_sysctl_rds_table[] = {
 		.ctl_name	= CTL_UNNUMBERED,
 		.procname	= "max_unacked_packets",
 		.data		= &rds_sysctl_max_unacked_packets,
-		.maxlen         = sizeof(unsigned long),
+		.maxlen         = sizeof(int),
 		.mode           = 0644,
 		.proc_handler   = &proc_dointvec,
 	},
@@ -82,7 +82,7 @@ static ctl_table rds_sysctl_rds_table[] = {
 		.ctl_name	= CTL_UNNUMBERED,
 		.procname	= "max_unacked_bytes",
 		.data		= &rds_sysctl_max_unacked_bytes,
-		.maxlen         = sizeof(unsigned long),
+		.maxlen         = sizeof(int),
 		.mode           = 0644,
 		.proc_handler   = &proc_dointvec,
 	},
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 24/48] ipv6: Dont reduce hop limit for an interface
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (23 preceding siblings ...)
  2015-05-15  8:05 ` [ 23/48] net: rds: use correct size for max unacked packets and bytes Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 25/48] fs: take i_mutex during prepare_binprm for set[ug]id executables Willy Tarreau
                   ` (22 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: D.S. Ljungmark, Hannes Frederic Sowa, David S. Miller,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: "D.S. Ljungmark" <ljungmark@modio.se>

commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a upstream.

A local route may have a lower hop_limit set than global routes do.

RFC 3756, Section 4.2.7, "Parameter Spoofing"

>   1.  The attacker includes a Current Hop Limit of one or another small
>       number which the attacker knows will cause legitimate packets to
>       be dropped before they reach their destination.

>   As an example, one possible approach to mitigate this threat is to
>   ignore very small hop limits.  The nodes could implement a
>   configurable minimum hop limit, and ignore attempts to set it below
>   said limit.

Signed-off-by: D.S. Ljungmark <ljungmark@modio.se>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust ND_PRINTK() usage]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit f10f7d2a8200fe33c5030c7e32df3a2b3561f3cd)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv6/ndisc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 752da21..3b77bed 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1244,7 +1244,14 @@ static void ndisc_router_discovery(struct sk_buff *skb)
 		rt->rt6i_expires = jiffies + (HZ * lifetime);
 
 	if (ra_msg->icmph.icmp6_hop_limit) {
-		in6_dev->cnf.hop_limit = ra_msg->icmph.icmp6_hop_limit;
+		/* Only set hop_limit on the interface if it is higher than
+		 * the current hop_limit.
+		 */
+		if (in6_dev->cnf.hop_limit < ra_msg->icmph.icmp6_hop_limit) {
+			in6_dev->cnf.hop_limit = ra_msg->icmph.icmp6_hop_limit;
+		} else {
+			ND_PRINTK2(KERN_WARNING "RA: Got route advertisement with lower hop_limit than current\n");
+		}
 		if (rt)
 			rt->u.dst.metrics[RTAX_HOPLIMIT-1] = ra_msg->icmph.icmp6_hop_limit;
 	}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 25/48] fs: take i_mutex during prepare_binprm for set[ug]id executables
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (24 preceding siblings ...)
  2015-05-15  8:05 ` [ 24/48] ipv6: Dont reduce hop limit for an interface Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 26/48] net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland Willy Tarreau
                   ` (21 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Jann Horn, Linus Torvalds, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jann Horn <jann@thejh.net>

commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream.

This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.

This patch was mostly written by Linus Torvalds.

Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
 - Drop the task_no_new_privs() and user namespace checks
 - Open-code file_inode()
 - s/READ_ONCE/ACCESS_ONCE/
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 470e517be17dd6ef8670bec7bd7831ea0d3ad8a6)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/exec.c | 65 +++++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 40 insertions(+), 25 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index c32ae34..8dc1270 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1181,6 +1181,45 @@ int check_unsafe_exec(struct linux_binprm *bprm)
 	return res;
 }
 
+static void bprm_fill_uid(struct linux_binprm *bprm)
+{
+	struct inode *inode;
+	unsigned int mode;
+	uid_t uid;
+	gid_t gid;
+
+	/* clear any previous set[ug]id data from a previous binary */
+	bprm->cred->euid = current_euid();
+	bprm->cred->egid = current_egid();
+
+	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+		return;
+
+	inode = bprm->file->f_path.dentry->d_inode;
+	mode = ACCESS_ONCE(inode->i_mode);
+	if (!(mode & (S_ISUID|S_ISGID)))
+		return;
+
+	/* Be careful if suid/sgid is set */
+	mutex_lock(&inode->i_mutex);
+
+	/* reload atomically mode/uid/gid now that lock held */
+	mode = inode->i_mode;
+	uid = inode->i_uid;
+	gid = inode->i_gid;
+	mutex_unlock(&inode->i_mutex);
+
+	if (mode & S_ISUID) {
+		bprm->per_clear |= PER_CLEAR_ON_SETID;
+		bprm->cred->euid = uid;
+	}
+
+	if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
+		bprm->per_clear |= PER_CLEAR_ON_SETID;
+		bprm->cred->egid = gid;
+	}
+}
+
 /* 
  * Fill the binprm structure from the inode. 
  * Check permissions, then read the first 128 (BINPRM_BUF_SIZE) bytes
@@ -1189,36 +1228,12 @@ int check_unsafe_exec(struct linux_binprm *bprm)
  */
 int prepare_binprm(struct linux_binprm *bprm)
 {
-	umode_t mode;
-	struct inode * inode = bprm->file->f_path.dentry->d_inode;
 	int retval;
 
-	mode = inode->i_mode;
 	if (bprm->file->f_op == NULL)
 		return -EACCES;
 
-	/* clear any previous set[ug]id data from a previous binary */
-	bprm->cred->euid = current_euid();
-	bprm->cred->egid = current_egid();
-
-	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {
-		/* Set-uid? */
-		if (mode & S_ISUID) {
-			bprm->per_clear |= PER_CLEAR_ON_SETID;
-			bprm->cred->euid = inode->i_uid;
-		}
-
-		/* Set-gid? */
-		/*
-		 * If setgid is set but no group execute bit then this
-		 * is a candidate for mandatory locking, not a setgid
-		 * executable.
-		 */
-		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
-			bprm->per_clear |= PER_CLEAR_ON_SETID;
-			bprm->cred->egid = inode->i_gid;
-		}
-	}
+	bprm_fill_uid(bprm);
 
 	/* fill in binprm security blob */
 	retval = security_bprm_set_creds(bprm);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 26/48] net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland.
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (25 preceding siblings ...)
  2015-05-15  8:05 ` [ 25/48] fs: take i_mutex during prepare_binprm for set[ug]id executables Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15 21:08   ` Ben Hutchings
  2015-05-15  8:05 ` [ 27/48] ppp: deflate: never return len larger than output buffer Willy Tarreau
                   ` (20 subsequent siblings)
  47 siblings, 1 reply; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Ani Sinha, David S. Miller, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Ani Sinha <ani@arista.com>

commit 6a2a2b3ae0759843b22c929881cc184b00cc63ff upstream.

Linux manpage for recvmsg and sendmsg calls does not explicitly mention setting msg_namelen to 0 when
msg_name passed set as NULL. When developers don't set msg_namelen member in msghdr, it might contain garbage
value which will fail the validation check and sendmsg and recvmsg calls from kernel will return EINVAL. This will
break old binaries and any code for which there is no access to source code.
To fix this, we set msg_namelen to 0 when msg_name is passed as NULL from userland.

Signed-off-by: Ani Sinha <ani@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit d29f1f53e5299e0bbb3e33ef8d35ed657fa633b6)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/socket.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/socket.c b/net/socket.c
index 19671d8..a838a67 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1872,6 +1872,9 @@ static int copy_msghdr_from_user(struct msghdr *kmsg,
 	if (copy_from_user(kmsg, umsg, sizeof(struct msghdr)))
 		return -EFAULT;
 
+	if (kmsg->msg_name == NULL)
+		kmsg->msg_namelen = 0;
+
 	if (kmsg->msg_namelen < 0)
 		return -EINVAL;
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 27/48] ppp: deflate: never return len larger than output buffer
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (26 preceding siblings ...)
  2015-05-15  8:05 ` [ 26/48] net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:05 ` [ 29/48] net: reject creation of netdev names with colons Willy Tarreau
                   ` (19 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Iain Douglas, Florian Westphal, David S. Miller, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Florian Westphal <fw@strlen.de>

[ Upstream commit e2a4800e75780ccf4e6c2487f82b688ba736eb18 ]

When we've run out of space in the output buffer to store more data, we
will call zlib_deflate with a NULL output buffer until we've consumed
remaining input.

When this happens, olen contains the size the output buffer would have
consumed iff we'd have had enough room.

This can later cause skb_over_panic when ppp_generic skb_put()s
the returned length.

Reported-by: Iain Douglas <centos@1n6.org.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 8bcd64423836bad3638684677f6d740bc7c9297f)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/ppp_deflate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ppp_deflate.c b/drivers/net/ppp_deflate.c
index 034c1c6..09a4382 100644
--- a/drivers/net/ppp_deflate.c
+++ b/drivers/net/ppp_deflate.c
@@ -269,7 +269,7 @@ static int z_compress(void *arg, unsigned char *rptr, unsigned char *obuf,
 	/*
 	 * See if we managed to reduce the size of the packet.
 	 */
-	if (olen < isize) {
+	if (olen < isize && olen <= osize) {
 		state->stats.comp_bytes += olen;
 		state->stats.comp_packets++;
 	} else {
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 29/48] net: reject creation of netdev names with colons
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (27 preceding siblings ...)
  2015-05-15  8:05 ` [ 27/48] ppp: deflate: never return len larger than output buffer Willy Tarreau
@ 2015-05-15  8:05 ` Willy Tarreau
  2015-05-15  8:06 ` [ 30/48] ipv4: Dont use ufo handling on later transformed packets Willy Tarreau
                   ` (18 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:05 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Matthew Thode, David S. Miller, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Matthew Thode <mthode@mthode.org>

[ Upstream commit a4176a9391868bfa87705bcd2e3b49e9b9dd2996 ]

colons are used as a separator in netdev device lookup in dev_ioctl.c

Specific functions are SIOCGIFTXQLEN SIOCETHTOOL SIOCSIFNAME

Signed-off-by: Matthew Thode <mthode@mthode.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit d501ebeb7da7531e92e3c8d194730341c314ff2d)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index d250444..0767b17 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -779,7 +779,7 @@ int dev_valid_name(const char *name)
 		return 0;
 
 	while (*name) {
-		if (*name == '/' || isspace(*name))
+		if (*name == '/' || *name == ':' || isspace(*name))
 			return 0;
 		name++;
 	}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 30/48] ipv4: Dont use ufo handling on later transformed packets
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (28 preceding siblings ...)
  2015-05-15  8:05 ` [ 29/48] net: reject creation of netdev names with colons Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 31/48] udp: only allow UFO for packets from SOCK_DGRAM sockets Willy Tarreau
                   ` (17 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable; +Cc: Steffen Klassert, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Steffen Klassert <steffen.klassert@secunet.com>

We might call ip_ufo_append_data() for packets that will be IPsec
transformed later. This function should be used just for real
udp packets. So we check for rt->dst.header_len which is only
nonzero on IPsec handling and call ip_ufo_append_data() just
if rt->dst.header_len is zero.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c146066ab80267c3305de5dda6a4083f06df9265)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/ip_output.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index faa6623..bd5c4b3 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -878,6 +878,7 @@ int ip_append_data(struct sock *sk,
 	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
 	    (sk->sk_protocol == IPPROTO_UDP) &&
 	    (rt->u.dst.dev->features & NETIF_F_UFO)) {
+	    (rt->u.dst.dev->features & NETIF_F_UFO) && !rt->u.dst.header_len) {
 		err = ip_ufo_append_data(sk, getfrag, from, length, hh_len,
 					 fragheaderlen, transhdrlen, mtu,
 					 flags);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 31/48] udp: only allow UFO for packets from SOCK_DGRAM sockets
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (29 preceding siblings ...)
  2015-05-15  8:06 ` [ 30/48] ipv4: Dont use ufo handling on later transformed packets Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 32/48] net: avoid to hang up on sending due to sysctl configuration overflow Willy Tarreau
                   ` (16 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Michal Kubecek, David S. Miller, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: =?latin1?q?Michal=20Kube=C4=8Dek?= <mkubecek@suse.cz>

[ Upstream commit acf8dd0a9d0b9e4cdb597c2f74802f79c699e802 ]

If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
checksum is to be computed on segmentation. However, in this case,
skb->csum_start and skb->csum_offset are never set as raw socket
transmit path bypasses udp_send_skb() where they are usually set. As a
result, driver may access invalid memory when trying to calculate the
checksum and store the result (as observed in virtio_net driver).

Moreover, the very idea of modifying the userspace provided UDP header
is IMHO against raw socket semantics (I wasn't able to find a document
clearly stating this or the opposite, though). And while allowing
CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
too intrusive change just to handle a corner case like this. Therefore
disallowing UFO for packets from SOCK_DGRAM seems to be the best option.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 332640b2821f75381b1049a904d93d4fb846334f)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/ip_output.c  | 4 ++--
 net/ipv6/ip6_output.c | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index bd5c4b3..00d4d00 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -877,8 +877,8 @@ int ip_append_data(struct sock *sk,
 	inet->cork.length += length;
 	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
 	    (sk->sk_protocol == IPPROTO_UDP) &&
-	    (rt->u.dst.dev->features & NETIF_F_UFO)) {
-	    (rt->u.dst.dev->features & NETIF_F_UFO) && !rt->u.dst.header_len) {
+	    (rt->u.dst.dev->features & NETIF_F_UFO) && !rt->u.dst.header_len &&
+	    (sk->sk_type == SOCK_DGRAM)) {
 		err = ip_ufo_append_data(sk, getfrag, from, length, hh_len,
 					 fragheaderlen, transhdrlen, mtu,
 					 flags);
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6dff3d7..1934328 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1259,7 +1259,8 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 	if (((length > mtu) ||
 	     (skb && skb_has_frags(skb))) &&
 	    (sk->sk_protocol == IPPROTO_UDP) &&
-	    (rt->u.dst.dev->features & NETIF_F_UFO)) {
+	    (rt->u.dst.dev->features & NETIF_F_UFO) &&
+	    (sk->sk_type == SOCK_DGRAM)) {
 		err = ip6_ufo_append_data(sk, getfrag, from, length,
 					  hh_len, fragheaderlen,
 					  transhdrlen, mtu, flags, rt);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 32/48] net: avoid to hang up on sending due to sysctl configuration overflow.
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (30 preceding siblings ...)
  2015-05-15  8:06 ` [ 31/48] udp: only allow UFO for packets from SOCK_DGRAM sockets Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 33/48] net: sysctl_net_core: check SNDBUF and RCVBUF for min length Willy Tarreau
                   ` (15 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Eric Dumazet, Li Yu, David S. Miller, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: "bingtian.ly@taobao.com" <bingtian.ly@taobao.com>

commit cdda88912d62f9603d27433338a18be83ef23ac1 upstream.

    I found if we write a larger than 4GB value to some sysctl
variables, the sending syscall will hang up forever, because these
variables are 32 bits, such large values make them overflow to 0 or
negative.

    This patch try to fix overflow or prevent from zero value setup
of below sysctl variables:

net.core.wmem_default
net.core.rmem_default

net.core.rmem_max
net.core.wmem_max

net.ipv4.udp_rmem_min
net.ipv4.udp_wmem_min

net.ipv4.tcp_wmem
net.ipv4.tcp_rmem

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Li Yu <raise.sail@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
 - Adjust context
 - Delete now-unused 'zero' variable]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 98eee187cdee2807bd80e6c02180c5c2abae6453)
[wt: backported to 2.6.32: set strategy to sysctl_intvec where relevant]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/sysctl_net_core.c | 18 ++++++++++++++----
 net/ipv4/sysctl_net_ipv4.c | 13 +++++++++----
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index e6bf72c..a600328 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -17,6 +17,8 @@
 static int zero = 0;
 static int ushort_max = 65535;
 
+static int one = 1;
+
 static struct ctl_table net_core_table[] = {
 #ifdef CONFIG_NET
 	{
@@ -25,7 +27,9 @@ static struct ctl_table net_core_table[] = {
 		.data		= &sysctl_wmem_max,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= proc_dointvec_minmax,
+		.strategy	= sysctl_intvec,
+		.extra1		= &one,
 	},
 	{
 		.ctl_name	= NET_CORE_RMEM_MAX,
@@ -33,7 +37,9 @@ static struct ctl_table net_core_table[] = {
 		.data		= &sysctl_rmem_max,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= proc_dointvec_minmax,
+		.strategy	= sysctl_intvec,
+		.extra1		= &one,
 	},
 	{
 		.ctl_name	= NET_CORE_WMEM_DEFAULT,
@@ -41,7 +47,9 @@ static struct ctl_table net_core_table[] = {
 		.data		= &sysctl_wmem_default,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= proc_dointvec_minmax,
+		.strategy	= sysctl_intvec,
+		.extra1		= &one,
 	},
 	{
 		.ctl_name	= NET_CORE_RMEM_DEFAULT,
@@ -49,7 +57,9 @@ static struct ctl_table net_core_table[] = {
 		.data		= &sysctl_rmem_default,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= proc_dointvec_minmax,
+		.strategy	= sysctl_intvec,
+		.extra1		= &one,
 	},
 	{
 		.ctl_name	= NET_CORE_DEV_WEIGHT,
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index d957371..d1a8883 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -22,6 +22,7 @@
 #include <net/inet_frag.h>
 
 static int zero;
+static int one = 1;
 static int tcp_retr1_max = 255;
 static int tcp_syn_retries_min = 1;
 static int tcp_syn_retries_max = MAX_TCP_SYNCNT;
@@ -521,7 +522,9 @@ static struct ctl_table ipv4_table[] = {
 		.data		= &sysctl_tcp_wmem,
 		.maxlen		= sizeof(sysctl_tcp_wmem),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= proc_dointvec_minmax,
+		.strategy	= sysctl_intvec,
+		.extra1		= &one,
 	},
 	{
 		.ctl_name	= NET_TCP_RMEM,
@@ -529,7 +532,9 @@ static struct ctl_table ipv4_table[] = {
 		.data		= &sysctl_tcp_rmem,
 		.maxlen		= sizeof(sysctl_tcp_rmem),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= proc_dointvec_minmax,
+		.strategy	= sysctl_intvec,
+		.extra1		= &one,
 	},
 	{
 		.ctl_name	= NET_TCP_APP_WIN,
@@ -735,7 +740,7 @@ static struct ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
 		.strategy	= sysctl_intvec,
-		.extra1		= &zero
+		.extra1		= &one
 	},
 	{
 		.ctl_name	= CTL_UNNUMBERED,
@@ -745,7 +750,7 @@ static struct ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
 		.strategy	= sysctl_intvec,
-		.extra1		= &zero
+		.extra1		= &one
 	},
 	{ .ctl_name = 0 }
 };
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 33/48] net: sysctl_net_core: check SNDBUF and RCVBUF for min length
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (31 preceding siblings ...)
  2015-05-15  8:06 ` [ 32/48] net: avoid to hang up on sending due to sysctl configuration overflow Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 34/48] rds: avoid potential stack overflow Willy Tarreau
                   ` (14 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Alexey Kodanev, David S. Miller, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Alexey Kodanev <alexey.kodanev@oracle.com>

[ Upstream commit b1cb59cf2efe7971d3d72a7b963d09a512d994c9 ]

sysctl has sysctl.net.core.rmem_*/wmem_* parameters which can be
set to incorrect values. Given that 'struct sk_buff' allocates from
rcvbuf, incorrectly set buffer length could result to memory
allocation failures. For example, set them as follows:

    # sysctl net.core.rmem_default=64
      net.core.wmem_default = 64
    # sysctl net.core.wmem_default=64
      net.core.wmem_default = 64
    # ping localhost -s 1024 -i 0 > /dev/null

This could result to the following failure:

skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32
head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev:<NULL>
kernel BUG at net/core/skbuff.c:102!
invalid opcode: 0000 [#1] SMP
...
task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000
RIP: 0010:[<ffffffff8155fbd1>]  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
RSP: 0018:ffff88003ae8bc68  EFLAGS: 00010296
RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000
RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8
RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900
FS:  00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0
Stack:
 ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d
 ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550
 ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00
Call Trace:
 [<ffffffff81628db4>] unix_stream_sendmsg+0x2c4/0x470
 [<ffffffff81556f56>] sock_write_iter+0x146/0x160
 [<ffffffff811d9612>] new_sync_write+0x92/0xd0
 [<ffffffff811d9cd6>] vfs_write+0xd6/0x180
 [<ffffffff811da499>] SyS_write+0x59/0xd0
 [<ffffffff81651532>] system_call_fastpath+0x12/0x17
Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00
      00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b
      eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
RIP  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
RSP <ffff88003ae8bc68>
Kernel panic - not syncing: Fatal exception

Moreover, the possible minimum is 1, so we can get another kernel panic:
...
BUG: unable to handle kernel paging request at ffff88013caee5c0
IP: [<ffffffff815604cf>] __alloc_skb+0x12f/0x1f0
...

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: delete now-unused 'one' variable]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 2d6dfb109bfbf3abd5f762173b1d73fd321dbe37)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/sysctl_net_core.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index a600328..d0a07c2 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -17,7 +17,8 @@
 static int zero = 0;
 static int ushort_max = 65535;
 
-static int one = 1;
+static int min_sndbuf = SOCK_MIN_SNDBUF;
+static int min_rcvbuf = SOCK_MIN_RCVBUF;
 
 static struct ctl_table net_core_table[] = {
 #ifdef CONFIG_NET
@@ -29,7 +30,7 @@ static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
 		.strategy	= sysctl_intvec,
-		.extra1		= &one,
+		.extra1		= &min_sndbuf,
 	},
 	{
 		.ctl_name	= NET_CORE_RMEM_MAX,
@@ -39,7 +40,7 @@ static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
 		.strategy	= sysctl_intvec,
-		.extra1		= &one,
+		.extra1		= &min_rcvbuf,
 	},
 	{
 		.ctl_name	= NET_CORE_WMEM_DEFAULT,
@@ -49,7 +50,7 @@ static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
 		.strategy	= sysctl_intvec,
-		.extra1		= &one,
+		.extra1		= &min_sndbuf,
 	},
 	{
 		.ctl_name	= NET_CORE_RMEM_DEFAULT,
@@ -59,7 +60,7 @@ static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
 		.strategy	= sysctl_intvec,
-		.extra1		= &one,
+		.extra1		= &min_rcvbuf,
 	},
 	{
 		.ctl_name	= NET_CORE_DEV_WEIGHT,
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 34/48] rds: avoid potential stack overflow
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (32 preceding siblings ...)
  2015-05-15  8:06 ` [ 33/48] net: sysctl_net_core: check SNDBUF and RCVBUF for min length Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 35/48] rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg() Willy Tarreau
                   ` (13 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Arnd Bergmann, Sowmini Varadhan, David S. Miller, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Arnd Bergmann <arnd@arndb.de>

[ Upstream commit f862e07cf95d5b62a5fc5e981dd7d0dbaf33a501 ]

The rds_iw_update_cm_id function stores a large 'struct rds_sock' object
on the stack in order to pass a pair of addresses. This happens to just
fit withint the 1024 byte stack size warning limit on x86, but just
exceed that limit on ARM, which gives us this warning:

net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]

As the use of this large variable is basically bogus, we can rearrange
the code to not do that. Instead of passing an rds socket into
rds_iw_get_device, we now just pass the two addresses that we have
available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly,
to create two address structures on the stack there.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 3fe2d645fe4ea7ff6cba9020685e46c1a1dff9c0)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/rds/iw_rdma.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c
index de4a1b1..6ed9cdd 100644
--- a/net/rds/iw_rdma.c
+++ b/net/rds/iw_rdma.c
@@ -86,7 +86,9 @@ static unsigned int rds_iw_unmap_fastreg_list(struct rds_iw_mr_pool *pool,
 			struct list_head *kill_list);
 static void rds_iw_destroy_fastreg(struct rds_iw_mr_pool *pool, struct rds_iw_mr *ibmr);
 
-static int rds_iw_get_device(struct rds_sock *rs, struct rds_iw_device **rds_iwdev, struct rdma_cm_id **cm_id)
+static int rds_iw_get_device(struct sockaddr_in *src, struct sockaddr_in *dst,
+			     struct rds_iw_device **rds_iwdev,
+			     struct rdma_cm_id **cm_id)
 {
 	struct rds_iw_device *iwdev;
 	struct rds_iw_cm_id *i_cm_id;
@@ -110,15 +112,15 @@ static int rds_iw_get_device(struct rds_sock *rs, struct rds_iw_device **rds_iwd
 				src_addr->sin_port,
 				dst_addr->sin_addr.s_addr,
 				dst_addr->sin_port,
-				rs->rs_bound_addr,
-				rs->rs_bound_port,
-				rs->rs_conn_addr,
-				rs->rs_conn_port);
+				src->sin_addr.s_addr,
+				src->sin_port,
+				dst->sin_addr.s_addr,
+				dst->sin_port);
 #ifdef WORKING_TUPLE_DETECTION
-			if (src_addr->sin_addr.s_addr == rs->rs_bound_addr &&
-			    src_addr->sin_port == rs->rs_bound_port &&
-			    dst_addr->sin_addr.s_addr == rs->rs_conn_addr &&
-			    dst_addr->sin_port == rs->rs_conn_port) {
+			if (src_addr->sin_addr.s_addr == src->sin_addr.s_addr &&
+			    src_addr->sin_port == src->sin_port &&
+			    dst_addr->sin_addr.s_addr == dst->sin_addr.s_addr &&
+			    dst_addr->sin_port == dst->sin_port) {
 #else
 			/* FIXME - needs to compare the local and remote
 			 * ipaddr/port tuple, but the ipaddr is the only
@@ -126,7 +128,7 @@ static int rds_iw_get_device(struct rds_sock *rs, struct rds_iw_device **rds_iwd
 			 * zero'ed.  It doesn't appear to be properly populated
 			 * during connection setup...
 			 */
-			if (src_addr->sin_addr.s_addr == rs->rs_bound_addr) {
+			if (src_addr->sin_addr.s_addr == src->sin_addr.s_addr) {
 #endif
 				spin_unlock_irq(&iwdev->spinlock);
 				*rds_iwdev = iwdev;
@@ -177,19 +179,13 @@ int rds_iw_update_cm_id(struct rds_iw_device *rds_iwdev, struct rdma_cm_id *cm_i
 {
 	struct sockaddr_in *src_addr, *dst_addr;
 	struct rds_iw_device *rds_iwdev_old;
-	struct rds_sock rs;
 	struct rdma_cm_id *pcm_id;
 	int rc;
 
 	src_addr = (struct sockaddr_in *)&cm_id->route.addr.src_addr;
 	dst_addr = (struct sockaddr_in *)&cm_id->route.addr.dst_addr;
 
-	rs.rs_bound_addr = src_addr->sin_addr.s_addr;
-	rs.rs_bound_port = src_addr->sin_port;
-	rs.rs_conn_addr = dst_addr->sin_addr.s_addr;
-	rs.rs_conn_port = dst_addr->sin_port;
-
-	rc = rds_iw_get_device(&rs, &rds_iwdev_old, &pcm_id);
+	rc = rds_iw_get_device(src_addr, dst_addr, &rds_iwdev_old, &pcm_id);
 	if (rc)
 		rds_iw_remove_cm_id(rds_iwdev, cm_id);
 
@@ -609,9 +605,17 @@ void *rds_iw_get_mr(struct scatterlist *sg, unsigned long nents,
 	struct rds_iw_device *rds_iwdev;
 	struct rds_iw_mr *ibmr = NULL;
 	struct rdma_cm_id *cm_id;
+	struct sockaddr_in src = {
+		.sin_addr.s_addr = rs->rs_bound_addr,
+		.sin_port = rs->rs_bound_port,
+	};
+	struct sockaddr_in dst = {
+		.sin_addr.s_addr = rs->rs_conn_addr,
+		.sin_port = rs->rs_conn_port,
+	};
 	int ret;
 
-	ret = rds_iw_get_device(rs, &rds_iwdev, &cm_id);
+	ret = rds_iw_get_device(&src, &dst, &rds_iwdev, &cm_id);
 	if (ret || !cm_id) {
 		ret = -ENODEV;
 		goto out;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 35/48] rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (33 preceding siblings ...)
  2015-05-15  8:06 ` [ 34/48] rds: avoid potential stack overflow Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 36/48] tcp: make connect() mem charging friendly Willy Tarreau
                   ` (12 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Al Viro, David S. Miller, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Al Viro <viro@ZenIV.linux.org.uk>

[ Upstream commit 7d985ed1dca5c90535d67ce92ef6ca520302340a ]

[I would really like an ACK on that one from dhowells; it appears to be
quite straightforward, but...]

MSG_PEEK isn't passed to ->recvmsg() via msg->msg_flags; as the matter of
fact, neither the kernel users of rxrpc, nor the syscalls ever set that bit
in there.  It gets passed via flags; in fact, another such check in the same
function is done correctly - as flags & MSG_PEEK.

It had been that way (effectively disabled) for 8 years, though, so the patch
needs beating up - that case had never been tested.  If it is correct, it's
-stable fodder.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 10c82cd7d46e4c525b046c399fcd285ce138198e)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/rxrpc/ar-recvmsg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
index d5630d9..b6076b2 100644
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -86,7 +86,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 		if (!skb) {
 			/* nothing remains on the queue */
 			if (copied &&
-			    (msg->msg_flags & MSG_PEEK || timeo == 0))
+			    (flags & MSG_PEEK || timeo == 0))
 				goto out;
 
 			/* wait for a message to turn up */
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 36/48] tcp: make connect() mem charging friendly
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (34 preceding siblings ...)
  2015-05-15  8:06 ` [ 35/48] rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg() Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 37/48] ip_forward: Drop frames with attached skb->sk Willy Tarreau
                   ` (11 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Denys Fedoryshchenko, Eric Dumazet, Yuchung Cheng,
	David S. Miller, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 355a901e6cf1b2b763ec85caa2a9f04fbcc4ab4a ]

While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.

We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()

Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)

Instead of open coding memcpy_fromiovecend(), simply use this helper.

This leaves in socket write queue clean fast clone skbs.

This was tested against our fastopen packetdrill tests.

Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
 - Drop the Fast Open changes
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 3e2eb8946907b2d53eb906e13e01d273c6534f5c)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_output.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0fc0a73..9e7fc38 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2378,13 +2378,10 @@ int tcp_connect(struct sock *sk)
 
 	tcp_connect_init(sk);
 
-	buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation);
-	if (unlikely(buff == NULL))
+	buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation);
+	if (unlikely(!buff))
 		return -ENOBUFS;
 
-	/* Reserve space for headers. */
-	skb_reserve(buff, MAX_TCP_HEADER);
-
 	tp->snd_nxt = tp->write_seq;
 	tcp_init_nondata_skb(buff, tp->write_seq++, TCPCB_FLAG_SYN);
 	TCP_ECN_send_syn(sk, buff);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 37/48] ip_forward: Drop frames with attached skb->sk
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (35 preceding siblings ...)
  2015-05-15  8:06 ` [ 36/48] tcp: make connect() mem charging friendly Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 38/48] tcp: avoid looping in tcp_send_fin() Willy Tarreau
                   ` (10 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Sebastian Poehn, David S. Miller, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: =?latin1?q?Sebastian=20P=F6hn?= <sebastian.poehn@gmail.com>

[ Upstream commit 2ab957492d13bb819400ac29ae55911d50a82a13 ]

Initial discussion was:
[FYI] xfrm: Don't lookup sk_policy for timewait sockets

Forwarded frames should not have a socket attached. Especially
tw sockets will lead to panics later-on in the stack.

This was observed with TPROXY assigning a tw socket and broken
policy routing (misconfigured). As a result frame enters
forwarding path instead of input. We cannot solve this in
TPROXY as it cannot know that policy routing is broken.

v2:
Remove useless comment

Signed-off-by: Sebastian Poehn <sebastian.poehn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit fccb908d23fbae3e941e9294590dd94de6b1d822)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/ip_forward.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index a2991bc..6be434085 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -56,6 +56,9 @@ int ip_forward(struct sk_buff *skb)
 	struct rtable *rt;	/* Route we use */
 	struct ip_options * opt	= &(IPCB(skb)->opt);

+	if (unlikely(skb->sk))
+		goto drop;
+
 	if (skb_warn_if_lro(skb))
 		goto drop;

-- 
1.7.12.2.21.g234cd45.dirty

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 38/48] tcp: avoid looping in tcp_send_fin()
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (36 preceding siblings ...)
  2015-05-15  8:06 ` [ 37/48] ip_forward: Drop frames with attached skb->sk Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 39/48] spi: spidev: fix possible arithmetic overflow for multi-transfer message Willy Tarreau
                   ` (9 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Eric Dumazet, David S. Miller, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 845704a535e9b3c76448f52af1b70e4422ea03fd ]

Presence of an unbound loop in tcp_send_fin() had always been hard
to explain when analyzing crash dumps involving gigantic dying processes
with millions of sockets.

Lets try a different strategy :

In case of memory pressure, try to add the FIN flag to last packet
in write queue, even if packet was already sent. TCP stack will
be able to deliver this FIN after a timeout event. Note that this
FIN being delivered by a retransmit, it also carries a Push flag
given our current implementation.

By checking sk_under_memory_pressure(), we anticipate that cooking
many FIN packets might deplete tcp memory.

In the case we could not allocate a packet, even with __GFP_WAIT
allocation, then not sending a FIN seems quite reasonable if it allows
to get rid of this socket, free memory, and not block the process from
eventually doing other useful work.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
 - Drop inapplicable change to sk_forced_wmem_schedule()
 - s/sk_under_memory_pressure(sk)/tcp_memory_pressure/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 82241580d7734af2207ad0bb1720904f569dac3a)
[wt: backported to 2.6.32: s/TCPHDR_FIN/TCPCB_FLAG_FIN/]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_output.c | 45 ++++++++++++++++++++++++++-------------------
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9e7fc38..5339f06 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2121,33 +2121,40 @@ begin_fwd:
 	}
 }
 
-/* Send a fin.  The caller locks the socket for us.  This cannot be
- * allowed to fail queueing a FIN frame under any circumstances.
+/* Send a FIN. The caller locks the socket for us.
+ * We should try to send a FIN packet really hard, but eventually give up.
  */
 void tcp_send_fin(struct sock *sk)
 {
+	struct sk_buff *skb, *tskb = tcp_write_queue_tail(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
-	struct sk_buff *skb = tcp_write_queue_tail(sk);
-	int mss_now;
 
-	/* Optimization, tack on the FIN if we have a queue of
-	 * unsent frames.  But be careful about outgoing SACKS
-	 * and IP options.
+	/* Optimization, tack on the FIN if we have one skb in write queue and
+	 * this skb was not yet sent, or we are under memory pressure.
+	 * Note: in the latter case, FIN packet will be sent after a timeout,
+	 * as TCP stack thinks it has already been transmitted.
 	 */
-	mss_now = tcp_current_mss(sk);
-
-	if (tcp_send_head(sk) != NULL) {
+	if (tskb && (tcp_send_head(sk) || tcp_memory_pressure)) {
+coalesce:
 		TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_FIN;
-		TCP_SKB_CB(skb)->end_seq++;
+		TCP_SKB_CB(tskb)->end_seq++;
 		tp->write_seq++;
+		if (!tcp_send_head(sk)) {
+			/* This means tskb was already sent.
+			 * Pretend we included the FIN on previous transmit.
+			 * We need to set tp->snd_nxt to the value it would have
+			 * if FIN had been sent. This is because retransmit path
+			 * does not change tp->snd_nxt.
+			 */
+			tp->snd_nxt++;
+			return;
+		}
 	} else {
-		/* Socket is locked, keep trying until memory is available. */
-		for (;;) {
-			skb = alloc_skb_fclone(MAX_TCP_HEADER,
-					       sk->sk_allocation);
-			if (skb)
-				break;
-			yield();
+		skb = alloc_skb_fclone(MAX_TCP_HEADER, sk->sk_allocation);
+		if (unlikely(!skb)) {
+			if (tskb)
+				goto coalesce;
+			return;
 		}
 
 		/* Reserve space for headers and prepare control bits. */
@@ -2157,7 +2164,7 @@ void tcp_send_fin(struct sock *sk)
 				     TCPCB_FLAG_ACK | TCPCB_FLAG_FIN);
 		tcp_queue_skb(sk, skb);
 	}
-	__tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_OFF);
+	__tcp_push_pending_frames(sk, tcp_current_mss(sk), TCP_NAGLE_OFF);
 }
 
 /* We get here when a process closes a file descriptor (either due to
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 39/48] spi: spidev: fix possible arithmetic overflow for multi-transfer message
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (37 preceding siblings ...)
  2015-05-15  8:06 ` [ 38/48] tcp: avoid looping in tcp_send_fin() Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 40/48] IB/core: Avoid leakage from kernel to user space Willy Tarreau
                   ` (8 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable; +Cc: Ian Abbott, Mark Brown, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Ian Abbott <abbotti@mev.co.uk>

commit f20fbaad7620af2df36a1f9d1c9ecf48ead5b747 upstream.

`spidev_message()` sums the lengths of the individual SPI transfers to
determine the overall SPI message length.  It restricts the total
length, returning an error if too long, but it does not check for
arithmetic overflow.  For example, if the SPI message consisted of two
transfers and the first has a length of 10 and the second has a length
of (__u32)(-1), the total length would be seen as 9, even though the
second transfer is actually very long.  If the second transfer specifies
a null `rx_buf` and a non-null `tx_buf`, the `copy_from_user()` could
overrun the spidev's pre-allocated tx buffer before it reaches an
invalid user memory address.  Fix it by checking that neither the total
nor the individual transfer lengths exceed the maximum allowed value.

Thanks to Dan Carpenter for reporting the potential integer overflow.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
[Ian Abbott: Note: original commit compares the lengths to INT_MAX
 instead of bufsiz due to changes in earlier commits.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 7499401e4a0b01ee43cff768de4ca630dcd0bc64)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/spi/spidev.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
index 5d23983..4dd8e2a 100644
--- a/drivers/spi/spidev.c
+++ b/drivers/spi/spidev.c
@@ -241,7 +241,10 @@ static int spidev_message(struct spidev_data *spidev,
 		k_tmp->len = u_tmp->len;

 		total += k_tmp->len;
-		if (total > bufsiz) {
+		/* Check total length of transfers.  Also check each
+		 * transfer length to avoid arithmetic overflow.
+		 */
+		if (total > bufsiz || k_tmp->len > bufsiz) {
 			status = -EMSGSIZE;
 			goto done;
 		}
-- 
1.7.12.2.21.g234cd45.dirty

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 40/48] IB/core: Avoid leakage from kernel to user space
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (38 preceding siblings ...)
  2015-05-15  8:06 ` [ 39/48] spi: spidev: fix possible arithmetic overflow for multi-transfer message Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 41/48] ipvs: uninitialized data with IP_VS_IPV6 Willy Tarreau
                   ` (7 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Eli Cohen, Yann Droneaud, Roland Dreier, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eli Cohen <eli@dev.mellanox.co.il>

commit 377b513485fd885dea1083a9a5430df65b35e048 upstream.

Clear the reserved field of struct ib_uverbs_async_event_desc which is
copied to user space.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Yann Droneaud <ydroneaud@opteya.com>
(cherry picked from commit 852acc0151014ba9731e2f5f2f3df3b6a8960d40)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/infiniband/core/uverbs_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index aec0fbd..8da0037 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -433,6 +433,7 @@ static void ib_uverbs_async_handler(struct ib_uverbs_file *file,
 
 	entry->desc.async.element    = element;
 	entry->desc.async.event_type = event;
+	entry->desc.async.reserved   = 0;
 	entry->counter               = counter;
 
 	list_add_tail(&entry->list, &file->async_file->event_list);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 41/48] ipvs: uninitialized data with IP_VS_IPV6
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (39 preceding siblings ...)
  2015-05-15  8:06 ` [ 40/48] IB/core: Avoid leakage from kernel to user space Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 42/48] ipv4: fix nexthop attlen check in fib_nh_match Willy Tarreau
                   ` (6 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Dan Carpenter, Julian Anastasov, Simon Horman, Ben Hutchings,
	Pablo Neira Ayuso, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.carpenter@oracle.com>

commit 3b05ac3824ed9648c0d9c02d51d9b54e4e7e874f upstream.

The app_tcp_pkt_out() function expects "*diff" to be set and ends up
using uninitialized data if CONFIG_IP_VS_IPV6 is turned on.

The same issue is there in app_tcp_pkt_in().  Thanks to Julian Anastasov
for noticing that.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 0ce625baeec39e813bef9b073f0214b513b2ef2d)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/netfilter/ipvs/ip_vs_ftp.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
index 33e2c79..0e399c0 100644
--- a/net/netfilter/ipvs/ip_vs_ftp.c
+++ b/net/netfilter/ipvs/ip_vs_ftp.c
@@ -150,6 +150,8 @@ static int ip_vs_ftp_out(struct ip_vs_app *app, struct ip_vs_conn *cp,
 	unsigned buf_len;
 	int ret;
 
+	*diff = 0;
+
 #ifdef CONFIG_IP_VS_IPV6
 	/* This application helper doesn't work with IPv6 yet,
 	 * so turn this into a no-op for IPv6 packets
@@ -158,8 +160,6 @@ static int ip_vs_ftp_out(struct ip_vs_app *app, struct ip_vs_conn *cp,
 		return 1;
 #endif
 
-	*diff = 0;
-
 	/* Only useful for established sessions */
 	if (cp->state != IP_VS_TCP_S_ESTABLISHED)
 		return 1;
@@ -257,6 +257,9 @@ static int ip_vs_ftp_in(struct ip_vs_app *app, struct ip_vs_conn *cp,
 	__be16 port;
 	struct ip_vs_conn *n_cp;
 
+	/* no diff required for incoming packets */
+	*diff = 0;
+
 #ifdef CONFIG_IP_VS_IPV6
 	/* This application helper doesn't work with IPv6 yet,
 	 * so turn this into a no-op for IPv6 packets
@@ -265,9 +268,6 @@ static int ip_vs_ftp_in(struct ip_vs_app *app, struct ip_vs_conn *cp,
 		return 1;
 #endif
 
-	/* no diff required for incoming packets */
-	*diff = 0;
-
 	/* Only useful for established sessions */
 	if (cp->state != IP_VS_TCP_S_ESTABLISHED)
 		return 1;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 42/48] ipv4: fix nexthop attlen check in fib_nh_match
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (40 preceding siblings ...)
  2015-05-15  8:06 ` [ 41/48] ipvs: uninitialized data with IP_VS_IPV6 Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 43/48] pagemap: do not leak physical addresses to non-privileged userspace Willy Tarreau
                   ` (5 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Jiri Pirko, Eric Dumazet, David S. Miller, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Pirko <jiri@resnulli.us>

commit f76936d07c4eeb36d8dbb64ebd30ab46ff85d9f7 upstream.

fib_nh_match does not match nexthops correctly. Example:

ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \
                          nexthop via 192.168.122.13 dev eth0
ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \
                          nexthop via 192.168.122.15 dev eth0

Del command is successful and route is removed. After this patch
applied, the route is correctly matched and result is:
RTNETLINK answers: No such process

Please consider this for stable trees as well.

Fixes: 4e902c57417c4 ("[IPv4]: FIB configuration using struct fib_config")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 0aba46add2915b344580569e87d9c41274b9c475)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/fib_semantics.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 9b096d6..8fc396a 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -453,7 +453,7 @@ int fib_nh_match(struct fib_config *cfg, struct fib_info *fi)
 			return 1;
 
 		attrlen = rtnh_attrlen(rtnh);
-		if (attrlen < 0) {
+		if (attrlen > 0) {
 			struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
 
 			nla = nla_find(attrs, attrlen, RTA_GATEWAY);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 43/48] pagemap: do not leak physical addresses to non-privileged userspace
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (41 preceding siblings ...)
  2015-05-15  8:06 ` [ 42/48] ipv4: fix nexthop attlen check in fib_nh_match Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 44/48] lockd: Try to reconnect if statd has moved Willy Tarreau
                   ` (4 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Kirill A. Shutemov, Konstantin Khlebnikov, Andy Lutomirski,
	Pavel Emelyanov, Andrew Morton, Mark Seaborn, Linus Torvalds,
	mancha security, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

commit ab676b7d6fbf4b294bf198fb27ade5b0e865c7ce upstream.

As pointed by recent post[1] on exploiting DRAM physical imperfection,
/proc/PID/pagemap exposes sensitive information which can be used to do
attacks.

This disallows anybody without CAP_SYS_ADMIN to read the pagemap.

[1] http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html

[ Eventually we might want to do anything more finegrained, but for now
  this is the simple model.   - Linus ]

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Seaborn <mseaborn@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[mancha security: Backported to 3.10]
Signed-off-by: mancha security <mancha1@zoho.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 1ffc3cd9a36b504c20ce98fe5eeb5463f389e1ac)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/proc/task_mmu.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 3b7b82a..73db5a6 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -773,9 +773,19 @@ out:
 	return ret;
 }

+static int pagemap_open(struct inode *inode, struct file *file)
+{
+	/* do not disclose physical addresses to unprivileged
+	   userspace (closes a rowhammer attack vector) */
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+	return 0;
+}
+
 const struct file_operations proc_pagemap_operations = {
 	.llseek		= mem_lseek, /* borrow this */
 	.read		= pagemap_read,
+	.open		= pagemap_open,
 };
 #endif /* CONFIG_PROC_PAGE_MONITOR */

-- 
1.7.12.2.21.g234cd45.dirty

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 44/48] lockd: Try to reconnect if statd has moved
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (42 preceding siblings ...)
  2015-05-15  8:06 ` [ 43/48] pagemap: do not leak physical addresses to non-privileged userspace Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 45/48] scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND Willy Tarreau
                   ` (3 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Benjamin Coddington, Trond Myklebust, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Benjamin Coddington <bcodding@redhat.com>

commit 173b3afceebe76fa2205b2c8808682d5b541fe3c upstream.

If rpc.statd is restarted, upcalls to monitor hosts can fail with
ECONNREFUSED.  In that case force a lookup of statd's new port and retry the
upcall.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[bwh: Backported to 3.2: not using RPC_TASK_SOFTCONN]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 3aabe891f32c209a2be7cd5581d2634020e801c1)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/lockd/mon.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
index f956651..48de6a5 100644
--- a/fs/lockd/mon.c
+++ b/fs/lockd/mon.c
@@ -109,6 +109,12 @@ static int nsm_mon_unmon(struct nsm_handle *nsm, u32 proc, struct nsm_res *res)
 
 	msg.rpc_proc = &clnt->cl_procinfo[proc];
 	status = rpc_call_sync(clnt, &msg, 0);
+	if (status == -ECONNREFUSED) {
+		dprintk("lockd:	NSM upcall RPC failed, status=%d, forcing rebind\n",
+				status);
+		rpc_force_rebind(clnt);
+		status = rpc_call_sync(clnt, &msg, 0);
+	}
 	if (status < 0)
 		dprintk("lockd: NSM upcall RPC failed, status=%d\n",
 				status);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 45/48] scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (43 preceding siblings ...)
  2015-05-15  8:06 ` [ 44/48] lockd: Try to reconnect if statd has moved Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 46/48] posix-timers: Fix stack info leak in timer_create() Willy Tarreau
                   ` (2 subsequent siblings)
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Jens Axboe, linux-scsi, Jan Kara, Jens Axboe, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 84ce0f0e94ac97217398b3b69c21c7a62ebeed05 upstream.

When sg_scsi_ioctl() fails to prepare request to submit in
blk_rq_map_kern() we jump to a label where we just end up copying
(luckily zeroed-out) kernel buffer to userspace instead of reporting
error. Fix the problem by jumping to the right label.

CC: Jens Axboe <axboe@kernel.dk>
CC: linux-scsi@vger.kernel.org
Coverity-id: 1226871
Signed-off-by: Jan Kara <jack@suse.cz>

Fixed up the, now unused, out label.

Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit d73b032b63e8967462e1cf5763858ed89e97880f)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 block/scsi_ioctl.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 123eb17..f5df2a8 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -503,7 +503,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,
 
 	if (bytes && blk_rq_map_kern(q, rq, buffer, bytes, __GFP_WAIT)) {
 		err = DRIVER_ERROR << 24;
-		goto out;
+		goto error;
 	}
 
 	memset(sense, 0, sizeof(sense));
@@ -513,7 +513,6 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,
 
 	blk_execute_rq(q, disk, rq, 0);
 
-out:
 	err = rq->errors & 0xff;	/* only 8 bit SCSI status */
 	if (err) {
 		if (rq->sense_len && rq->sense) {
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 46/48] posix-timers: Fix stack info leak in timer_create()
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (44 preceding siblings ...)
  2015-05-15  8:06 ` [ 45/48] scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 47/48] hfsplus: fix B-tree corruption after insertion at position 0 Willy Tarreau
  2015-05-15  8:06 ` [ 48/48] sound/oss: fix deadlock in sequencer_ioctl(SNDCTL_SEQ_OUTOFBAND) Willy Tarreau
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Mathias Krause, Oleg Nesterov, Brad Spengler, PaX Team,
	Thomas Gleixner, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

commit 6891c4509c792209c44ced55a60f13954cb50ef4 upstream.

If userland creates a timer without specifying a sigevent info, we'll
create one ourself, using a stack local variable. Particularly will we
use the timer ID as sival_int. But as sigev_value is a union containing
a pointer and an int, that assignment will only partially initialize
sigev_value on systems where the size of a pointer is bigger than the
size of an int. On such systems we'll copy the uninitialized stack bytes
from the timer_create() call to userland when the timer actually fires
and we're going to deliver the signal.

Initialize sigev_value with 0 to plug the stack info leak.

Found in the PaX patch, written by the PaX Team.

Fixes: 5a9fa7307285 ("posix-timers: kill ->it_sigev_signo and...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: PaX Team <pageexec@freemail.hu>
Link: http://lkml.kernel.org/r/1412456799-32339-1-git-send-email-minipli@googlemail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 3cd3a349aa3519b88d29845c0bc36bcbae158e93)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/posix-timers.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 5e76d22..f2335e8 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -578,6 +578,7 @@ SYSCALL_DEFINE3(timer_create, const clockid_t, which_clock,
 			goto out;
 		}
 	} else {
+		memset(&event.sigev_value, 0, sizeof(event.sigev_value));
 		event.sigev_notify = SIGEV_SIGNAL;
 		event.sigev_signo = SIGALRM;
 		event.sigev_value.sival_int = new_timer->it_id;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 47/48] hfsplus: fix B-tree corruption after insertion at position 0
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (45 preceding siblings ...)
  2015-05-15  8:06 ` [ 46/48] posix-timers: Fix stack info leak in timer_create() Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  2015-05-15  8:06 ` [ 48/48] sound/oss: fix deadlock in sequencer_ioctl(SNDCTL_SEQ_OUTOFBAND) Willy Tarreau
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable
  Cc: Joe Perches, Andrew Morton, Vyacheslav Dubeyko, Hin-Tak Leung,
	Anton Altaparmakov, Al Viro, Christoph Hellwig, Sergei Antonov,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Sergei Antonov <saproj@gmail.com>

commit 98cf21c61a7f5419d82f847c4d77bf6e96a76f5f upstream.

Fix B-tree corruption when a new record is inserted at position 0 in the node
in hfs_brec_insert(). In this case a hfs_brec_update_parent() is called to
update the parent index node (if exists) and it is passed hfs_find_data with
a search_key containing a newly inserted key instead of the key to be updated.
This results in an inconsistent index node. The bug reproduces on my machine
after an extents overflow record for the catalog file (CNID=4) is inserted into
the extents overflow B-tree. Because of a low (reserved) value of CNID=4, it
has to become the first record in the first leaf node.
The resulting first leaf node is correct:
----------------------------------------------------
| key0.CNID=4 | key1.CNID=123 | key2.CNID=456, ... |
----------------------------------------------------
But the parent index key0 still contains the previous key CNID=123:
-----------------------
| key0.CNID=123 | ... |
-----------------------

A change in hfs_brec_insert() makes hfs_brec_update_parent() work correctly
by preventing it from getting fd->record=-1 value from __hfs_brec_find().

Along the way, I removed duplicate code with unification of the if condition.
The resulting code is equivalent to the original code because node is never 0.

Also hfs_brec_update_parent() will now return an error after getting a negative
fd->record value. However, the return value of hfs_brec_update_parent() is not
checked anywhere in the file and I'm leaving it unchanged by this patch.
brec.c lacks error checking after some other calls too, but this issue is of
less importance than the one being fixed by this patch.

Cc: stable@vger.kernel.org
Cc: Joe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Anton Altaparmakov <aia21@cam.ac.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/hfsplus/brec.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/hfsplus/brec.c b/fs/hfsplus/brec.c
index c88e5d7..5bcf730 100644
--- a/fs/hfsplus/brec.c
+++ b/fs/hfsplus/brec.c
@@ -119,13 +119,16 @@ skip:
 	hfs_bnode_write(node, entry, data_off + key_len, entry_len);
 	hfs_bnode_dump(node);
 
-	if (new_node) {
-		/* update parent key if we inserted a key
-		 * at the start of the first node
-		 */
-		if (!rec && new_node != node)
-			hfs_brec_update_parent(fd);
+	/*
+	 * update parent key if we inserted a key
+	 * at the start of the node and it is not the new node
+	 */
+	if (!rec && new_node != node) {
+		hfs_bnode_read_key(node, fd->search_key, data_off + size);
+		hfs_brec_update_parent(fd);
+	}
 
+	if (new_node) {
 		hfs_bnode_put(fd->bnode);
 		if (!new_node->parent) {
 			hfs_btree_inc_height(tree);
@@ -154,9 +157,6 @@ skip:
 		goto again;
 	}
 
-	if (!rec)
-		hfs_brec_update_parent(fd);
-
 	return 0;
 }
 
@@ -341,6 +341,8 @@ again:
 	if (IS_ERR(parent))
 		return PTR_ERR(parent);
 	__hfs_brec_find(parent, fd);
+	if (fd->record < 0)
+		return -ENOENT;
 	hfs_bnode_dump(parent);
 	rec = fd->record;
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [ 48/48] sound/oss: fix deadlock in sequencer_ioctl(SNDCTL_SEQ_OUTOFBAND)
       [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
                   ` (46 preceding siblings ...)
  2015-05-15  8:06 ` [ 47/48] hfsplus: fix B-tree corruption after insertion at position 0 Willy Tarreau
@ 2015-05-15  8:06 ` Willy Tarreau
  47 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15  8:06 UTC (permalink / raw
  To: linux-kernel, stable; +Cc: Alexey Khoroshilov, Takashi Iwai, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Alexey Khoroshilov <khoroshilov@ispras.ru>

A deadlock can be initiated by userspace via ioctl(SNDCTL_SEQ_OUTOFBAND)
on /dev/sequencer with TMR_ECHO midi event.

In this case the control flow is:
sound_ioctl()
-> case SND_DEV_SEQ:
   case SND_DEV_SEQ2:
     sequencer_ioctl()
     -> case SNDCTL_SEQ_OUTOFBAND:
          spin_lock_irqsave(&lock,flags);
          play_event();
          -> case EV_TIMING:
               seq_timing_event()
               -> case TMR_ECHO:
                    seq_copy_to_input()
                    -> spin_lock_irqsave(&lock,flags);

It seems that spin_lock_irqsave() around play_event() is not necessary,
because the only other call location in seq_startplay() makes the call
without acquiring spinlock.

So, the patch just removes spinlocks around play_event().
By the way, it removes unreachable code in seq_timing_event(),
since (seq_mode == SEQ_2) case is handled in the beginning.

Compile tested only.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit bc26d4d06e337ade069f33d3f4377593b24e6e36)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/oss/sequencer.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/sound/oss/sequencer.c b/sound/oss/sequencer.c
index 5cb171d..7d32997 100644
--- a/sound/oss/sequencer.c
+++ b/sound/oss/sequencer.c
@@ -677,13 +677,8 @@ static int seq_timing_event(unsigned char *event_rec)
 			break;
 
 		case TMR_ECHO:
-			if (seq_mode == SEQ_2)
-				seq_copy_to_input(event_rec, 8);
-			else
-			{
-				parm = (parm << 8 | SEQ_ECHO);
-				seq_copy_to_input((unsigned char *) &parm, 4);
-			}
+			parm = (parm << 8 | SEQ_ECHO);
+			seq_copy_to_input((unsigned char *) &parm, 4);
 			break;
 
 		default:;
@@ -1326,7 +1321,6 @@ int sequencer_ioctl(int dev, struct file *file, unsigned int cmd, void __user *a
 	int mode = translate_mode(file);
 	struct synth_info inf;
 	struct seq_event_rec event_rec;
-	unsigned long flags;
 	int __user *p = arg;
 
 	orig_dev = dev = dev >> 4;
@@ -1481,9 +1475,7 @@ int sequencer_ioctl(int dev, struct file *file, unsigned int cmd, void __user *a
 		case SNDCTL_SEQ_OUTOFBAND:
 			if (copy_from_user(&event_rec, arg, sizeof(event_rec)))
 				return -EFAULT;
-			spin_lock_irqsave(&lock,flags);
 			play_event(event_rec.arr);
-			spin_unlock_irqrestore(&lock,flags);
 			return 0;
 
 		case SNDCTL_MIDI_INFO:
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  2015-05-15  8:05 ` [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES Willy Tarreau
@ 2015-05-15 12:32   ` Ben Hutchings
  2015-05-15 13:38     ` Willy Tarreau
  0 siblings, 1 reply; 61+ messages in thread
From: Ben Hutchings @ 2015-05-15 12:32 UTC (permalink / raw
  To: Willy Tarreau
  Cc: linux-kernel, stable, Andy Lutomirski, Andi Kleen, Linus Torvalds,
	Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 924 bytes --]

On Fri, 2015-05-15 at 10:05 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Andy Lutomirski <luto@amacapital.net>
> 
> commit f647d7c155f069c1a068030255c300663516420e upstream.
> 
> Otherwise, if buggy user code points DS or ES into the TLS
> array, they would be corrupted after a context switch.
> 
> This also significantly improves the comments and documents some
> gotchas in the code.
> 
> Before this patch, the both tests below failed.  With this
> patch, the es test passes, although the gsbase test still fails.
[...]

This depends on the changes to FPU/MMX/SSE state management that you
didn't apply to 2.6.32.  Note this comment:

	/* Must be after DS reload */
	unlazy_fpu(prev_p);

Ben.

-- 
Ben Hutchings
It is impossible to make anything foolproof because fools are so ingenious.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  2015-05-15 12:32   ` Ben Hutchings
@ 2015-05-15 13:38     ` Willy Tarreau
  2015-05-15 14:25       ` Ben Hutchings
  0 siblings, 1 reply; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15 13:38 UTC (permalink / raw
  To: Ben Hutchings
  Cc: linux-kernel, stable, Andy Lutomirski, Andi Kleen, Linus Torvalds,
	Ingo Molnar

Hi Ben,

On Fri, May 15, 2015 at 01:32:20PM +0100, Ben Hutchings wrote:
> On Fri, 2015-05-15 at 10:05 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Andy Lutomirski <luto@amacapital.net>
> > 
> > commit f647d7c155f069c1a068030255c300663516420e upstream.
> > 
> > Otherwise, if buggy user code points DS or ES into the TLS
> > array, they would be corrupted after a context switch.
> > 
> > This also significantly improves the comments and documents some
> > gotchas in the code.
> > 
> > Before this patch, the both tests below failed.  With this
> > patch, the es test passes, although the gsbase test still fails.
> [...]
> 
> This depends on the changes to FPU/MMX/SSE state management that you
> didn't apply to 2.6.32.  Note this comment:
> 
> 	/* Must be after DS reload */
> 	unlazy_fpu(prev_p);

Are you sure you're not confusing with another one ? When running
estest without this patch, I get "FAIL: ES corrupted 1000/1000 times"
while I get "OK: ES was preserved" once applied, so it does seem to
do what it's intended for.

Also I'm not seeing any reference to the comment above in the patch
nor around it, which leaves me confused :-/

Thanks,
Willy


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  2015-05-15 13:38     ` Willy Tarreau
@ 2015-05-15 14:25       ` Ben Hutchings
  2015-05-15 14:31         ` Ben Hutchings
                           ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Ben Hutchings @ 2015-05-15 14:25 UTC (permalink / raw
  To: Willy Tarreau
  Cc: linux-kernel, stable, Andy Lutomirski, Andi Kleen, Linus Torvalds,
	Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 1900 bytes --]

On Fri, 2015-05-15 at 15:38 +0200, Willy Tarreau wrote:
> Hi Ben,
> 
> On Fri, May 15, 2015 at 01:32:20PM +0100, Ben Hutchings wrote:
> > On Fri, 2015-05-15 at 10:05 +0200, Willy Tarreau wrote:
> > > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > > 
> > > ------------------
> > > 
> > > From: Andy Lutomirski <luto@amacapital.net>
> > > 
> > > commit f647d7c155f069c1a068030255c300663516420e upstream.
> > > 
> > > Otherwise, if buggy user code points DS or ES into the TLS
> > > array, they would be corrupted after a context switch.
> > > 
> > > This also significantly improves the comments and documents some
> > > gotchas in the code.
> > > 
> > > Before this patch, the both tests below failed.  With this
> > > patch, the es test passes, although the gsbase test still fails.
> > [...]
> > 
> > This depends on the changes to FPU/MMX/SSE state management that you
> > didn't apply to 2.6.32.  Note this comment:
> > 
> > 	/* Must be after DS reload */
> > 	unlazy_fpu(prev_p);
> 
> Are you sure you're not confusing with another one ? When running
> estest without this patch, I get "FAIL: ES corrupted 1000/1000 times"
> while I get "OK: ES was preserved" once applied, so it does seem to
> do what it's intended for.
>
> Also I'm not seeing any reference to the comment above in the patch
> nor around it, which leaves me confused :-/

v2.6.32.65:arch/x86/kernel/process_64.c:425:    /* Must be after DS reload */

If this comment is correct then the patch will cause a regression for
FPU state management.  The comment was introduced by:

commit 0a5ace2ab08d45cd78d7ef0067cdcd5c812ac54f
Author: Andi Kleen <ak@suse.de>
Date:   Thu Oct 5 18:47:22 2006 +0200

    [PATCH] x86-64: Fix FPU corruption

Ben.

-- 
Ben Hutchings
It is impossible to make anything foolproof because fools are so ingenious.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  2015-05-15 14:25       ` Ben Hutchings
@ 2015-05-15 14:31         ` Ben Hutchings
  2015-05-15 14:37         ` Willy Tarreau
  2015-05-15 15:53         ` Andi Kleen
  2 siblings, 0 replies; 61+ messages in thread
From: Ben Hutchings @ 2015-05-15 14:31 UTC (permalink / raw
  To: Willy Tarreau
  Cc: linux-kernel, stable, Andy Lutomirski, Andi Kleen, Linus Torvalds,
	Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 2574 bytes --]

On Fri, 2015-05-15 at 15:25 +0100, Ben Hutchings wrote:
> On Fri, 2015-05-15 at 15:38 +0200, Willy Tarreau wrote:
> > Hi Ben,
> > 
> > On Fri, May 15, 2015 at 01:32:20PM +0100, Ben Hutchings wrote:
> > > On Fri, 2015-05-15 at 10:05 +0200, Willy Tarreau wrote:
> > > > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > > > 
> > > > ------------------
> > > > 
> > > > From: Andy Lutomirski <luto@amacapital.net>
> > > > 
> > > > commit f647d7c155f069c1a068030255c300663516420e upstream.
> > > > 
> > > > Otherwise, if buggy user code points DS or ES into the TLS
> > > > array, they would be corrupted after a context switch.
> > > > 
> > > > This also significantly improves the comments and documents some
> > > > gotchas in the code.
> > > > 
> > > > Before this patch, the both tests below failed.  With this
> > > > patch, the es test passes, although the gsbase test still fails.
> > > [...]
> > > 
> > > This depends on the changes to FPU/MMX/SSE state management that you
> > > didn't apply to 2.6.32.  Note this comment:
> > > 
> > > 	/* Must be after DS reload */
> > > 	unlazy_fpu(prev_p);
> > 
> > Are you sure you're not confusing with another one ? When running
> > estest without this patch, I get "FAIL: ES corrupted 1000/1000 times"
> > while I get "OK: ES was preserved" once applied, so it does seem to
> > do what it's intended for.
> >
> > Also I'm not seeing any reference to the comment above in the patch
> > nor around it, which leaves me confused :-/
> 
> v2.6.32.65:arch/x86/kernel/process_64.c:425:    /* Must be after DS reload */
> 
> If this comment is correct then the patch will cause a regression for
> FPU state management.  The comment was introduced by:
> 
> commit 0a5ace2ab08d45cd78d7ef0067cdcd5c812ac54f
> Author: Andi Kleen <ak@suse.de>
> Date:   Thu Oct 5 18:47:22 2006 +0200
> 
>     [PATCH] x86-64: Fix FPU corruption

And that replaced a longer comment that said "the AMD workaround
requires it to be after DS reload".  The comment above clear_fpu_state()
says:

/* AMD CPUs don't save/restore FDP/FIP/FOP unless an exception
   is pending. Clear the x87 state here by setting it to fixed
   values. The kernel data segment can be sometimes 0 and sometimes
   new user value. Both should be ok.
   Use the PDA as safe address because it should be already in L1. */

Hopefully Andi can explain further if needed; I have no idea.

Ben.

-- 
Ben Hutchings
It is impossible to make anything foolproof because fools are so ingenious.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  2015-05-15 14:25       ` Ben Hutchings
  2015-05-15 14:31         ` Ben Hutchings
@ 2015-05-15 14:37         ` Willy Tarreau
  2015-05-15 15:53         ` Andi Kleen
  2 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15 14:37 UTC (permalink / raw
  To: Ben Hutchings
  Cc: linux-kernel, stable, Andy Lutomirski, Andi Kleen, Linus Torvalds,
	Ingo Molnar

On Fri, May 15, 2015 at 03:25:33PM +0100, Ben Hutchings wrote:
> On Fri, 2015-05-15 at 15:38 +0200, Willy Tarreau wrote:
> > Hi Ben,
> > 
> > On Fri, May 15, 2015 at 01:32:20PM +0100, Ben Hutchings wrote:
> > > On Fri, 2015-05-15 at 10:05 +0200, Willy Tarreau wrote:
> > > > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > > > 
> > > > ------------------
> > > > 
> > > > From: Andy Lutomirski <luto@amacapital.net>
> > > > 
> > > > commit f647d7c155f069c1a068030255c300663516420e upstream.
> > > > 
> > > > Otherwise, if buggy user code points DS or ES into the TLS
> > > > array, they would be corrupted after a context switch.
> > > > 
> > > > This also significantly improves the comments and documents some
> > > > gotchas in the code.
> > > > 
> > > > Before this patch, the both tests below failed.  With this
> > > > patch, the es test passes, although the gsbase test still fails.
> > > [...]
> > > 
> > > This depends on the changes to FPU/MMX/SSE state management that you
> > > didn't apply to 2.6.32.  Note this comment:
> > > 
> > > 	/* Must be after DS reload */
> > > 	unlazy_fpu(prev_p);
> > 
> > Are you sure you're not confusing with another one ? When running
> > estest without this patch, I get "FAIL: ES corrupted 1000/1000 times"
> > while I get "OK: ES was preserved" once applied, so it does seem to
> > do what it's intended for.
> >
> > Also I'm not seeing any reference to the comment above in the patch
> > nor around it, which leaves me confused :-/
> 
> v2.6.32.65:arch/x86/kernel/process_64.c:425:    /* Must be after DS reload */

Ah OK I missed it, thanks.

> If this comment is correct then the patch will cause a regression for
> FPU state management.  The comment was introduced by:
> 
> commit 0a5ace2ab08d45cd78d7ef0067cdcd5c812ac54f
> Author: Andi Kleen <ak@suse.de>
> Date:   Thu Oct 5 18:47:22 2006 +0200
> 
>     [PATCH] x86-64: Fix FPU corruption

Indeed! Andy, is there any practical case covered by your patch that
should motivate a safe way to backport it, or can we simply drop it
for 2.6.32 ? I don't intend to backport the FPU state management
series that late in the cycle just for this!

Thanks!
Willy


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  2015-05-15 14:25       ` Ben Hutchings
  2015-05-15 14:31         ` Ben Hutchings
  2015-05-15 14:37         ` Willy Tarreau
@ 2015-05-15 15:53         ` Andi Kleen
  2015-05-15 16:48           ` Willy Tarreau
  2015-05-15 20:53           ` Ben Hutchings
  2 siblings, 2 replies; 61+ messages in thread
From: Andi Kleen @ 2015-05-15 15:53 UTC (permalink / raw
  To: Ben Hutchings
  Cc: Willy Tarreau, linux-kernel, stable, Andy Lutomirski, Andi Kleen,
	Linus Torvalds, Ingo Molnar

> 
> v2.6.32.65:arch/x86/kernel/process_64.c:425:    /* Must be after DS reload */
> 
> If this comment is correct then the patch will cause a regression for
> FPU state management.  The comment was introduced by:

Yes I already stated before that these super-risky patches
are not stable material.

-Andi

> 
> commit 0a5ace2ab08d45cd78d7ef0067cdcd5c812ac54f
> Author: Andi Kleen <ak@suse.de>
> Date:   Thu Oct 5 18:47:22 2006 +0200
> 
>     [PATCH] x86-64: Fix FPU corruption
> 
> Ben.
> 
> -- 
> Ben Hutchings
> It is impossible to make anything foolproof because fools are so ingenious.



-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  2015-05-15 15:53         ` Andi Kleen
@ 2015-05-15 16:48           ` Willy Tarreau
  2015-05-15 20:53           ` Ben Hutchings
  1 sibling, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-15 16:48 UTC (permalink / raw
  To: Andi Kleen
  Cc: Ben Hutchings, linux-kernel, stable, Andy Lutomirski,
	Linus Torvalds, Ingo Molnar

On Fri, May 15, 2015 at 05:53:34PM +0200, Andi Kleen wrote:
> > 
> > v2.6.32.65:arch/x86/kernel/process_64.c:425:    /* Must be after DS reload */
> > 
> > If this comment is correct then the patch will cause a regression for
> > FPU state management.  The comment was introduced by:
> 
> Yes I already stated before that these super-risky patches
> are not stable material.

OK thank you Andi, I'm dropping it from -32 then.

Willy


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  2015-05-15 15:53         ` Andi Kleen
  2015-05-15 16:48           ` Willy Tarreau
@ 2015-05-15 20:53           ` Ben Hutchings
  2015-05-15 22:15             ` Andi Kleen
  1 sibling, 1 reply; 61+ messages in thread
From: Ben Hutchings @ 2015-05-15 20:53 UTC (permalink / raw
  To: Andi Kleen
  Cc: Willy Tarreau, linux-kernel, stable, Andy Lutomirski,
	Linus Torvalds, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 538 bytes --]

On Fri, 2015-05-15 at 17:53 +0200, Andi Kleen wrote:
> > 
> > v2.6.32.65:arch/x86/kernel/process_64.c:425:    /* Must be after DS reload */
> > 
> > If this comment is correct then the patch will cause a regression for
> > FPU state management.  The comment was introduced by:
> 
> Yes I already stated before that these super-risky patches
> are not stable material.
[...]

Which other patches do you include in that?

Ben.

-- 
Ben Hutchings
It is impossible to make anything foolproof because fools are so ingenious.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [ 09/48] x86_64, vdso: Fix the vdso address randomization algorithm
  2015-05-15  8:05 ` [ 09/48] x86_64, vdso: Fix the vdso address randomization algorithm Willy Tarreau
@ 2015-05-15 21:02   ` Ben Hutchings
  0 siblings, 0 replies; 61+ messages in thread
From: Ben Hutchings @ 2015-05-15 21:02 UTC (permalink / raw
  To: Willy Tarreau; +Cc: linux-kernel, stable, Kees Cook, Andy Lutomirski

[-- Attachment #1: Type: text/plain, Size: 2565 bytes --]

On Fri, 2015-05-15 at 10:05 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Andy Lutomirski <luto@amacapital.net>
> 
> commit 394f56fe480140877304d342dec46d50dc823d46 upstream
> 
> The theory behind vdso randomization is that it's mapped at a random
> offset above the top of the stack.  To avoid wasting a page of
> memory for an extra page table, the vdso isn't supposed to extend
> past the lowest PMD into which it can fit.  Other than that, the
> address should be a uniformly distributed address that meets all of
> the alignment requirements.
> 
> The current algorithm is buggy: the vdso has about a 50% probability
> of being at the very end of a PMD.  The current algorithm also has a
> decent chance of failing outright due to incorrect handling of the
> case where the top of the stack is near the top of its PMD.
> 
> This fixes the implementation.  The paxtest estimate of vdso
> "randomisation" improves from 11 bits to 18 bits.  (Disclaimer: I
> don't know what the paxtest code is actually calculating.)
> 
> It's worth noting that this algorithm is inherently biased: the vdso
> is more likely to end up near the end of its PMD than near the
> beginning.  Ideally we would either nix the PMD sharing requirement
> or jointly randomize the vdso and the stack to reduce the bias.
> 
> In the mean time, this is a considerable improvement with basically
> no risk of compatibility issues, since the allowed outputs of the
> algorithm are unchanged.
> 
> As an easy test, doing this:
> 
> for i in `seq 10000`
>   do grep -P vdso /proc/self/maps |cut -d- -f1
> done |sort |uniq -d
> 
> used to produce lots of output (1445 lines on my most recent run).
> A tiny subset looks like this:
> 
> 7fffdfffe000
> 7fffe01fe000
> 7fffe05fe000
> 7fffe07fe000
> 7fffe09fe000
> 7fffe0bfe000
> 7fffe0dfe000
> 
> Note the suspicious fe000 endings.  With the fix, I get a much more
> palatable 76 repeated addresses.
> 
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Cc: stable@vger.kernel.org
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> [bwh: Backported to 2.6.32:
>  - The whole file is only built for x86_64; adjust context and comment for this
>  - We don't have align_vdso_addr()]

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

> Signed-off-by: Willy Tarreau <w@1wt.eu>
[...]

-- 
Ben Hutchings
It is impossible to make anything foolproof because fools are so ingenious.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [ 16/48] netfilter: conntrack: disable generic tracking for known protocols
  2015-05-15  8:05 ` [ 16/48] netfilter: conntrack: disable generic tracking for known protocols Willy Tarreau
@ 2015-05-15 21:05   ` Ben Hutchings
  0 siblings, 0 replies; 61+ messages in thread
From: Ben Hutchings @ 2015-05-15 21:05 UTC (permalink / raw
  To: Willy Tarreau
  Cc: linux-kernel, stable, Florian Westphal, Daniel Borkmann,
	Jozsef Kadlecsik, Pablo Neira Ayuso

[-- Attachment #1: Type: text/plain, Size: 2064 bytes --]

On Fri, 2015-05-15 at 10:05 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Florian Westphal <fw@strlen.de>
> 
> commit db29a9508a9246e77087c5531e45b2c88ec6988b upstream
> 
> Given following iptables ruleset:
> 
> -P FORWARD DROP
> -A FORWARD -m sctp --dport 9 -j ACCEPT
> -A FORWARD -p tcp --dport 80 -j ACCEPT
> -A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT
> 
> One would assume that this allows SCTP on port 9 and TCP on port 80.
> Unfortunately, if the SCTP conntrack module is not loaded, this allows
> *all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
> which we think is a security issue.
> 
> This is because on the first SCTP packet on port 9, we create a dummy
> "generic l4" conntrack entry without any port information (since
> conntrack doesn't know how to extract this information).
> 
> All subsequent packets that are unknown will then be in established
> state since they will fallback to proto_generic and will match the
> 'generic' entry.
> 
> Our originally proposed version [1] completely disabled generic protocol
> tracking, but Jozsef suggests to not track protocols for which a more
> suitable helper is available, hence we now mitigate the issue for in
> tree known ct protocol helpers only, so that at least NAT and direction
> information will still be preserved for others.
> 
>  [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html
> 
> Joint work with Daniel Borkmann.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> [bwh: Backported to 2.6.32: adjust context]

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

> Signed-off-by: Willy Tarreau <w@1wt.eu>
[...]

-- 
Ben Hutchings
It is impossible to make anything foolproof because fools are so ingenious.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [ 26/48] net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland.
  2015-05-15  8:05 ` [ 26/48] net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland Willy Tarreau
@ 2015-05-15 21:08   ` Ben Hutchings
  2015-05-16  5:31     ` Willy Tarreau
  0 siblings, 1 reply; 61+ messages in thread
From: Ben Hutchings @ 2015-05-15 21:08 UTC (permalink / raw
  To: Willy Tarreau; +Cc: linux-kernel, stable, Ani Sinha, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]

On Fri, 2015-05-15 at 10:05 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Ani Sinha <ani@arista.com>
> 
> commit 6a2a2b3ae0759843b22c929881cc184b00cc63ff upstream.
> 
> Linux manpage for recvmsg and sendmsg calls does not explicitly mention setting msg_namelen to 0 when
> msg_name passed set as NULL. When developers don't set msg_namelen member in msghdr, it might contain garbage
> value which will fail the validation check and sendmsg and recvmsg calls from kernel will return EINVAL. This will
> break old binaries and any code for which there is no access to source code.
> To fix this, we set msg_namelen to 0 when msg_name is passed as NULL from userland.
[...]

I think you'll also want this related fix:

commit 91edd096e224941131f896b86838b1e59553696a
Author: Catalin Marinas <catalin.marinas@arm.com>
Date:   Fri Mar 20 16:48:13 2015 +0000

    net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour

Ben.

-- 
Ben Hutchings
It is impossible to make anything foolproof because fools are so ingenious.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  2015-05-15 20:53           ` Ben Hutchings
@ 2015-05-15 22:15             ` Andi Kleen
  0 siblings, 0 replies; 61+ messages in thread
From: Andi Kleen @ 2015-05-15 22:15 UTC (permalink / raw
  To: Ben Hutchings
  Cc: Andi Kleen, Willy Tarreau, linux-kernel, stable, Andy Lutomirski,
	Linus Torvalds, Ingo Molnar

On Fri, May 15, 2015 at 09:53:43PM +0100, Ben Hutchings wrote:
> On Fri, 2015-05-15 at 17:53 +0200, Andi Kleen wrote:
> > > 
> > > v2.6.32.65:arch/x86/kernel/process_64.c:425:    /* Must be after DS reload */
> > > 
> > > If this comment is correct then the patch will cause a regression for
> > > FPU state management.  The comment was introduced by:
> > 
> > Yes I already stated before that these super-risky patches
> > are not stable material.
> [...]
> 
> Which other patches do you include in that?

Sorry I misspoke. Currently it's only that one.

-Andi

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [ 26/48] net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland.
  2015-05-15 21:08   ` Ben Hutchings
@ 2015-05-16  5:31     ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-05-16  5:31 UTC (permalink / raw
  To: Ben Hutchings; +Cc: linux-kernel, stable, Ani Sinha, David S. Miller

Hi Ben,

On Fri, May 15, 2015 at 10:08:22PM +0100, Ben Hutchings wrote:
> I think you'll also want this related fix:
> 
> commit 91edd096e224941131f896b86838b1e59553696a
> Author: Catalin Marinas <catalin.marinas@arm.com>
> Date:   Fri Mar 20 16:48:13 2015 +0000
> 
>     net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour

Ah good catch, I missed it. Now merged and tested.
BTW, I've added your s-o-b on the two other patches.

Thanks!
Willy


^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2015-05-16  5:31 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
2015-05-15  8:05 ` [ 00/48] 2.6.32.66-longterm review Willy Tarreau
2015-05-15  8:05 ` [ 01/48] x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs Willy Tarreau
2015-05-15  8:05 ` [ 02/48] x86/tls: Validate TLS entries to protect espfix Willy Tarreau
2015-05-15  8:05 ` [ 03/48] x86, tls, ldt: Stop checking lm in LDT_empty Willy Tarreau
2015-05-15  8:05 ` [ 04/48] x86, tls: Interpret an all-zero struct user_desc as "no segment" Willy Tarreau
2015-05-15  8:05 ` [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES Willy Tarreau
2015-05-15 12:32   ` Ben Hutchings
2015-05-15 13:38     ` Willy Tarreau
2015-05-15 14:25       ` Ben Hutchings
2015-05-15 14:31         ` Ben Hutchings
2015-05-15 14:37         ` Willy Tarreau
2015-05-15 15:53         ` Andi Kleen
2015-05-15 16:48           ` Willy Tarreau
2015-05-15 20:53           ` Ben Hutchings
2015-05-15 22:15             ` Andi Kleen
2015-05-15  8:05 ` [ 06/48] x86/tls: Disallow unusual TLS segments Willy Tarreau
2015-05-15  8:05 ` [ 07/48] x86/tls: Dont validate lm in set_thread_area() after all Willy Tarreau
2015-05-15  8:05 ` [ 08/48] x86, kvm: Clear paravirt_enabled on KVM guests for espfix32s benefit Willy Tarreau
2015-05-15  8:05 ` [ 09/48] x86_64, vdso: Fix the vdso address randomization algorithm Willy Tarreau
2015-05-15 21:02   ` Ben Hutchings
2015-05-15  8:05 ` [ 10/48] ASLR: fix stack randomization on 64-bit systems Willy Tarreau
2015-05-15  8:05 ` [ 11/48] x86, cpu, amd: Add workaround for family 16h, erratum 793 Willy Tarreau
2015-05-15  8:05 ` [ 12/48] x86/asm/entry/64: Remove a bogus ret_from_fork optimization Willy Tarreau
2015-05-15  8:05 ` [ 13/48] x86: Conditionally update time when ack-ing pending irqs Willy Tarreau
2015-05-15  8:05 ` [ 14/48] serial: samsung: wait for transfer completion before clock disable Willy Tarreau
2015-05-15  8:05 ` [ 15/48] splice: Apply generic position and size checks to each write Willy Tarreau
2015-05-15  8:05 ` [ 16/48] netfilter: conntrack: disable generic tracking for known protocols Willy Tarreau
2015-05-15 21:05   ` Ben Hutchings
2015-05-15  8:05 ` [ 17/48] isofs: Fix infinite looping over CE entries Willy Tarreau
2015-05-15  8:05 ` [ 18/48] isofs: Fix unchecked printing of ER records Willy Tarreau
2015-05-15  8:05 ` [ 19/48] net: sctp: fix memory leak in auth key management Willy Tarreau
2015-05-15  8:05 ` [ 20/48] net: sctp: fix slab corruption from use after free on INIT collisions Willy Tarreau
2015-05-15  8:05 ` [ 21/48] IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic Willy Tarreau
2015-05-15  8:05 ` [ 22/48] net: llc: use correct size for sysctl timeout entries Willy Tarreau
2015-05-15  8:05 ` [ 23/48] net: rds: use correct size for max unacked packets and bytes Willy Tarreau
2015-05-15  8:05 ` [ 24/48] ipv6: Dont reduce hop limit for an interface Willy Tarreau
2015-05-15  8:05 ` [ 25/48] fs: take i_mutex during prepare_binprm for set[ug]id executables Willy Tarreau
2015-05-15  8:05 ` [ 26/48] net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland Willy Tarreau
2015-05-15 21:08   ` Ben Hutchings
2015-05-16  5:31     ` Willy Tarreau
2015-05-15  8:05 ` [ 27/48] ppp: deflate: never return len larger than output buffer Willy Tarreau
2015-05-15  8:05 ` [ 29/48] net: reject creation of netdev names with colons Willy Tarreau
2015-05-15  8:06 ` [ 30/48] ipv4: Dont use ufo handling on later transformed packets Willy Tarreau
2015-05-15  8:06 ` [ 31/48] udp: only allow UFO for packets from SOCK_DGRAM sockets Willy Tarreau
2015-05-15  8:06 ` [ 32/48] net: avoid to hang up on sending due to sysctl configuration overflow Willy Tarreau
2015-05-15  8:06 ` [ 33/48] net: sysctl_net_core: check SNDBUF and RCVBUF for min length Willy Tarreau
2015-05-15  8:06 ` [ 34/48] rds: avoid potential stack overflow Willy Tarreau
2015-05-15  8:06 ` [ 35/48] rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg() Willy Tarreau
2015-05-15  8:06 ` [ 36/48] tcp: make connect() mem charging friendly Willy Tarreau
2015-05-15  8:06 ` [ 37/48] ip_forward: Drop frames with attached skb->sk Willy Tarreau
2015-05-15  8:06 ` [ 38/48] tcp: avoid looping in tcp_send_fin() Willy Tarreau
2015-05-15  8:06 ` [ 39/48] spi: spidev: fix possible arithmetic overflow for multi-transfer message Willy Tarreau
2015-05-15  8:06 ` [ 40/48] IB/core: Avoid leakage from kernel to user space Willy Tarreau
2015-05-15  8:06 ` [ 41/48] ipvs: uninitialized data with IP_VS_IPV6 Willy Tarreau
2015-05-15  8:06 ` [ 42/48] ipv4: fix nexthop attlen check in fib_nh_match Willy Tarreau
2015-05-15  8:06 ` [ 43/48] pagemap: do not leak physical addresses to non-privileged userspace Willy Tarreau
2015-05-15  8:06 ` [ 44/48] lockd: Try to reconnect if statd has moved Willy Tarreau
2015-05-15  8:06 ` [ 45/48] scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND Willy Tarreau
2015-05-15  8:06 ` [ 46/48] posix-timers: Fix stack info leak in timer_create() Willy Tarreau
2015-05-15  8:06 ` [ 47/48] hfsplus: fix B-tree corruption after insertion at position 0 Willy Tarreau
2015-05-15  8:06 ` [ 48/48] sound/oss: fix deadlock in sequencer_ioctl(SNDCTL_SEQ_OUTOFBAND) Willy Tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).