* weird /proc/net/unix issue on CentOS 5.7 / 2.6.18-274.7.1.el5
@ 2012-06-21  0:44 Eric Wong
  2012-06-21  1:13 ` Eric Wong
From: Eric Wong @ 2012-06-21  0:44 UTC
  To: raindrops

Hey all, I encountered a strange bug on CentOS 5.7 servers running the
2.6.18-274.7.1.el5 kernel.  I'm not sure if/which newer versions fix
this and will report back if/when I find this.  I can't reproduce the
issue on a vanilla 3.4.2 Linux kernel nor on older CentOS 5.4 machines.

(Pointers to repositories appreciated, RH doesn't it seem to make it
easy to find their kernel git repositories (if they're public at all))

The regression is caused by attempting to read unix listener stats.
Here's the relevant strace output:

 open("/proc/net/unix", O_RDONLY) = 9
 fstat(9, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
 fstat(9, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
 ioctl(9, SNDCTL_TMR_TIMEBASE or TCGETS, 0x40d2a6a0) = -1 ENOTTY (Inappropriate ioctl for device)
 fstat(9, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
 lseek(9, 0, SEEK_CUR)       = 0
 read(9, "Num       RefCount Protocol Flag"..., 8192) = 4023
 ppoll([{fd=9, events=POLLIN}], 1, NULL, NULL, 8
 /* hangs forever */

This is on Ruby 1.9.3-p194 on x86_64

I've also tried different versions of Ruby and forcing select()
(instead of ppoll()):
  ruby -e 'IO.select([File.open("/proc/net/unix")],nil,nil,0.1)

Ruby returned nil after timing out with select, too.

It's arguable Ruby is being dumb about calling ppoll() (or select()) on
a file in /proc/, especially since we haven't hit EAGAIN, but really,
select/ppoll/poll/pselect/epoll_wait should all return immediately on
"regular" files

2012-06-21  0:44 weird /proc/net/unix issue on CentOS 5.7 / 2.6.18-274.7.1.el5 Eric Wong
2012-06-21  1:13 ` Eric Wong
2012-06-23  3:10   ` Eric Wong

