Here are a few things to try which should help at best to solve your issue, at worst to figure out what it "isn't". In some cases you may want to combine the steps (e.g. strace and 'try with cleared environment').
Ulimit
Check to see if you have any unusually low limits set for number of allowed processes in your shell or pipeline maximum size with the following command: ulimit -a
If you can, append the output of that command to your question.
Logging
On older versions of bash pipelines could break due to logging functions being enabled (bash < 4.1).
type log
That should return something like 'log: not found'. If instead it returns a function definition, clear it out with the command unset log
.
Debug Trap
trap -p
See if any traps are output that are linked to DEBUG or logging. If they are and/or a log function is defined, you need to find out where they are defined and (at least temporarily) remove them.
They could be defined in .bashrc, .bash_profile and any other related initialization files. Since it appears to impacting root as well, it would more likely be found in a system level file like /etc/bashrc or /etc/profile.
At the very least you can clear the trap and log function from your current environment and see if it resolves the issue.
Try with cleared environment
Another method to check this is by running the piped commands using (fixed)
env -i ls | env -i grep a | env -i grep b | env -i grep c | env -i grep d
to clear the environment (for that command sequence). You may need to change your commands to include full paths. It would be worthwhile to see if the values from ulimit -a
are different in this enviroment, also.
Bash debug output
Before running your piped cmd sequence, type set -x
on the command line, which will turn on bash debugging - all 'behind the scenes' commands will be printed to the screen. It's possible you may see something odd - a hook to another function being called similar to the log issue discussed above - or other oddity.
Strace
Run the command with strace:
strace ls | grep a | grep b | grep c | grep d
and see what exactly is going on. If you want to post these results you'd probably need to put them on pastebin or similar site and post a link. This is the most likely approach to resolve the issue, but the output can be hard to decode.
Update
After reviewing your logs:
When using the env -i each stage of the pipe needs to use it - each stage is effectively a separate shell instance. My mistake.
env -i ls | env -i grep a | env -i grep b | env -i grep c | env -i grep d
The logging function that is called between each call combined with the DEBUG trap is almost definitely the bug I was referring to. Unfortunately the bug is not available for viewing even with my RHEL subscription. It is https://bugzilla.redhat.com/show_bug.cgi?id=720464
This bug resulted in a race condition when logging occurred in conjunction with debug traps, which is exactly what you have going on - the set -x clearly shows the fairly extensive logging (to syslog) of every command that is issued.
Because a pipe creates sub shells you can't just clear it in the top level shell and issue piped commands. The next piped stage will have it defined. Retesting with the change in item 1 above will show that it does work without these hooks.
The bug report indicates no back port of the fix. I've put some details from rhel here: http://pastebin.com/dymenY7e
You need to clear the trap and or remove the definition of the logging function history_to_syslog If you have root access you can definitely remove this permanently. I gave some tips in my original answer on where to look.
You could try checking for an update to bash for centos 5, but the info I linked above stated no back port to rhel 5 was created so it's unlikely one was for centos 5.
Brief update:
To clarify the tie between the bug and the failure mode a bit - what happens is that calls to interact with process ids associated with the logging function and DEBUG hook occur out of sequence - the race condition - resulting in calls such as getppid that reference processes that have just been closed, resulting in the error that you see.
On a side note- that is an aggressive logging capability. The sysadmin clearly doesn't believe in the circle of trust.