Apache 2.2.21 / PHP 5.3.8: LIBC Panic continues, and MALLOC FAILED shown via stdout

82 views
Skip to first unread message

Lewis G Rosenthal

unread,
Oct 7, 2011, 1:01:55 AM10/7/11
to Apache2 Mailing List
Greetings...

After the upgrade, and some pain re-scripting my php.ini for 5.3 (my 5.2
php.ini just wouldn't get PHP to start), and after a couple days of
uptime, in the vio window, I'm seeing:

MALLOC FAILED

I can't find anything related to this in PHP's error log (mainly, a
bunch of warnings and deprecated statements, for one site in particular
which is indeed poorly scripted). Apache's error log shows a high number
of our old friend:

LIBC PANIC!!
fmutex deadlock: Owner died!
0x0026013c: Owner=0x17020008 Self=0x17020001 fs=0x3 flags=0x0
hev=0x00010012
Desc="LIBC Heap"
pid=0x1702 ppid=0x15c5 tid=0x0001 slot=0x012c pri=0x0200 mc=0x0000
J:\APPS\APACHE2\BIN\HTTPD.EXE
Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.

But then, after the last one of these, I have the following:

[Thu Oct 06 09:33:08 2011] [error] (OS 343)OS/2 error 343: unable to
open work queue, exiting
[Thu Oct 06 12:10:06 2011] [error] caught exception
(XCPT_ACCESS_VIOLATION) in worker thread, initiating child shutdown
pid=6036
[Thu Oct 06 12:10:06 2011] [error] caught exception in worker
thread, initiating child shutdown pid=6036
[Thu Oct 06 13:09:42 2011] [error] (OS 343)OS/2 error 343: unable to
open work queue, exiting
[Thu Oct 06 13:51:37 2011] [error] (OS 343)OS/2 error 343: unable to
open work queue, exiting
[Thu Oct 06 14:48:32 2011] [error] (OS 343)OS/2 error 343: unable to
open work queue, exiting
[Thu Oct 06 21:54:12 2011] [error] (OS 343)OS/2 error 343: unable to
open work queue, exiting

This seems to point to mpmt_os2 (from
http://svn.apache.org/repos/asf/httpd/sandbox/replacelimit/server/mpm/mpmt_os2/mpmt_os2_child.c
):

static void worker_main(void *vpArg)
{
long conn_id;
conn_rec *current_conn;
apr_pool_t *pconn;
apr_allocator_t *allocator;
apr_bucket_alloc_t *bucket_alloc;
worker_args_t *worker_args;
HQUEUE workq;
PID owner;
int rc;
REQUESTDATA rd;
ULONG len;
BYTE priority;
int thread_slot = (int)vpArg;
EXCEPTIONREGISTRATIONRECORD reg_rec = { NULL, thread_exception_handler };
ap_sb_handle_t *sbh;

/* Trap exceptions in this thread so we don't take down the whole process */
DosSetExceptionHandler(&reg_rec );

rc = DosOpenQueue(&owner,&workq,
apr_psprintf(pchild, "/queues/httpd/work.%d", getpid()));

if (rc) {
ap_log_error(APLOG_MARK, APLOG_ERR, APR_FROM_OS_ERROR(rc), ap_server_conf,
"unable to open work queue, exiting");
ap_scoreboard_image->servers[child_slot][thread_slot].tid = 0;
}

conn_id = ID_FROM_CHILD_THREAD(child_slot, thread_slot);
ap_update_child_status_from_indexes(child_slot, thread_slot, SERVER_READY,
NULL);

apr_allocator_create(&allocator);
apr_allocator_max_free_set(allocator, ap_max_mem_free);
bucket_alloc = apr_bucket_alloc_create_ex(allocator);

while (rc = DosReadQueue(workq,&rd,&len, (PPVOID)&worker_args, 0, DCWW_WAIT,&priority, NULLHANDLE),
rc == 0&& rd.ulData != WORKTYPE_EXIT) {
pconn = worker_args->pconn;
ap_create_sb_handle(&sbh, pconn, child_slot, thread_slot);
current_conn = ap_run_create_connection(pconn, ap_server_conf,
worker_args->conn_sd, conn_id,
sbh, bucket_alloc);

if (current_conn) {
ap_process_connection(current_conn, worker_args->conn_sd);
ap_lingering_close(current_conn);
}

apr_pool_destroy(pconn);
ap_update_child_status_from_indexes(child_slot, thread_slot,
SERVER_READY, NULL);
}

ap_update_child_status_from_indexes(child_slot, thread_slot, SERVER_DEAD,
NULL);

apr_bucket_alloc_destroy(bucket_alloc);
apr_allocator_destroy(allocator);
}

though I have no idea what the heck a 343 error is (surely not listed in
http://www.edm2.com/os2api/Define/DosErrorCodes.html ).

So, looking at conf\extra\httpd-mpm.conf, I have (as inherited from 2.2.15):

<IfModule mpm_mpmt_os2_module>
ThreadStackSize 2097152
StartServers 2
MinSpareThreads 5
MaxSpareThreads 10
MaxRequestsPerChild 0
</IfModule>

ThreadStackSize too big or too small, perhaps, in relation to the number
of spare threads? Or...not enough difference between min & max spare
threads? And where, exactly, is /queues/httpd/ supposed to reside?

Grepping through my site logs for a timestamp which coincides with the
above 343 errors, I can't find anything.

Thoughts?

--
Lewis
-------------------------------------------------------------
Lewis G Rosenthal, CNA, CLP, CLE, CWTS
Rosenthal& Rosenthal, LLC www.2rosenthals.com
Need a managed Wi-Fi hotspot? www.hautspot.com
visit my IT blog www.2rosenthals.net/wordpress
please do not add my address to any non-bcc mass mailings
-------------------------------------------------------------

Steven Levine

unread,
Oct 8, 2011, 4:05:32 PM10/8/11
to apa...@googlegroups.com
In <4E8E87C3...@2rosenthals.com>, on 10/07/11
at 01:01 AM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi,

>uptime, in the vio window, I'm seeing:

>MALLOC FAILED

I can't say where this is comming from based on a text search of the
sources I have here.

Based on what you say below, you probably have a memory leak that is
exhausting the heap.

>which is indeed poorly scripted). Apache's error log shows a high number
>of our old friend:

> LIBC PANIC!!
> fmutex deadlock: Owner died!
> 0x0026013c: Owner=0x17020008 Self=0x17020001 fs=0x3 flags=0x0
> hev=0x00010012
> Desc="LIBC Heap"
> pid=0x1702 ppid=0x15c5 tid=0x0001 slot=0x012c pri=0x0200 mc=0x0000
> J:\APPS\APACHE2\BIN\HTTPD.EXE
> Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.

You should be seeing httpd restarts after this. It's a side effect of
some thread exiting unexpectedly.

>But then, after the last one of these, I have the following:

> [Thu Oct 06 09:33:08 2011] [error] (OS 343)OS/2 error 343: unable to
> open work queue, exiting

This means the thread that created the worker thread that is reporting
this failure has died and the queue that it created no longer exists.

See server\mpm\mpmt_os2\mpmt_os2_child.c:161

> [Thu Oct 06 12:10:06 2011] [error] caught exception
> (XCPT_ACCESS_VIOLATION) in worker thread, initiating child shutdown
> pid=6036

This says that the recovery logic is not quite right.

Yes, it does, but you are just seeing side effects of a prior failure.

>though I have no idea what the heck a 343 error is (surely not listed in
>http://www.edm2.com/os2api/Define/DosErrorCodes.html ).

You should use something less antique. I recommend H\BSEERR.H from your
OS/2 toolkit installation.

#define ERROR_QUE_NAME_NOT_EXIST 343 /* MSG%none */

There's also the Control Program Guide and Reference (cpref.inf).

>Thoughts?

I would recommend locating where the MALLOC FAILED message is comming
from. This might give a a better idea of where to start looking.

You might actually have two separater failures, but it is too soon to say.

Steven

--
----------------------------------------------------------------------
"Steven Levine" <ste...@earthlink.net> eCS/Warp/DIY etc.
www.scoug.com www.ecomstation.com
----------------------------------------------------------------------

Lewis G Rosenthal

unread,
Oct 9, 2011, 1:55:43 AM10/9/11
to apa...@googlegroups.com
Hey, there, buddy...

On 10/08/11 04:05 pm, Steven Levine thus wrote :


> In<4E8E87C3...@2rosenthals.com>, on 10/07/11
> at 01:01 AM, Lewis G Rosenthal<lgros...@2rosenthals.com> said:
>
> Hi,
>
>
>> uptime, in the vio window, I'm seeing:
>>
>
>> MALLOC FAILED
>>
> I can't say where this is comming from based on a text search of the
> sources I have here.
>
> Based on what you say below, you probably have a memory leak that is
> exhausting the heap.
>
>

Sounds reasonable.


>> which is indeed poorly scripted). Apache's error log shows a high number
>> of our old friend:
>>
>
>> LIBC PANIC!!
>> fmutex deadlock: Owner died!
>> 0x0026013c: Owner=0x17020008 Self=0x17020001 fs=0x3 flags=0x0
>> hev=0x00010012
>> Desc="LIBC Heap"
>> pid=0x1702 ppid=0x15c5 tid=0x0001 slot=0x012c pri=0x0200 mc=0x0000
>> J:\APPS\APACHE2\BIN\HTTPD.EXE
>> Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.
>>
> You should be seeing httpd restarts after this. It's a side effect of
> some thread exiting unexpectedly.
>
>

Not in the vio, however, POPUPLOG.OS2 says:

10-09-2011 01:37:48 SYS3175 PID 1c5e TID 0006 Slot 0124


J:\APPS\APACHE2\BIN\HTTPD.EXE

c0000005
1ffb1b65
P1=00000001 P2=00000b14 P3=XXXXXXXX P4=XXXXXXXX
EAX=00000b14 EBX=20035d00 ECX=0235ffd4 EDX=0235ffd4
ESI=0235ffd4 EDI=00000006
DS=0053 DSACC=f0f3 DSLIM=ffffffff
ES=0053 ESACC=f0f3 ESLIM=ffffffff
FS=150b FSACC=00f3 FSLIM=00000030
GS=0000 GSACC=**** GSLIM=********
CS:EIP=005b:1ffb1b65 CSACC=f0df CSLIM=ffffffff
SS:ESP=0053:0235ff94 SSACC=f0f3 SSLIM=ffffffff
EBP=0235ff94 FLG=00010213

DOSCALL1.DLL 0002:00001b65

(I see several of these from this evening, and I've been working pretty
heavily in Wordpress - PHP - the past couple of hours.)

>> But then, after the last one of these, I have the following:
>>
>
>> [Thu Oct 06 09:33:08 2011] [error] (OS 343)OS/2 error 343: unable to
>> open work queue, exiting
>>
> This means the thread that created the worker thread that is reporting
> this failure has died and the queue that it created no longer exists.
>
> See server\mpm\mpmt_os2\mpmt_os2_child.c:161
>
>

Okay. Thanks for the pointer.


>> [Thu Oct 06 12:10:06 2011] [error] caught exception
>> (XCPT_ACCESS_VIOLATION) in worker thread, initiating child shutdown
>> pid=6036
>>
> This says that the recovery logic is not quite right.
>
>

Okay.


>> This seems to point to mpmt_os2 (from
>> http://svn.apache.org/repos/asf/httpd/sandbox/replacelimit/server/mpm/mpmt_os2/mpmt_os2_child.c
>>
> Yes, it does, but you are just seeing side effects of a prior failure.
>
>

Right. I see that (now), based on your previous comments.


>> though I have no idea what the heck a 343 error is (surely not listed in
>> http://www.edm2.com/os2api/Define/DosErrorCodes.html ).
>>
> You should use something less antique. I recommend H\BSEERR.H from your
> OS/2 toolkit installation.
>
> #define ERROR_QUE_NAME_NOT_EXIST 343 /* MSG%none */
>
> There's also the Control Program Guide and Reference (cpref.inf).
>
>

Thanks for the tip. Sometimes, lots of bandwidth can be one's own worst
enemy, as it is easy to fall prey to the "just Google it" phenomenon... ;-)


>> Thoughts?
>>
> I would recommend locating where the MALLOC FAILED message is comming
> from. This might give a a better idea of where to start looking.
>
> You might actually have two separater failures, but it is too soon to say.
>

Hmmm...

Of course, the obvious (to me) suspect is PHP. The error log is pretty
fat as it is, though I may need to crank up the verbosity a bit just to
get this (there is that one really noisy site - I even fixed a couple of
their coding errors as I just couldn't stand the cruft in the log anymore!).

A possible side effect (or cause - hard to tell - is the oft-reported
"DB function failed with error number 2006 - MySQL server has gone away"
message which I was not seeing under PHP 5.2.11, but which I have seen
several times since upgrading. If PHP is retarting while in the middle
of a DB transaction, I might expect to see that message. I have both
mysql & mysqli enabled, and have now (yesterday) disabled persistent
connections for both. I haven't seen the error tonight, but as per the
above snippet from POPUPLOG, I am still seeing the HTTPD crashes. I
might roll back to 5.2.11, just to help narrow this down (read:
hopefully eliminate Apache 2.2.21 as the source).

Thanks (as always), Steve. I'll dig a little deeper and will follow up,
here.

Lewis G Rosenthal

unread,
Oct 9, 2011, 2:34:46 AM10/9/11
to apa...@googlegroups.com
Quick follow-up:

On 10/09/11 01:55 am, Lewis G Rosenthal thus wrote :


> Hey, there, buddy...
>
> On 10/08/11 04:05 pm, Steven Levine thus wrote :
>> In<4E8E87C3...@2rosenthals.com>, on 10/07/11
>> at 01:01 AM, Lewis G Rosenthal<lgros...@2rosenthals.com> said:

<snip>

This is nothing new. While I wasn't seeing anything in the vio under
Apache 2.2.15 / PHP 5.2.11, I have scads of these in POPUPLOG going back
for years. I didn't compare registers, but it's been falling over for
some time (and recovering nicely), and now that I think of it, after
months of uptme, I have seen rather high PIDs for HTTPD in TOP.

<snip>


>> I would recommend locating where the MALLOC FAILED message is comming
>> from. This might give a a better idea of where to start looking.
>>
>> You might actually have two separater failures, but it is too soon to
>> say.
> Hmmm...
>
> Of course, the obvious (to me) suspect is PHP. The error log is pretty
> fat as it is, though I may need to crank up the verbosity a bit just
> to get this (there is that one really noisy site - I even fixed a
> couple of their coding errors as I just couldn't stand the cruft in
> the log anymore!).
>
> A possible side effect (or cause - hard to tell - is the oft-reported
> "DB function failed with error number 2006 - MySQL server has gone
> away" message which I was not seeing under PHP 5.2.11, but which I
> have seen several times since upgrading. If PHP is retarting while in
> the middle of a DB transaction, I might expect to see that message. I
> have both mysql & mysqli enabled, and have now (yesterday) disabled
> persistent connections for both. I haven't seen the error tonight, but
> as per the above snippet from POPUPLOG, I am still seeing the HTTPD
> crashes. I might roll back to 5.2.11, just to help narrow this down
> (read: hopefully eliminate Apache 2.2.21 as the source).
>
> Thanks (as always), Steve. I'll dig a little deeper and will follow
> up, here.
>

I rolled back to 5.2.11 for a little while. I'll watch logs and keep an
eye out for the MALLOC FAILED message in the vio.

Cheers

Paul Smedley

unread,
Oct 9, 2011, 9:22:40 AM10/9/11
to apa...@googlegroups.com
Please keep the list updated with what you find.

I'm reading with interest but am currently lying in a hotel room in Bali with no access to eCS, just an Android tab, and an opponent and a mscbook air :)
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.


Lewis G Rosenthal, CNA, CLP, CLE, CWTS
Rosenthal& Rosenthal, LLC www.2rosenthals.com
Need a managed Wi-Fi hotspot? www.hautspot.com
visit my IT blog www.2rosenthals.net/wordpress
please do not add my address to any non-bcc mass mailings



--
You received this message because you are subscribed to the Google Groups "Apache for OS/2" group.
To post to this group, send email to apa...@googlegroups.com.
To unsubscribe from this group, send email to apache2+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/apache2?hl=en.

Lewis G Rosenthal

unread,
Oct 9, 2011, 9:53:35 AM10/9/11
to apa...@googlegroups.com
On 10/09/11 09:22 am, Paul Smedley thus wrote :

> Please keep the list updated with what you find.
>
> I'm reading with interest but am currently lying in a hotel room in
> Bali with no access to eCS, just an Android tab, and an opponent and a
> mscbook air :)
>
My friend, if *I* were with my wife in a hotel room in Bali, you can bet
*I* would *not* be following this thread so closely. You need to enjoy
your time away, and shift your focus for a bit (or I will have to touch
base with the misses and get her on the case). :-)

Now, there will be plenty more coming down the pike soon enough.
Meanwhile, we don't want to hear from you for the next few days. Really.
Put that blasted machine *down*!!

;-)

<snip>

--
Lewis
-------------------------------------------------------------


Lewis G Rosenthal, CNA, CLP, CLE, CWTS
Rosenthal& Rosenthal, LLC www.2rosenthals.com
Need a managed Wi-Fi hotspot? www.hautspot.com
visit my IT blog www.2rosenthals.net/wordpress
please do not add my address to any non-bcc mass mailings

-------------------------------------------------------------

Ian Manners

unread,
Oct 9, 2011, 10:22:03 AM10/9/11
to apa...@googlegroups.com
Hmmm,

>My friend, if *I* were with my wife in a hotel room in Bali, you can bet
>*I* would *not* be following this thread so closely. You need to enjoy
>your time away, and shift your focus for a bit (or I will have to touch
>base with the misses and get her on the case). :-)

The way this reads, which ones the missus ?
Android, opponent, or mscbook ?

>Now, there will be plenty more coming down the pike soon enough.
>Meanwhile, we don't want to hear from you for the next few days. Really.
>Put that blasted machine *down*!!

He's australian, got to do something while your on the can :o)

Cheers
Ian Manners
http://www.os2site.com/

Steven Levine

unread,
Oct 9, 2011, 11:55:12 AM10/9/11
to apa...@googlegroups.com
In <4E91375...@2rosenthals.com>, on 10/09/11

at 01:55 AM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi,

>Not in the vio, however, POPUPLOG.OS2 says:

>10-09-2011 01:37:48 SYS3175 PID 1c5e TID 0006 Slot 0124
>J:\APPS\APACHE2\BIN\HTTPD.EXE
>c0000005
>1ffb1b65
>P1=00000001 P2=00000b14 P3=XXXXXXXX P4=XXXXXXXX
>EAX=00000b14 EBX=20035d00 ECX=0235ffd4 EDX=0235ffd4
>ESI=0235ffd4 EDI=00000006
>DS=0053 DSACC=f0f3 DSLIM=ffffffff
>ES=0053 ESACC=f0f3 ESLIM=ffffffff
>FS=150b FSACC=00f3 FSLIM=00000030
>GS=0000 GSACC=**** GSLIM=********
>CS:EIP=005b:1ffb1b65 CSACC=f0df CSLIM=ffffffff
>SS:ESP=0053:0235ff94 SSACC=f0f3 SSLIM=ffffffff
>EBP=0235ff94 FLG=00010213

>DOSCALL1.DLL 0002:00001b65

This looks like something we've worked on in the past. What is the md5
sum of your doscall1.dll and which os2kernl are you running?

This is most likely the DosUnsetExceptionHandler trap. You might recall
that we got these as the result of something stomping on the exception
handler chain which lives in the stack.

Regarding the "MALLOC FAILED" message, you want to try using fm/2's seek
and scan on the volume containing your server binaries. If the message
text in not compresses, you should be able to find the executable that
contains the message.

Lewis G Rosenthal

unread,
Oct 9, 2011, 12:50:54 PM10/9/11
to apa...@googlegroups.com
On 10/09/11 10:22 am, Ian Manners thus wrote :

> Hmmm,
>
>
>> My friend, if *I* were with my wife in a hotel room in Bali, you can bet
>> *I* would *not* be following this thread so closely. You need to enjoy
>> your time away, and shift your focus for a bit (or I will have to touch
>> base with the misses and get her on the case). :-)
>>
> The way this reads, which ones the missus ?
> Android, opponent, or mscbook ?
>
>
:-)

>> Now, there will be plenty more coming down the pike soon enough.
>> Meanwhile, we don't want to hear from you for the next few days. Really.
>> Put that blasted machine *down*!!
>>
> He's australian, got to do something while your on the can :o)
>
>
ROFL!!

Lewis G Rosenthal

unread,
Oct 9, 2011, 1:19:56 PM10/9/11
to apa...@googlegroups.com
On 10/09/11 11:55 am, Steven Levine thus wrote :

> In<4E91375...@2rosenthals.com>, on 10/09/11
> at 01:55 AM, Lewis G Rosenthal<lgros...@2rosenthals.com> said:
>
> Hi,
>
>
>> Not in the vio, however, POPUPLOG.OS2 says:
>>
>
>> 10-09-2011 01:37:48 SYS3175 PID 1c5e TID 0006 Slot 0124
>> J:\APPS\APACHE2\BIN\HTTPD.EXE
>> c0000005
>> 1ffb1b65
>> P1=00000001 P2=00000b14 P3=XXXXXXXX P4=XXXXXXXX
>> EAX=00000b14 EBX=20035d00 ECX=0235ffd4 EDX=0235ffd4
>> ESI=0235ffd4 EDI=00000006
>> DS=0053 DSACC=f0f3 DSLIM=ffffffff
>> ES=0053 ESACC=f0f3 ESLIM=ffffffff
>> FS=150b FSACC=00f3 FSLIM=00000030
>> GS=0000 GSACC=**** GSLIM=********
>> CS:EIP=005b:1ffb1b65 CSACC=f0df CSLIM=ffffffff
>> SS:ESP=0053:0235ff94 SSACC=f0f3 SSLIM=ffffffff
>> EBP=0235ff94 FLG=00010213
>>
>
>> DOSCALL1.DLL 0002:00001b65
>>
> This looks like something we've worked on in the past. What is the md5
> sum of your doscall1.dll and which os2kernl are you running?
>
>
Indeed.

12-29-04 11:25 144,631 0 DOSCALL1.DLL

MD5 checksum: 07A4-0FFE-C2A1-8991-72FB-C23F-3CF7-1319

Kernel: 14.104 SMP

> This is most likely the DosUnsetExceptionHandler trap. You might recall
> that we got these as the result of something stomping on the exception
> handler chain which lives in the stack.
>
>

Right you are. Good memory, and thanks for jogging mine.


> Regarding the "MALLOC FAILED" message, you want to try using fm/2's seek
> and scan on the volume containing your server binaries. If the message
> text in not compresses, you should be able to find the executable that
> contains the message.
>
>

I grep'd (APACHE2\BIN, APACHE2\MODULES, PHP5 -r), but came up empty (so
far). I'll have a more thorough look tonight, when I get back up to NY.

I suppose it's possible that this is the same thing we had before, but
we just weren't getting the verbosity in the reporting. Certainly, the
*outside* appearance is the same stability (except for the MySQL
timeout, which hasn't appeared again, as yet). Interesting to note since
rolling back to PHP 5.2.11, the following Apache log entries are sequential:

[Sun Oct 09 01:17:01 2011] [notice] Apache/2.2.21 (OS/2) PHP/5.2.11
configured -- resuming normal operations
[Sun Oct 09 08:52:23 2011] [error] (OS 343)OS/2 error 343: unable to
open work queue, exiting

Subsequent to the 08:52:53 error 343, we have the "owner died" stuff a
number of times, but what strikes me is that it took over 7 hours to get
to that point (earlier starts resulted in the reporting much sooner). It
could be the mix of PHP modules I'm running now and the difference in
memory footprint of 5.3 vs 5.2, but it's interesting nonetheless.

I'll keep searching for that MALLOC FAILED message & will follow up.

Thanks!

Paul Smedley

unread,
Oct 9, 2011, 6:31:57 PM10/9/11
to apa...@googlegroups.com
Haha thanks for the laugh. I have trust me, reading a few person emails isn't going to stress me out too much.

My eCS ports ARE my stress relief from my day job :)
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Lewis G Rosenthal <lgros...@2rosenthals.com> wrote:
On 10/09/11 09:22 am, Paul Smedley thus wrote :
> Please keep the list updated with what you find.
>
> I'm reading with interest but am currently lying in a hotel room in
> Bali with no access to eCS, just an Android tab, and an opponent and a
> mscbook air :)
>
My friend, if *I* were with my wife in a hotel room in Bali, you can bet
*I* would *not* be following this thread so closely. You need to enjoy
your time away, and shift your focus for a bit (or I will have to touch
base with the misses and get her on the case). :-)

Now, there will be plenty more coming down the pike soon enough.
Meanwhile, we don't want to hear from you for the next few days. Really.
Put that blasted machine *down*!!

;-)

<snip>

--
Lewis

Lewis G Rosenthal, CNA, CLP, CLE, CWTS
Rosenthal& Rosenthal, LLC www.2rosenthals.com
Need a managed Wi-Fi hotspot? www.hautspot.com
visit my IT blog www.2rosenthals.net/wordpress
please do not add my address to any non-bcc mass mailings

Lewis G Rosenthal

unread,
Oct 19, 2011, 9:53:02 PM10/19/11
to apa...@googlegroups.com
Hi, guys...

On 10/09/11 11:55 am, Steven Levine thus wrote :

> In<4E91375...@2rosenthals.com>, on 10/09/11
> at 01:55 AM, Lewis G Rosenthal<lgros...@2rosenthals.com> said:
>
>

<snip>


> Regarding the "MALLOC FAILED" message, you want to try using fm/2's seek
> and scan on the volume containing your server binaries. If the message
> text in not compresses, you should be able to find the executable that
> contains the message.
>
>

I ran lxlite * /X /R against a copy of the PHP 5.3.8 files. You were
right, Steve:

[j:\apps\php5.3.8]\bin\grep -iR "MALLOC FAILED" *
Binary file modules/openssl.dll matches
Binary file modules/pgsql.dll matches
Binary file modules/xsl.dll matches
Binary file modules/yaz.dll matches

Now, of the above, I had the following loaded in php.ini:

openssl.dll
pgsql.dll
xsl.dll

As I'm not running:

* *any* SSL sites on the server;
* *any* Postgres databases on any server; and
* not sure what might be using XSL

I'm wondering what would trigger a MALLOC FAILED message from any of those.

Under apache2\bin, ab.exe is the only binary containing that error string.

Current Apache stats, after rolling back to PHP 5.2.11:

Current Time: Wednesday, 19-Oct-2011 20:42:03
Restart Time: Thursday, 13-Oct-2011 12:58:18
Parent Server Generation: 0
Server uptime: 6 days 7 hours 43 minutes 45 seconds
Total accesses: 874 - Total Traffic: 11.3 MB
CPU Usage: u4035.65 s0 cu0 cs0 - .739% CPU load
.0016 requests/sec - 21 B/second - 13.2 kB/request
2 requests currently being processed, 18 idle workers

No MALLOC FAILED messages listed in the vio. The usual noise is present
in error_log and POPUPLOG.OS2 (60-70 SYS3175's from HTTPD.EXE per day),
though Apache seems to recover on its own well enough, with no impact on
end users.

And Steve, thanks so much for your participation at Warpstock this year!

Cheers/2

--
Lewis
-------------------------------------------------------------


Lewis G Rosenthal, CNA, CLP, CLE, CWTS
Rosenthal& Rosenthal, LLC www.2rosenthals.com
Need a managed Wi-Fi hotspot? www.hautspot.com
visit my IT blog www.2rosenthals.net/wordpress
please do not add my address to any non-bcc mass mailings

-------------------------------------------------------------

Steven Levine

unread,
Oct 20, 2011, 12:32:38 AM10/20/11
to apa...@googlegroups.com
In <4E9F7EFE...@2rosenthals.com>, on 10/19/11
at 09:53 PM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi,

>I ran lxlite * /X /R against a copy of the PHP 5.3.8 files. You were
>right, Steve:

>[j:\apps\php5.3.8]\bin\grep -iR "MALLOC FAILED" *
>Binary file modules/openssl.dll matches
>Binary file modules/pgsql.dll matches
>Binary file modules/xsl.dll matches
>Binary file modules/yaz.dll matches

I think most if not all of these are false positives. Since the error
message is in uppercase, the -i switch is not appropriate.

>I'm wondering what would trigger a MALLOC FAILED message from any of
>those.

Heap overflow or heap corruption. :-)

>And Steve, thanks so much for your participation at Warpstock this year!

It's always fun.

Lewis G Rosenthal

unread,
Oct 20, 2011, 6:29:51 AM10/20/11
to apa...@googlegroups.com
Good morning...

On 10/20/11 12:32 am, Steven Levine thus wrote :


> In<4E9F7EFE...@2rosenthals.com>, on 10/19/11
> at 09:53 PM, Lewis G Rosenthal<lgros...@2rosenthals.com> said:
>
> Hi,
>
>
>> I ran lxlite * /X /R against a copy of the PHP 5.3.8 files. You were
>> right, Steve:
>>
>
>> [j:\apps\php5.3.8]\bin\grep -iR "MALLOC FAILED" *
>> Binary file modules/openssl.dll matches
>> Binary file modules/pgsql.dll matches
>> Binary file modules/xsl.dll matches
>> Binary file modules/yaz.dll matches
>>
> I think most if not all of these are false positives. Since the error
> message is in uppercase, the -i switch is not appropriate.
>
>

Indeed. Silly oversight on my part. I've been working on another project
which has had me doing some involved grep'ing...that was "a memory
artifact." ;-)

I'll re-run, without ignoring case, though I have a hunch I'll come up
with the same results.

>> I'm wondering what would trigger a MALLOC FAILED message from any of
>> those.
>>
> Heap overflow or heap corruption. :-)
>
>

Even if the modules are loaded but unused? I would have thought not.

FWIW, I'm loading those same modules under 5.2.11 (though I should
re-think that, under more careful examination).

<snip>

Steven Levine

unread,
Oct 20, 2011, 12:46:33 PM10/20/11
to apa...@googlegroups.com
In <4E9FF81F...@2rosenthals.com>, on 10/20/11
at 06:29 AM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi there,

>I'll re-run, without ignoring case, though I have a hunch I'll come up
>with the same results.

You won't, unless it's yaz.dll which I don't have here. :-)

>>> I'm wondering what would trigger a MALLOC FAILED message from any of
>>> those.
>>>
>> Heap overflow or heap corruption. :-)
>>
>>
>Even if the modules are loaded but unused? I would have thought not.

There is no such thing as loaded but unused. IAC, I am talking in general
terms since we have not yet identified the source of the message.

>FWIW, I'm loading those same modules under 5.2.11 (though I should
>re-think that, under more careful examination).

Until proven otherwise, I claim the error message is coming from
elsewhere.

Massimo

unread,
Dec 26, 2011, 10:40:28 AM12/26/11
to apa...@googlegroups.com
Hi Lewis,

i've faced that MALLOC FAILED error me too in these 2 latest days,
but i've found it's due to a *very bad* DDOS attack.

Some idiot were sending hundreds/thousands of requests to one virtual host

see post "apache DDOS" of 26th december 2011 from me

i guess about this "problem" we need to know how to "tune" apache so that
it should refuse requests if they are "too much at the same time and from the same ip"

i guess a real problem of apache is allways that damned "apr_socket_accept" OS 105
the previous ownership of this semaphore has ended" error

e.g. that make apache to refuse to start if i don't unplug the lan patches (! :D) from the
switch during the apache restart :(

thanks
bye

massimo s.

Lewis G Rosenthal

unread,
Dec 26, 2011, 2:26:41 PM12/26/11
to apa...@googlegroups.com
Hi, Max...

On 12/26/11 10:40 am, Massimo thus wrote :


> Hi Lewis,
>
> i've faced that MALLOC FAILED error me too in these 2 latest days,
> but i've found it's due to a *very bad* DDOS attack.
>

I have some interesting news...


> Some idiot were sending hundreds/thousands of requests to one virtual host
>
> see post "apache DDOS" of 26th december 2011 from me
>

I noticed. See my reply in that thread. Ian pretty much hit on it,
though: this is an issue for the firewall to handle, and not the web server.


> i guess about this "problem" we need to know how to "tune" apache so that
> it should refuse requests if they are "too much at the same time and from the same ip"
>
> i guess a real problem of apache is allways that damned "apr_socket_accept" OS 105
> the previous ownership of this semaphore has ended" error
>
> e.g. that make apache to refuse to start if i don't unplug the lan patches (! :D) from the
> switch during the apache restart :(
>

That truly is a weird one, I grant you.

<snip>


> Il 20/10/2011 3.53, Lewis G Rosenthal ha scritto:
>> Hi, guys...
>>
>> On 10/09/11 11:55 am, Steven Levine thus wrote :
>>> In<4E91375...@2rosenthals.com>, on 10/09/11
>>> at 01:55 AM, Lewis G Rosenthal<lgros...@2rosenthals.com> said:
>>>
>>>
>> <snip>
>>> Regarding the "MALLOC FAILED" message, you want to try using fm/2's seek
>>> and scan on the volume containing your server binaries. If the message
>>> text in not compresses, you should be able to find the executable that
>>> contains the message.

Okay, well, after chasing my tail with this one for some time, searching
J:\APPS\* for "MALLOC FAILED", I finally came up with several hits:

C:\mptn\BIN\arp.exe
C:\mptn\BIN\dhcpcd.exe
C:\mptn\BIN\mib_2.exe
C:\mptn\BIN\netstat.exe
C:\mptn\BIN\nfsd.exe
C:\mptn\BIN\pgwsplit.exe
C:\mptn\DLL\tcpip32.dll
C:\mptn\DLL\tcpip32.old
C:\mptn\DLL\tcpmri.dll

If I had to guess, I'd say we're getting the report from tcpip32.dll.
Steve, the dll reported is your patched one, I believe:

8-16-11 6:01 87,504 0 tcpip32.dll

Build Level Display Facility Version 6.12.675 Sep 25 2001
(C) Copyright IBM Corporation 1993-2001
Signature: @#IBM:6.01a#@ patched 2011/08/16 06:01:01 on TCPBLDSRVR
- MPTS
6.01-TCP/IP 4.3 for OS/2 - 32 BIT SOCKETS
Vendor: IBM
Revision: 6.01
File Version: 6.1
Description: patched 2011/08/16 06:01:01 on TCPBLDSRVR - MPTS
6.01-TCP/IP 4.
3 for OS/2 - 32

vs the "old" file, which is:

5-21-04 11:15 98,611 0 tcpip32.old

Now, the above list came from my ThinkPad. Looking back at the server, I
see that I have not yet updated the tpcip32.dll, and am in fact, running
the 5-21-04 build. Considering that your patch(es) was/were specifically
designed to work around SMP issues (i.e., make tcpip32.dll multi-thread
and SMP safe), and I am running an SMP machine, I should probably switch
to that one, unless you have a better idea.

Max, your server is SMP also, isn't it?

Cheers/2, and Happy Holidays...

Massimo

unread,
Dec 26, 2011, 3:07:43 PM12/26/11
to apa...@googlegroups.com
the web server is a proliant ml110 g2, so p4 3ghz, no smp


massimo s.

Reply all
Reply to author
Forward
0 new messages