Opened 12 years ago

Closed 12 years ago

#63 closed defect (worksforme)

Error while allocating

Reported by: www.google.com/accounts/o8/id?id=AItOawnbO_Bp0gq-BdFo1JSwu_TT7SOWGIFncw4 Owned by: somebody
Priority: major Milestone:
Component: nginx-core Version: 1.0.x
Keywords: Cc: rafit.izhak.ratzin@…
uname -a:
nginx -V: 1.0.10

Description

We are running nginx 1.0.10.

The type of traffic is large posts (more than 4K - 16K post messages, hundreds to thousands per second)

We kept crashing within seconds, while free:
#4 0x000000000041ac41 in ngx_destroy_pool (pool=0x1f5e3f0) at core/ngx_palloc.c:64

Please see the steps below we run to solve this crash:

  1. Commenting our code and instead responding with a dummy predefined message.

After this step I notice the nginx is crashing easily and generated cores easily. (e.g. started at Nov 29 08:06:08, 1st core was created 2011/11/29 08:06:30, 2nd 2011/11/29 08:06:33, 3rd 08:07:16, 4th 2011/11/29 08:07:22)
What I also noticed it that it always crashes on the same function/line: ngx_destroy_pool, while it frees a large pool (same large pool for all the cores at the same run).

#4 0x000000000041ac41 in ngx_destroy_pool (pool=0x1f5e3f0) at core/ngx_palloc.c:64

  1. After running strlen on the allocated buffer I noticed that the size that is allocated is the same as the size returns by strlen.

Which means the allocation size is for content_ln and not content_ln+1 (+1 for the end_of string), the allocation doesn’t includes the null char.
ngx_alloc 0x125e8a0 size 4376

(gdb) p strlen((char *)(l->alloc))
$1 = 4376
(gdb) p l->alloc
$2 = (void *) 0x125e8a0

  1. I added +1 to the allocation in ngx_palloc_large and rerun again:

static void *
ngx_palloc_large(ngx_pool_t *pool, size_t size)
{

void *p;
ngx_uint_t n;
ngx_pool_large_t *large;

p = ngx_alloc(size+1, pool->log); THE CHANGE IS HERE

....
}

it runs for a while, no cores were generated while before I could generate cores easily within seconds.

  1. I returned back our original call and rerun again.

So far no cores were created, and it runs for almost 12 hours.

There is probably a better/cleaner fix for that .....

The reason I put it as major is because this crash can lead to a dead lock when all the workers are waiting for a malloc lock while a thread that was killed due to "double-free" false diagnostic (memory corruption) did not release this lock.

Please let me know if you need any core files

Thanks,
Rafit

Change History (2)

comment:1 by is, 12 years ago

Status: newaccepted

Please show full output of "nginx -V"

comment:2 by Maxim Dounin, 12 years ago

Resolution: worksforme
Status: acceptedclosed

Feedback timeout. I suspect the problem is in submitter's code anyway.

Note: See TracTickets for help on using tickets.