Opened 13 years ago
Closed 13 years ago
#63 closed defect (worksforme)
Error while allocating
Reported by: | www.google.com/accounts/o8/id?id=AItOawnbO_Bp0gq-BdFo1JSwu_TT7SOWGIFncw4 | Owned by: | somebody |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | nginx-core | Version: | 1.0.x |
Keywords: | Cc: | rafit.izhak.ratzin@… | |
uname -a: | |||
nginx -V: | 1.0.10 |
Description
We are running nginx 1.0.10.
The type of traffic is large posts (more than 4K - 16K post messages, hundreds to thousands per second)
We kept crashing within seconds, while free:
#4 0x000000000041ac41 in ngx_destroy_pool (pool=0x1f5e3f0) at core/ngx_palloc.c:64
Please see the steps below we run to solve this crash:
- Commenting our code and instead responding with a dummy predefined message.
After this step I notice the nginx is crashing easily and generated cores easily. (e.g. started at Nov 29 08:06:08, 1st core was created 2011/11/29 08:06:30, 2nd 2011/11/29 08:06:33, 3rd 08:07:16, 4th 2011/11/29 08:07:22)
What I also noticed it that it always crashes on the same function/line: ngx_destroy_pool, while it frees a large pool (same large pool for all the cores at the same run).
#4 0x000000000041ac41 in ngx_destroy_pool (pool=0x1f5e3f0) at core/ngx_palloc.c:64
- After running strlen on the allocated buffer I noticed that the size that is allocated is the same as the size returns by strlen.
Which means the allocation size is for content_ln and not content_ln+1 (+1 for the end_of string), the allocation doesn’t includes the null char.
ngx_alloc 0x125e8a0 size 4376
(gdb) p strlen((char *)(l->alloc))
$1 = 4376
(gdb) p l->alloc
$2 = (void *) 0x125e8a0
- I added +1 to the allocation in ngx_palloc_large and rerun again:
static void *
ngx_palloc_large(ngx_pool_t *pool, size_t size)
{
void *p;
ngx_uint_t n;
ngx_pool_large_t *large;
p = ngx_alloc(size+1, pool->log); THE CHANGE IS HERE
....
}
it runs for a while, no cores were generated while before I could generate cores easily within seconds.
- I returned back our original call and rerun again.
So far no cores were created, and it runs for almost 12 hours.
There is probably a better/cleaner fix for that .....
The reason I put it as major is because this crash can lead to a dead lock when all the workers are waiting for a malloc lock while a thread that was killed due to "double-free" false diagnostic (memory corruption) did not release this lock.
Please let me know if you need any core files
Thanks,
Rafit
Change History (2)
comment:1 by , 13 years ago
Status: | new → accepted |
---|
comment:2 by , 13 years ago
Resolution: | → worksforme |
---|---|
Status: | accepted → closed |
Feedback timeout. I suspect the problem is in submitter's code anyway.
Please show full output of "nginx -V"