Context Navigation

← Previous Ticket
Next Ticket →

#457 closed defect (fixed)

Win32: ngx_utf8_to_utf16 doesn't allow file names outside U+FFFF

Reported by:	Kroward 1	Owned by:
Priority:	minor	Milestone:
Component:	nginx-core	Version:
Keywords:	win32	Cc:
uname -a:	Microsoft Windows XP [Version 5.1.2600]
nginx -V:	nginx/Windows-1.5.7 (prebuild binary)

Description

Function ngx_utf8_to_utf16() in 'win32\ngx_files.c' return error on UTF-8 characters with CP>U+FFFF.

Such characters are perfectly valid and should be presented as surrogate pair on Windows.

This code fixes situation (it appears twice in the source):

        if (n > 0x10ffff) {
            ngx_set_errno(NGX_EILSEQ);
            return NULL;
        }

        if (n > 0xffff) {
            // LE order
            *u++ = (u_short) (0xd800 | ( ((n >> 16) & 0x1f) - 1 ) << 6 | (n & 0xffff) >> 10); // high surrogate
            // ??? check buffer length here ???
            *u++ = (u_short) (0xdc00 | n & 0x3ff); // low surrogate
        } else {
            *u++ = (u_short) n;
        }

You can check it if you extract exemplar CJK-named file from attached archive.

(autoindex module not working with Unicode, so url should be entered manually like .../%f0%a9%ba%8a.txt)

Attachments (1)

U29E8A.7z (126 bytes ) - added by Kroward 1 12 years ago.: Example of a file with problematic unicode name (should be extracted from archive)

Download all attachments as: .zip

Change History (4)

by Kroward 1, 12 years ago

Attachment:	U29E8A.7z added

Example of a file with problematic unicode name (should be extracted from archive)

comment:1 by Maxim Dounin, 12 years ago

Keywords:	win32 added
Status:	new → accepted

Correct, Windows 2000 and newer supports UTF-16, not just UCS-2. This needs to be addressed. The code suggested obviously needs to be changed to properly check if there is a space in the buffer used.

Something like this seems to be a proper fix:

diff -r 692afcea9d0d -r 06b47c205b0c src/os/win32/ngx_files.c
--- a/src/os/win32/ngx_files.c	Tue Dec 03 22:07:03 2013 +0400
+++ b/src/os/win32/ngx_files.c	Fri Dec 06 23:31:38 2013 +0400
@@ -799,13 +799,25 @@ ngx_utf8_to_utf16(u_short *utf16, u_char
             continue;
         }
 
+        if (u + 1 == last) {
+            *len = u - utf16;
+            break;
+        }
+
         n = ngx_utf8_decode(&p, 4);
 
-        if (n > 0xffff) {
+        if (n > 0x10ffff) {
             ngx_set_errno(NGX_EILSEQ);
             return NULL;
         }
 
+        if (n > 0xffff) {
+            n -= 0x10000;
+            *u++ = (u_short) (0xd800 + (n >> 10));
+            *u++ = (u_short) (0xdc00 + (n & 0x03ff));
+            continue;
+        }
+
         *u++ = (u_short) n;
     }
 
@@ -838,12 +850,19 @@ ngx_utf8_to_utf16(u_short *utf16, u_char
 
         n = ngx_utf8_decode(&p, 4);
 
-        if (n > 0xffff) {
+        if (n > 0x10ffff) {
             free(utf16);
             ngx_set_errno(NGX_EILSEQ);
             return NULL;
         }
 
+        if (n > 0xffff) {
+            n -= 0x10000;
+            *u++ = (u_short) (0xd800 + (n >> 10));
+            *u++ = (u_short) (0xdc00 + (n & 0x03ff));
+            continue;
+        }   
+
         *u++ = (u_short) n;
     }

Just in case, example characters (e.g., MUSICAL SYMBOL G CLEF, U+1D11E, 𝄞) as well as various details can be found at http://en.wikipedia.org/wiki/UTF-16.

comment:2 by Maxim Dounin <mdounin@…>, 12 years ago

In 1cd23ca84a9bbaa965160dba5ba62bda3e8a9e32/nginx:

Win32: support for UTF-16 surrogate pairs (ticket #457).

comment:3 by Maxim Dounin, 12 years ago

Resolution:	→ fixed
Status:	accepted → closed

Fix committed, thanks.

Note: See TracTickets for help on using tickets.

Download in other formats: