Opened 11 years ago

Closed 11 years ago

#457 closed defect (fixed)

Win32: ngx_utf8_to_utf16 doesn't allow file names outside U+FFFF

Reported by: Kroward 1 Owned by:
Priority: minor Milestone:
Component: nginx-core Version:
Keywords: win32 Cc:
uname -a: Microsoft Windows XP [Version 5.1.2600]
nginx -V: nginx/Windows-1.5.7 (prebuild binary)

Description

Function ngx_utf8_to_utf16() in 'win32\ngx_files.c' return error on UTF-8 characters with CP>U+FFFF.

Such characters are perfectly valid and should be presented as surrogate pair on Windows.

This code fixes situation (it appears twice in the source):

        if (n > 0x10ffff) {
            ngx_set_errno(NGX_EILSEQ);
            return NULL;
        }

        if (n > 0xffff) {
            // LE order
            *u++ = (u_short) (0xd800 | ( ((n >> 16) & 0x1f) - 1 ) << 6 | (n & 0xffff) >> 10); // high surrogate
            // ??? check buffer length here ???
            *u++ = (u_short) (0xdc00 | n & 0x3ff); // low surrogate
        } else {
            *u++ = (u_short) n;
        }

You can check it if you extract exemplar CJK-named file from attached archive.

(autoindex module not working with Unicode, so url should be entered manually like .../%f0%a9%ba%8a.txt)

Attachments (1)

U29E8A.7z (126 bytes ) - added by Kroward 1 11 years ago.
Example of a file with problematic unicode name (should be extracted from archive)

Download all attachments as: .zip

Change History (4)

by Kroward 1, 11 years ago

Attachment: U29E8A.7z added

Example of a file with problematic unicode name (should be extracted from archive)

comment:1 by Maxim Dounin, 11 years ago

Keywords: win32 added
Status: newaccepted

Correct, Windows 2000 and newer supports UTF-16, not just UCS-2. This needs to be addressed. The code suggested obviously needs to be changed to properly check if there is a space in the buffer used.

Something like this seems to be a proper fix:

diff -r 692afcea9d0d -r 06b47c205b0c src/os/win32/ngx_files.c
--- a/src/os/win32/ngx_files.c	Tue Dec 03 22:07:03 2013 +0400
+++ b/src/os/win32/ngx_files.c	Fri Dec 06 23:31:38 2013 +0400
@@ -799,13 +799,25 @@ ngx_utf8_to_utf16(u_short *utf16, u_char
             continue;
         }
 
+        if (u + 1 == last) {
+            *len = u - utf16;
+            break;
+        }
+
         n = ngx_utf8_decode(&p, 4);
 
-        if (n > 0xffff) {
+        if (n > 0x10ffff) {
             ngx_set_errno(NGX_EILSEQ);
             return NULL;
         }
 
+        if (n > 0xffff) {
+            n -= 0x10000;
+            *u++ = (u_short) (0xd800 + (n >> 10));
+            *u++ = (u_short) (0xdc00 + (n & 0x03ff));
+            continue;
+        }
+
         *u++ = (u_short) n;
     }
 
@@ -838,12 +850,19 @@ ngx_utf8_to_utf16(u_short *utf16, u_char
 
         n = ngx_utf8_decode(&p, 4);
 
-        if (n > 0xffff) {
+        if (n > 0x10ffff) {
             free(utf16);
             ngx_set_errno(NGX_EILSEQ);
             return NULL;
         }
 
+        if (n > 0xffff) {
+            n -= 0x10000;
+            *u++ = (u_short) (0xd800 + (n >> 10));
+            *u++ = (u_short) (0xdc00 + (n & 0x03ff));
+            continue;
+        }   
+
         *u++ = (u_short) n;
     }
 

Just in case, example characters (e.g., MUSICAL SYMBOL G CLEF, U+1D11E, 𝄞) as well as various details can be found at http://en.wikipedia.org/wiki/UTF-16.

comment:2 by Maxim Dounin <mdounin@…>, 11 years ago

In 1cd23ca84a9bbaa965160dba5ba62bda3e8a9e32/nginx:

Win32: support for UTF-16 surrogate pairs (ticket #457).

comment:3 by Maxim Dounin, 11 years ago

Resolution: fixed
Status: acceptedclosed

Fix committed, thanks.

Note: See TracTickets for help on using tickets.