• Hin-Tak Leung's avatar
    hfsplus: fix worst-case unicode to char conversion of file names and attributes · 017f8da4
    Hin-Tak Leung authored
    This is a series of 3 patches which corrects issues in HFS+ concerning
    the use of non-english file names and attributes.  Names and attributes
    are stored internally as UTF-16 units up to a fixed maximum size, and
    convert to and from user-representation by NLS.  The code incorrectly
    assume that NLS string lengths are equal to unicode lengths, which is
    only true for English ascii usage.
    
    This patch (of 3):
    
    The HFS Plus Volume Format specification (TN1150) states that file names
    are stored internally as a maximum of 255 unicode characters, as defined
    by The Unicode Standard, Version 2.0 [Unicode, Inc.  ISBN
    0-201-48345-9].  File names are converted by the NLS system on Linux
    before presented to the user.
    
    255 CJK characters converts to UTF-8 with 1 unicode character to up to 3
    bytes, and to GB18030 with 1 unicode character to up to 4 bytes.  Thus,
    trying in a UTF-8 locale to list files with names of more than 85 CJK
    characters results in:
    
        $ ls /mnt
        ls: reading directory /mnt: File name too long
    
    The receiving buffer to hfsplus_uni2asc() needs to be 255 x
    NLS_MAX_CHARSET_SIZE bytes, not 255 bytes as the code has always been.
    
    Similar consideration applies to attributes, which are stored internally
    as a maximum of 127 UTF-16BE units.  See XNU source for an up-to-date
    reference on attributes.
    
    Strictly speaking, the maximum value of NLS_MAX_CHARSET_SIZE = 6 is not
    attainable in the case of conversion to UTF-8, as going beyond 3 bytes
    requires the use of surrogate pairs, i.e.  consuming two input units.
    
    Thanks Anton Altaparmakov for reviewing an earlier version of this
    change.
    
    This patch fixes all callers of hfsplus_uni2asc(), and also enables the
    use of long non-English file names in HFS+.  The getting and setting,
    and general usage of long non-English attributes requires further
    forthcoming work, in the following patches of this series.
    
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: default avatarHin-Tak Leung <htl10@users.sourceforge.net>
    Reviewed-by: default avatarAnton Altaparmakov <anton@tuxera.com>
    Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Sougata Santra <sougata@tuxera.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    017f8da4
xattr.c 22.7 KB