From ff0056d45eefbbf2650dcba80ab9be3fa3b0cdaf Mon Sep 17 00:00:00 2001 From: Corinna Vinschen Date: Fri, 22 Jan 2010 22:32:42 +0000 Subject: [PATCH] * new-features.sgml (ov-new1.7.2): Add chapter for news in 1.7.2. * setup2.sgml (setup-locale-ov): Describe how valid locales are determined by Windows locale support. Change description for modifiers in locale environment variables. (setup-locale-how): Describe new charset behaviour. Mention new getlocale tool to fetch valid locale information from Windows. (setup-locale-missing): Drop now implemented LC_foo options. Explain missing LC_MESSAGES in more detail. --- winsup/doc/ChangeLog | 11 ++++ winsup/doc/new-features.sgml | 38 ++++++++++++ winsup/doc/setup2.sgml | 112 +++++++++++++++++++++-------------- 3 files changed, 115 insertions(+), 46 deletions(-) diff --git a/winsup/doc/ChangeLog b/winsup/doc/ChangeLog index 7e831e68c..d2fa2c177 100644 --- a/winsup/doc/ChangeLog +++ b/winsup/doc/ChangeLog @@ -1,3 +1,14 @@ +2010-01-22 Corinna Vinschen + + * new-features.sgml (ov-new1.7.2): Add chapter for news in 1.7.2. + * setup2.sgml (setup-locale-ov): Describe how valid locales are + determined by Windows locale support. Change description for modifiers + in locale environment variables. + (setup-locale-how): Describe new charset behaviour. Mention new + getlocale tool to fetch valid locale information from Windows. + (setup-locale-missing): Drop now implemented LC_foo options. + Explain missing LC_MESSAGES in more detail. + 2010-01-17 Corinna Vinschen * setup2.sgml (setup-locale): Mention three character codes per diff --git a/winsup/doc/new-features.sgml b/winsup/doc/new-features.sgml index 7a6780a0e..872a90d8f 100644 --- a/winsup/doc/new-features.sgml +++ b/winsup/doc/new-features.sgml @@ -1,5 +1,43 @@ What's new and what changed in Cygwin 1.7 +What's new and what changed from 1.7.1 to 1.7.2 + + +- Localization support has been much improved. + + - Cygwin now handles locales using the underlying Windows locale support. + The locale must exists in Windows to be recognized. + + - New tool "getlocale" to fetch valid locale values from Windows. + + - Default charset for locales without explicit charset is now choosen + from a list of Linx-compatible charsets. For instance en_US -> ISO-8859-1, + ja_JP -> EUC-JP. + + - Support for the @euro locale modifier to switch to the ISO-8859-15 + charset. + + - Default charset in the "C" or "POSIX" locale has been changed back from + UTF-8 to ASCII, to circumvent problems with applications expecting a + singlebyte charset in the "C"/"POSIX" locale. Still use UTF-8 internally + for filename conversion in this case. + + - LC_COLLATE, LC_MONETARY, LC_NUMERIC, and LC_TIME localization is enabled + via Windows locale support. + + - New strfmon(3) call. + +- Support open(2) flags O_CLOEXEC and O_TTY_INIT flags. Support + fcntl flag F_DUPFD_CLOEXEC. Support socket flags SOCK_CLOEXEC and + SOCK_NONBLOCK). + +- Add new Linux-compatible API calls accept4(2), dup3(2), and pipe2(2). + +- fnmatch(3) call is now multibyte-aware. + + + + OS related changes diff --git a/winsup/doc/setup2.sgml b/winsup/doc/setup2.sgml index 8c03babae..f317ed25c 100644 --- a/winsup/doc/setup2.sgml +++ b/winsup/doc/setup2.sgml @@ -255,35 +255,41 @@ charset. The Cygwin DLL itself, however, will nevertheless use the locale set in the environment (or the "C.UTF-8" default locale) for converting filenames etc. -When the locale set in the environment specifies an ASCII charset, +When the locale in the environment specifies an ASCII charset, for example "C" or "en_US.ASCII", Cygwin will still use UTF-8 under the hood to translate filenames. This allows for easier interoperability with applications running in the default "C.UTF-8" locale. -Right now the language and territory, as well as the modifier, are not -important to Cygwin, except to fix a single problem. There's a class of -characters in the Unicode character set, called the "CJK Ambiguous Width -Character set". For these characters the width returned by the -wcwidth/wcswidth function is usually 1. This is often a problem in -East-Asian languages, which historically use character sets in which -these characters have a width of 2. Kind of explains why they are -called "ambiguous"... +Starting with Cygwin 1.7.2, the language and territory are used to +fetch locale-dependent information from Windows. If the language and +territory are not known to Windows, the setlocale +function fails. - -The problem has been fixed like this. wcwidth/wcswidth usually -return 1 as the width of these characters. However, if the language is -specifed as "ja" (Japanese), "ko" (Korean), or "zh" (Chinese), wcwidth -returns 2 for these characters. Unfortunately this isn't correct in -all circumstances, so the user can specify the modifier "@cjknarrow", -which modifies the behaviour of wcwidth/wcswidth to return 1 for the -ambiguous width characters to return 1 even in those languages. +The modifier is used for two cases. - -Other than that, the only important part so far is the character set. + -How does that work? +For languages which default to one of the ISO-8859 character +sets, the modifier "@euro" can be added to enforce usage of the ISO-8859-15 +character set, which includes a character for the "Euro" currency sign . + + +There's a class of characters in the Unicode character set, +called the "CJK Ambiguous Width Character set". For these characters the width +returned by the wcwidth/wcswidth function is usually 1. This is often a +problem in East-Asian languages, which historically use character sets in +which these characters have a width of 2. By default, the wcwidth/wcswidth +functions return 1 as the width of these characters, except if the language is +specifed as "ja" (Japanese), "ko" (Korean), or "zh" (Chinese). In these +languages wcwidth and wcswidth return 2 for these characters. This is not +correct in all circumstances, so the user of one of these languages can specify +the modifier "@cjknarrow", which modifies the behaviour of wcwidth/wcswidth to +return 1 for the ambiguous width characters. + + + @@ -296,32 +302,47 @@ Assume that you've set one of the aforementioned environment variables to some valid POSIX locale value, other than "C" and "POSIX". Assume further that you're living in Japan. You might want to use the language code "ja" and the territory "JP", thus setting, say, LANG to "ja_JP". You didn't -set a character set, so what will Cygwin use now? Easy! It will use the -default Windows ANSI codepage of your system, if it's supported by Cygwin. -Hopefully Cygwin supports all relevant default ANSI codepages... +set a character set, so what will Cygwin use now? Starting with Cygwin 1.7.2, +the default character set is determined by the default Windows ANSI codepage +for this language and territory. Cygwin uses a character set which is the +typical Unix-equivalent to the Windows ANSI codepage. For instance: -For a list of supported character sets, see - - + + "en_US" ISO-8859-1 + "el_GR" ISO-8859-7 + "pl_PL" ISO-8859-2 + "pl_PL@euro" ISO-8859-15 + "ja_JP" EUCJP + "ko_KR" EUCKR + "te_IN" UTF-8 + -You don't want to use the default Windows codepage as character set? -In that case you have to specify the charset explicitly. For instance, -assume you're from Italy and don't want to use the Italian default Windows -ANSI codepage 1252, but the more portable ISO-8859-15 character set. -What you can do, for instance, is to set the LANG variable -in the C:\cygwin\Cygwin.bat file which is the batch file -to start a Cygwin session from the "Cygwin" desktop shortcut. +You don't want to use the default character set? In that case you have to +specify the charset explicitly. For instance, assume you're from Japan and +don't want to use the japanese default charset EUC-JP, but the Windows +default charset SJIS. What you can do, for instance, is to set the +LANG variable in the C:\cygwin\Cygwin.bat +file which is the batch file to start a Cygwin session from the "Cygwin" +desktop shortcut. @echo off C: chdir C:\cygwin\bin - set LANG=it_IT.ISO-8859-15 + set LANG=ja_JP.SJIS bash --login -i + +For a list of locales supported by your Windows machine, use the new +>getlocale -a command, which is part of the Cygwin package. +For a description see + +For a list of supported character sets, see + + @@ -435,19 +456,18 @@ entries are useful to cygwin: 932/SJIS, 936/GBK, 949/EUC-KR, 950/Big5, What does not work? -Except for LC_ALL, LC_CTYPE, -and LANG, all other LC_xxx environment variables, -LC_COLLATE, LC_MESSAGES, -LC_MONETARY, LC_NUMERIC, -and LC_TIME, are ignored right now. This means, while Cygwin -supports different character sets, it does not support -real localization so far. There's no support for locale-specific monetary -symbols, for a decimalpoint other than '.', no support for native time -formats, and no support for native language sorting orders. - +The environment variable and locale setting LC_MESSAGES +is ignored right now. There's no known WIndows function to fetch the +regular expressions to recognize user input with the meaning of "yes" +or "no" from some Windows function. Therefore, +nl_langinfo(YESEXPR) and +nl_langinfo(NOEXPR) always return a string +suitable only for the English language. -Cygwin's internationalization support is work in progress and we would -be glad for coding help in this area. +If somebody knows a simple solution to this problem, feel free +to notify us on the +Cygwin mailing list. +