* new-features.sgml (ov-new1.7.2): Add chapter for news in 1.7.2.

* setup2.sgml (setup-locale-ov): Describe how valid locales are determined by Windows locale support. Change description for modifiers in locale environment variables. (setup-locale-how): Describe new charset behaviour. Mention new getlocale tool to fetch valid locale information from Windows. (setup-locale-missing): Drop now implemented LC_foo options. Explain missing LC_MESSAGES in more detail.
2010-01-22 22:32:42 +00:00 · 2010-01-22 22:32:42 +00:00 · ff0056d45e
parent be822de2a1
commit ff0056d45e
3 changed files with 115 additions and 46 deletions
--- a/winsup/doc/ChangeLog
+++ b/winsup/doc/ChangeLog
@ -1,3 +1,14 @@
+2010-01-22  Corinna Vinschen  <corinna@vinschen.de>
+
+	* new-features.sgml (ov-new1.7.2): Add chapter for news in 1.7.2.
+	* setup2.sgml (setup-locale-ov): Describe how valid locales are
+	determined by Windows locale support.  Change description for modifiers
+	in locale environment variables.
+	(setup-locale-how): Describe new charset behaviour.  Mention new
+	getlocale tool to fetch valid locale information from Windows.
+	(setup-locale-missing): Drop now implemented LC_foo options.
+	Explain missing LC_MESSAGES in more detail.
+
 2010-01-17  Corinna Vinschen  <corinna@vinschen.de>

 	* setup2.sgml (setup-locale): Mention three character codes per
--- a/winsup/doc/new-features.sgml
+++ b/winsup/doc/new-features.sgml
@ -1,5 +1,43 @@
 <sect1 id="ov-new1.7"><title>What's new and what changed in Cygwin 1.7</title>

+<sect2 id="ov-new1.7.2"><title>What's new and what changed from 1.7.1 to 1.7.2</title>
+
+<screen>
+- Localization support has been much improved.
+
+  - Cygwin now handles locales using the underlying Windows locale support.
+    The locale must exists in Windows to be recognized.
+
+  - New tool "getlocale" to fetch valid locale values from Windows.
+
+  - Default charset for locales without explicit charset is now choosen
+    from a list of Linx-compatible charsets.  For instance en_US -> ISO-8859-1,
+    ja_JP -> EUC-JP.
+
+  - Support for the @euro locale modifier to switch to the ISO-8859-15
+    charset.
+
+  - Default charset in the "C" or "POSIX" locale has been changed back from
+    UTF-8 to ASCII, to circumvent problems with applications expecting a
+    singlebyte charset in the "C"/"POSIX" locale.  Still use UTF-8 internally
+    for filename conversion in this case.
+
+  - LC_COLLATE, LC_MONETARY, LC_NUMERIC, and LC_TIME localization is enabled
+    via Windows locale support.
+    
+  - New strfmon(3) call.
+
+- Support open(2) flags O_CLOEXEC and O_TTY_INIT flags.  Support
+  fcntl flag F_DUPFD_CLOEXEC.  Support socket flags SOCK_CLOEXEC and
+  SOCK_NONBLOCK).
+
+- Add new Linux-compatible API calls accept4(2), dup3(2), and pipe2(2).
+
+- fnmatch(3) call is now multibyte-aware.
+</screen>
+
+</sect2>
+
 <sect2 id="ov-new1.7-os"><title>OS related changes</title>

 <screen>
--- a/winsup/doc/setup2.sgml
+++ b/winsup/doc/setup2.sgml
@ -255,35 +255,41 @@ charset.  The Cygwin DLL itself, however, will nevertheless use the locale
 set in the environment (or the "C.UTF-8" default locale) for converting
 filenames etc.</para>

-<para>When the locale set in the environment specifies an ASCII charset,
+<para>When the locale in the environment specifies an ASCII charset,
 for example "C" or "en_US.ASCII", Cygwin will still use UTF-8
 under the hood to translate filenames.  This allows for easier
 interoperability with applications running in the default "C.UTF-8" locale.
 </para>

 <para>
-Right now the language and territory, as well as the modifier, are not
-important to Cygwin, except to fix a single problem.  There's a class of
-characters in the Unicode character set, called the "CJK Ambiguous Width
-Character set".  For these characters the width returned by the
-wcwidth/wcswidth function is usually 1.  This is often a problem in
-East-Asian languages, which historically use character sets in which
-these characters have a width of 2.  Kind of explains why they are
-called "ambiguous"...</para>
+Starting with Cygwin 1.7.2, the language and territory are used to
+fetch locale-dependent information from Windows.  If the language and
+territory are not known to Windows, the <function>setlocale</function>
+function fails.</para>

-<para>
-The problem has been fixed like this.  wcwidth/wcswidth usually
-return 1 as the width of these characters.  However, if the language is
-specifed as "ja" (Japanese), "ko" (Korean), or "zh" (Chinese), wcwidth
-returns 2 for these characters.  Unfortunately this isn't correct in
-all circumstances, so the user can specify the modifier "@cjknarrow",
-which modifies the behaviour of wcwidth/wcswidth to return 1 for the
-ambiguous width characters to return 1 even in those languages.</para>
+<para>The modifier is used for two cases.</para>

-<para>
-Other than that, the only important part so far is the character set.
+<itemizedlist mark="bullet">

-How does that work?</para>
+<listitem><para>For languages which default to one of the ISO-8859 character
+sets, the modifier "@euro" can be added to enforce usage of the ISO-8859-15
+character set, which includes a character for the "Euro" currency sign .</para>
+</listitem>
+
+<listitem><para>There's a class of characters in the Unicode character set,
+called the "CJK Ambiguous Width Character set".  For these characters the width
+returned by the wcwidth/wcswidth function is usually 1.  This is often a
+problem in East-Asian languages, which historically use character sets in
+which these characters have a width of 2.  By default, the wcwidth/wcswidth
+functions return 1 as the width of these characters, except if the language is
+specifed as "ja" (Japanese), "ko" (Korean), or "zh" (Chinese).  In these
+languages wcwidth and wcswidth return 2 for these characters.  This is not
+correct in all circumstances, so the user of one of these languages can specify
+the modifier "@cjknarrow", which modifies the behaviour of wcwidth/wcswidth to
+return 1 for the ambiguous width characters.</para>
+</listitem>
+
+</itemizedlist>

 </sect2>

@ -296,32 +302,47 @@ Assume that you've set one of the aforementioned environment variables to some
 valid POSIX locale value, other than "C" and "POSIX".  Assume further that
 you're living in Japan.  You might want to use the language code "ja" and the
 territory "JP", thus setting, say, <envar>LANG</envar> to "ja_JP".  You didn't
-set a character set, so what will Cygwin use now?  Easy!  It will use the
-default Windows ANSI codepage of your system, if it's supported by Cygwin.
-Hopefully Cygwin supports all relevant default ANSI codepages...</para>
+set a character set, so what will Cygwin use now?  Starting with Cygwin 1.7.2,
+the default character set is determined by the default Windows ANSI codepage
+for this language and territory.  Cygwin uses a character set which is the
+typical Unix-equivalent to the Windows ANSI codepage.  For instance:</para>

-<note><para>For a list of supported character sets, see
-<xref linkend="setup-locale-charsetlist"></xref>
-</para></note>
+<screen>
+  "en_US"		ISO-8859-1
+  "el_GR"		ISO-8859-7
+  "pl_PL"		ISO-8859-2
+  "pl_PL@euro"		ISO-8859-15
+  "ja_JP"		EUCJP
+  "ko_KR"		EUCKR
+  "te_IN"		UTF-8
+</screen>
 </listitem>

 <listitem><para>
-You don't want to use the default Windows codepage as character set?
-In that case you have to specify the charset explicitly.  For instance,
-assume you're from Italy and don't want to use the Italian default Windows
-ANSI codepage 1252, but the more portable ISO-8859-15 character set.
-What you can do, for instance, is to set the <envar>LANG</envar> variable
-in the <filename>C:\cygwin\Cygwin.bat</filename> file which is the batch file
-to start a Cygwin session from the "Cygwin" desktop shortcut.</para>
+You don't want to use the default character set?  In that case you have to
+specify the charset explicitly.  For instance, assume you're from Japan and
+don't want to use the japanese default charset EUC-JP, but the Windows
+default charset SJIS.  What you can do, for instance, is to set the
+<envar>LANG</envar> variable in the <filename>C:\cygwin\Cygwin.bat</filename>
+file which is the batch file to start a Cygwin session from the "Cygwin"
+desktop shortcut.</para>

 <screen>
  @echo off

  C:
  chdir C:\cygwin\bin
-  set LANG=it_IT.ISO-8859-15
+  set LANG=ja_JP.SJIS
  bash --login -i
 </screen>
+
+<note><para>For a list of locales supported by your Windows machine, use the new
+><command>getlocale -a</command> command, which is part of the Cygwin package.
+For a description see <xref linkend="getlocale"></xref></para></note>
+
+<note><para>For a list of supported character sets, see
+<xref linkend="setup-locale-charsetlist"></xref>
+</para></note>
 </listitem>

 <listitem><para>
@ -435,19 +456,18 @@ entries are useful to cygwin: 932/SJIS, 936/GBK, 949/EUC-KR, 950/Big5,
 <sect2 id="setup-locale-missing"><title>What does not work?</title>

 <para>
-Except for <envar>LC_ALL</envar>, <envar>LC_CTYPE</envar>,
-and <envar>LANG</envar>, all other LC_xxx environment variables,
-<envar>LC_COLLATE</envar>, <envar>LC_MESSAGES</envar>,
-<envar>LC_MONETARY</envar>, <envar>LC_NUMERIC</envar>,
-and <envar>LC_TIME</envar>, are ignored right now.  This means, while Cygwin
-supports different character sets, it does <emphasis>not</emphasis> support
-real localization so far.  There's no support for locale-specific monetary
-symbols, for a decimalpoint other than '.', no support for native time
-formats, and no support for native language sorting orders.
-</para>
+The environment variable and locale setting <envar>LC_MESSAGES</envar>
+is ignored right now.  There's no known WIndows function to fetch the
+regular expressions to recognize user input with the meaning of "yes"
+or "no" from some Windows function.  Therefore,
+<function>nl_langinfo(YESEXPR)</function> and
+<function>nl_langinfo(NOEXPR)</function> always return a string
+suitable only for the English language.</para>

-<para>Cygwin's internationalization support is work in progress and we would
-be glad for coding help in this area.</para>
+<para>If somebody knows a simple solution to this problem, feel free
+to notify us on the 
+<ulink url="mailto:cygwin@cygin.com">Cygwin mailing list</ulink>.
+</para>

 </sect2>