Comments (19)
This seems to work:
RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8
The starting state for /etc/locale.gen
has en_US.UTF-8
commented out, along with all the other entries. Running dpkg-configure
interactively and selecting en_US.UTF-8
has the same effect as this set of commands, I think.
Edit: FWIW, I found another Dockerfile that uses a similar strategy: https://registry.hub.docker.com/u/etna/drone-debian/dockerfile/
from rocker.
Another thing to add to r-base so that it bubbles up.
[ That said, I am a 7-bit snob now and rarely ever set these... But we probably should. ]
from rocker.
Note, I set locale to C.UTF-8 as in the second example, rather than en_US.UTF-8 as in the first example; and just set the Debian base. (A summary of C.UTF-8 vs en_US.UTF-8 here, but happy for input on which locale @wch had in mind).
from rocker.
I'm not an expert in this stuff, but I think that en_US.UTF-8 would be better, since it defines proper sorting for non-ASCII characters, while C_UTF-8 does not -- it probably just uses the unicode value for sorting.
For example, in en_US.UTF-8, all the a
's with accents come before b
:
> sort(c('A', 'a', 'Ä', 'ä', 'À', 'à', 'b'))
[1] "a" "A" "à" "À" "ä" "Ä" "b"
But it's not true in C.UTF-8:
> sort(c('A', 'a', 'Ä', 'ä', 'À', 'à', 'b'))
[1] "A" "a" "b" "À" "Ä" "à" "ä"
So I think that, despite the provincial-sounding label, en_US actually supports non-English languages better than C.
from rocker.
@wch sounds reasonable to me.
For reasons that are not obvious to me, just switching C.UTF-8
to en_US.UTF-8
in this Dockerfile results in an error:
*** update-locale: Error: invalid locale settings: LANG=en_US.UTF-8
2014/10/07 20:12:37 The command [/bin/sh -c dpkg-reconfigure locales && locale-gen en_US.UTF-8 && /usr/sbin/update-locale LANG=en_US.UTF-8] returned a non-zero code: 255
No idea why, en_US.UTF-8
is on the list of locales returned by the command...
from rocker.
+1 -- I don't think I have ever seen C.UTF-8 in the wild anywhere. Not that I pay much attention though...
from rocker.
Blech:
root@e5b38b5f638c:/# du -csh /usr/share/locale/
87M /usr/share/locale/
87M total
root@e5b38b5f638c:/#
from rocker.
Doesn't seem so bad when I do it:
$ docker run --rm -ti eddelbuettel/debian-r-base /bin/bash
root@1dbe56be3aa1:/# du -csh /usr/share/locale
43M /usr/share/locale
43M total
root@1dbe56be3aa1:/# apt-get install -qq -y locales
root@1dbe56be3aa1:/# du -csh /usr/share/locale
47M /usr/share/locale
47M total
root@1dbe56be3aa1:/# echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
> && locale-gen en_US.utf8 \
> && /usr/sbin/update-locale LANG=en_US.UTF-8
Generating locales (this might take a while)...
en_US.UTF-8... done
Generation complete.
root@1dbe56be3aa1:/# du -csh /usr/share/locale
47M /usr/share/locale
47M total
from rocker.
I was using the 'drd' (ie daily r-devel) which has more packages hence more po files. Anyway, on my home system it is 177 mb so ... that's just a cost of doing business.
I learned something new which may help shrink the image some more.
from rocker.
Testing: docker run -it rocker/r-base R
> Sys.getlocale(category = "LC_ALL")
[1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C"
@wch Look good?
from rocker.
For some reason, the rstudio image (and thus hadleyverse) object to the locale settings.
The container throws a warning on startup:
$ docker run --rm -it rocker/rstudio R
/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
And likewise R complains as well:
R version 3.1.1 (2014-07-10) -- "Sock it to Me"
...
Type 'q()' to quit R.
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_COLLATE failed, using "C"
3: Setting LC_TIME failed, using "C"
4: Setting LC_MESSAGES failed, using "C"
5: Setting LC_MONETARY failed, using "C"
6: Setting LC_PAPER failed, using "C"
7: Setting LC_MEASUREMENT failed, using "C"
and then defaults to the "C" locale:
> Sys.getlocale(category = "LC_ALL")
[1] "C"
from rocker.
That rings a bell but I don;t quite recall what to do. Should be a generic issue for Debian-based VMs etc though. Maybe as simple as setting it in /etc/bash/bashrc, or profile or ...
from rocker.
I get C
locale on recent image r-base
image.
docker run -it r-base
Sys.getlocale(category = "LC_ALL")
# [1] "C"
According to discussion here I should get US UTF8 so it looks like this issue needs to be reopened.
from rocker.
I used this SO answer to solve that issue on Ubuntu 14.04.
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
Sys.getlocale(category = "LC_ALL")
# [1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C"
I tried the same on official debian's r-base
but it throws a lot of warnings about locale while build and in R console after run. So it cannot be directly applied to debian too.
from rocker.
Really? On r-base
I see:
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux stretch/sid
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
> Sys.getenv("LC_ALL")
[1] "en_US.UTF-8"
> Sys.getlocale(category="LC_ALL")
[1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C"
>
Are you sure you have the latest r-base image? (Not sure what you mean by 'official debian's r-base
' or what warnings you're seeing either)
Yes, ubuntu and debian set locales differently; both are described in the link above. (And of course the debian way is also illustrated at the top of the r-base
Dockerfile.
Does anyone else still see the C
locale in r-base
?
from rocker.
I get the same as Carl:
$ docker run --rm -ti r-base R -e 'sessionInfo()'
R version 3.2.2 (2015-08-14) -- "Fire Safety"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux stretch/sid
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
>
>
$
from rocker.
@cboettig, I get the same result as you.
from rocker.
heh, I cannot reproduce it anymore... so it was likely some issue on my side, maybe overlapping name of an image I've build a while ago.
from rocker.
So I think that, despite the provincial-sounding label, en_US actually supports non-English languages better than C.
Not sure how you jump to that conclusion.
What about languages where, for instance, "ä" or "å" are supposed to sort after "z"?
from rocker.
Related Issues (20)
- OpenBLAS warning in r-base HOT 6
- Distroless R? HOT 2
- Use reticulate and python in docker container to deploy plumber api HOT 1
- libk5crypto in r-base image doesn't work: "Random number generator could not be seeded while getting initial credentials" HOT 8
- rocker/r-base:4.2.1: addgroup: addgroup with two arguments is an unspecified operation. HOT 3
- r-base:4.1.2 and higher on ppc64le does not work. HOT 4
- apt-get install failing for libssl-dev with dpkg error in docker build HOT 6
- Cannot connect to SQL database HOT 4
- cannot build docker due to usrmerge failing HOT 4
- `'lib = "/usr/local/lib/R/site-library"' is not writable` error when install R packages by non-root user HOT 3
- wiki: best practices for creating dockerfiles HOT 6
- `r-base` (Docker Official Image) is not updated HOT 5
- Migrating from Dockerhub? HOT 3
- Container for RStudio (not server) HOT 1
- MRAN repository snapshots were deleted - Failing jobs because packages can't be found anymore HOT 5
- docker v18 & packages repositories & rocker's images HOT 5
- unable to load shared object
- Update results in deletion of all install package...
- installation testing fails HOT 1
- r: error while loading shared libraries: libR.so: cannot open shared object file: No such file or directory
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rocker.