Какая связь между локалью, шрифтом и кодировкой?

506
prepangolin

Я думаю, что эти термины обычно используются в IME и т. Д.

  • место действия
  • шрифт
  • кодирование

К чему конкретно относится каждый из этих терминов?

Также будут приветствоваться практические знания об их отношениях между ними.

1

1 ответ на вопрос

1
akira

locale:

the 'locale' is holding information about certain conventions that people in a certain 'area' (local to each other) have in regards to where to set the decimal point of big numbers, how proper date-formatting looks like, where the punctuations appear etc. Example given:

  • 1234567,89 (SI style (French version), Albania, Belgium, Bosnia, Brazil, Bulgaria, Czech Republic, Denmark, Estonia, Finland, France, French Canada, Germany, Greece, Hungary, Italy, Latin Europe, Netherlands (non-currency numbers, see below), Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden)
  • 1234567.89 SI style (English version), Australia, English Canada, China
  • 1,234,567·89 Ireland, Japan, Korea, Malaysia, New Zealand, Philippines, Singapore, Taiwan, Thailand, United Kingdom, United States (older, typically hand written)
  • 1'234'567.89 Switzerland (printed, computing, currency, international requisite, everyday use)

(taken from http://en.wikipedia.org/wiki/Decimal_mark)

encoding:

the 'encoding' is a convention of how to represent 'things' of one system in units of a second 'system'. example given: you have 10 eggs in your hand. you can not represent these physical object in a computer, you have to 'encode' it to something the computer understands. one possible encoding would be this: "10 eggs" (as a text), you have now an encoded version of the 10 eggs in your hand.

the 'unit' of a computer usually means 'bytes'. each byte is able to (usually) hold the numbers of 0 up to 255. if you want to represent bigger numbers, you have to agree with other folks onto a schema to represent (store, retrieve) such higher numbers. one possible way:

  • to store numbers up to 65535 we use 2 bytes
  • the formula to retrieve the number is (byte1 * 256) + byte2

voila, an 'encoding', a convention of how to represent things of one system (natural numbers) in different units of a different system (bytes of a computer).

another common topic is 'how to store text'. people around the real wolrd use a lot of different 'drawings' to express their thoughts (they encode their toughts into words, sentences, longer texts etc). the sum of most of these 'drawings' are collected in something called the unicode-table. each of such 'drawings' is called a 'glyph'. you will find such glyphs as 'A', 'Ä', 'Ʌ', 'Ά', 'Ӑ', 'ڣ', '㈱', '⛽', '✪', '⬛' etc. (if you see a '?' sign or an empty block somewhere: that glyph is not part of your 'font' and thus your computer does not know, how to represent that glyph on the screen; more on that in the next part). each 'glyph' has a position in that table and thus, one way of representing text in a computer is to list of positions in that table:

104, 101, 108, 108, 111 -> 'h', 'e', 'l', 'l', 'o' 

voila, 'encoded text'. sometimes the position in that table does not fit into a 'byte', then you have to 'encode' the position of the glyph with multiple bytes as outlined above.

font:

a 'font' is usually a container file (similar to a .zip) which holds the graphic representation of all the glyphs the font-author wants to be in that container. the computer can then lookup the glyph in that font and uses the instructions per glyph to represent / render / draw the font to the screen. there are multiple ways of doing this:

  • you could define that each glyph is 10x10 pixels and then you fill in the pixels for each glyph (pixel fonts)
  • you could store a recipe for each glyph of how to draw it on every canvas size possible ("start in the top left corner, draw a line to bottom-center, draw a line from bottom-center to top right" -> 'V') (vector fonts)