
KOI8-R (RFC 1489) is an 8-bit character encoding, derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses a Cyrillic alphabet. KOI8-R was based on Russian Morse code, which was created from a phonetic version of Latin Morse code. As a result, Russian Cyrillic letters are in pseudo-Roman order rather than the normal Cyrillic alphabetical order. Although this may seem unnatural, if the 8th bit is stripped, the text is partially readable in ASCII and may convert to syntactically correct KOI-7. For example, "Код Обмена Информацией" in KOI8-R becomes kOD oBMENA iNFORMACIEJ (the Russian meaning of the "KOI" acronym).
Alias(es) | cp878 (code page 878) |
---|---|
Language(s) | Russian, Bulgarian |
Classification | 8-bit KOI, extended ASCII |
Extends | KOI8-B |
Based on | KOI-8 |
Other related encoding(s) | KOI8-U, KOI8-RU |
KOI8 stands for Kod Obmena Informatsiey, 8 bit (Russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit". In Microsoft Windows, KOI8-R is assigned the code page number 20866. In IBM, KOI8-R is assigned code page 878. KOI8-R also happens to cover Bulgarian.
It lacks proper quotation marks for these languages: both «...» and the Bulgarian „...“. Windows-1251 does support these, as well as more letters, and has thus become more popular. KOI8-R is used by less than 0.004% of websites, mostly Russian and Bulgarian.[citation needed]Unicode and UTF-8 is preferred to single-byte Cyrillic encodings in modern applications, Unicode contains 436 Cyrillic letters including for Old Cyrillic.
Character set
The following table shows the KOI8-R encoding. Each character is shown with its equivalent Unicode code point.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ||||||||||||||||
1x | ||||||||||||||||
2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
8x | ─ 2500 | │ 2502 | ┌ 250C | ┐ 2510 | └ 2514 | ┘ 2518 | ├ 251C | ┤ 2524 | ┬ 252C | ┴ 2534 | ┼ 253C | ▀ 2580 | ▄ 2584 | █ 2588 | ▌ 258C | ▐ 2590 |
9x | ░ 2591 | ▒ 2592 | ▓ 2593 | ⌠ 2320 | ■ 25A0 | ∙ 2219 | √ 221A | ≈ 2248 | ≤ 2264 | ≥ 2265 | NBSP | ⌡ 2321 | ° 00B0 | ² 00B2 | · 00B7 | ÷ 00F7 |
Ax | ═ 2550 | ║ 2551 | ╒ 2552 | ё 0451 | ╓ 2553 | ╔ 2554 | ╕ 2555 | ╖ 2556 | ╗ 2557 | ╘ 2558 | ╙ 2559 | ╚ 255A | ╛ 255B | ╜ 255C | ╝ 255D | ╞ 255E |
Bx | ╟ 255F | ╠ 2560 | ╡ 2561 | Ё 0401 | ╢ 2562 | ╣ 2563 | ╤ 2564 | ╥ 2565 | ╦ 2566 | ╧ 2567 | ╨ 2568 | ╩ 2569 | ╪ 256A | ╫ 256B | ╬ 256C | © 00A9 |
Cx | ю 044E | а 0430 | б 0431 | ц 0446 | д 0434 | е 0435 | ф 0444 | г 0433 | х 0445 | и 0438 | й 0439 | к 043A | л 043B | м 043C | н 043D | о 043E |
Dx | п 043F | я 044F | р 0440 | с 0441 | т 0442 | у 0443 | ж 0436 | в 0432 | ь 044C | ы 044B | з 0437 | ш 0448 | э 044D | щ 0449 | ч 0447 | ъ 044A |
Ex | Ю 042E | А 0410 | Б 0411 | Ц 0426 | Д 0414 | Е 0415 | Ф 0424 | Г 0413 | Х 0425 | И 0418 | Й 0419 | К 041A | Л 041B | М 041C | Н 041D | О 041E |
Fx | П 041F | Я 042F | Р 0420 | С 0421 | Т 0422 | У 0423 | Ж 0416 | В 0412 | Ь 042C | Ы 042B | З 0417 | Ш 0428 | Э 042D | Щ 0429 | Ч 0427 | Ъ 042A |
See also
- KOI8-B, a derivation of KOI8-R with only the letter subset implemented
- KOI8-U, another derivative encoding which adds Ukrainian characters
- KOI character encodings
- RELCOM
- Windows-1251, another common Cyrillic character encoding
References
- "SBCS code page information - CPGID: 00878 / Name: Russian internet koi8-r". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. IBM. C-H 3-3220-050. Archived from the original on 2017-02-18. Retrieved 2017-02-18.
- "CCSID information document; CCSID 878; KOI8-R CYRILLIC". IBM. Retrieved 2017-02-18.
- Richter, Helmut (2016-01-04) [1999-08-18]. "KOI8-R.TXT". 2.0. Retrieved 2016-12-09.
- Code Page CPGID 00878 (pdf) (PDF), IBM
- Code Page CPGID 00878 (txt), IBM
- International Components for Unicode (ICU), ibm-878_P100-1996.ucm, 2002-12-03
Further reading
- Flohr, Guido; Kiss, Gabor; Chernov, Andrey A. (2016) [2006]. "Locale::RecodeData::KOI8_R - Conversion routines for KOI8-R". CPAN libintl-perl. 1.0. Archived from the original on 2017-01-15. Retrieved 2017-01-15.
- Kostis, Kosta. "koi8-r (Russian U*IX encoding, also used by RELCOM)". 1.20. Archived from the original on 2017-01-16. Retrieved 2017-01-16.
- RFC 1489
- "KOI8-R (RFC 1489)". Kermit. Columbia University. Retrieved 2020-06-24.
- Kornai, Andras; Birnbaum, David J.; da Cruz, Frank; Davis, Bur; Fowler, George; Paine, Richard B.; Paperno, Slava; Simonsen, Keld J.; Thobe, Glenn E.; Vulis, Dimitri; van Wingen, Johan W. (1993-03-13). "CYRILLIC ENCODING FAQ Version 1.3". 1.3. Retrieved 2020-06-24.
External links
- Universal Cyrillic decoder, an online program that may help recovering Cyrillic texts with broken KOI8-R or other character encodings.
- "The Home of the KOI8-R since 1995". 1995. Retrieved 2016-12-05.
- Czyborra, Roman (1998-11-30) [1998-05-25]. "The Cyrillic Charset Soup". Archived from the original on 2016-12-03. Retrieved 2016-12-03.
- Hohlov, Yu. E. "Cyrillic Information Representation in Electronic Form - Character Set (Code Page) Tables". Archived from the original on 2016-12-05. Retrieved 2016-12-05.
- Nechayev, Valentin (2013) [2001]. "Review of 8-bit Cyrillic encodings universe". Archived from the original on 2016-12-05. Retrieved 2016-12-05.
KOI8 R RFC 1489 is an 8 bit character encoding derived from the KOI 8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian which uses a Cyrillic alphabet KOI8 R was based on Russian Morse code which was created from a phonetic version of Latin Morse code As a result Russian Cyrillic letters are in pseudo Roman order rather than the normal Cyrillic alphabetical order Although this may seem unnatural if the 8th bit is stripped the text is partially readable in ASCII and may convert to syntactically correct KOI 7 For example Kod Obmena Informaciej in KOI8 R becomes kOD oBMENA iNFORMACIEJ the Russian meaning of the KOI acronym KOI8 RAlias es cp878 code page 878 Language s Russian BulgarianClassification8 bit KOI extended ASCIIExtendsKOI8 BBased onKOI 8Other related encoding s KOI8 U KOI8 RUvte KOI8 stands for Kod Obmena Informatsiey 8 bit Russian Kod Obmena Informaciej 8 bit which means Code for Information Exchange 8 bit In Microsoft Windows KOI8 R is assigned the code page number 20866 In IBM KOI8 R is assigned code page 878 KOI8 R also happens to cover Bulgarian It lacks proper quotation marks for these languages both and the Bulgarian Windows 1251 does support these as well as more letters and has thus become more popular KOI8 R is used by less than 0 004 of websites mostly Russian and Bulgarian citation needed Unicode and UTF 8 is preferred to single byte Cyrillic encodings in modern applications Unicode contains 436 Cyrillic letters including for Old Cyrillic Character setThe following table shows the KOI8 R encoding Each character is shown with its equivalent Unicode code point KOI8 R 0 1 2 3 4 5 6 7 8 9 A B C D E F0x1x2x SP amp 3x 0 1 2 3 4 5 6 7 8 9 lt gt 4x A B C D E F G H I J K L M N O5x P Q R S T U V W X Y Z 6x a b c d e f g h i j k l m n o7x p q r s t u v w x y z 8x 2500 2502 250C 2510 2514 2518 251C 2524 252C 2534 253C 2580 2584 2588 258C 25909x 2591 2592 2593 2320 25A0 2219 221A 2248 2264 2265 NBSP 2321 00B0 00B2 00B7 00F7Ax 2550 2551 2552 yo 0451 2553 2554 2555 2556 2557 2558 2559 255A 255B 255C 255D 255EBx 255F 2560 2561 Yo 0401 2562 2563 2564 2565 2566 2567 2568 2569 256A 256B 256C c 00A9Cx yu 044E a 0430 b 0431 c 0446 d 0434 e 0435 f 0444 g 0433 h 0445 i 0438 j 0439 k 043A l 043B m 043C n 043D o 043EDx p 043F ya 044F r 0440 s 0441 t 0442 u 0443 zh 0436 v 0432 044C y 044B z 0437 sh 0448 e 044D sh 0449 ch 0447 044AEx Yu 042E A 0410 B 0411 C 0426 D 0414 E 0415 F 0424 G 0413 H 0425 I 0418 J 0419 K 041A L 041B M 041C N 041D O 041EFx P 041F Ya 042F R 0420 S 0421 T 0422 U 0423 Zh 0416 V 0412 042C Y 042B Z 0417 Sh 0428 E 042D Sh 0429 Ch 0427 042ASee alsoKOI8 B a derivation of KOI8 R with only the letter subset implemented KOI8 U another derivative encoding which adds Ukrainian characters KOI character encodings RELCOM Windows 1251 another common Cyrillic character encodingReferences SBCS code page information CPGID 00878 Name Russian internet koi8 r IBM Software Globalization Coded character sets and related resources Code pages by CPGID Code page identifiers IBM C H 3 3220 050 Archived from the original on 2017 02 18 Retrieved 2017 02 18 CCSID information document CCSID 878 KOI8 R CYRILLIC IBM Retrieved 2017 02 18 Richter Helmut 2016 01 04 1999 08 18 KOI8 R TXT 2 0 Retrieved 2016 12 09 Code Page CPGID 00878 pdf PDF IBM Code Page CPGID 00878 txt IBM International Components for Unicode ICU ibm 878 P100 1996 ucm 2002 12 03Further readingFlohr Guido Kiss Gabor Chernov Andrey A 2016 2006 Locale RecodeData KOI8 R Conversion routines for KOI8 R CPAN libintl perl 1 0 Archived from the original on 2017 01 15 Retrieved 2017 01 15 Kostis Kosta koi8 r Russian U IX encoding also used by RELCOM 1 20 Archived from the original on 2017 01 16 Retrieved 2017 01 16 RFC 1489 KOI8 R RFC 1489 Kermit Columbia University Retrieved 2020 06 24 Kornai Andras Birnbaum David J da Cruz Frank Davis Bur Fowler George Paine Richard B Paperno Slava Simonsen Keld J Thobe Glenn E Vulis Dimitri van Wingen Johan W 1993 03 13 CYRILLIC ENCODING FAQ Version 1 3 1 3 Retrieved 2020 06 24 External linksUniversal Cyrillic decoder an online program that may help recovering Cyrillic texts with broken KOI8 R or other character encodings The Home of the KOI8 R since 1995 1995 Retrieved 2016 12 05 Czyborra Roman 1998 11 30 1998 05 25 The Cyrillic Charset Soup Archived from the original on 2016 12 03 Retrieved 2016 12 03 Hohlov Yu E Cyrillic Information Representation in Electronic Form Character Set Code Page Tables Archived from the original on 2016 12 05 Retrieved 2016 12 05 Nechayev Valentin 2013 2001 Review of 8 bit Cyrillic encodings universe Archived from the original on 2016 12 05 Retrieved 2016 12 05