![Alphabet (computer science)](https://www.english.nina.az/image-resize/1600/900/web/wikipedia.jpg)
In formal language theory, an alphabet, sometimes called a vocabulary, is a non-empty set of indivisible symbols/characters/glyphs, typically thought of as representing letters, characters, digits, phonemes, or even words. Alphabets in this technical sense of a set are used in a diverse range of fields including logic, mathematics, computer science, and linguistics. An alphabet may have any cardinality ("size") and, depending on its purpose, may be finite (e.g., the alphabet of letters "a" through "z"), countable (e.g., ), or even uncountable (e.g., ).
Strings, also known as "words" or "sentences", over an alphabet are defined as a sequence of the symbols from the alphabet set. For example, the alphabet of lowercase letters "a" through "z" can be used to form English words like "iceberg" while the alphabet of both upper and lower case letters can also be used to form proper names like "Wikipedia". A common alphabet is {0,1}, the binary alphabet, and a "00101111" is an example of a binary string. Infinite sequences of symbols may be considered as well (see Omega language).
It is often necessary for practical purposes to restrict the symbols in an alphabet so that they are unambiguous when interpreted. For instance, if the two-member alphabet is {00,0}, a string written on paper as "000" is ambiguous because it is unclear if it is a sequence of three "0" symbols, a "00" followed by a "0", or a "0" followed by a "00".
Notation
If L is a formal language, i.e. a (possibly infinite) set of finite-length strings, the alphabet of L is the set of all symbols that may occur in any string in L. For example, if L is the set of all variable identifiers in the programming language C, L's alphabet is the set { a, b, c, ..., x, y, z, A, B, C, ..., X, Y, Z, 0, 1, 2, ..., 7, 8, 9, _ }.
Given an alphabet , the set of all strings of length
over the alphabet
is indicated by
. The set
of all finite strings (regardless of their length) is indicated by the Kleene star operator as
, and is also called the Kleene closure of
. The notation
indicates the set of all infinite sequences over the alphabet
, and
indicates the set
of all finite or infinite sequences.
For example, using the binary alphabet {0,1}, the strings ε, 0, 1, 00, 01, 10, 11, 000, etc. are all in the Kleene closure of the alphabet (where ε represents the empty string).
Applications
Alphabets are important in the use of formal languages, automata and semiautomata. In most cases, for defining instances of automata, such as deterministic finite automata (DFAs), it is required to specify an alphabet from which the input strings for the automaton are built. In these applications, an alphabet is usually required to be a finite set, but is not otherwise restricted.
When using automata, regular expressions, or formal grammars as part of string-processing algorithms, the alphabet may be assumed to be the character set of the text to be processed by these algorithms, or a subset of allowable characters from the character set.
See also
- Combinatorics on words
- Terminal and nonterminal symbols
References
- Fletcher, Peter; Hoyle, Hughes; Patty, C. Wayne (1991). Foundations of Discrete Mathematics. PWS-Kent. p. 114. ISBN 0-53492-373-9.
An alphabet is a nonempty finite set the members of which are called symbols or characters.
- Ebbinghaus, H.-D.; Flum, J.; Thomas, W. (1994). Mathematical Logic (2nd ed.). New York: Springer. p. 11. ISBN 0-387-94258-0.
By an alphabet
we mean a nonempty set of symbols.
- Rosen, Kenneth H. (2012). Discrete Mathematics and Its Applications (PDF) (7th ed.). New York: McGraw Hill. pp. 847–851. ISBN 978-0-07-338309-5.
A vocabulary (or alphabet) V is a finite, nonempty set of elements called symbols. A word (or sentence) over V is a string of finite length of elements of V.
- Rautenberg, Wolfgang (2010). A Concise Introduction to Mathematical Logic (PDF) (Third ed.). Springer. p. xx. ISBN 978-1-4419-1220-6.
If 𝗔 is an alphabet, i.e., if the elements 𝐬 ∈ 𝗔 are symbols or at least named symbols, then the sequence (𝐬1,...,𝐬n)∈𝗔n is written as 𝐬1···𝐬n and called a string or a word over 𝗔.
Literature
- John E. Hopcroft and Jeffrey D. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison-Wesley Publishing, Reading Massachusetts, 1979. ISBN 0-201-02988-X.
In formal language theory an alphabet sometimes called a vocabulary is a non empty set of indivisible symbols characters glyphs typically thought of as representing letters characters digits phonemes or even words Alphabets in this technical sense of a set are used in a diverse range of fields including logic mathematics computer science and linguistics An alphabet may have any cardinality size and depending on its purpose may be finite e g the alphabet of letters a through z countable e g v1 v2 displaystyle v 1 v 2 ldots or even uncountable e g vx x R displaystyle v x x in mathbb R Strings also known as words or sentences over an alphabet are defined as a sequence of the symbols from the alphabet set For example the alphabet of lowercase letters a through z can be used to form English words like iceberg while the alphabet of both upper and lower case letters can also be used to form proper names like Wikipedia A common alphabet is 0 1 the binary alphabet and a 00101111 is an example of a binary string Infinite sequences of symbols may be considered as well see Omega language It is often necessary for practical purposes to restrict the symbols in an alphabet so that they are unambiguous when interpreted For instance if the two member alphabet is 00 0 a string written on paper as 000 is ambiguous because it is unclear if it is a sequence of three 0 symbols a 00 followed by a 0 or a 0 followed by a 00 NotationIf L is a formal language i e a possibly infinite set of finite length strings the alphabet of L is the set of all symbols that may occur in any string in L For example if L is the set of all variable identifiers in the programming language C L s alphabet is the set a b c x y z A B C X Y Z 0 1 2 7 8 9 Given an alphabet S displaystyle Sigma the set of all strings of length n displaystyle n over the alphabet S displaystyle Sigma is indicated by Sn displaystyle Sigma n The set i NSi textstyle bigcup i in mathbb N Sigma i of all finite strings regardless of their length is indicated by the Kleene star operator as S displaystyle Sigma and is also called the Kleene closure of S displaystyle Sigma The notation Sw displaystyle Sigma omega indicates the set of all infinite sequences over the alphabet S displaystyle Sigma and S displaystyle Sigma infty indicates the set S Sw displaystyle Sigma ast cup Sigma omega of all finite or infinite sequences For example using the binary alphabet 0 1 the strings e 0 1 00 01 10 11 000 etc are all in the Kleene closure of the alphabet where e represents the empty string ApplicationsAlphabets are important in the use of formal languages automata and semiautomata In most cases for defining instances of automata such as deterministic finite automata DFAs it is required to specify an alphabet from which the input strings for the automaton are built In these applications an alphabet is usually required to be a finite set but is not otherwise restricted When using automata regular expressions or formal grammars as part of string processing algorithms the alphabet may be assumed to be the character set of the text to be processed by these algorithms or a subset of allowable characters from the character set See alsoCombinatorics on words Terminal and nonterminal symbolsReferencesFletcher Peter Hoyle Hughes Patty C Wayne 1991 Foundations of Discrete Mathematics PWS Kent p 114 ISBN 0 53492 373 9 An alphabet is a nonempty finite set the members of which are called symbols or characters Ebbinghaus H D Flum J Thomas W 1994 Mathematical Logic 2nd ed New York Springer p 11 ISBN 0 387 94258 0 By an alphabet A displaystyle mathcal A we mean a nonempty set of symbols Rosen Kenneth H 2012 Discrete Mathematics and Its Applications PDF 7th ed New York McGraw Hill pp 847 851 ISBN 978 0 07 338309 5 A vocabulary or alphabet V is a finite nonempty set of elements called symbols A word or sentence over V is a string of finite length of elements of V Rautenberg Wolfgang 2010 A Concise Introduction to Mathematical Logic PDF Third ed Springer p xx ISBN 978 1 4419 1220 6 If 𝗔 is an alphabet i e if the elements 𝐬 𝗔 are symbols or at least named symbols then the sequence 𝐬1 𝐬n 𝗔n is written as 𝐬1 𝐬n and called a string or a word over 𝗔 LiteratureJohn E Hopcroft and Jeffrey D Ullman Introduction to Automata Theory Languages and Computation Addison Wesley Publishing Reading Massachusetts 1979 ISBN 0 201 02988 X