Toke
`Toke' is a program for performing tokenization and morphological and
typographical analysis and manipulation of natural-language text. It has been
developed as a joint venture between SULTRY and the Microsoft Research Institute.
Toke includes the following features:
- Tokenization of text into tokens of various types, preserving sub-token
typographical information
- User-configurability of the parsing and lexical scanning rules employed
to achieve this
- (Optional) Inflectional analysis and generation of English text
- Application of user-specified `change rules' to all or part of the text.
The change rules are written in a flexible and powerful language and can refer to
literal, typographical, and morphological properties of the text
- A graphical user interface
- Cross-platform compatibility: Toke is written in a mixture of C and
Tcl/Tk
You can read the documentation for more detailed
information.
Toke is distributed in source form, and also (for the Macintosh) as a
self-contained binary.
© University of Sydney and Microsoft Research Institute 1997
23 April 1997