TeXHyphenator-JThis package consists of:
The package provides a simple Java API to hyphenate a word using TeX hyphenation tables. The distributions are available for anonymous ftp download at http://ftp.davidashen.net/TeXHyphenator-J. There is also a SourceForge project texhyphj. Edit Makefile as appropriate (paths to java binaries and compilation options can be changed) and execute make. net/davidashen/text/Hyphenator.java contains: public static void main(String[] args) which both serves as a programming example and provides a simple command-line interface to test hyphenation patterns and code lists. Try the following command: java net.davidashen.test.Hyphenator hyphenation etc/hyphen/hyphen.tex net.davidashen.text.Hyphenator h=new net.davidashen.text.Hyphenator(); h.setErrorHandler(new MyErrorHandler()); h.loadTable(new java.io.BufferedInputStream(new java.io.FileInputStream("hyphen.tex"))); String hyphenated_word=h.hyphenate(word); See auto-generated API documentation for interface details. The module accepts most TeX hyphenation tables with no or little modification. It handles sections 'patterns' (for hyphenation patterns) and 'hyphenation' (for exceptions); otherwise, it ignores everything else. TeX macro definitions are not supported, and are not intended to be, since the purpose is to make concise and clear code available, not to re-implement the TeX parser in Java. Hexadecimal characters (e.g. ^^ae) and control characters (^^A) are supported; the former ones are usually used for non-ANSI European characters. Additionally, \rm macros for accented characters are translated into UCS codes; that is, \^a is 0xe2, \l is 0x142 etc. See net.davidashen.text.Hyphenator.Scanner.acctab for the full list. Certain hyphenation tables use encoding other than ISO-8859-1. To facilitate translation from that particular encoding to UCS, a list of codes and their unicode values can be passed to the hyphenator. See ruhyphal.tex, koicodes.txt for an example of a KOI8-R-encoded hyphenation table and a list of codes. TeXHyphenator-J, a hyphenation library in Java is developed by David Tolpin. This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA | |