Appendix D. Resources

 

Web search engines are your friends. Type lucene in your favorite web search engine and you’ll find many interesting Lucene-related projects. Other good places to look are SourceForge, Google Code, and GitHub; a search for lucene on any of those sites displays a number of open source projects written on top of Lucene.

D.1. Lucene knowledgebases

Search Lucene: http://search-lucene.com/

D.2. Internationalization

Unicode page in Wikipedia: http://en.wikipedia.org/wiki/Unicode

The Unicode Consortium: http://unicode.org

Bray, Tim, “Characters vs. Bytes”: www.tbray.org/ongoing/When/200x/2003/04/26/UTF

Green, Dale, “Trail: Internationalization”: http://java.sun.com/docs/books/tutorial/i18n/index.html

Lindenberg, Norbert, and Masayoshi Okutsu, “Supplementary Characters in the Java Platform”: http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

Peterson, Erik, “Chinese Character Dictionary—Unicode Version”: www.mandarin-tools.com/chardict_u8.html

Spolsky, Joel, “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)”: www.joelonsoftware.com/articles/Unicode.html

Davis, Mark, “Globalization Gotchas”: http://macchiato.com/slides/GlobalizationGotchas.ppt

D.3. Language detection

Rosette Language Identifier, http://basistech.com/language-identification

D.4. Term vectors

D.5. Lucene ports

D.6. Case studies

D.7. Miscellaneous

D.8. IR software

D.9. Doug Cutting’s publications