The Isobel Project
Isobel can be functionally divided in two subsytems, Isobel Gatherer (the crawling and filtering subsystem) and Isobel Analyzer (the analysis subsystem). This two subsytems can also be used separately, to allow for a more flexible solution.
The problem:
Professionals and researchers working in the knowledge management and content analysis field are usually facing the same problems. In order to experiment new content analysis algorithms, whether semantic or statistical, they have to reimplement an infrastructure which gathers content, which converts it from common data formats, which stores the content in an appropriate way. They also commonly need to analyse graphs, to index words, to use dictionary of words or semantic networks. Moreover, since the crawling of the web is an error prone process, they always need an efficient logging infrastructure which is also able to notify some problems to the user.
Isobel, an open source framework
Isobel works out of the box, but it's meant mainly as a framework to
build complex information retrieval and analysis systems.
Isobel can be functionally divided in two subsytems, Isobel Gatherer (the crawling and filtering subsystem) and Isobel Analyzer (the analysis subsystem). The two subsytems can also be used separately, to allow for a more flexible solution.
Both systems share a common infrastructure for configuration, IO, logging etc.
Isobel Gatherer offers ready to use services like multi protocol content
fetching with fine grained scheduling, formats conversion (pdf, doc,
etc.), Hyperlink graph storage and analysis, content storage and
indexing, ontology services, xml configuration, logging and statistics.
On top of this services a pipeline plug-in model allows the
programmer to easily extend the system with specific advanced
functions.
Isobel Analyzer unleashes the power of the IBM UIMA architecture allowing for the reuse of analysis components developed for this architecture.
The same configuration, logging, event notification and IO infrastructure is shared between the two subsytems, keeping the complexity of the entire system quite low and easeing the approach to the UIMA architecture.
Isobel is mainly developed in Java, and it supports GNU/Linux and windows OS.
It is based on several well known open source projects, mainly: Lucene, IBM UIMA, Hybernate, Log4J, JUNG, Xerces.
Features List
- MultiProtocol fetching (HTTP, FTP, File system, db fetching)
- Highly configurable scheduling
- Document formats conversions (PDF, DOC, RTF)
- Graph Storage
- Graph structure analysis
- XML Content storage
- XML Configuration
- Python scripting
- Ontology service interface (word synonyms, hyponyms, hypernyms)
- Pipeline plug-in model for extensions
- UIMA analyzers reuse
- Log4J logging
- Indexing with Lucene
- Language guessing with NGramJ
For more information please contact:




