EBISearch is a text search engine providing access to biological data
resources hosted at EMBL-EBI. We will speak about the history of this
engine, the infrastracture used, some statistics and future plans.
EBISearch is a text search engine providing access to biological data
resources hosted at EMBL-EBI. These Lucene indexes are organized in a
hierarchy and we offer an easy 'inter-index' (or, in Lucene terms:
inter-domain) navigation via a network of cross-references.
The data resources represented in the EBI Search engine include:
biological sequences, chemicals and macro-molecular structures,
bio-medical literature abstracts and meta-information related to
biological entities (e.g. genes, transcripts, proteins, etc.)
The EBISearch evolution/development is influenced by the EMBL-EBI IT
infrastructure, which is designed to cope with great amount of data and
relies on technical choices about data storage on network/distributed
filesystem and heterogeneous type of hosts.
The EBISearch engine provides search accross ~1.1bn documents updated
to the last biologial data available; we will provide some statistics
about the amount of data we index; indexing parallelisation and the
lifecycle of these indexes.
At search time the engine organize the indexes in a hierarchy and searches
are executed across most domains. This allows us to benefit of homogeneus
score across the indexes. We rely heavily on the use facets for
filtering results (Lucene taxonomies)..
EBISearch usage is monitored and analyzed through our application logs
as well as web logs; we will present some statistics about usage and
discuss which kind of theme we are looking into; the focus is in
understanding usage patterns to drive our next development.
In the future, we need to explore how to cope with the increasing
volume of data that are being generated in the bio-medical fields and on
how to handle requests for new functionalities; for these reasons we will
investigate the usage of other technologies and existing
search engines based on Lucene.
We are also interested in different types of data visualization and we
will briefly present what we have done in this area.