Edward Zimmermann

👥 1 conference
🎤 1 talk
📅 Years active: 2022 to 2022

Biography

A transdisciplinary technologist with a broad education and diverse interests,Ed has worn many hats throughout his career including that of a national economist, market/social researcher, cybernetician, computer scientist and entrepreneur. Often these hats were interchangeable (and overlaid). In computer networking his experience reaches back to the early days of the ARPAnet when he found himself as a teen with a shared office in UCLA's Boelter Hall. Fast forward to the early 1990s and he founded of one of the first ISPs in Germany and was one of the earliest contributors to the WWW.

A dominant focus of Ed's R&D over the past 20+ years has been text retrieval, metadata, data mining, knowledge discovery, pattern recognition, NLP and machine learning. He has been a part of many publically funded projects, worked with German, EU and UN organizations and collaborated with a number of research institutes and national scientific agencies. His name is also associated with a large number of open source software packages.

— biography from FOSDEM 2022
https://archive.fosdem.org/2022/schedule/speaker/edward_zimmermann/

Conferences

1 known conferences

👥 FOSDEM 2022 📅 05 Feb 2022

🎤 A lightning intro to re-Isearch
06 Feb 2022 show details

Project re-isearch is a novel multimodal search and retrieval engine using mathematical models and algorithms different from the all-too-common inverted index. The design allows it to have, in practice, effectively no limits on the frequency of words, term length, number of fields or complexity of structured data and support even overlap--- where fields or structures cross other's boundaries (common examples are quotes, line/sentences, biblical verse, annotations). Its model enables a completely flexible unit of retrieval and modes of search. Developed using a highly portable C++ subset to be RAM efficient, the engine provides also bindings to a number of other languages such as Python, Tcl, Java etc.

“Re-isearch” is a project following in the spirit of the original isearch developed back in the 1990s. Reborn in 2020 in the middle of the global Covid19 pandemic as Project re-Isearch.

Like the original, it is not just about textual words but pushes the envelope. re-Isearch is multi-object, multi-modal and with an unharnessed unit of retrieval.

Mainstream search engines are about finding any information: "a list of all documents containing a specific word or phrase”. So search engines paradoxically return both too much information (i.e. long lists of links) and too little information (i.e. links to content, not content itself). The re-Isearch engine is, by contrast, about exploiting document structure, both implicit (XML and other markup) and explicit (visual groupings such as paragraph), to zero in on relevant sections of documents, not just links to documents. This concept of search granularity is a radical departure from other designs. With typical text indexers one has the concept of document or record and that is the unit of index and the unit of retrieval. Instead we can have a dynamic search time unit of retrieval: user specified or heuristically determined. The structure of of documents can be exploited to identify which document elements (such as the appropriate chapter or page) to retrieve. Retrieval granularity may be on the level of sub-structures of a given document or page such as line, paragraph but may also be as part of a larger collection.

Like the original, it is not just about textual words but the design contains a large number of objects: numerical, range, geospatial etc. It is unique among full-text systems in that it also provides numerous object types with their own methods of search and allows these to be viewed parallel as text--- a date field (of which it will be one of the first to support some key parts of the new ISO-8601:2019 standard date semantics), for instance, can be searched as a date but also a text, searching for the words in the field.