talk on conference website
NLMaps Web is a web interface for querying OSM with natural language questions such as “Show me where I can find drinking water within 500m of the Louvre in Paris”. They are first parsed into a custom query language, which is then used to retrieve the answer by queries to Nominatim and Overpass.
Nominatim and Overpass are powerful ways of querying OSM, but the Overpass Query
Language is somewhat impractical for quick queries for unfamiliar users. In
order to query OSM using natural language (NL) queries such as “Show me where I
can find drinking water within 500m of the Louvre in Paris”, Lawrence and
Riezler [1] created the first NLMaps dataset mapping NL queries to a custom
machine-readable language (MRL), which can then be used to retrieve the answer
from OSM via a combination of queries to Nominatim and Overpass. They extended
their dataset in a subsequent work by auto-generating synthetic queries from a
table mapping NL terms to OSM tags – calling the combined dataset NLMaps v2. [2]
The proposed purpose of these datasets is training a parser that can parse NL
queries into their MRL representation, as done in [2-5].
The main aim of my Master’s thesis was building a web-based NLMaps interface
that can be used to issue queries and to view the result. In addition, the web
interface should enable the user to give feedback on the returned, either by
simply marking the parser-produced MRL query as correct or incorrect, or by
explicitly correcting it with the help of a web form. This feedback should be
directly used to improve the parser by training it in an asynchronous online
learning procedure.
After observing that parsers trained on NLMaps v2 perform poorly on new queries,
an investigation into the causes for this revealed several shortcomings in
NLMaps v2, mainly: (1) Train and test split are extremely similar limiting the
informativeness of evaluating on the test split. (2) Various inconsistencies
exist mapping from NL terms to OSM tags (e.g. “forest” sometimes mapping to
natural=wood, sometimes to landuse=forest). (3) The NL queries’ linguistic
diversity is limited since most of them were generated with a very simple
templating procedure, which leads to parsers trained on the data not being very
robust to new wordings of a query. (4) In a similar vein, there is only a small
amount of different area names in NLMaps v2 with the names “Paris”, “Heidelberg”
and “Edinburgh” being so dominant that parsers are biased towards producing
them. (5) Some generated NL queries are worded very unnaturally making them
counter-productive learning examples. (6) Usage of OSM tags is sometimes
incorrect, which affects the usefulness of produced parses.
The detailed analysis is used to eliminate some of the shortcomings – such as
incorrect tag usage – from NLMaps v2. Additionally, a new approach of
auto-generating NL-MRL pairs with probabilistic templates is used to create a
dataset of synthetic queries that features a significantly higher linguistic
diversity and a large set of different area names. The combination of the
improved NLMaps v2 and the new synthetic queries is called NLMaps v3.
A character-based GRU encoder-decoder model with attention [6] is used for
parsing NL queries into MRL queries using the configuration that performed best
in previous work [5]. This model is trained on NLMaps v3 and used as the parser
in the newly developed web interface. Mainly through advertising on the OSM talk
list and the OSM subreddit, 12 annotators are hired from all over the world to
use the web interface to issue new NL queries and to correct the parser-produced
MRL query if it is incorrect. They are assisted by completing a tutorial before
the annotation job and by help compiled from taginfo [7], TagFinder [8] and
custom suggestions for difficult tag combinations. The collected dataset
contains 3773 NL-MRL pairs and is called NLMaps v4.
With the help of NLMaps v4, an informative evaluation can be performed revealing
that a parser trained on NLMaps v2 parses achieve an exact match accuracy of
5.2 % on the MRL queries of the test split of NLMaps v4 while a parser trained
on NLMaps v3 performs significantly better with 28.9 %. Pre-training on
NLMaps v3 and fine-tuning on NLMaps v4 achieves an accuracy of 58.8 %.
Since the thesis’s goal is an online learning system – i.e. a system that
updates the parser directly after receiving feedback in the form of an NL-MRL
pair –, various online learning simulations are conducted in order to find the
best setup. In all cases, the parser is pre-trained on NLMaps v3 and then
receives the NL-MRL pairs in NLMaps v4 one by one, updating the model after each
step. The most simple variant of the experiment uses only the one NL-MRL pair
for the update, another variant adds NL-MRL pairs from NLMaps v3 to the
minibatch and a third variant additionally adds further “memorized” NL-MRL pairs
from previously given feedback to the minibatch. The main findings of the
simulation are that all variants improve performance on NLMaps v4 with respect
to the pre-trained parser, but with some of them the performance on NLMaps v3
degrades. The simple variant that updates only on the one NL-MRL pair is
paricularly unstable, while adding NLMaps v3 instances stabilizes the
performance on NLMaps v3 and improves the performance on NLMaps v4. Adding the
instances from memorized feedback further improves the performance to an
accuracy of 53.0Â %, which is still lower than the offline batch learning
fine-tuning mentioned in the previous paragraph.
In conclusion, the thesis improves the existing NLMaps dataset and contributes
two new datasets – one of which is especially valuable since it consists of real
user queries – laying the groundwork necessary for further enhancing NLMaps
parsers. The current parser – achieving an accuracy of 58.8 % – can be used by
OSM users via the new web interface currently available at
https://nlmaps.gorgor.de/ for issuing queries and also for correcting incorrect
ones. Future work will concentrate on improving the web interface’s UX and
enhancing the parser’s performance in terms of speed and accuracy.