In this presentation we discuss the Conversational Agent (CA) designs for two closely related problem areas:
The CA perspective is taken mostly for exposition and didactic purposes. Nevertheless, we emphasise the practical applicability of the underlying designs and implementations.
Although, operationally data acquisitions are prerequisite for data wrangling we discuss data wrangling first -- the corresponding DTWs designs and implementations are more mature and the related materials are more universal, applicable to multiple programming languages.
Multi-language Data Wrangling and Acquisition Conversational Agents
Anton Antonov
FOSDEM 2022
Abstract
In this presentation we discuss the Conversational Agent (CA) designs for two closely related problem areas:
The CA perspective is taken mostly for exposition and didactic purposes. Nevertheless, we emphasise the practical applicability of the underlying designs and implementations.
Although, operationally data acquisitions are prerequisite for data wrangling we discuss data wrangling first -- the corresponding DTWs designs and implementations are more mature and the related materials are more universal, applicable to multiple programming languages.
Outline
Data Wrangling
In the first part of the presentation we show and compare data wrangling examples in different programming languages using different packages.
Here is a list of the programming languages and packages we consider:
Julia-DataFrames
Python-pandas
R
R-tidyverse
WL
We look into the common data wrangling workflows and how we can design a conversational agent that translates natural language commands into data wrangling code for Julia, Python, R, SQL, WL.
WL's external evaluator features are heavily utilized.
Data Acquisition Workflows
In the second part of the presentation we discuss the following facets of a data acquisition system:
Conversational Agent based on a Finite State Machine
Gathering and utilizing metadata taxonomies
The making of datasets recommender systems and search engines
Making (ingredient) variables queries
Introspection queries
Random data generation specifications
Data obfuscation specifications
Extensions to ML models acquisition workflows