In this presentation we discuss the Conversational Agent (CA) designs for two closely related problem areas:
Data Acquisition Workflows (DAWs)
Data Transformation Workflows (DTWs)
The CA perspective is taken mostly for exposition and didactic purposes. Nevertheless, we emphasise the practical applicability of the underlying designs and implementations.
Although, operationally data acquisitions are prerequisite for data wrangling we discuss data wrangling first -- the corresponding DTWs designs and implementations are more mature and the related materials are more universal, applicable to multiple programming languages.
Anton Antonov
FOSDEM 2022
In this presentation we discuss the Conversational Agent (CA) designs for two closely related problem areas:
Data Acquisition Workflows (DAWs)
Data Transformation Workflows (DTWs)
The CA perspective is taken mostly for exposition and didactic purposes. Nevertheless, we emphasise the practical applicability of the underlying designs and implementations.
Although, operationally data acquisitions are prerequisite for data wrangling we discuss data wrangling first -- the corresponding DTWs designs and implementations are more mature and the related materials are more universal, applicable to multiple programming languages.
In the first part of the presentation we show and compare data wrangling examples in different programming languages using different packages.
Here is a list of the programming languages and packages we consider:
Julia-DataFrames
Python-pandas
R
R-tidyverse
WL
We look into the common data wrangling workflows and how we can design a conversational agent that translates natural language commands into data wrangling code for Julia, Python, R, SQL, WL.
WL's external evaluator features are heavily utilized.
In the second part of the presentation we discuss the following facets of a data acquisition system:
Conversational Agent based on a Finite State Machine
Gathering and utilizing metadata taxonomies
The making of datasets recommender systems and search engines
Making (ingredient) variables queries
Introspection queries
Random data generation specifications
Data obfuscation specifications
Extensions to ML models acquisition workflows
Speakers: Anton Antonov