A utility program that fetches and preprocesses learning data from supported learning tools. Educators and researches have important usecases for accessing the raw data that is generated while learners are using digital learning tools and environments. These stakeholders can aim to e.g. analyse and improve teaching materials, methods, and activities.
The aim of Llama CLI is to support and ease the steps of
- connecting to the supported learning data sources
- excluding persons and unwanted data tables or columns
- fetching partial and complete data sets
- anonymizing data before research activities
- standardizing/transforming/sharing data
- sampling and selecting data for analysis/ML
Currently supported data sources are
- A-plus https://apluslms.github.io/
- JSON log files from https://github.com/acos-server/acos-server
- Database export from https://docs.mongodb.com/database-tools/mongodump/
Transforming program submissions and events to ProgSnap 2
is supported via llama shell
.
The name for the project comes from ~ la lumière à Montagne analytique. Pardon my French for ~ light on the mountain of analytics. Also LA is an acronym, that the package author may have used in his thesis more than a decent number of times, and that stands for Learning Analytics which is a research field in education technologies. Llamas are also known from a controversial programming exercise for computer science majors at Aalto University.
Llama CLI is available at PyPI. It has a number of automatically installed
dependencies, most notably pandas
, numpy
, scipy
, and requests
.
% python3 -m pip install llama-cli
% llama
OR contained in a virtual environment (directory)
% python3 -m venv .venv && .venv/bin/pip install llama-cli
% .venv/bin/llama
Llama CLI operates on the current working directory. The configurations and data will be stored in that directory – little bit like when working with git repositories. One work directory can connect with multiple data sources and one should select the sources that the current research or analysis project requires.
% llama
Llama CLI fetches and preprocesses learning data
usage: llama <cmd> [<args>]
status Show the working tree status
source Manage sources for learning data
list List available data tables and columns
privacy Configure privacy (default: pseudoanonymous)
exclude Exclude selected tables, columns, or persons at fetch
fetch Fetch data from sources
anonymize Export anonymized data
shell Open python REPL with 'llama' instance for exported data
- Use
llama source add
to interactively connect with data sources. The required addresses and keys will be prompted when required. - Use
llama list
to fetch the available data tables. - Time to consider excluding some uninteresting data or persons who have
not consent to the research at hand. See
llama exclude
for examples. - Use
llama fetch rows
to download data tables. Depending on the project it may be necessary to alsollama fetch files
and/orllama fetch meta
. This step has a delay between internet requests and it may take a long time to complete. The rows can be fetched again to append new data if supported by the data source. - The data in
fetched
directory is pseudoanonymized by default. The pseudo identifiers are required to complete fetching of depended data. With access to the source database the pseudo identifiers can be traced to persons. Usellama anonymize
to produceexport
directory that can be e.g. stored in research repository, when the security measures and research consent allow it.
The raw CSV and other files are available in the export
directory. However,
the package also offers a Python interface for programmatic accessors and samplers
of the exported data. Exports can be opened both in an interactive test via
llama shell
or using following constructor in a program or e.g. Jupyter notebook.
from llama import LlamaApi, LlamaStats
llama = LlamaApi('export')
API documentation:
This README documents the
LlamaApi
that in addition to selecting data, offers quick output from statistical methods inLlamaStats
. When the return values are needed for further processing, theLlamaStats
must be used directly.
Constructs a standard interface to work with one or multiple Llama export directories.
If no directory parameters are given the constructor seeks ./export
directory.
Calculated distributions are cached in memory for multiple queries.
*directories: str
(optional 0-N paramaters) Llama export directory paths- Returns an instance of
LlamaApi
Lists sources and tables from the data. Subset of data can be selected with the optional select dictionary.
select: dict OR dict[]
(optional) comprised of the following keyssource: int
(optional) index of a learning data sourcetable: str
(optional) text to match table name (or id)table_by_id: bool
(optional) True to matchtable
with table idpersons: str[]
(optional) list of person identifiersreverse: bool
(optional) True to exclude above matches and include rest
Reads and iterates over data form tables. This method can be combined with
many methods from LlamaStats
.
select: dict
(optional) seellama.list
- Returns
iterator
overtuples
of(source: dict, table: dict, rows: pandas.DataFrame)
Creates a ProgSnap 2 compatible export that merges the selected tables to one main event table.
select: dict
seellama.list
export_dir: str
a directory where the new export is created
Calculates statistical grade and attempt distributions, as well as weekly and daily patterns.
select: dict
(optional) seellama.list
Renders a page about overall statistics.
select: dict
(optional) seellama.list
pdf_name: str
(optional) a file name for pdf output, else try to plot to window
Calculates statistical grade and attempt distributions, as well as weekly and daily patterns for the learners.
select: dict
(optional) seellama.list
Renders a statistic page for each learner.
select: dict
(optional) seellama.list
pdf_name: str
(optional) a file name for pdf output, else try to plot to window
Compresses learner distributions into 21 variables per learner.
select: dict
(optional) seellama.list
csv_name: str
(optional) a file name for csv output, else print
Calculates statistical distributions for each selected exercise table.
select: dict
(optional) seellama.list
Renders a statistic page for each exercise.
select: dict
(optional) seellama.list
pdf_name: str
(optional) a file name for pdf output, else try to plot to window
Compresses exercise distributions into 23 variables per exercise.
select: dict
(optional) seellama.list
csv_name: str
(optional) a file name for csv output, else print