Data Stack

Python application Documentation Status

1. Presentation

A sample data stack running on Docker, that contains the following components:

  • Airflow

  • Metabase

  • MariaDB, with PHPMyAdmin

  • Postgres, with PHPPgAdmin

  • Doccano data labelling interface

  • Nginx as reverse proxy

  • Sphinx auto-generated documentation

  • A template python module, usable in Airflow DAGS

  • A template machine learning package, using Pytorch

  • A ml_helper package, that provides functions to store machine learning models results and parameters in a database.

  • A utils package with utilities functions.

  • Unit-testing with pytest library

2. Installation

You will need to have the following software installed:

Once you’re good, create a virtual environment in install the pre-requisite python libraries:

virtualenv venv;
source venv/bin/activate;
pip install -r requirements.txt;

3. Usage

3.1 Launch the Docker stack

Run it with:

docker-compose up -d

Then visit:

Add your Airflow DAGS in the dags folder.

3.2 Unit testing

Run the unit tests with:

pytest tests

3.3 Generating the Sphinx docs

Generate the Sphinx documentation with:

sphinx-apidoc ./src -o docs/source -M;
cd docs && make html && open build/html/index.html;

Indices and tables