Why Gemfury? Push, build, and install  RubyGems npm packages Python packages Maven artifacts PHP packages Go Modules Debian packages RPM packages NuGet packages

Repository URL to install this package:

Details    
  sarus_synthetic_data
  sarus_synthetic_data.egg-info
  tests
  MANIFEST.in
  PKG-INFO
  README.md
  pyproject.toml
  requirements.txt
  setup.cfg
  setup.py
Size: Mime:
  README.md

Synthetic Data Generation

Sarus synthetic generation package. It provides two different models two generate synthetic data:

  • a marginals based model where each column is sampled from its marginals
  • a correlation based model where a deep learning model is trained to generate independently each table. In both cases, links between tables are added independently.

Code organization

The code is organized in three directories:

  • column_generator: contains code only needed to sample column by column
  • correlations_generator: contains code only needed for the model that samples table by table t
  • shared: contains utilities needed for both generators

Dependencies and locking

Dependencies of the correlations generator are listed as optional in a dedicated section. We use pipenv to lock separately the minimal dependencies and the ones for the correlations generator. You can use the makefile to lock in a docker container so that it is reproducible, via the bash command make lock. It generates two requirements file containing minimal and large dependencies.

Testing

Testing is done via tox, tests are split in column/correlation tests and dependencies are installed accordingly.