Why Gemfury? Push, build, and install  RubyGems npm packages Python packages Maven artifacts PHP packages Go Modules Debian packages RPM packages NuGet packages

Repository URL to install this package:

Details    
Size: Mime:
Metadata-Version: 2.1
Name: sarus_synthetic_data
Version: 4.0.7
Summary: package to train synthetic data generators
Author-email: Sarus <contact@sarus.tech>
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas~=1.4.0
Requires-Dist: scipy>=1.5.0
Requires-Dist: sarus-statistics<5.0.0,>=4.0.0
Requires-Dist: pyarrow~=15.0
Requires-Dist: jax==0.4.25
Requires-Dist: jaxlib==0.4.25
Requires-Dist: psutil
Requires-Dist: tensorflow~=2.15.0
Requires-Dist: flax~=0.8.1
Requires-Dist: optax~=0.1.4
Requires-Dist: orbax-checkpoint==0.4.3
Requires-Dist: transformers~=4.43.0
Requires-Dist: gcsfs
Requires-Dist: pydantic~=2.7.0
Provides-Extra: tests
Requires-Dist: pytest>=6.2; extra == "tests"
Requires-Dist: pytest-mock>=3.6; extra == "tests"
Requires-Dist: types-psutil; extra == "tests"

# Synthetic Data Generation

Sarus synthetic generation package. It provides two different models two generate synthetic data:
- a marginals based model where each column is sampled from its marginals
- a correlation based model where a  deep learning model is trained to generate independently each
table.
  In both cases, links between tables are added independently.

## Code organization
The code is organized in three directories:
- column_generator: contains code only needed to sample column by column
- correlations_generator: contains code only needed for the model that samples table by table t
- shared: contains utilities needed for both generators

## Dependencies and locking

Dependencies of the correlations generator are listed as optional in a dedicated section.
We use pipenv to lock separately the minimal dependencies and the ones for the correlations generator.
You can use the makefile to lock in a docker container so that it is reproducible, via the bash command
`make lock`. It generates two requirements file containing minimal and large dependencies.

## Testing

Testing is done via tox, tests are split in column/correlation tests and dependencies are installed accordingly.