Repository URL to install this package:
|
Version:
1.1.0 ▾
|
| bin |
| lib |
| Gemfile |
| README.md |
| uoy-faculty-datasync.gemspec |
This gem provides data synchronization facilities for databases.
Add these lines to your application's Gemfile:
source 'https://gem.fury.io/universityofyork/' do gem 'uoy-faculty-datasync', '~> 0.1', require: 'datasync' end
And then execute:
$ bundle
Or install it yourself as:
$ gem install uoy-faculty-datasync
The gem fundamentally is about synchronizing data between databases. There are four types of mapper - implicit, source table, target table, and link table.
There are fundamentally four mapper types:
ImplicitMapper uses the ID column in both records to imply links between the
objects. For example, usernames in the source and target system.TargetLinkMapper uses a column in the target table to refer to the source
object it's linked to. The column must be able to store the source record ID
in its native form, and must be nullable. If a default value is set it should
be NULL.SourceLinkMapper is similar to TargetLinkMapper but stores the target
record ID in the source table. The same constraints exist for the link column.LinkTableMapper uses a separate table to store the links independently of
the source and target records. It must contain three columns, none of which
are nullable:
source_id - the source ID in its native formattarget_id - the target ID in its native formatmapping - the mapping name as textTo define a mapping, use the DSL:
require 'datasync' my_mapping = Datasync.create_mapper(Datasync::ImplicitMapper) do title 'Implicit mapping' source(:username) { DB[:source].where(active: true) } target(:username) { DB[:target] } match_using :eppn match_when :unlinked_source, :unlinked_target limit unlinked_target: 100, total: 10000 parser(:forename, &:capitalize) parser(:surname, &:capitalize) mapping do map username: username map eppn: [username, 'york.ac.uk'].join('@') map display_name: [forename, surname].join(' ') end formatter(:username, &:downcase) on(:unlinked_source) { insert } on(:linked) { update } on(:matched) do link update end on(:unlinked_target) { delete } end
Most of the options are mandatory:
title specifies a human-readable titlesource and target set the source and target unique ID column, and a block
which returns the source or target data as a Sequel::Datasetmatch_using specifies the mapped object fields to use for matchingmatch_when specifies which records can be eligible for matchinglimit allows you to limit the total number of links permitted in each state,
as well as the overall total or number of errors.parser(:field) { |value| ... } specifies a block used to parse a source fieldmapping specifies the field mapping in terms of parsed source object fields,
which are in scope. Additionally, @source and @target are in scope for the
input and output, respectively.formatter(:field) { |value| ... } specifies a block used to format a target
fieldon(:situation) { ... } specifies how to handle a particular situation. The
helpers insert and update use the in-scope object and take hash arguments to
append metadata to the target object if you need to. link and unlink affect
linking where it is used, and delete is available to remove the target. You
can also use raw_source, source and object to get the raw source object,
parsed source object, and mapped object, and target_row as a Sequel::Dataset.
Both @source_id and target_id are also in scope.target_key_from :object specifies that the transformed object contains the
target id column alreadytarget_key_from :database specifies that the target id column is produced by
the database on insert (typically an autoincrementing primary key). This is
the default.The following situations can occur in all mappers:
The following situations can only occur in certain mappers:
LinkTableMapper, SourceLinkMapper)LinkTableMapper, TargetLinkMapper)LinkTableMapper only)Once you have a defined mapping, you can call my_mapping.sync! on it.
This synchronizes the data according to the specified rules, and returns the
SyncRunner object, which you can interrogate for information:
#runtime#start_time, #end_time#totals#link_states#ok?#errors#aborted?#terminating_exception#reportThere are some options which can be defined in the create_mapper block for some of the types:
# Options for TargetLinkMapper option link_column: :source_id # Options for SourceLinkMapper option link_column: :target_id # Options for LinkTableMapper option(:link_table) { DB[:link] } option mapping_name: 'test-mapping'
Link column specifies the column which will be managed by Datasync to store the references to the other table.
Link table is used to specify a table (Sequel::Dataset) that Datasync can use to store the links between source and target records itself. It also contains a column so that several mappings can be stored in one table, where the IDs are of the same type.
You can run the spec standalone against an in-memory SQLite3 database. To do
this, run bundle install and then bundle exec rake spec.
To run against a postgres image (as in CI), first install the bundle, using
docker compose run --rm app bundle install, and then run the spec with
docker compose run --rm app. Between runs, run docker-compose down to
remove the database so you have a fresh empty copy for every run.
The repository is at https://github.com/university-of-york/faculty-dev-datasync-gem