A Guide to Schema Migration in DefraDB
Overview
In a database system, an application’s requirements can change at any given time, to meet this change, Schema migrations are necessary. This is where Lens comes in, as a migration engine that produces effective schema migration.
This guide will provide an understanding of schema migrations, focusing on the Lens migration engine. Let’s dive in!
Lens is a pipeline for user-defined transformations. It enables users to write their transformations in any programming language and run them through the Lens pipeline, which transforms the cached representation of the data.
Goals of the Lens Migration System
Here are some of the goals of the Lens schema migration system:
Presenting a consistent view of data across nodes: The Lens schema migration system can present data across nodes consistently, regardless of the schema version being used.
Verifiability of data: Schema migration in the Lens migration system is presented as data, this preserves the user-defined mutations without corrupting system-defined mutations and also allows migrating from one schema version to another.
A language-agnostic way of writing schema migrations: Schema migrations can be written in any programming language and executed properly as Lens is language-agnostic.
Safe usage of migrations by others through a sandbox: Migrations written in Lens are run in a sandbox, which ensures safety and eliminates the concern for remote code executions (RCE).
Peer-to-peer sync of schema migrations: Lens allows peers to write their migrations in different application versions and sync without worrying about the versions other peers are using.
Local autonomy of schema migrations: Lens enables local autonomy in writing schema migrations by giving users control of the schema version they choose to use. The users can stay in a particular schema version and still communicate with peers on different versions, as Lens is not restricted to a particular schema version.
Reproducibility and deterministic nature of executing migrations: When using the Lens migration system, changes to schemas can be written, tagged and shared with other peers regardless of their infrastructure and requirements for deployments.
Mechanism
In this section, we’ll look at the mechanism behind the Lens migration system and explain how it works.
Lens migration system functions as a bi-directional transformation engine, enabling the migration of data documents in both forward and reverse directions. It allows for the transformation of documents from schema X to Y in the forward direction and Y to X in the reverse direction.
The above process is done foundationally, through a verifiable system powered by WebAssembly (Wasm). Wasm also enables the sandbox safety and language-agnostic feature of Lens.
Internally, schema migrations are evaluated lazily. This avoids the upfront cost of doing a massive migration at once.
Lazy evaluation is a technique in programming where an expression is only evaluated when its value is needed.
Adopting lazy evaluation in the migration system also allows rapid toggling between schema versions and representations.
Usage
The Lens migration system addresses critical use cases related to schema migrations in peer-to-peer, eventually consistent databases. These use cases include:
Safe Schema Progression: Ensuring the seamless progression of database schemas is vital for accommodating changing application requirements. Lens facilitates the modification, upgrade, or reversion of schemas while upholding data integrity.
Handling Peer-to-Peer Complexity: In environments where different clients operate on varying application and database versions, Lens offers a solution to address the complexity of schema migrations. It ensures coherence and effectiveness across different networks.
Language-Agnostic Flexibility: Functions in Lens are designed to be language-agnostic, offering the versatility to define schema changes in the preferred programming language. This adaptability makes Lens suitable for diverse development environments and preferences.
Lazy Evaluation: Lens employs a lazy evaluation mechanism, initiating migrations without immediate execution. Schema changes are applied only when documents are read, queried, or updated. This approach reduces the upfront cost of extensive schema migrations while maintaining data consistency.
On-Demand Schema Selection: Lens supports on-demand schema selection during data queries. Users can specify the schema version they wish to work with, facilitating A/B testing and the seamless transition between different schema versions.
These use cases highlight how Lens empowers users to manage schema migrations effectively, ensuring data consistency and adaptability in evolving database systems.
Example
In this example we will define a collection using a schema with an emailAddress
field. We will then patch the schema to add a new field email
, then define a bi-directional Lens to migrate data to/from the new field.
Step One, define the Users
collection/schema:
defradb client schema add '
type Users {
emailAddress: String
}
'
Step Two, patch the Users
schema, adding the new field, here we pass in --set-active=true
to automatically apply the schema change to the Users
collection:
defradb client schema patch '
[
{ "op": "add", "path": "/Users/Fields/-", "value": {"Name": "email", "Kind": "String"} }
]
' --set-active=true
Step Three, fetch the schema ids so that we can later tell Defra which schema versions we wish to migrate to/from:
defradb client schema describe --name="Users"
Step Four, in order to define our Lens module - we need to define 4 functions:
next() unsignedInteger8
, this is a host function imported to the module - calling it will return a pointer to a byte array that will either contain an error, an EndOfStream identifier (indicating that there are no more source values), or a pointer to the start of a json byte array containing the Defra document to migrate. It is typically called from within thetransform
andinverse
functions, and can be called multiple times within them if desired.alloc(size: unsignedInteger64) unsignedInteger8
, this is required by all lens modules regardless of language or content - this function should allocate a block of memory of the givensize
, it is used by the Lens engine to pass stuff in to the wasm instance. The memory needs to remain reserved until the next wasm call, e.g. untiltransform
orset_param
has been called. It's implementation will be different depending on which language you are working with, but it should not need to differ between modules of the same language. The Rust SDK contains an alloc function that you can call.set_param(ptr: unsignedInteger8) unsignedInteger8
, this function is only required by modules that accept a set of parameters. As an input parameter it receives a single pointer that will point to the start of a json byte array containing the parameters defined in the configuration file. It returns a pointer to either nil, or an error message. It will be called once, when the the migration is defined in Defra (and on restart of the database). How it is implemented is up to you.transform() unsignedInteger8
, this function is required by all Lens modules - it is the migration, and within this function you should define what the migration should do, in this example it will copy the data from theemailAddress
field into theemail
field. Lens Modules can call thenext
function zero to many times to draw documents from the Defra datastore, however modules used in schema migrations should currently limit this to a single call pertransform
call (Lens based views may call it more or less frequently in order to filter or create documents).inverse() unsignedInteger8
, this function is optional, you only need to define it if you wish to define the inverse migration. It follows the same pattern as thetransform
function, only you should implement it to do the reverse. In this example we want this to copy the value from theemail
field into theemailAddress
field.
Here is what our migration would look like if we were to write it in Rust:
#[link(wasm_import_module = "lens")]
extern "C" {
fn next() -> *mut u8;
}
#[derive(Deserialize, Clone)]
pub struct Parameters {
pub src: String,
pub dst: String,
}
static PARAMETERS: RwLock<Option<Parameters>> = RwLock::new(None);
#[no_mangle]
pub extern fn alloc(size: usize) -> *mut u8 {
lens_sdk::alloc(size)
}
#[no_mangle]
pub extern fn set_param(ptr: *mut u8) -> *mut u8 {
match try_set_param(ptr) {
Ok(_) => lens_sdk::nil_ptr(),
Err(e) => lens_sdk::to_mem(lens_sdk::ERROR_TYPE_ID, &e.to_string().as_bytes())
}
}
fn try_set_param(ptr: *mut u8) -> Result<(), Box<dyn Error>> {
let parameter = lens_sdk::try_from_mem::<Parameters>(ptr)?;
let mut dst = PARAMETERS.write()?;
*dst = Some(parameter);
Ok(())
}
#[no_mangle]
pub extern fn transform() -> *mut u8 {
match try_transform() {
Ok(o) => match o {
Some(result_json) => lens_sdk::to_mem(lens_sdk::JSON_TYPE_ID, &result_json),
None => lens_sdk::nil_ptr(),
EndOfStream => lens_sdk::to_mem(lens_sdk::EOS_TYPE_ID, &[]),
},
Err(e) => lens_sdk::to_mem(lens_sdk::ERROR_TYPE_ID, &e.to_string().as_bytes())
}
}
fn try_transform() -> Result<StreamOption<Vec<u8>>, Box<dyn Error>> {
let ptr = unsafe { next() };
let mut input = match lens_sdk::try_from_mem::<HashMap<String, serde_json::Value>>(ptr)? {
Some(v) => v,
// Implementations of `transform` are free to handle nil however they like. In this
// implementation we chose to return nil given a nil input.
None => return Ok(None),
EndOfStream => return Ok(EndOfStream)
};
let params = PARAMETERS.read()?;
let value = input.get_mut(¶ms.src)
.ok_or(ModuleError::PropertyNotFoundError{requested: params.src.clone()})?
.clone();
let mut result = input.clone();
result.insert(params.dst, value);
let result_json = serde_json::to_vec(&result)?;
lens_sdk::free_transport_buffer(ptr)?;
Ok(Some(result_json))
}
#[no_mangle]
pub extern fn inverse() -> *mut u8 {
match try_inverse() {
Ok(o) => match o {
Some(result_json) => lens_sdk::to_mem(lens_sdk::JSON_TYPE_ID, &result_json),
None => lens_sdk::nil_ptr(),
EndOfStream => lens_sdk::to_mem(lens_sdk::EOS_TYPE_ID, &[]),
},
Err(e) => lens_sdk::to_mem(lens_sdk::ERROR_TYPE_ID, &e.to_string().as_bytes())
}
}
fn try_inverse() -> Result<StreamOption<Vec<u8>>, Box<dyn Error>> {
let ptr = unsafe { next() };
let mut input = match lens_sdk::try_from_mem::<HashMap<String, serde_json::Value>>(ptr)? {
Some(v) => v,
// Implementations of `transform` are free to handle nil however they like. In this
// implementation we chose to return nil given a nil input.
None => return Ok(None),
EndOfStream => return Ok(EndOfStream)
};
let params = PARAMETERS.read()?;
// Note: In this example `inverse` is exactly the same as `transform`, only the useage
// of `params.dst` and `params.src` is reversed.
let value = input.get_mut(¶ms.dst)?;
let mut result = input.clone();
result.insert(params.src, value);
let result_json = serde_json::to_vec(&result)?;
lens_sdk::free_transport_buffer(ptr)?;
Ok(Some(result_json))
}
More fully coded example modules, including an AssemblyScript example can be found in our integration tests here: https://github.com/sourcenetwork/defradb/tree/develop/tests/lenses
and here: https://github.com/lens-vm/lens/tree/main/tests/modules
We should then compile it to wasm, and copy the resultant .wasm
file to a location that the Defra node has access to. Make sure that the file is safe there, at the moment Defra will not copy it and will refer back to that location on database restart.
Step Five, now that we have updated the collection, and defined our migration, we need to tell Defra to use it, by providing it the source and destination schema IDs from our earlier defradb client schema describe
call, and a configuration file defining the parameters we wish to pass it:
defradb client schema migration set <The source schema ID> <The destination schema ID> '
{
"lenses": [
{
"path": <The path to your compiled `.wasm` binary from step four>,
"arguments": {
"src": "emailAddress",
"dst": "email"
}
}
]
}
'
Now the migration has been configured! Any documents committed under the original schema version will now be returned as if they were committed using the newer schema version.
As we have defined an inverse migration, we can give this migration to other nodes in our peer network still on the original schema version, and they will be able to query our documents committed using the new schema version applying the inverse.
We can also change our active schema version on this node back to the original to see the inverse in action:
defradb client schema set-active <Original schema ID>
Now when we query Defra, any documents committed after the schema update will be rendered as if they were committed on the original schema version, with email
field values being copied to the emailAddress
field at query time.
Advantages
Here are some advantages of Lens as a schema migration system:
- Lens is not bound to a particular deployment, programming language, or interaction method. It can be used globally and is accessible to clients regardless of their location or infrastructure.
- Users can query on-demand even with different schema versions.
- Migration between different schemas is a seamless process.
Disadvantages
The Lens migration system also has some downsides to schema migration which include:
- Using a Lazy execution approach, errors might be found later when querying through the migration.
- There’s a time constraint as the Lens migration system is a work in progress
- The performance of the system is secondary, with more focus on overall functionality.
Future Outlook
The core problem we currently have in the Lens schema migration system is the performance issues when migrating schemas, hence for future versions, the following would be considered:
- Increasing the performance of the migration system.
- Making migrations easier to write.
- Expansion of the schema update system to include the removal of fields, not just adding fields.
- Enabling users to query the schema version of their choice on-demand.
- Support for Eager evaluation.
- Implementing dry run testing for development and branching scenarios, and handling divergent schemas.