The Traditional Problem

In traditional data management, schema changes force immediate data migration. This creates friction, risk, and often prevents organizations from evolving their data structures.

Traditional: Big Bang Migration

Schema v1
Migration Required
Schema v2
  • Must touch every row immediately
  • Downtime or complex migration procedures
  • Risk of data loss or corruption
  • Blocks schema evolution - changes become expensive
  • All-or-nothing proposition

FERIN: Lazy Migration

Schema v1Data @ v1
Optional
Schema v2Data @ v2
  • Data remains valid at its current schema version
  • Migrate only when needed
  • Multiple schema versions coexist
  • Consumers choose when to upgrade
  • Schema evolution becomes routine, not exceptional

Concept vs. Concept Version

Understanding the distinction between concepts and concept versions is fundamental to schema evolution in FERIN. This two-level structure enables meaning to evolve while maintaining referential integrity.

Concept Plane (Meaning)

Concept"Meter"

An abstract idea - the SI unit of length. The concept represents "what we mean" independent of how we define it at any point in time.

Version 1 (1889)Length of the prototype bar
Version 2 (1960)Wavelength of krypton-86
Version 3 (1983)Distance light travels in 1/299,792,458 second

Content Plane (Data)

urn:si:units:metermvalid

When to Create a New Concept Version

Create a new version of an existing concept when:

  • The definition is clarified but the meaning remains fundamentally the same
  • Precision or detail is added without changing what the concept represents
  • Editorial improvements are made to the definition text
  • Additional context or examples are provided

When to Create a New Concept

Create a new concept (with a new identifier) when:

  • The fundamental meaning has changed
  • The scope or domain of application has shifted
  • The same term now refers to something different
  • You need to track both old and new meanings simultaneously

The Test

Would existing references to this concept become confusing or misleading after this change?

  • If no: Create a new version of the existing concept
  • If yes: Create a new concept with a supersession relation

The Provider-Consumer Contract Model

FERIN implements a contract model between data providers and consumers, not a strict enforcement model. This is fundamental to enabling lazy migration.

Register (Provider)

Provides

  • Schemas - Contracts that define data structure
  • Schema-ed data - Instances conforming to schemas
  • Version negotiation - Multiple schema versions available
  • Sunset notices - Timeline for deprecation
Contract

Consumer

Receives

  • Stability - Data remains queryable at known schema
  • Predictability - Know when versions will sunset
  • Control - Choose when to upgrade
  • Compatibility - Multiple versions coexist

Analogy: API Versioning for Data

This is similar to REST API versioning, but applied to data schemas:

API VersioningFERIN Schema Versioning
/api/v1/items/items?schemaVersion=1
/api/v2/items/items?schemaVersion=2
Old versions deprecated on timelineOld schemas deprecated on timeline
Consumers choose upgrade timingConsumers choose migration timing

Implementation Strategies

Version Column Pattern

Store schema version as a field on each record.

-- Single table with version tracking
CREATE TABLE items (
  id UUID PRIMARY KEY,
  schema_version TEXT NOT NULL,
  content JSONB NOT NULL,
  status TEXT NOT NULL,
  valid_from TIMESTAMP,
  valid_until TIMESTAMP
);

-- Query specific schema version
SELECT * FROM items
WHERE schema_version = '1.0'
  AND status = 'valid';

-- Lazy migration: update one item at a time
UPDATE items SET
  schema_version = '2.0',
  content = jsonb_set(content, '{category}', '"default"')
WHERE id = ? AND schema_version = '1.0';

Schema-per-Table Pattern

Maintain separate tables for each schema version.

-- Version 1 schema
CREATE TABLE items_v1 (
  id UUID PRIMARY KEY,
  name TEXT NOT NULL,
  description TEXT
);

-- Version 2 schema (adds category)
CREATE TABLE items_v2 (
  id UUID PRIMARY KEY,
  name TEXT NOT NULL,
  description TEXT,
  category TEXT NOT NULL  -- New required field
);

-- Query at version 1
SELECT * FROM items_v1 WHERE id = ?;

-- Migrate on-demand
INSERT INTO items_v2 (id, name, description, category)
SELECT id, name, description, 'default'
FROM items_v1 WHERE id = ?;

Event Sourcing Pattern

Store schema changes as events, reconstruct state on demand.

// Schema change events
const events = [
  { type: 'schema_defined', version: '1.0',
    fields: ['name', 'description'] },
  { type: 'schema_defined', version: '2.0',
    fields: ['name', 'description', 'category'] },
  { type: 'item_created', id: '123',
    schemaVersion: '1.0', data: {...} },
  { type: 'item_migrated', id: '123',
    fromVersion: '1.0', toVersion: '2.0' }
];

// Reconstruct item at any schema version
function getItemAtVersion(id, targetVersion) {
  // Apply events up to target version
  // Return migrated state if migration exists
}

Migration Decision Framework

Not all data needs immediate migration. Consider these factors:

FactorMigrate SoonCan Wait
Data usageFrequently accessedRarely queried
Consumer requirementsConsumer needs new fieldsConsumer still uses old schema
ValidationOld schema has issuesOld schema still valid
Sunset timelineOld schema deprecating soonNo deprecation planned
VolumeSmall datasetLarge dataset, batch migration

Related Topics