Migrating NorthWind Database to a MultiValue Database - SQL Data-Types and MultiValue
The battle between traditional databases (RDBMS) and MultiValue databases have raged for years, and will likely continue well into the future. Because NoSQL-style databases are becoming more accepted by CEO and CIO for enterprise level datastores, the MultiValue model is being looked at very closely again.
In Part I of this series, I talked about the basics of translating the SQL CREATE TABLE command into MultiValue Files. In Part II, I talked about the flexibility of the MultiValue datastore, and when and where SQL subtables should and shouldn't be used. In this article, I'll talk about some of the other things to consider with the SQL CREATE TABLE.
Building the Database in MultiValue - Data-Type and Size
One of the main features of every MultiValue database implementation is that data-typing and size constraints are not required. While this allows any type and size of data to be saved in each field/column of a table without producing an error, there are times when it is important to consider the SQL data-types and how to translate them into MultValue DICT items.
Let's look at the different data-types found in [Figure 1]. You'll see several different data-types being used: INT, DATETIME, VARCHAR, DECIMAL, DOUBLE, and TINYINT. There are many other data-types available as well, but let's consider the basic ones.
/* Table: orders */ CREATE TABLE orders ( id INT NOT NULL, employee_id INT , customer_id INT , order_date DATETIME , shipped_date DATETIME , shipper_id INT , ship_name VARCHAR(50) , ship_address VARCHAR(250) , ship_city VARCHAR(50) , ship_state_province VARCHAR(50) , ship_zip_postal_code VARCHAR(50) , ship_country_region VARCHAR(50) , shipping_fee DECIMAL(19,4) NULL DEFAULT '0.0000', taxes DECIMAL(19,4) NULL DEFAULT '0.0000', payment_type VARCHAR(50) , paid_date DATETIME , notes VARCHAR(250) , tax_rate DOUBLE NULL DEFAULT '0', tax_status_id TINYINT , status_id TINYINT NULL DEFAULT '0', PRIMARY KEY (id) );
Figure 1
Some consider the SQL column data-type a function of data validation, but it does much more than that. In many SQL database, the data-type has to do with size of storage in addition to data validation. Both of these constraints are of limited value.
If you are unfamiliar with the SQL model, then imagine that each SQL record is a fixed length byte array. That means that each column consumes a specific number of bytes, even when empty. This is why it is so important that you setup an SQL data source correctly the first time and try not alter it at a later date because it requires a rebuild of all the data stored. Fixed width is messy that way.
In MultiValue, there is no issue with the size of the data. To get the advantage of data-types there are conversion codes. This is a more flexible alternative.
Since the traditional way to store data in MultiValue is in internal formats, the conversion codes allow the data to switch between internal format and external/output format without directly altering the data [Figure 2].
Data-type |
Conversion Code |
Notes |
VARCHAR(nn) |
No Conversion code on strings needed |
|
CHARACTER(nn) |
No Conversion code on strings needed |
|
INT |
MR0 |
|
TINYINT |
MR0 |
|
DECIMAL(19,4) |
MR4 |
|
DECIMAL(19,2) |
MR2 |
|
DOUBLE |
MR64 |
Double is supposed to have a decimal precision of 64. MultiValue systems will store it that data, but the database stored procedures can only handle math precision of 9. |
REAL |
MR64 |
See note on DOUBLE |
FLOAT(4) |
MR4 |
|
BOOLEAN |
Recommend data stored at '1' and '0' |
|
DATE |
D4- |
Dates should always be stored as MultiValue Internal date |
TIME |
MTS |
Times should always be stored as MultiValue Internal Time |
DATETIME, DATESTAMP |
This requires special handling. |
|
BLOB, RAW |
Blobs require special handling. |
Figure 2
Building the Database in MultiValue - DATETIME, DATESTAMP
The DATESTAMP data-type is often used to store a date and time in the same field. This can be done in a MultiValue database, but if you want to keep the data stored in an internal formation, then it will require a program, an A-Correlative, or F-Correlative.
Some Options:
- Store it in external format: YYYY-MM-DD HH:MM:SS
If you do it this way, then the data will be there, but if you want to internal MultiValue date range selection against it, you will have to create a virtual DICT (Correlative or I-Type) to convert the external data into internal data that supports ranges:
As an A-Correlative Virtual DICT item, you can isolate the Date from the DateStamp: A;((1(G0 1))(G1-1) : (1(G0 1))(G2-1) : (1(G0 1))(G0-1))(DI)
As you can see it is a little complex, but will work and once you have the virtual DICT created you won't have to do it again. - Store in Internal format using a Stored Procedure
If you choose to keep the datastamp in internal format, the common way to store is: {internaldate}*{internaltime} example: 12585*1144
The issue here is that to create this data, you will have to use a MultiValue Stored Procedure, but it allows you to sort and select date and time ranges very easily. You would still need to have a virtual DICT item for Date and Time, but that is much easier than in option 1:
Datestamp.Date = A;1(G0*1)
Datestamp.Time = A;1(G1*1)
Building the Database in MultiValue - BLOB
The BLOB data-type is really an afterthought data-type for modern SQL database. No database is really designed to handle BLOBs of data. It is very inefficient both for storing and retrieving.
Standard OS file systems are better places to store BLOB data. Although there are times when you want to store BLOB on a central server and link them to your data.
There are a few options:
- Store them as HEX or Base64 Strings and not worry about the efficiency.
- Store them on the database server's file system and create a file path link to the original blob that the application can retrieve later. (recommended)
- Create a blob chunked file to keep the BLOB data separate from the actual data, but still on the database.
Building the Database in MultiValue - DECIMAL/NUMBERS
MultiValue Databases will store the data in any format you want to do, but depending on what you want to do with those number within the database, you will run into limitations.
MultiValue databases are designed to store and process strings and integers very efficiently, but when it comes to decimal and floating point math, it chokes a little, comparatively.
The Store Procedure language of MultiValue database is very powerful, and can do things unheard of on other database platforms, but usually has a limitation when adding, subtracting, multiplying, and dividing numbers that have more than 4 decimals.
If you keep the number in internal format, which is usually looks like a very large integer, you will not have any problems. Although if you try to multiply "1 * 0.1234567" you will find your end product rounded to a precision of 4: ".1234"
There is an option to force the precision up to 9, but that is max precision the database can do internally. (Please note: Many MultiValue databases have ways around this limit to one degree or another, but they are functions of that specific flavor of MultiValue database)
As you can see, most data-types will translate just fine into a MultiValue Database. Other than some special handling when it comes to unique data-types, storing the different types of data from Northwinds database isn't a problem.