Finding MV outside of MV
In the wide, wonderful, and sometimes wacky world of MultiValue (MV), we tend to see things as either MV or not MV. However, our perception does not dictate reality. The wheels of database technology continue to turn and things change in unexpected ways.
As a result, I propose a question that some might view as ridiculous at best, heresy at worst. Ready for the question? Buckle up — here goes: Might your next MultiValue database be … SQL?
Before you answer that — or send a hitman — let's establish some context.
SQL is Flat, MV has three dimensions
Since being named the winner of the Zumasys idea contest, I've come under a bit of fire for proposing that MV needs to be modified to extend beyond three dimensions. I had suggested that there could be significant benefits to embracing encodings like JSON which supports multidimensional structures to a practically unlimited depth. In addition to the dimensional support, JSON supports internationalization using UTF-8 out of the box and can encode the full Unicode character set if needed, it's easy to work with, lightweight to transport, and well supported throughout the entire world of technology. And hey, it already can support MultiValue-type data.
Unrelated to the merits or demerits of that idea, this became a flashpoint for others to share their ideas with me. What followed was a series of online and offline discussions about the potential benefits of adding more schema support to MV, collection nesting, storage and retrieval optimization, and a whole host of other great and crazy ideas. The idea of what MultiValue could be was evolving beyond adding dimensions.
This then started me thinking… is there anything in the wild now that has the benefits of MultiValue plus some of these other things, like strong schema support or nested collections? There's several NoSQL options out there besides us now. Common sense would dictate looking that way first.
Certainly, the SQLs of the world would not do the kinds of things that we're doing for, because hey, they're not MultiValue… right? Considering the various products that have built a MultiValue system running on SQL databases, maybe there is an SQL option that fits some or all of our wishlist.
Stuck in the Codd Model or Low ACID
When you compare most SQL implementations to MV, the differences are stark. Most SQL solutions, for all their billions of dollars of development, still insist that data is remarkably flat . Dr. Codd still rules the roost with these platforms as they remain steadfastly committed to first — and second and third — normal form table structures.
On the other end of the spectrum, many NoSQL products also vary significantly from MultiValue. For example, MongoDB and CassandraDB are both great for scalability, redundancy, and distributed data, but they veer farther away from the ACID compliance, structure and query-ability that both SQL and MultiValue provide.
Having said that, both SQL and NoSQL options have MultiValue beat hands-down when it comes to out-of-the-box connectivity. There are drivers available to connect just about everything to just about any database that isn't MultiValue.
In my opinion, that right there is a huge problem for MV. SQL databases also have very complex and tunable query planners to allow database designers the flexibility to optimize retrieval strategies. While we have options in MV for improving queries, sometimes a solution calls for more than just another index. There are plenty of ways to deal with this, but they involve rolling up our sleeves more often than not.
Post Fix?
If only there existed a database that allowed both the flexibility of NoSQL with the structure and index support of SQL, could house complex multidimensional and unstructured data, plus a healthy dose of open connectivity. Both structured and unstructured, living in perfect harmony. Now THAT would be a trick.
Yet, that is exactly what PostgreSQL intends to do. PostgreSQL is a SQL database with some MV-like leanings. It's not exactly SQL and it's not exactly MultiValue, but it's an interesting hybrid that does a lot of things we admire in both camps, and most of it is done very well. Now, there are readers that will wince just a little bit every time they read the word SQL, so for our purposes today, let's just call it Postgres.
Old and Reliable
One of the strong arguments for MultiValue is that it has been around the block and proven itself over and over again. Postgres isn't a newcomer either.
Postgres is an +30-year veteran of open source development that can be installed easily using the package manager of your favorite Linux distribution. It can be downloaded from postgresql.org and installed on Windows desktops and servers as well. It has a command line, like most SQL and MV databases, and in the spirit of SQL, all updates and queries are done using SQL data manipulation language (DML).
There are two intriguing features of Postgres that may be of interest to the MV world. First up is the ability to create compound data types using existing data types. This allows you to create new data types to represent a row that might be stored in MV as a multi-valued or sub-valued field. However, instead of being separated over multiple attributes, all the data for that row is kept together according to the data type as you've defined it.
Say, for instance, that you want to store the two lines of an address, plus the city, state, and zip together. This could be done by creating a new data type for the address. Let's call it "t_address":
CREATE TYPE t_address AS (
addr1 text,
addr2 text,
city text,
state text,
zip text
);
Once this has been created, it can then be used when creating a table. By adding one column of this type to the table, you're really adding 5 different subfields into that one column. You can then address each of the subfields directly when inserting or retrieving data from the record.
The second, and (I think) more interesting feature of Postgres as it pertains to MV developers is the array data modifier . In short, any field anywhere can be stored as an array — i.e. multiple values — all in the same field. This is where Postgres thumbs its nose at Dr. Codd's rules of normal form, embracing the wide world of MultiValue.
Using our t_address example earlier, one could create a table with a simple address like this:
CREATE TABLE customer (
name text,
address t_address
);
To convert this into a multi-valued list of addresses, one simply needs to add "[]" to the end of the t_address data type, like this:
CREATE TABLE customer (
name text,
address t_address[]
);
Independently, these two features are quite powerful. Used together, however, they vault Postgres into the realm of being a database for everybody — whether flat or dimensional.
All this is fine for data that require structure, but what if you need some additional thing that is more free-form? In that case, Postgres has two data types that allow JSON storage, right in the row as a first-class column. The "json" data type stores raw JSON that must be parsed to be used, whereas the "jsonb" data type stores a binary, pre-parsed representation of the structure that can not only be queried like regular fields, values deep inside of the JSON can be indexed and leveraged in query planning and execution.
There is much more to Postgres than just this, but I find these to be particularly interesting features to those of us with long careers in any flavor or flavors of MultiValue.
I am by no means a Postgres cheerleader. I want you to see the good and the bad. With that in mind, let's take a peek at the dark underbelly of Postgres, at least in terms our MultiValue perspective.
First, because Postgres is fundamentally SQL-based, it uses data manipulation language constructs like INSERT and UPDATE to do updates. This may be great for flat tables but it is really pretty awful for doing multi-valued nested inserts and updates from the command line.
Removing a specific value from a multi-valued array is not a standard feature of the DML, but it can be accomplished with a PLSQL procedure. Interestingly, you can rewrite an entire "attribute" with nested structures intact, not unlike what we might do with a WRITEV on a MV record. However, updating specific data elements in nested structures requires some pretty tricky SQL chicanery.
Speaking of PLSQL, Postgres has its own procedural language. Or to be more exact, there are four supported procedural languages (including Python) plus an entire framework for adding in additional languages for writing stored procedures. Of course, none of these are mvBASIC, but it would be an interesting study to see if the procedure language framework could be used to bridge that gap. Another time, perhaps?
Spectrum has already published code [ http://www.intl-spectrum.com/article/r992/ ] to bring parts of the OCONV to python. That would be one way to bridge the gap. JBASE and MVON# cross-compile mvBASIC to other languages. Someone could make a cross-compiler to convert mvBASIC to python or another supported language, allowing us to port mvBASIC programs with little or no changes.
To be clear, SQL procedural languages aren't like mvBASIC in that you don't write the entire application in SQL languages. The business logic typically exists in some separate repository, whether that be written in C#, PHP, Go, Dart, JavaScript, or any number of other languages in the wild. But in recent years, more work is being done outside of the MV environment as more of the front end moves to the web and apps. And since python is on both lists, procedural and application...
The important takeaway from all this is simply to acknowledge that MultiValue isn't as far outside of the mainstream as we might otherwise think. Both NoSQL and PostgreSQL support features that are important to us, while not sacrificing the fundamentals that have made them successful. When it comes to multi-valued data, the mainstream has now come to us.
More importantly, the more MultiValue people that get involved with products like this, the more likely it is that we can take the awesomeness that we know as MV and use it to influence the world. <insert_evil_laugh_here/>.