Welcome Guest | My Membership | Login

Business Tech: AI and DSS: Part II


Article

Dystopia is the term used for any number of bleak, failed societies. In Science Fiction, computers, robots, and evil algorithms - AI (Artificial Intelligence) and DSS (Decision Support Systems) being two examples - are often to blame. The emphasis is generally on dehumanizing us by reducing our society to the data which describes it. Personally, I blame SQL.

To be serious, SQL is designed around the idea that optimizing data means optimizing the digital use of data. It is not designed for you, my organic friend.

What if we take a more human approach to data? Well, if you want data that has a human touch, we have plenty of options. NVP (Name-Value Pairs) offer a readable - human readable - label with each jot of data. Formalize that a bit and you are in the realm of XML and JSON. These three are certainly not machine optimized. They are people-centric, focusing on clarity to the reader over mathematical minimalism. If the AI uprising is your fear, your best defense is to skew the rules toward… well… us.

What About MultiValue?

MultiValue sits in the middle. I once heard, and often quote, Mike Ruane as saying that MultiValue is compressed XML. We use positions instead of labels, but we bring a structure that is more eyeball friendly than SQL. Think of it this way: I have to transform SQL data to share it. It has to become tab-separated, or XML, or some other decidedly non-SQL thing before it can move. Generally, this isn't just swapping columns for commas. SQL data is spread out and has to be unified and essentially re-architected before it can be transportable.

Given that moving data, dissecting data, and assembling data is a big part of what we do, having a database that can't do any of that easily is an odd choice. Unless, of course, you are in the thrall of the metal ones. MultiValue pays attention to speed, but it also has its bags packed at all times. Any modern developer who can tease data out of a comma-separated file can handle a string with @AM delimiters. Tell them the @VMs are embedded sub-strings and they'll probably be just fine with those delimiters as well. If we must dress it up for travel, subbing @AM to comma and @VM to pipe is often enough. When it comes to speed, the less we handle the data, the faster we can ship it.

Thingedness

XML, JSON and MultiValue also have another critical edge when it comes to readability: Thingedness . This is the term I coined to describe the ideal relationship between data and the user of the data. Here's where columnar databases and SQL databases fail the thingedness test: Can you point to a single record and associate it with a common, real world, thing? My XML, JSON, NVP, or MultiValue INVOICE file can have the entirety of an invoice in each record. One read equals one invoice. That's something a non-database person can grasp: one hundred invoices equals one hundred records.

While there are reasons to not do this — many excellent reasons — the closer your data gets to this model, the easier it is for the programmer, the user, and the architect to keep the entire data model in their head. As you approach thingedness, you approach clarity of concept. The data world has more in common with the human one.

With XML, JSON, and MultiValue, thingedness is achievable. The big difference between the three is that the first two have to be transformed to be used. MultiValue can chose to unpack its bags, but a MultiValue string is always ready to work.

Some of the Excellent Reasons

SQL is the extreme counter-argument to thingedness. It is based on the premise that the more you break something down, the better you can control it and account for it. There is merit to this approach if you are concerned with scaling up the size of your data. However, the more the complexity of your data scales up, the worse this idea becomes. There is a reason Google uses NoSQL to manage search. There is a reason that Facebook uses NoSQL

Still, SQL's popularity isn't random. For some jobs, the rules of SQL are the most rational ones. A good example is tool building. It is easier to generalize a tool, for reporting or analytics, when all data has a rigid uniformity of storage. The less creative the structures are, the easier it is to make new tools.

Moreover, forcing the table designer to specify field types and lengths helps keep the design focused on the use and intent of the data. Free-form data can often result in sloppy design. Working in SQL makes me a better NoSQL architect.

So, please don't damn the methodology out of hand. It has its place. Not every place, but I wouldn't want a pure thingedness database, either.

Where the Pendulum Stops

What we are looking for here is an acceptable level of atomicity. Simply put, we want to break things down just enough.

The middle is where the winners want to be. Reasonable control, but not the OCD of SQL. Reasonable thingedness, but not a rigid mandate to mirror the structures of the world. SQL doesn't do middle. Columnar doesn't do middle. XML and JSON can do middle, but they can't be operated upon directly for complex tasks.

MultiValue can do middle. We can create an invoice header record, with unified data, and split the details, each to their own record. We can keep multiple values in the header efficiently: Three contact names? No problem. Only one on the next one? No wasted space.

This is the balance between the AI/DSS view of data and the human view. We can scale in complexity because we can make decisions in our architecture and applications to treat elements of our data in sane ways.

How Does This Relate to AI and DSS?

As you saw last issue in the Animals program, we needed to construct the growing AI data in a way which favors decision trees. The less efficiently we implement, the slower our program will get as it matures. The infant version, the one with just a few started animals, will always be faster than the adult, with its extensive zoo, but here we aren't worried about relative speed, we are worried about being fast enough to keep the user feeding the program. The game Animals doesn't grow if no one plays.

As Nathan discusses in another article in this issue, the closer you put the data interaction to the data, the better your speed. Additionally, as my dad would point out, the more parts, the more that can break. Keeping the programming close to the data requires fewer transformations, less network bandwidth, and fewer steps. That makes it faster and less fragile.

When we implement DSS or AI, we are talking about extensive data. If we are being really smart about it, we are also expecting that data to keep growing. Real AI and DSS should eventually perform successfully outside of the original parameters. If you planned everything it does, it is more of a performing bear than a critical thinker.

You choice of data storage matters. Your choice of programming language matters. With enough time, trouble, effort, and money, you might be able to make a pig sing, but starting with a singer is probably a wiser move. Understanding the underlying effects of your choices raises you above decisions like, "Well, it was the only language I knew, so I wrote everything in Whitespace." Picking tools responsibly? That's real intelligence.

 

# # #          # # #          # # #

 

Related Articles

  • From the Inside March/April 2017

    Company: International Spectrum

    Innovate or your application will die a slow death. Harsh words, but we have been hearing it, and saying it, for years. Our employees and management are demanding more and more from the existing systems and business applications. Most companies have come to rely on their business software and systems more than they rely on their employees.

  • MultiValue Framework vs. NoSQL/Relational Database

    We don't call a jacket a lapel just because a jacket has a lapel. Likewise, we need to stop calling MultiValue a database just because it has one. There's so much more. Underselling the power we bring to every project is a thing of the past.

  • Expanding Your Toolkit: What is JSON?

    Moving data in and out of MultiValue requires us to understand all of the different ways that we may need to transform the inbound and outbound information. Bennett offers us a practical guide to JSON (JavaScript Object Notation).

  • IS.HASH.MD5 Generating with UniBASIC DIGEST Command

    The MD5 message-digest algorithm is a widely used cryptographic hash function producing a 128-bit (16-byte) hash value, typically expressed in text format as a 32-digit hexadecimal number. MD5 has been utilized in a wide variety of cryptographic applications and is also commonly used to verify data integrity.

  • Business Tech - Computer Memory…the Other Kind

    Company: HDWP

    Sometimes we need to take a fresh look at what we do. While our ground-level perspective is superior for detail, we need to pull back a few thousand feet to really see how things connect. This installment of Business Tech asks the deceptively simple question "Is your data complete?"


Return to top