JSON via REST API (aka RESTful API) is becoming a common offering of cloud providers for their clients to access data. JSON format has an advantage of self defining arrays, sets, and attributes. However, it also has the disadvantage of being extremely verbose, with every field in every record
having its own definition. It's not uncommon for each http call to take 100-200ms. That means that you will max out at 5-10 calls per second. This is problematic for system (retail for example) where there are hundreds or thousands of transactions per second. Note, it's not unusual to have multiple records per REST API call, but it is still very verbose. Compare this to a compressed file, or, even faster, a direct database connection, where data is kept in native data formats, such as binary and decimal. Direct database connections have very little meta-data, which keeps the data compact, and means there is little transformation required on importing.
Unpacking the JSON data is relatively easy with SQL's JSON functions, JSON_VALUE, JSON_QUERY, and SQL's CROSS APPLY. Essentially, every array (defined by square brackets) will become an SQL table. So each array within an array becomes a dependent (child) table. Each JSON attribute will become an SQL column, and each JSON set (defined by curly brackets) will become column attributes, but their name will inherit the name of the set. A good REST API cloud provider will
- Make it clear what the primary keys are for all arrays (as they were/become tables).
- Make it clear what the data types are for all attributes. That's not part of the JSON definition, and if you get it wrong, it will bite you when an attribute arrives that is out of bounds.
- Give you the option of filtering records based on last updated/inserted datetimes (or high water marks of some other kind). Otherwise, you will have to download and unpack all records to find out what's changed or added.
No comments:
Post a Comment