A detailed description of the CEJSON JSON Parser.
A JSON object string is bracketed by curly braces and is a list of name value pairs. It can be just one pair. A name value pair is a string (the name) and a value separated by a colon. List elements are comma separated. A value is a literal (string, integer, float or null) or a “sub”-object.
Example of a JSON Object string:
{"id":39,"sensor":"Temperature","value":78}
This would be record #39, the sensor is Temperature for which the value is 78.
A JSON array string is bracketed by square brackets. It is an array of values, each of which is separated by a comma. A value is a literal (string, integer, float or null) or an object. An object is a name (string) and a value (literal).
Example of a JSON Array string:
This would be an array of three sensor records.
A HTML POST or PATCH (update) to an Azure Mobile Service (AzMS) table returns a JSON single object string as a response if successful. This response is the record for new object added to the table with a POST or a record of only the updated fields for a PATCH. A GET will return an array of name-value pairs depending upon the filter used. The filter can “select” which records are returned as well as which fields. HTML DELETE does not return a JSON string, only an indication of success or otherwise.
The string representing an array of name values pairs as returned from an AzMS GET query will therefore be a bracketed by square braces and contain a comma separated list of name value pairs, each pair bracketed by curly braces with the enclosed name-value pair separated by a colon. The name will be the first item each the pair and is a string whereas the value is a literal. No nested objects are returned as the value (which is permissible with JSON but not used with AzMS).
Strings are enclosed in double quotes (as C style escape notion, ie. \”) and may contain other escape characters. Characters are actually Unicode chars. For simplicity with CEJSON, nested objects and arrays are not implemented … That is, they are assumed to not be returned as values from an AzMS table query when the JSON string is being passed.
Note that there is no DateTime JSON data type. AzMS does generate datatimes as strings so it would be useful for CEJSON to recognise these string values and to parse them into a DateTime struct (ToDo). Also, version 2 AzMS tables use a GUID string as the primary key. This could be parsed into an array of 32 bytes. CEJSON is though focused upon version 1 AzMS tables which use an auto-incremented integer as the key field and so processing of GUIDs is not required.
Numbers can be integer of float. When parsing the string values are first concatenated as strings. When the string collection is terminated by a comma or right brace, the string is parsed to determine its literal type. If true or false its datatype is Boolean, whereas if it’s null then that is its data type. The CEJSON parser is case insensitive for these types. Number values are assumed to be integer until a decimal point is encountered in the string at which point it is parsed as a float. Exponential notation is not implemented for CSJSON although it is valid JSON syntax. Hexadecimal and octal notations are not defined for JSON.
Actually, the first character of a value string determines its data type except for differentiating between integers and floats:
First character
Datatype
\”
String
N or n
Null
T or t
Boolean
F or f
Any digit
Integer, or could be float
Of course the parser then continues with the rest of the value string completing the determination of the final the value and validating it.
The parsed name-value pairs are in a 1 dimensional dynamic array (on the heap) of structs. The array size is increased as more records are received. That way the number of records that can be returned by GET is not fixed which is useful because the table size can grow and shrink over time. The one dimensional array is translated into an array of records with each record containing a fixed number of records. This number is not fixed but is determined during parsing from the number of name-value pairs in each record (ie the number inside each matching curly brace open and close). The name-value struct that is used for all entities whether the record id, its sensor or its sensor value is:
struct NameValue { char Name[MAX_PATH]; char StringValue[MAX_PATH]; union { BOOL BooleanValue; int IntegerValue; float FloatValue; }; DataType DType; };
Datatype is just an enum of the basic data types.
All values are parsed into the StringValue until complete and then parsed into their respective value type if non integer.
The parser operates as a state machine in that it consumes one character per function call and maintains the current state between calls. Each character may add to the current entity’s string (name or value string) or be an expected character for changing state when pivotal characters are received:
Character
End array
{
Start record
}
End record
:
End name start value
,
New name-value pair: Start name
Start or end name
Is name value or end thereof
The state is maintained by a global variable of enum type Expecting (Click link). This name is used as the states indicate what they are are expecting.
The advantage of the state machine is that it can work with a stream such as an HTML response which it can process “on- the-fly rather” than having to what for the whole response. That can mean faster processing but more importantly facilitates a better management of memory. The required storage is not fixed but is expanded (on the heap) as more name-value records are detected. The estate machine also enables, rather than calling the function character by character but by a fixed number of character once the next set of those are received… This latter mechanism is used by CEJSON’s parser.
The main functions with the parser are:
Function (Click link to see code)
Description
ParseJsonString()
The parser, Essential one big Switch-Case statement depending upon the current state.
Expect()
If in a state of “readiness” checks if the current character is the expected one. If so increment or change the state.
IncrementState()
Increments (the enum) the state.
The ParserJSONString() function is shown in detail in the following link here.
The CEJSON JSON parser was ported from the Ardjson version. That version was developed to run as state machine to run in a resources limited environment and so there are certain carry over efficiencies. Both of these parsers are focused upon the JSON response strings from Azure Mobile Services and so there have been some simplifications excluding some JSON features not used by AzMS. The CEJSON parser generates a dynamic array of name value pairs which can be viewed as an an array of records, one record for each sensor value, each consisting of a fixed number of name-value pairs.