This blog presents a more complete JSon parser in the Arduino context  that extracts the data entities from each record. in the JSon string. It is implemented as a Stream Parser - State Machine.

Ardjson: https://ardjson.codeplex.com

Previous: http://embedded101.com/Blogs/David-Jones/entryid/567/CEJson-Part-6-Arduino-Telemetry-Sensor-Apps


 


 

This version is implemented in 1.3c1 JsonParserToGetTelemetrySensorValues.zip on the Downloads tab at https://ardjson.codeplex.com


Stream Processing

The previous version of the JSon parser had two main limitations:

  1. All of the request stream, or the complete JSon string needed to be stored before the parsing was done. This is OK on a desktop where storage is generally a non-issue. But with a resource starved device it can be an issue. The previous version of the sketch to get records would crash (the sketch would restart) if more than a few records were downloaded.
  2. The parsing assumed that the JSon string was correct with no defined recovery if the string was incorrect in in way.

 

The memory issue can be addressed by processing the string as a stream; that is interpret the stream characters as they arrive. Rather than call the parsing function with a complete string that then loops through the string character by character, after the complete HTTP Response has been received, the stream processing requires that each received character is sent for interpretation as it is received. This though requires that there are two simultaneous processing activities, one to Receive the HTTP Request and one to process it.

 

Multithreading could be used to implement this scenario with the original parser function being used. It would wait for a signal from he response thread that a new char was available, process as far as it could with that then yield until the next signal from the response thread. To avoid the storage problem, the response thread would need to only receive characters when the parse thread signalled that it had yielded. Multithreading is though an overkill for this activity.

 

Some programming languages and/or operating systems threading APIs support the ability of a single thread to split the thread’s context multiple times. Microsoft Windows Embedded CE/Compact supports Fibers which do that. Programmatically they are very similar to threads but only one of the fibres runs at a time in a thread. They can yield and maintain their own context:

A Windows CE/Compact fiber is a unit of execution that must be manually scheduled by the application. Fibers run in the context of the threads that schedule them. Each thread can schedule multiple fibers.A fiber does not have all the same state information associated with it as that associated with a thread. The only state information maintained for a fiber is its stack, a subset of its registers, and the fiber data provided during fiber creation.

 

An alternative is to process the stream on-the-fly: A Stream Parser requires a function that maintains state; a State Machine. Rather than iterate through a stored array (of characters .. a string)  it takes one item from the stream as a parameter to the function call. There is a global state variable that maintains the current state of the machine. The function is in effect one big switch-case statement based upon the state variable, such that there is an alternative case for each state in the switch-case.

Analysis then requires:

  • An enumeration of all possible states
  • Processing required for each state.

 

The processing for some states is a simple expectation that a specific character is received. For example, it is expected that an open brace is received when a new JSon record is expected to start to be received in the stream.

 

State processing also may involving forking to one of a number of state depending upon which character is received. For example, after a JSon record has been received, a comma means that another record is expected, whereas a closing square bracket means that array of records, and hence the parsing, is done.

 

Other processing involves continuing in the current, just concatenating the received characters. There needs to be a termination character for these states. For example when igetting a name for a name value pair, the state machine it will append characters to the name until a double quote is received. Also, the received characters need to validated. For example, with the name state, the characters should be alpha, when receiving a number they should be digits etc.

 

The parse function is called once for each character which then selectively processed it based upon the current state. When the state processing is done for that character it returns with result indicating success or failure of the processing. If any invalid or out of context characters are detected by the parsing function, the return result is set to false; otherwise if the state processing is correct then the return result is  set to true. The HTTP Response receiving function only continues to call the parser function if the previous result is true.
This mechanism simply addresses the error issue identified above: Any identified errors in parser halt any further calls to it by setting the return value to false.

 

The Ardjson Stream Parser

Parsers are based upon what is called, a Railraod Diagrams. The JSon diagrams, from http://JSON.org are not too complex. This parser implements most of this syntax with some simplifications.


An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).

An array is an ordered collection of values. An array begins with [ (left bracket) and ends with ] (right bracket). Values are separated by , (comma).

A value can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested.

A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes. A character is represented as a single character string. A string is very much like a C or Java string.

A number is very much like a C or Java number, except that the octal and hexadecimal formats are not used.

Whitespace can be inserted between any pair of tokens. Excepting a few encoding details, that completely describes the language.

 


This stream parser starts by expecting an array of objects. It therefore expects a opening square bracket then an opening brace. It then expects a name-value pair. The name state expects a name as string and so is delimtered by double quotes. Its contents can only be alphas. Once the second double quote is received, it expects a colon. It then moves on to expect a value. If the first character of the value is a double quote then it goes into the getting a string state. Alternatively,if that first character is T,t,F or f it then goes into the getting a Boolean state Otherwise it goes into the getting an Integer state. Whilst the integer state expects only digits, if it gets a period it switches to getting a Float state. The getting string state terminates upon the second double quote whereas the other states terminate upon reception of a comma (get another name-value pair) or closing brace (end of object/record).The non-string values, which are stored as strings up to this point are then parsed into their data types. If the end of an object is received then it expects either a comma (get another object) or a closing square bracket (end of the parsing).

 

Some aspects have not been implemented, although the parser does handle JSon strings returned from a Microsoft Azure Mobile Service Table. The following are not implemented, but most will be implemented in subsequent versions of the parser:

  • No nested objects (no records within records)
  • Integers and floats can’t start with negative sign (or positive for that matter)
  • Exponential notation for Floats
  • Hexadecimal digits
  • null value

 

image

Stream Parser Output

 

 


Next: Ardjson Part 7b: Programming details of the JSon Stream Parser