Press ESC to close

JSON parsing in Apex, a developer’s nightmare :( Winter’12 release I badly need you !

Today I stumbled upon a requirement to parse JSON in Apex. I was happy about the fact that I know there is a cool open source library called JSONObject, that will do all the heavy lifting of JSON parsing and another more cooler native JSON parser is lined up in coming winter’12 release.


I was able to quickly setup the web service fixture using all goodness of dependency injection for callouts in apex, as discussed in this post.

“JSON response vs JSONObject Parser” the bloody war !

The real pain started when I started parsing JSON response. I met a series of awesome exceptions and errors, leaving me confused about who is wrong JSON or Parser 😕
I firstly got this exception about “Missing Value”

image


After doing some googling, I found this post, where the code snippet indicated that newlines might be an issue with the parser. So I removed both “\r” & “\n” chars from the JSON response, why would machine require a pretty printed JSON 🙂 Then life moved on for a while, until I got this exception

“FATAL_ERROR|superpat.JSONObject.JSONException: Expected a , or }”

I was again thinking, now what who’s wrong here JSON or Parser 😕 Again after some fighting with code, I ended up on google searching for the same and luckily found metadady already fixed this one. Many thanks to metadady, for fixing this issue and submitting a patch, I don’t know why this patch is still not applied on JSONObject since Jan 24, 2011.


Again, life moved a bit ahead, until the JSONObject started parsing some Unicode chars like “\u0026”, in the response. Again Metadady was nice enough to mention this issue with his patch, but it seems there is no clean way to handle unicode chars in Apex. So I decided to get rid of them, I really can live without these special creatures in String 🙁
So, I kicked off all unicode awesomeness in JSON response with something like

“jsonstring.replaceAll(‘\\\\u.{4}’, ”)”

Next, life moved really good and my JSON was parsed successfully, I was having tears of joy in my eyes 🙂 But those tears of joy turned into tears of sorrow, when I saw the debug logs

  • it took around 10+ seconds and

  • “154574” script lines for parsing a moderate json string.

Number of script statements: 154574 out of 200000 ******* CLOSE TO LIMIT

It was a strange feeling of loosing everything after you won, words can’t express it.

I was having some hope, that I can get rid of it. But but but, it was all lost when I read this post from legendary Jeff Douglas. It was an unlucky Googling day for me, if this Jeff’s post could have appeared earlier, I could be more happy guy by EOD.

Is Game over ?

Anyway, game is not over yet. I am waiting for winter’12 release and native JSON parser i.e. SYSTEM.JSON to come. Thanks to Salesforce team for finally adding a native JSON parser to the Apex library.

Comments (7)

  • Anonymoussays:

    September 29, 2011 at 2:50 pm

    That whole MAX number of script executions (and the MAX in batch jobs) is why we steered clear of using Force.com for our fairly computationally intensive product. We still have other products with Force.com, but not this one.

  • Anonymoussays:

    September 29, 2011 at 5:04 pm

    Agreed @kaleb, before starting with force.com its good to match your business and computation requirements to what platform allows. Though force.com is doing great job and I see potential of many good apps here, but yes one can't do anything too.

  • Anonymoussays:

    September 30, 2011 at 9:31 pm

    Hi Abhinav,Came across your site here via google searches for various Apex/Vforce issues – it's great resource to read your findings (and those of the few other bloggers), and a comfort to see others dealing with similar challenges on this platform.I took a look at the JSONObject.cls, and unfortunately this is an example of when a direct port (to a language like Apex) does not work well. While it may be efficient to scan/parse in a per-character manner in C (or even Java), it's not practical in Apex. On script statements alone, you'll be consuming way too many statements per-character of the input stream. We'd have to do some profiling to see where the bulk of the statements are getting consumed here, but easy 2 easy-to-solve issues (besides the per-char scanner) pop out:1) The Value class is trying to model a “variant” type. That's fine, but each instance construction will eat 8 statements! 1 for the statement that invokes the new operator, 1 for the chosen constructor's assignment statement, and 6 for each of its member-field declarations. We can reduce this to 1 or 2 statements by combining the Prototype design-pattern with Apex's built-in clone method (which bypasses the declaration statements), plus a little bit of assignment/conditional-expression hacking – this logic can be wrapped in static constructor methods to keep the main code body reading clean. I have example code demonstrating the technique, if you're interested.2) The usage of StringBuffer. Unlike in Java, the Apex StringBuffer (i.e. a custom class) does not improve concatenation performance (Java's StringBuffer & StringBuilder improve memory-allocation efficiency from (N^2) to O(N)). Worse yet, it burns through additional script statements from the custom append method. Remove all usage of StringBuffer; use the += op on a String variable instead (name it 'buffer' if you like).Back to scanning – implementing a per-char scanner is not the way to go for lexical analysis in Apex; too many script statements per character. What I would do: a) Build static lookahead-char(s)-to-Scanner maps based on the required syntax (a map for unambiguous 1-char lookahead tokens, a map for unambiguous 2-char tokens, a map or maps to resolve ambiguities, etc…) b) Implement a Scanner for each token class (e.g. Word, Number, String-Constant, Operator-Literal, etc) c) each Scanner impl should use a Regexp with the Matcher.region and Matcher.lookingAt methods (excellent way to offload the heavy lifting of a tokenizer to native code) to efficiently pull out the proper token lexeme in 1 statement – now you're burning statements on a per-token basis rather than per-char! d) the nextToken() method would grab lookahead chars in 1 statement, then use the maps to get the appropriate scanner and thus the token. Enhance with a peekToken() method that caches peeks for use by subsequent peekToken() or nextToken() calls. Further enhance with a mechanism to save/restore lexer state (for when you need to try to parse a higher-than-token-level entity before making a decision, backtracking if necessary).I wish I had time to try my hand at optimizing JSONObject.cls (or even a from-scratch rewrite), but with Apex-native JSON serialize/deserialize on the horizon, it's probably not worthwhile. Still, it is never game-over 🙂 Sorry for the over-long comment; parsing is a side-topic of interest to me, and I enjoy hand-writing them when the need arises.

  • Anonymoussays:

    October 1, 2011 at 12:21 am

    Hey Mike,Awesome thoughts man ! I liked all your suggestions and in depth insights. That is very true, that you can't port any thing to Apex directly, specially this char wise char scanner. On Value Class I agree, its not worth and adding any value by this design. It can be kept simple to be a string, with requirements to convert to more specialized types like numbers, dates being left on client. As JSON is never something strongly typed, its plain text string only.You're absolutely correct on StringBuffer, Java did a great job in StringBuffer and then even optimizing it in StringBuilder for multithreading reasons. In Apex where script lines are so crucial its hard work out a good and efficient StringBuffer impl, unless given natively by Apex language itself.Yes, using char by char scanning is not the approach for Apex, using Regex could be a great idea. Using those classes Apex native system, will do the heavy lifting of string parsing. But I really feel locked with Apex's regex classes too, they are based on Java's pattern and matcher, but they hide many cool flags and settings the Java Pattern class offers. I also thought of re-writing a simple JSON parser in Apex, but seeing winter'12 native JSON parser at gates I dropped the idea. But yes, its very strange that no body tried optimizing this JSONObject class since so long, I'm sure many of other developers should have struggled similarly with it.

  • Anonymoussays:

    October 1, 2011 at 12:24 am

    @Mike, forget to thank you for your awesome comment, I really liked your deep thoughts and views. If JSON parser is not coming natively in winter'12 release, I could have purposed to pair with you for writing a better open source Json Parser in Apex 🙂

  • Anonymoussays:

    October 3, 2011 at 2:35 am

    @AbhinavMany thanks for your feedback – and thanks again for your blog here! Indeed, a clean-sheet JSON parser in Apex would have been a really fun side project. Can't complain though, whenever Salesforce comes though for us – as it looks to be soon the case here, with their upcoming native JSON utils).Man, in general I wish I had some spare time for various side/pet projects. I'm once again getting the itch for fun side coding projects, which I hadn't had in years 🙁

  • Anonymoussays:

    October 3, 2011 at 3:27 am

    Yeah @Mike, side projects are really fun, because you can pick your fav. technical area/languages/platform and develop something on top of it. That is really hard to do in a typical consulting project. I try to cool down that itch by doing a little bit of moonlighting and some time on weekend, but can't spare enough 🙁

Leave a Reply

%d bloggers like this: