The Incredible C++ XML Parser and JSON Parser
Small, simple, cross-platform, scalable and fast  C++ XML Parser

In 2003, I started working on XML technology and produce my first XMLParser library. This old library is now used in thousands of applications all around the world (and also in space! 😲 ). The main objective of old XMLParser library was to allow me to easily manipulate input/ouput configuration files and xml data files. The old library was limited to relatively small data files (typically, smaller than 10MB) because it's a pure DOM-style parser 😒 .

During the next 10 years, I received many emails from coders using the old XMLParser library to parse larger and larger files (some individual use it to parse 300MB XML files!). Altough the old library managed to parse these larger files, it consumed a very large amount of RAM memory (sometime up to 10GB) and of CPU ressources. Furthermore, I am now manipulating (inside Anatella) terabyte-size XML files. In May 2013, I decided that it was time for an "upgrade"! 😉 ...and the Incredible XML Parser was born! 😊

The Incredible XML Parser is composed of only 2 files: a .cpp file and a .h file.
The total size is 220 KB.

The Incredible XML Parser library includes three parsers: It has:

  1. An ultra fast XML Pull Parser (that is named "IXMLPullParser") that requires very little memory to run. The Pull Parser is ultra fast but it does not offer the flexibility and the user-friendliness of a full-fledged DOM parser.
  2. A very fast XML DOM parser (that is named "IXMLDomParser") (The Dom parser is built "on-top" of the Pull Parser) that provides more comfort when manipulating XML elements. It works by using recursion and building a node tree for breaking down the elements of an XML document.
  3. An ultra fast JSON Pull Parser (that is named "IJSONPullParser") that requires very little memory to run. The JSON Pull Parser is ultra fast and is compatible with the Incredible XML DOM Parser so that you can build (with the DOM Parser) a node tree in-memory that allows you to easily&quickly explore your JSON file (for example: using advanced XPATH queries: see example12!).
The Incredible XML DOM Parser, the Incredible XML Pull Parser and the Incredible JSON Pull Parser can all process terabyte-size XML/JSON files in a few hours on commodity hardware with very low memory consumption (i.e. less than a few megabyte).

The objectives of the Incredible XML/JSON Parser are the same as the old XMLParser library:
  1. user-friendliness (i.e. it should be easy to use).
  2. Small foot-print & no dependencies (i.e. this must remain a small library, easy to include & compile everywhere, on any plateform).
And, in addition, it provides even more speed & scalability.

For the Incredible XML Parser, I kept all the nice functionnalites from the old XML Parser that made it so popular and I added the following:
  1. The Incredible XML Pull Parser has one of the lowest memory consumption amongst all XML Pull parsers.
  2. The Incredible JSON Pull Parser has one of the lowest memory consumption amongst all JSON Pull parsers.
  3. The Incredible XML DOM Parser has the lowest memory consumption amongst all XML DOM parsers.
  4. The Incredible XML Pull Parser is one of the fastest XML Pull parser.
  5. The Incredible JSON Pull Parser is one of the fastest JSON Pull parser.
  6. The Incredible XML DOM Parser is the fastest XML DOM parser.
  7. The Incredible XML DOM Parser is the only DOM parser able to work on UNLIMITED file size.
  8. The 2 Incredible XML Parsers are able to handle nearly any character encodings.
  9. The 3 Incredible Parsers fully support "char*" mode and "wchar_t*" mode.
  10. The 3 Incredible Parsers are able to handle stream-lined data. This has several advantages:
    1. you are not limited anymore by your RAM memory size.
    2. very reduced and (more or less) constant memory consumption.
    3. you can process very easily stream-lined data (such as data coming from an HTTP connection or the data coming from the decompression of a ZIP file).
  11. The 3 Incredible Parsers are 100% thread-safe (more precisely: they are reentrant).
  12. The Incredible XML&JSON Pull Parsers are "in-place" parser (They do not copy internally any strings, so that it's as fast as possible).
  13. The Incredible XML&JSON Pull Parsers are one of the easiest-to-use XML Pull parsers(because they always return zero-terminated char* or wchar_t*, in opposition to other "in-place" parsers).
  14. The Incredible XML Dom Parser supports "hot starts" and is able to parse a sub-section of the original XML file without doing any memory allocation at all. The "hot start" functionality is unique and is very important because it allows us to use a very flexible DOM-style Parser on UNLIMITED XML&JSON file size (see example 7 inside the documentation) using very little RAM memory.
  15. The Incredible XML Dom Parser provides an ultra fast XPATH support. With XPATH, you can find very easily inside any XML&JSON file the information that you need.
  16. The Incredible XML Parser has an extensive (doxygen) documentation
  17. The Incredible XML Dom Parser is a good replacement for the old XMLParser library (The IXMLNode class from the Incredible XML Dom Parser is a direct replacement to the XMLNode class from the old XMLParser library).
  18. The Incredible XML Parser is easy to customize: The code is concise, commented and written in a plain and simple way. Thus, if you really need to change something (but I doubt of it), it's easy.
To the best of my knowledge, there exists no other "non-validating C++ XML parser" that is as simple and as powerfull. 😄 This is especially true if you need to parse large XML documents: In such a case, there are no parser that comes even close to the Incredible XML Parser presented here.

I originally selected the name "Ultimate" for the XML Parser because I cannot see how it would be possible to improve on the XML Parser Library presented here 😜. Of course, you can always add features such as "XML Validation",etc. but it will only produce a slower, more "bloated" library. It's really the "Incredible XML parser" 😏 and if you are a professional developper serious about your work, you should use the "Incredible XML parser" and no other parser  🙏.

License

The Incredible XML Parser is distributed under the Aladdin Free Public License(AFPL).

The old XLMParser library is completely free and will remain free forever.
The Incredible XML parser is also completely free in these situations:
  1. You only need the Aladdin Free Public License(AFPL).
  2. You need another license (e.g. a BSD license or a MIT license) but you'll use the Incredible XML Parser inside:
If you are not in the situations described herabove, you can still buy a BSD license (or MIT license) to use the XML Parser inside all your projects: Simply to request your license.

Download

If you like this library, you can create a URL-Link towards this page from your website (use this URL: http://www.applied-mathematics.net/tools/IXMLParser.html). If you want to help other people to produce better softwares using XML technology, you can increase the visibility of this library by adding a URL-link toward this page (so that its google-ranking increases !).

If you like this library, please add a message in the guestbook !

To obtain the library, simply , and I will send to you the Incredible XML Parser directly, the same day. You will receive by e-mail a zip-file. Inside the zip file, you will find 5 examples:

Log

Version changes:

A small tutorial

Let's assume that you want to parse the XML file "PMMLModel.xml" that contains:

<?xml version="1.0" encoding="ISO-8859-1"?>
<PMML version="3.0"
xmlns="http://www.dmg.org/PMML-3-0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema_instance" >
<Header copyright="Frank Vanden Berghen"> Hello World!
<Application name="&lt;Condor>" version="1.99beta" />
</Header> <Extension name="keys"> <Key name="urn"> </Key> </Extension>
<DataDictionary>
<DataField name="persfam" optype="continuous" dataType="double">
<Value value="9.900000e+001" property="missing" />
</DataField>
<DataField name="prov" optype="continuous" dataType="double" />
<DataField name="urb" optype="continuous" dataType="double" />
<DataField name="ses" optype="continuous" dataType="double" />
</DataDictionary>
<RegressionModel functionName="regression" modelType="linearRegression">
<RegressionTable intercept="0.00796037">
<NumericPredictor name="persfam" coefficient="-0.00275951" />
<NumericPredictor name="prov" coefficient="0.000319433" />
<NumericPredictor name="ses" coefficient="-0.000454307" /> <NONNumericPredictor name="testXmlExample" />
</RegressionTable>
</RegressionModel>
</PMML>

Let's analyse line by line the following small example program:

#include <stdio.h>    // to get the "printf" function
#include "xmlParser.h"

int main(int argc, char **argv)
{
  // This create a new Incredible XML DOM parser:
  IXMLDomParser iDom;

// This open and parse the XML file: ITCXMLNode xMainNode=iDom.openFileHelper("PMMLModel.xml","PMML");
// This prints "<Condor>": ITCXMLNode xNode=xMainNode.getChildNode("Header"); printf("Application Name is: '%s'\n", xNode.getChildNode("Application").getAttribute("name"));
// This prints "Hello world!":
printf("Text inside Header tag is :'%s'\n", xNode.getText());
// This gets the number of "NumericPredictor" tags:
xNode=xMainNode.getChildNode("RegressionModel").getChildNode("RegressionTable"); int n=xNode.nChildNode("NumericPredictor"); // This prints the "coefficient" value for all the "NumericPredictor" tags:
for (int i=0; i<n; i++) printf("coeff %i=%f\n",i+1,atof(xNode.getChildNode("NumericPredictor",i).getAttribute("coefficient"))); // This create a IXMLRenderer object and use this object to print a formatted XML string based on // the content of the first "Extension" tag of the XML file (more details below): printf("%s\n",IXMLRenderer().getString(xMainNode.getChildNode("Extension")));
return 0;
}

To easily manipulate the data contained inside the XML file, the first operation is to create an IXMLDomParser object (in the above example, it's named "iDom") and use it to get an instance of the class ITCXMLNode that represents the XML file in memory. You can use:

ITCXMLNode xMainNode=iDom.openFileHelper("PMMLModel.xml","PMML");
or, if you use the UNICODE windows version of the library:
ITCXMLNode xMainNode=iDom.openFileHelper(L"PMMLModel.xml",L"PMML");
or, if the XML document is already in a memory buffer pointed by the variable "char *xmlDoc" :
ITCXMLNode xMainNode=iDom.parseString(xmlDoc,"PMML");
This will create an object called xMainNode that represents the first tag named PMML found inside the XML document. This object is the top of tree structure representing the XML file in memory. The following command creates a new object called xNode that represents the "Header" tag inside the "PMML" tag.

ITCXMLNode xNode=xMainNode.getChildNode("Header");
The following command prints on the screen "<Condor>" (note that the "&lt;" character entity has been replaced by "<"):
printf("Application Name is: '%S'\n", xNode.getChildNode("Application").getAttribute("name"));
The following command prints on the screen "Hello World!":
printf("Text inside Header tag is :'%s'\n", xNode.getText());
Let's assume you want to "go to" the tag named "RegressionTable":

xNode=xMainNode.getChildNode("RegressionModel").getChildNode("RegressionTable");

Note that the previous value of the object named xNode has been "garbage collected" so that no memory leak occurs. If you want to know how many tags named "NumericPredictor" are contained inside the tag named "RegressionTable":

int n=xNode.nChildNode("NumericPredictor");

The variable n now contains the value 3. If you want to print the value of the coefficient attribute for all the NumericPredictor tags:

for (int i=0; i<n; i++)
  printf("coeff %i=%f\n",i+1,atof(xNode.getChildNode("NumericPredictor",i).getAttribute("coefficient")));
Or equivalently, but faster at runtime:
int iterator=0;
for (int i=0; i<n; i++)
  printf("coeff %i=%f\n",i+1,atof(xNode.getChildNode("NumericPredictor",&iterator).getAttribute("coefficient")));

If you want to generate and print on the screen the following XML formatted text:

<Extension name="keys">
  <Key name="urn" />
</Extension>

You can use:

IXMLRenderer iRenderer;
char *t=iRenderer.getString(xMainNode.getChildNode("Extension"),true);
printf("%s\n",t);

Note that you must NOT free yourself the memory buffer containing the returned XML string (You must NOT write any "free(t);") : The memory buffer containing the XML string is owned by the iRenderer object and it will be free'd when the iRenderer object is destroyed (i.e. when it falls "out-of-scope"). The parameter true to the function getString() means that we want formatted output.

The Incredible XML Parser library contains many more other small usefull methods that are not described here (The zip file contains some additional examples to explain other functionalities and a complete Doxygen documentation about the IXMParser). These methods allows you to:

That's all folks! With this basic knowledge, you should be able to retreive easily any data from any XML file!