Small, simple, cross-platform, free and fast  C++ XML Parser

This project started from my frustration that I could not find any simple, portable XML Parser to use inside all my projects (for example, inside the award-winning TIMi software suite commercialized by the Business-Insight company). Let's look at the well-known Xerces C++ library: The complete Xerces project is 53 MB! (11 MB compressed in a zipfile). In 2003, I was developping many small tools. I was using XML as standard for all my input/ouput configuration and data files. The source code of my small tools was usually around 600KB. In these conditions, don't you think that 53MB to be able to read an XML file is a little bit "too much"? So I created my own XML parser. My XML parser "library" is composed of only 2 files: a .cpp file and a .h file. The total size is 149 KB.

Here is how it works: The XML parser loads a full XML file in memory, it parses the file and it generates a tree structure representing the XML file. Of course, you can also parse XML data that you have already stored yourself into a memory buffer. Thereafter, you can easily "explore" the tree to get your data. You can also modify the tree using "add" and "delete" functions and regenerate a formatted XML string from a subtree. Memory management is totally transparent through the use of smart pointers (in other words, you will never have to do any new, delete, malloc or free)("Smart pointers" are a primitive version of the garbage collector in Java).

UPDATE: Based on the expertise gained during the development of this XML Parsing library, I create a new, improved XML Parser: the Incredible XML Parser. The Incredible XML Parser has all the nice features from the library described on this page AND it's even faster, more scalable, less memory-hungry and easier to use. To the best of my knowledge, the Incredible XML Parser is the best "non-validating C++ XML parser" currently available 😄 (and by a large margin!). You should definitively check it out!

Here are the characteristics of the (old) XMLparser library:

Download

If you like this library, you can create a URL-Link towards this page from your website (use this URL: http://www.applied-mathematics.net/tools/xmlParser.html). If you want to help other people to produce better softwares using XML technology, you can increase the visibility of this library by adding a URL-link toward this page (so that its google-ranking increases ).

If you like this library, please add a message in the guestbook !

Download here: small, simple, multi-Plateform XMLParser library with examples (zipfile).
Inside the zip file, you will find 5 examples: If you have a Kindle, you might also be interested in KKCM: the Kranf Kindle Collection Manager.

Log

Version changes:

A small tutorial

Let's assume that you want to parse the XML file "PMMLModel.xml" that contains:

<?xml version="1.0" encoding="ISO-8859-1"?>
<PMML version="3.0"
xmlns="http://www.dmg.org/PMML-3-0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema_instance" >
<Header copyright="Frank Vanden Berghen"> Hello World!
<Application name="&lt;Condor>" version="1.99beta" />
</Header> <Extension name="keys"> <Key name="urn"> </Key> </Extension>
<DataDictionary>
<DataField name="persfam" optype="continuous" dataType="double">
<Value value="9.900000e+001" property="missing" />
</DataField>
<DataField name="prov" optype="continuous" dataType="double" />
<DataField name="urb" optype="continuous" dataType="double" />
<DataField name="ses" optype="continuous" dataType="double" />
</DataDictionary>
<RegressionModel functionName="regression" modelType="linearRegression">
<RegressionTable intercept="0.00796037">
<NumericPredictor name="persfam" coefficient="-0.00275951" />
<NumericPredictor name="prov" coefficient="0.000319433" />
<NumericPredictor name="ses" coefficient="-0.000454307" /> <NONNumericPredictor name="testXmlExample" />
</RegressionTable>
</RegressionModel>
</PMML>

Let's analyse line by line the following small example program:

#include <stdio.h>    // to get "printf" function
#include <stdlib.h>   // to get "free" function
#include "xmlParser.h"

int main(int argc, char **argv)
{
  // this open and parse the XML file:
XMLNode xMainNode=XMLNode::openFileHelper("PMMLModel.xml","PMML");
// this prints "<Condor>": XMLNode xNode=xMainNode.getChildNode("Header"); printf("Application Name is: '%s'\n", xNode.getChildNode("Application").getAttribute("name"));
// this prints "Hello world!": printf("Text inside Header tag is :'%s'\n", xNode.getText());
// this gets the number of "NumericPredictor" tags:
xNode=xMainNode.getChildNode("RegressionModel").getChildNode("RegressionTable"); int n=xNode.nChildNode("NumericPredictor"); // this prints the "coefficient" value for all the "NumericPredictor" tags:
for (int i=0; i<n; i++) printf("coeff %i=%f\n",i+1,atof(xNode.getChildNode("NumericPredictor",i).getAttribute("coefficient"))); // this prints a formatted ouput based on the content of the first "Extension" tag of the XML file:
char *t=xMainNode.getChildNode("Extension").createXMLString(true);
printf("%s\n",t);
free(t); return 0;
}

To manipulate the data contained inside the XML file, the first operation is to get an instance of the class XMLNode that is representing the XML file in memory. You can use:

XMLNode xMainNode=XMLNode::openFileHelper("PMMLModel.xml","PMML");
or, if you use the UNICODE windows version of the library:
XMLNode xMainNode=XMLNode::openFileHelper("PMMLModel.xml",_T("PMML"));
or, if the XML document is already in a memory buffer pointed by variable "char *xmlDoc" :
XMLNode xMainNode=XMLNode::parseString(xmlDoc,"PMML");
This will create an object called xMainNode that represents the first tag named PMML found inside the XML document. This object is the top of tree structure representing the XML file in memory. The following command creates a new object called xNode that represents the "Header" tag inside the "PMML" tag.

XMLNode xNode=xMainNode.getChildNode("Header");
The following command prints on the screen "<Condor>" (note that the "&lt;" character entity has been replaced by "<"):
printf("Application Name is: '%S'\n", xNode.getChildNode("Application").getAttribute("name"));
The following command prints on the screen "Hello World!":
printf("Text inside Header tag is :'%s'\n", xNode.getText());
Let's assume you want to "go to" the tag named "RegressionTable":

xNode=xMainNode.getChildNode("RegressionModel").getChildNode("RegressionTable");

Note that the previous value of the object named xNode has been "garbage collected" so that no memory leak occurs. If you want to know how many tags named "NumericPredictor" are contained inside the tag named "RegressionTable":

int n=xNode.nChildNode("NumericPredictor");

The variable n now contains the value 3. If you want to print the value of the coefficient attribute for all the NumericPredictor tags:

and business-insight
for (int i=0; i<n; i++)
  printf("coeff %i=%f\n",i+1,atof(xNode.getChildNode("NumericPredictor",i).getAttribute("coefficient")));
Or equivalently, but faster at runtime:
int iterator=0;
for (int i=0; i<n; i++)
  printf("coeff %i=%f\n",i+1,atof(xNode.getChildNode("NumericPredictor",&iterator).getAttribute("coefficient")));

If you want to generate and print on the screen the following XML formatted text:

<Extension name="keys">
  <Key name="urn" />
</Extension>

You can use:

char *t=xMainNode.getChildNode("Extension").createXMLString(true);
printf("%s\n",t);
free(t);

Note that you must free the memory yourself (using the "free(t);" function) : only the XMLNode objects and their contents are "garbage collected". The parameter true to the function createXMLString means that we want formatted output.


The XML Parser library contains many more other small usefull methods that are not described here (The zip file contains some additional examples to explain other functionalities and a complete Doxygen documentation about the XMParser.). These methods allows you to: That's all folks! With this basic knowledge, you should be able to retreive easily any data from any XML file!