The Ultimate C++ XML Parser
Small, simple, cross-platform, scalable and fast C++ XML Parser

In 2003, I started working on XML technology and I produced my first XMLParser library. This old library is now used in thousands of applications all around the world (and also in space! 😲 ). The main objective of the old XMLParser library was to allow me to easily manipulate input/ouput configuration files and some small xml data files. The old library was limited to relatively small data files (typically, smaller than 10MB) because it's a pure DOM-style parser 😒 .

During the next 10 years, I received many emails from coders using the old XMLParser library to parse larger and larger XML files (some individual use it to parse 300MB XML files!). Altough the old library managed to parse these larger files, it consumed a very large amount of RAM memory (sometime up to 10GB) and of CPU ressources. Furthermore, I am now manipulating (inside Anatella) terabyte-size XML files. In May 2013, I decided that it was time for an "upgrade"! 😉 ...and the Ultimate XML Parser was born! 😊

The Ultimate XML Parser is composed of only 2 files: a .cpp file and a .h file.
The total size is 220 KB.

The Ultimate XML Parser library includes two parsers: It has:

An ultra fast XML Pull Parser (that is named "UXMLPullParser") that requires very little memory to run. The Pull Parser is ultra fast but it does not offer the flexibility and the user-friendliness of a full-fledged DOM parser.
A very fast XML DOM parser (that is named "UXMLDomParser") (The Dom parser is built "on-top" of the Pull Parser) that provides more comfort when manipulating XML elements. It works by using recursion and building a node tree for breaking down the elements of an XML document.

The Ultimate XML DOM Parser and the Ultimate XML Pull Parser can both process terabyte-size XML files in a few hours on commodity hardware with very low memory consumption (i.e. less than a few megabyte).

The objectives of the Ultimate XML Parser are the same as the old XMLParser library:

user-friendliness (i.e. it should be easy to use).
Small foot-print & no dependencies (i.e. this must remain a small library, easy to include & compile everywhere, on any plateform).

And, in addition, it provides even more speed & scalability.

For the Ultimate XML Parser, I kept all the nice functionnalites from the old XML Parser that made it so popular and I added the following:

The Ultimate XML Pull Parser has one of the lowest memory consumption amongst all XML Pull parsers.
The Ultimate XML DOM Parser has the lowest memory consumption amongst all XML DOM parsers.
The Ultimate XML Pull Parser is one of the fastest XML Pull parser.
The Ultimate XML DOM Parser is the fastest XML DOM parser.
The Ultimate XML DOM Parser is the only DOM parser able to work on UNLIMITED file size.
The 2 Ultimate XML Parsers are able to handle nearly any character encodings.
The 2 Ultimate XML Parsers fully support "char*" mode and "wchar_t*" mode.
The 2 Ultimate XML Parsers are able to handle stream-lined data. This has several advantages:
1. you are not limited anymore by your RAM memory size.
2. very reduced and (more or less) constant memory consumption.
3. you can process very easily stream-lined data (such as data coming from an HTTP connection or the data coming from the decompression of a ZIP file).
The 2 Ultimate XML Parsers are 100% thread-safe (more precisely: they are reentrant).
The Ultimate XML Pull Parser is an "in-place" parser (it does not copy internally any strings, so that it's as fast as possible).
The Ultimate XML Pull Parser is one of the easiest-to-use XML Pull parsers (because it always returns zero-terminated char* or wchar_t*, in opposition to other "in-place" parsers).

The Ultimate XML Dom Parser supports "hot starts" and is able to parse a sub-section of the original XML file without doing any memory allocation at all. The "hot start" functionality is unique and is very important because it allows us to use a very flexible DOM-style Parser on UNLIMITED XML file size (see example 7 inside the documentation) using very little RAM memory.
The Ultimate XML Parser has an extensive (doxygen) documentation
The Ultimate XML Dom Parser is a good replacement for the old XMLParser library (The UXMLNode class from the Ultimate XML Dom Parser is a direct replacement to the XMLNode class from the old XMLParser library).
The Ultimate XML Parser is easy to customize: The code is concise, commented and written in a plain and simple way. Thus, if you really need to change something (but I doubt of it), it's easy.

To the best of my knowledge, there exists no other "non-validating C++ XML parser" that is as simple and as powerfull. 😄 This is especially true if you need to parse large XML documents: In such a case, there are no parser that comes even close to the Ultimate XML Parser presented here.

I selected the name "Ultimate" for the XML Parser because I cannot see how it would be possible to improve on the XML Parser Library presented here 😜. Of course, you can always add features such as "XML Validation",etc. but it will only produce a slower, more "bloated" library. It's really the "Ultimate XML parser" 😏 and if you are a professional developper serious about your work, you should use the "Ultimate XML parser" and no other parser 🙏.

License

The Ultimate XML Parser is distributed under the Aladdin Free Public License(AFPL).

The old XLMParser library is completely free and will remain free forever.
The Ultimate XML parser is also completely free in these situations:

You only need the Aladdin Free Public License(AFPL).
You need another license (e.g. a BSD license or a MIT license) but you'll use the Ultimate XML Parser inside:
- a computer video game (or anything related to video games).
- a software for a charity organization.

If you are not in the situations described herabove, you can still buy for a small fee a BSD license (or MIT license) to use the XML Parser inside all your projects: Simply to request your license.

Download

If you like this library, you can create a URL-Link towards this page from your website (use this URL: http://www.applied-mathematics.net/tools/UXMLParser.html). If you want to help other people to produce better softwares using XML technology, you can increase the visibility of this library by adding a URL-link toward this page (so that its google-ranking increases

!).

If you like this library, please add a message in the guestbook !

To obtain the library, simply , and I will send to you the Ultimate XML Parser directly, the same day. You will receive by e-mail a zip-file. Inside the zip file, you will find 5 examples:

ansi (char*) unix/solaris project example (makefile based)
ansi (char*) windows project example (for Visual Studio .NET)
ansi (char*) windows .dll project with a small test project to check the generated .dll
wide char (wchar_t*) unix/solaris project example (makefile based)
wide char (wchar_t*) windows project example (for Visual Studio .NET)

Log

Version changes:

V3.01: May 19, 2013: initial version.
V3.02: May 24, 2013: Various bug fixes & improvements.
v3.03: May 24, 2013: Performed extensive testing on large documents and fixed some remaining small bugs.

A small tutorial

Let's assume that you want to parse the XML file "PMMLModel.xml" that contains:

<?xml version="1.0" encoding="ISO-8859-1"?>
<PMML version="3.0"
  xmlns="http://www.dmg.org/PMML-3-0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema_instance" >
  <Header copyright="Frank Vanden Berghen">
     Hello World!
     <Application name="&lt;Condor>" version="1.99beta" />
  </Header>
  <Extension name="keys"> <Key name="urn"> </Key> </Extension>
  <DataDictionary>
    <DataField name="persfam" optype="continuous" dataType="double">
       <Value value="9.900000e+001" property="missing" />
    </DataField>
    <DataField name="prov" optype="continuous" dataType="double" />
    <DataField name="urb" optype="continuous" dataType="double" />
    <DataField name="ses" optype="continuous" dataType="double" />
  </DataDictionary>
  <RegressionModel functionName="regression" modelType="linearRegression">
    <RegressionTable intercept="0.00796037">
      <NumericPredictor name="persfam" coefficient="-0.00275951" />
      <NumericPredictor name="prov" coefficient="0.000319433" />
      <NumericPredictor name="ses" coefficient="-0.000454307" />
      <NONNumericPredictor name="testXmlExample" />
    </RegressionTable>
  </RegressionModel>
</PMML>

Let's analyse line by line the following small example program:

#include <stdio.h>    // to get the "printf" function
#include "xmlParser.h"

int main(int argc, char **argv)
{
  // This create a new Ultimate XML DOM parser:
  UXMLDomParser uDom;

  // This open and parse the XML file:
  UTCXMLNode xMainNode=uDom.openFileHelper("PMMLModel.xml","PMML");

  // This prints "<Condor>":
  UTCXMLNode xNode=xMainNode.getChildNode("Header");
  printf("Application Name is: '%s'\n", xNode.getChildNode("Application").getAttribute("name"));  

  // This prints "Hello world!":
  printf("Text inside Header tag is :'%s'\n", xNode.getText());

  // This gets the number of "NumericPredictor" tags:
  xNode=xMainNode.getChildNode("RegressionModel").getChildNode("RegressionTable");
  int n=xNode.nChildNode("NumericPredictor");

  // This prints the "coefficient" value for all the "NumericPredictor" tags:
  for (int i=0; i<n; i++)
    printf("coeff %i=%f\n",i+1,atof(xNode.getChildNode("NumericPredictor",i).getAttribute("coefficient")));

  // This create a UXMLRenderer object and use this object to print a formatted XML string based on 
  // the content of the first "Extension" tag of the XML file (more details below):
  printf("%s\n",UXMLRenderer().getString(xMainNode.getChildNode("Extension")));
  return 0;
}

To easily manipulate the data contained inside the XML file, the first operation is to create an UXMLDomParser object (in the above example, it's named "uDom") and use it to get an instance of the class UTCXMLNode that represents the XML file in memory. You can use:

UTCXMLNode xMainNode=uDom.openFileHelper("PMMLModel.xml","PMML");

or, if you use the UNICODE windows version of the library:

UTCXMLNode xMainNode=uDom.openFileHelper(L"PMMLModel.xml",L"PMML");

or, if the XML document is already in a memory buffer pointed by the variable "char *xmlDoc" :

UTCXMLNode xMainNode=uDom.parseString(xmlDoc,"PMML");

This will create an object called xMainNode that represents the first tag named PMML found inside the XML document. This object is the top of tree structure representing the XML file in memory. The following command creates a new object called xNode that represents the "Header" tag inside the "PMML" tag.

UTCXMLNode xNode=xMainNode.getChildNode("Header");

The following command prints on the screen "<Condor>" (note that the "<" character entity has been replaced by "<"):

printf("Application Name is: '%S'\n", xNode.getChildNode("Application").getAttribute("name"));

The following command prints on the screen "Hello World!":

printf("Text inside Header tag is :'%s'\n", xNode.getText());

Let's assume you want to "go to" the tag named "RegressionTable":

xNode=xMainNode.getChildNode("RegressionModel").getChildNode("RegressionTable");

Note that the previous value of the object named xNode has been "garbage collected" so that no memory leak occurs. If you want to know how many tags named "NumericPredictor" are contained inside the tag named "RegressionTable":

int n=xNode.nChildNode("NumericPredictor");

The variable n now contains the value 3. If you want to print the value of the coefficient attribute for all the NumericPredictor tags:

for (int i=0; i<n; i++)
  printf("coeff %i=%f\n",i+1,atof(xNode.getChildNode("NumericPredictor",i).getAttribute("coefficient")));

Or equivalently, but faster at runtime:

int iterator=0;
for (int i=0; i<n; i++)
  printf("coeff %i=%f\n",i+1,atof(xNode.getChildNode("NumericPredictor",&iterator).getAttribute("coefficient")));

If you want to generate and print on the screen the following XML formatted text:

<Extension name="keys">
  <Key name="urn" />
</Extension>

You can use:

UXMLRenderer uRenderer;
char *t=uRenderer.getString(xMainNode.getChildNode("Extension"),true);
printf("%s\n",t);

Note that you must NOT free youself the memory buffer containing the returned XML string (You must NOT write any "free(t);") : The memory buffer containing the XML string is owned by the uRenderer object and it will be free'd when the uRenderer object is destroyed (i.e. when it falls "out-of-scope"). The parameter true to the function getString() means that we want formatted output.

The Ultimate XML Parser library contains many more other small usefull methods that are not described here (The zip file contains some additional examples to explain other functionalities and a complete Doxygen documentation about the UXMParser). These methods allows you to:

navigate easily inside the structure of the XML document.
create, update & save your own XML structure of UTCXMLNode's.

That's all folks! With this basic knowledge, you should be able to retreive easily any data from any XML file!