Christoph Hartmann on April 1st, 2008

The Groovy documentation gives the first hints about how to use the Groovy XML Parser. This post describes how to read an XML file into a groovy class instance. To check for a valid input document advanced XML programs require XSDs for a formal description. Of course the XML processing task becomes more difficult with XML namespaces but the following expample shows how easy XML processing is with Groovy.

This demo illustrates how to read an XML file into Groovy objects. As the background I used the student lecture management. It stores students, assigned lectures with results and specific details about the lecture itself. The students are stored within a studenten node which itself contains a node for every student. A student may have a name, birthday and its assignments.

The second major node is the lehrveranstaltungen node that contains details about the lectures. The student assignments and the lectures are linked via identifier.

The following XML file is used as xml input:

<?xml version="1.0" encoding="UTF-8"?>
<studentenverwaltung
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="studentenverwaltung.xsd"
xmlns="http://www.hpi.uni-potsdam.de/studentenverwaltung">
<studenten>
<student geburtstag="19821221" matrikelnummer="123456">
<vorname>John</vorname>
<nachname>Doe</nachname>
<belegungen>
<belegung lehrveranstaltung="_12" note="1.0"/>
</belegungen>
</student>
<student geburtstag="19821221" matrikelnummer="654321">
<vorname>Alice</vorname>
<nachname>Doe</nachname>
<belegungen>
<belegung lehrveranstaltung="_12" note="1.0"/>
</belegungen>
</student>
</studenten>
<lehrveranstaltungen>
<lehrveranstaltung id="_12" belegungspunkte="6">
<titel>Datenorientiertes XML</titel>
<dozent>Martin von Löwis</dozent>
<beschreibung>
Das ist eine ganz nützliche Veranstaltung.
</beschreibung>
</lehrveranstaltung>
</lehrveranstaltungen>
</studentenverwaltung>

This whole document with two students will be read into 3 classes:

  • Student,
  • Belegung and
  • Lehrveranstaltung

These classes are quite small compared to Java classes:

class Student{
      String geburtstag
      String matrikelnummer
      String vorname
      String nachname
      def belegungen = []
 
      String toString() {
          "Student: $vorname $nachname nMartrikelnummer: $matrikelnummer) nBelegungen: $belegungen"
      }
}
 
class Belegung {
	  Lehrveranstaltung lehrveranstaltung
	  String note
 
	  String toString(){
		  "nLehrveranstaltung: $lehrveranstaltung  Note: $note"
 
	  }
}
 
class Lehrveranstaltung {
      String  id
      int     belegungspunkte
      String  titel
      String  dozent
      String  beschreibung
 
      String toString(){
    	  "$titel ($dozent)"
      }
}

Now the XML processing starts. At first we have to initialize the Namespace and load the XML file into a variable.

// create namespace
def ns = new groovy.xml.Namespace("http://www.hpi.uni-potsdam.de/studentenverwaltung", 'ns')
def root = new XmlParser().parse(new File('studies.xml'))

After parsing the xml document it’s easy to iterate over the nodes with Groovy. A syntax like

root[ns.studenten][ns.student].each { xmlstudent -&gt;
...
}

reads all Stutent nodes within the Studenten node. The called each method iterates over the result list and provide the xmlstudent variable to access the current student within the following code block. With the help of this code block it is easy to read the student vales into a new object.

// read core data
	student = new Student(
		vorname:             xmlstudent[ns.vorname].text(),
		nachname:           xmlstudent[ns.nachname].text(),
		matrikelnummer:  xmlstudent.'@matrikelnummer',
		geburtstag:          xmlstudent.'@geburtstag'
	)

The constructor behavior is special compared to Java. Groovy allows to initialize class members via the constructor. Additionally the instruction xmlstudent.’@matrikelnummer’ reads the xml attribute matrikelnummer from the xml student node.

The second task is to read in all lecture assignments for the student. This task is more complex because the lectures are stored within it’s own node. For that reason we have to iterate over all assignments stored within the belegung nodes and use the assigned lecture link to find the right lecture.

// read belegungen
xmlstudent[ns.belegungen][ns.belegung].each { belegung -&gt;
...
}

To include the content of the lecture node we have to search for this lecture node now. The required lecture link is taken from the lehrveranstaltung attribute of the current assignment. This lecture lookup is written like:

// read lecture
xmlLehr = root[ns.lehrveranstaltungen][ns.lehrveranstaltung].find{
	it.'@id' == belegung.'@lehrveranstaltung'
}

With this lecture node it is possible to fill the student class with the student’s assignments.

// read assignment
student.belegungen += new Belegung(
	note : belegung.'@note',
	lehrveranstaltung : new Lehrveranstaltung(
		id					:xmlLehr.'@id',
		belegungspunkte		:xmlLehr.'@belegungspunkte',
		titel				:xmlLehr[ns.titel].text(),
		dozent				:xmlLehr[ns.dozent].text(),
		beschreibung		:xmlLehr[ns.beschreibung].text()
	)
)

These few lines of code are enough to read a namespace xml into Groovy. The iteration over all nodes and the simple attributes access is useful to keep the amount of code low. If you are interested to try out the whole example you are welcome to download the attached zip file that contains the Eclipse project with all required files.

Download:

Download
Code Sample

Tags: ,

Leave a Reply