The following example reads a file with MARC records and writes MARCXML records in UTF-8 encoding to the console:
InputStream input = new FileInputStream("input.mrc") MarcReader reader = new MarcStreamReader(input); MarcWriter writer = new MarcXmlWriter(System.out, true); while (reader.hasNext()) { Record record = reader.next(); writer.write(record); } writer.close();
To perform a character conversion like MARC-8 to UCS/Unicode register a CharConverter
:
writer.setConverter(new AnselToUnicode());
In addition you can perform Unicode normalization. This is for example not done by the MARC-8 to UCS/Unicode converter. With Unicode normalization text is transformed into the canonical composed form. For example "a�bc" is normalized to "�bc". To perform normalization set Unicode normalization to true:
writer.setUnicodeNormalization(true);
Please note that it's not garanteed to work if you try to convert normalized Unicode back to MARC-8 encoding using {@link org.marc4j.converter.impl.UnicodeToAnsel}.
This class provides very basic formatting options. For more advanced options create an instance of this class with a {@link javax.xml.transform.sax.SAXResult} containing a {@link org.xml.sax.ContentHandler} derived from a dedicated XML serializer.
The following example uses org.apache.xml.serialize.XMLSerializer
to write MARC records to XML using MARC-8 to UCS/Unicode conversion and Unicode normalization:
InputStream input = new FileInputStream("input.mrc") MarcReader reader = new MarcStreamReader(input); OutputFormat format = new OutputFormat("xml","UTF-8", true); OutputStream out = new FileOutputStream("output.xml"); XMLSerializer serializer = new XMLSerializer(out, format); Result result = new SAXResult(serializer.asContentHandler()); MarcXmlWriter writer = new MarcXmlWriter(result); writer.setConverter(new AnselToUnicode()); while (reader.hasNext()) { Record record = reader.next(); writer.write(record); } writer.close();
You can post-process the result using a Source
object pointing to a stylesheet resource and a Result
object to hold the transformation result tree. The example below converts MARC to MARCXML and transforms the result tree to MODS using the stylesheet provided by The Library of Congress:
String stylesheetUrl = "http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3.xsl"; Source stylesheet = new StreamSource(stylesheetUrl); Result result = new StreamResult(System.out); InputStream input = new FileInputStream("input.mrc") MarcReader reader = new MarcStreamReader(input); MarcXmlWriter writer = new MarcXmlWriter(result, stylesheet); writer.setConverter(new AnselToUnicode()); while (reader.hasNext()) { Record record = (Record) reader.next(); writer.write(record); } writer.close();
It is also possible to write the result into a DOM Node:
InputStream input = new FileInputStream("input.mrc") MarcReader reader = new MarcStreamReader(input); DOMResult result = new DOMResult(); MarcXmlWriter writer = new MarcXmlWriter(result); writer.setConverter(new AnselToUnicode()); while (reader.hasNext()) { Record record = (Record) reader.next(); writer.write(record); } writer.close(); Document doc = (Document) result.getNode();@author Bas Peters @version $Revision: 1.9 $
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|