Search Engine Studio FAQ

Question ID:
How can I specify an exact list of URLs/documents to be indexed? / What's the format of indexer's input XML file?
You should use the XML mode of the indexer if you want to specify a precise list of URLs/documents to be indexed.
Below is more information about the format of the XML file to be used. Please note that this is the same format that's used to import structures to Xtreeme SiteXpert. The difference is that, unlike in SiteXpert, no structure needs to be created, the URLs should be simply listed in a long list.
The XML document should conform to the following DTD (document type definition):
<!DOCTYPE sitemap [
  <!ELEMENT sitemap (node+)>
  <!ELEMENT node (node*|text)*>
  <!ATTLIST node
    href CDATA #IMPLIED>
  <!ELEMENT text (#PCDATA)>
Here's a sample XML file that can be used:
<node href=""/>
<node href=""/>
<node href=""/>
<node href=""/>
<node href=""/>
<node href=""/>
If you have a different format of XML data, you can easily convert it to the above format by specifying an XSLT transformation file.