Friday, June 28, 2013

Solr Cell: captureAttr parameter and metadata confusion in book

I had to untangle the description and example for the captureAttr parameter of Solr Cell in the book. The cleaned-up description will be in EAR #2. It turns out that this parameter covers the "meta metadata" for the document. With captureAttr set to true (as it is in the standard Solr example), the meta metadata will be captured by the "meta" field, which is typically mapped to "attr_meta" using the "uprefix" parameter. But with captureAttr set to false, the meta metadata will be sent as literal text (attribute name and value with white space between them all) to the "content" field. See the updated examples!
 
I also added a caveat for the fact that the <div> HTML tag cannot be captured – due to a limitation in Tika. This would include its "id" attribute as well.

-- Jack Krupansky

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home