Sunday, June 30, 2013

Solr book updates for function queries and update processors and Hot Spots page

I just finished another wave of formatting and indexing for the Function Queries chapter. This includes an alphabetical table of the functions.
 
The Update Processors chapter now has a clean, clickable table of update processors, with shortened name and short summary.
 
I also added a new Preface page, called "Solr Hot Spots" that has direct links to some particular sections of the book that warrant highlighting, such as the list of functions, the list of update processors, and the grammar specifications for the main query parsers (Solr/Lucene, dismax, and edismax). The Update XML and JSON chapters are also "hotlinked" on that preface page. The general idea is that the reader can access a lot of interesting content even before they dive deep into the Table of Contents.
 
Now I'm down to sorting through all of my notes to decide what else to include in EAR #2 at the end of the week.

-- Jack Krupansky

Saturday, June 29, 2013

Faceting formatting and indexing improved for book

The bulk of the formatting for the Faceting chapter of the book for EAR#2 is now complete. This includes all of the section headers and indexing of terms, as well as a clean table for the faceting parameters. That means that the bulk of the work that I intended to complete for EAR#2 is now finished. There are still a lot of smaller issues that I wanted to get to, and will, but overall EAR#2 is now close at hand and on schedule for release by the end of the coming week.

-- Jack Krupansky

Friday, June 28, 2013

Solr Cell: captureAttr parameter and metadata confusion in book

I had to untangle the description and example for the captureAttr parameter of Solr Cell in the book. The cleaned-up description will be in EAR #2. It turns out that this parameter covers the "meta metadata" for the document. With captureAttr set to true (as it is in the standard Solr example), the meta metadata will be captured by the "meta" field, which is typically mapped to "attr_meta" using the "uprefix" parameter. But with captureAttr set to false, the meta metadata will be sent as literal text (attribute name and value with white space between them all) to the "content" field. See the updated examples!
 
I also added a caveat for the fact that the <div> HTML tag cannot be captured – due to a limitation in Tika. This would include its "id" attribute as well.

-- Jack Krupansky

SolrCell vs. Solr Cell in book

Oops... I was always sure that Solr Cell was written as SolrCell, without a space, but... I was wrong. After researching a little more carefully, I concluded that the proper name for the Solr Content Extraction Library is Solr Cell, as two separate words.
 
EAR #2 for the book will be updated accordingly. Still on schedule for the end of next week.

-- Jack Krupansky

Thursday, June 27, 2013

Added some scripting update processor examples to EAR#2 of the book

Over the past two days I have added some examples of scripted update processors to EAR#2 of the book, due out at the end of next week. They use the Solr StatelessScriptUpdateProcessorFactory class to run JavaScript scripts when documents are being indexed.
 
One set of examples, using the "split-string" script to split a single input string for a field into a list of strings for a multivalued field based on a character delimiter, such as comma, slash, or newline, or based on a regular expression pattern.
 
The other set of examples use the "normalize-date" script to do one of two things: 1) expand abbreviated dates into full ISO dates, such as "2012" (year only), "2012-04-15" (day only), and "2012-04-15T01:02:03" (missing trailing "Z"), and 2) to truncate data values based on some unit, such as day (ignore the time), year (ignore the month, day, and time), etc. The latter is great for field facets to limit the number of unique values, such as for pivot facets.

-- Jack Krupansky

Tuesday, June 25, 2013

Oops - forgot Grouping in the book

Oops... I'm not sure how I missed it, but I neglected to put "Grouping" in the list of unfinished coverage for the book. Adding it to the list now.

-- Jack Krupansky

Monday, June 24, 2013

Progress update on Solr 4.x Deep Dive EAR #2 - 6/24/2013

After releasing the book, Solr 4.x Deep Dive - Early Access Release #1, on Friday, I immediately started work on EAR #2, with an expected release to about two weeks.
 
My main focus for this second release will be cleanup, formatting, and expanding the index. I suspect that these tasks will keep me busy enough that I may not get to any new material.
 
I did add a "Summary Table of Contents" section in the Preface for easier browsing of the book contents since the main TOC is so voluminous. The summary has two parts, the list of parts of the book and the list of chapters by part. The list entries are hyperlinked to the relevant part or chapter.
 
I've already formatted the section headers for all of the "Indexing Data" chapters. I added a fair number of indexing entries for that material as well. I still need to do the same for the Faceting chapter, which was the last and newest chapter I wrote before releasing EAR #1.
 
I found and fixed a few technical errors in the text as well – please comment on any errors you may encounter.
 
I also added some items to the "TO DO" list – including real-time "/get", which somehow slipped between the cracks.

-- Jack Krupansky

Draft of my new book released on Lulu.com: Solr 4.x Deep Dive - Early Access Release #1

Last Friday I released a preliminary draft of my new book, Solr 4.x Deep Dive - Early Access Release #1, as a PDF e-book on Lulu.com.
 
I expect to be updating in every month, if not every other week, both with improvements to the existing material and the addition of new material.
 
See:
 
Expect frequent updates on my progress with the book on this blog

-- Jack Krupansky