Saturday, August 31, 2013

What's next for Solr 4.x Deep Dive EAR#7?

I haven't made any final decision on what to focus on for Early Access Release #7 of my Solr 4.x Deep Dive e-book, but here's my menu that I expect to choose from:
  • More of the admin API handlers
  • Query elevation component
  • A start on SolrCloud – at least a robust glossary
  • Traditional distributed Solr (replication and shards)
  • A start on Data Import Handler – maybe a focus on non-database file import, and a robust glossary
My default will be to continue with the rest of the admin API to get that fundamental area out of the way.
 
I really do want to tackle SolrCloud ASAP, but it's a lot of material that requires a lot of thought.
 
-- Jack Krupansky

Friday, August 30, 2013

Early Access Release #6 for Solr 4.x Deep Dive is now available for download on Lulu.com

Okay, it's hot off the e-presses: Solr 4.x Deep Dive, Early Access Release #6 is now available for purchase and download as an e-book for $9.99 on Lulu.com at:
 
 
(That link says "1", but it apparently correctly redirects to EAR #6.)
 
Summary of changes:
  • Coverage of Core Admin API
Total of 56 pages of additional content, with two new appendices (solr.xml format, new and legacy.)
 
Please feel free to email or comment on this blog for any questions or issues related to the book.
 
Thanks!

-- Jack Krupansky

Friday, August 23, 2013

Working on Core Admin API for Early Access Release #6 of Solr 4.x Deep Dive

I've been plodding away on the Core Admin API chapter for Early Access Release #6 or my Solr 4.x Deep Dive e-book, to be published as an e-book on Lulu.com next Friday, August 30, 2013. There is certainly plenty of undocumented nuance to be covered in a deep dive. So far, 90% of my book time has been spent reading code and working on examples and only 10% actual writing for the book. After another day or so hopefully that will change to at least 50% writing.

-- Jack Krupansky

Friday, August 16, 2013

Early Access Release #5 for Solr 4.x Deep Dive is now available for download on Lulu.com

Okay, it's hot off the e-presses: Solr 4.x Deep Dive, Early Access Release #5 is now available for purchase and download as an e-book for $9.99 on Lulu.com at:
 
 
(That link says "1", but it apparently correctly redirects to EAR #5.)
 
Summary of changes:
  • Coverage of Real-time Get component
  • Coverage of Terms Component
  • Coverage of Term Vectors Component
  • Coverage of Highlighting Component
  • Round a decimal number. I added a JavaScript script for the StatelessScriptUpdate processor which takes an input field, a number of decimal digits (default is zero), an output field (defaults to replacing the input field), and an optional flag for whether the rounded decimal number should have its type changed to integer (default is to stay as a float decimal.) Handles multivalued fields.
  • Append a field onto another field. This is just a use of the Clone and Concat update processors, using various delimiters. Also an example that uses the Ignore Field update processor to remove the source field after it has been appended.
  • Map country code to continent code. This JavaScript script for the StatelessScriptUpdate processor can do the mapping in-place or output to another field. Option for case of output string (default is lower case). Handles multivalued fields. Unmappable input values are preserved as-is.
Total of 281 pages of additional content.
 
Please feel free to email or comment on this blog for any questions or issues related to the book.
 
Thanks!

-- Jack Krupansky

Wednesday, August 14, 2013

Progressing on Highlighting for EAR#5 of Solr 4.x Deep Dive

For the past week I have been slogging away on Highlighting. A hefty chunk of that time has been research, reading the source code, and coming up with examples and testing them. Oh, and then some actual writing. I still have a fair amount to do, but so far I have about 60 pages. Highlighting has a lot of parameters and sub-components – 46 main parameters and then a lot of them have field-specific variants.
 
I will also be including coverage of the new postings highlighter.
 
As with the spell checker, there will probably still be a number of nooks and crannies and odd parameter combinations that I don't quite mange to cover as fully as I would like. I already have the lion share of the highlighting coverage that I intended in place right now. I'll probably put in any day or so and go ahead and publish what I have. It it already way, way beyond what is currently available on the wiki and in Javadoc.
 
I still intend to publish Early Access Release #5 for Solr 4.x Deep Dive on Friday, August 16, 2013.
 
After highlighting, I may move on to grouping, and maybe take another stab at SolrCloud.
-- Jack Krupansky

Wednesday, August 07, 2013

Added some update processor scripts to EAR#5 for Solr 4.x Deep Dive

I was going to take a day off to get some perspective on the next phase of the book, but instead I saw several questions on the Solr email list that begged for solution using update processors. So, I added them as examples:
  1. Round a decimal number. I added a JavaScript script for the StatelessScriptUpdate processor which takes an input field, a number of decimal digits (default is zero), an output field (defaults to replacing the input field), and an optional flag for whether the rounded decimal number should have its type changed to integer (default is to stay as a float decimal.) Handles multivalued fields.
  2. Append a field onto another field. This is just a use of the Clone and Concat update processors, using various delimiters. Also an example that uses the Ignore Field update processor to remove the source field after it has been appended.
  3. Map country code to continent code. This JavaScript script for the StatelessScriptUpdate processor can do the mapping in-place or output to another field. Option for case of output string (default is lower case). Handles multivalued fields. Unmappable input values are preserved as-is
Okay, now I can get back to deciding whether to tackle highlighting next. I think that at a minimum I will take a stab at it for a couple of days, see how far I can get before getting too bogged down in the more confusing aspects, and then reconsider what to focus on next for EAR#5, which is still scheduled for publication on Friday, August 16, 2013.
 
-- Jack Krupansky

Sunday, August 04, 2013

Finished adding term vector component coverage to EAR#5 for Solr 4.x Deep Dive

I finally finished adding coverage for the Solr term vector component to Early Access Release #5 of my Solr 4.x Deep Dive book. I thought I would have finished it on Thursday, but it took a lot longer than expected as I dove deeper and deeper. It comprises a 78-page chapter, plus another four pages or so in the introductory tutorial.
 
Now, on to... hmmm... not sure. Maybe highlighting, but I think I'll take a day off to contemplate what make most sense to tackle next. Highlighting would make sense since term vectors are used in highlighting and are fresh on my mind.
 
Expected publication date for EAR#5 is Friday, August 16, 2013.

-- Jack Krupansky

Thursday, August 01, 2013

Added a script that skips documents based on a pattern match in a field

I just added another example script for the StatelessScriptUpdate processor that skips the indexing of documents that match a specified pattern in a specified field, with a case sensitivity option as well. The specific example checks the type of a file name, but the script can work for any field and any pattern.
 
I also cleaned up the language for what happens when a script returns a value of "false" – it aborts the current update command, and does not execute any remaining update processors.
 
Now back to coverage of the Solr term vector component. Oh, that's another mistake I had made – that's "vector" singular, not "vectors" plural. Minor, but I seek to be accurate.
 
I toyed with the idea of publishing Early Access Release #5 of Solr 4.x Deep Dive tomorrow since I have bunch of new material, but my preference is to get even more material for two weeks from now.

-- Jack Krupansky