Books, tutorials, and talks on Mahout
Independent Developer of Advanced Software Technology, specializing in Software Agent Technology
The Watchmaker Framework is an extensible, high-performance, object-oriented framework for implementing platform-independent evolutionary/genetic algorithms in Java. The framework provides type-safe evolution for arbitrary types via a non-invasive API. The Watchmaker Framework is Open Source software, free to download and use subject to the terms of the Apache Software Licence, Version 2.0.
Evolutionary algorithms (EAs) are inspired by the biological model of evolution and natural selection first proposed by Charles Darwin in 1859. In the natural world, evolution helps species adapt to their environments. Environmental factors that influence the survival prospects of an organism include climate, availability of food and the dangers of predators.Species change over the course of many generations. Mutations occur randomly. Some mutations will be advantageous, but many will be useless or detrimental. Progress comes from the feedback provided by non-random natural selection....Evolutionary algorithms are based on a simplified model of this biological evolution. To solve a particular problem we create an environment in which potential solutions can evolve. The environment is shaped by the parameters of the problem and encourages the evolution of good solutions.The field of Evolutionary Computation encompasses several types of evolutionary algorithm. These include Genetic Algorithms (GAs), Evolution Strategies, Genetic Programming (GP), Evolutionary Programming and Learning Classifier Systems.The most common type of evolutionary algorithm is the generational genetic algorithm. We'll cover other EA variants in later chapters but, for now, all of the evolutionary algorithms that we meet will be some kind of generational GA.The basic outline of a generational GA is as follows (most other EA variants are broadly similar). A population of candidate solutions is iteratively evolved over many generations. Mimicking the concept of natural selection in biology, the survival of candidates (or their offspring) from generation to generation in an EA is governed by a fitness function that evaluates each candidate according to how close it is to the desired outcome, and a selection strategy that favours the better solutions. Over time, the quality of the solutions in the population should improve. If the program is successful, we can terminate the evolution once it has found a solution that is good enough.
Taste is a flexible, fast collaborative filtering engine for Java. The engine takes users' preferences for items ("tastes") and returns estimated preferences for other items. For example, a site that sells books or CDs could easily use Taste to figure out, from past purchase data, which CDs a customer might be interested in listening to.Taste provides a rich set of components from which you can construct a customized recommender system from a selection of algorithms. Taste is designed to be enterprise-ready; it's designed for performance, scalability and flexibility. Taste is not just for Java; it can be run as an external server which exposes recommendation logic to your application via web services and HTTP.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.
The Apache Mahout machine learning library's goal is to build scalable machine learning libraries.Currently Mahout supports mainly four use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from exisiting categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. Frequent itemset mining takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.