Data Mining: Practical Machine Learning Tools and Techniques
As with any burgeoning technology that enjoys commercial attention, the use of data mining is surrounded by a great deal of hype. Exaggerated reports tell of secrets that can be uncovered by setting algorithms loose on oceans of data. But there is no magic in machine learning, no hidden power, no alchemy. Instead there is an identifiable body of practical techniques that can extract useful information from raw data. This book describes these techniques and shows how they work.
The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same, it has been updated to reflect the changes that have taken place over five years, and now has nearly double the references. The highlights for the new edition include thirty new technique sections; an enhanced Weka machine learning workbench, which now features an interactive interface; comprehensive information on neural networks; a new section on Bayesian networks; plus much more.
- Authors, Ian Witten and Eibe Frank, recipients of the 2005 ACM SIGKDD Service Award.
- Algorithmic methods at the heart of successful data mining including tried and true techniques as well as leading edge methods;
- Performance improvement techniques that work by transforming the input or output;
- Downloadable Weka, a collection of machine learning algorithms for data mining tasks, including tools for data pre-processing, classification, regression, clustering, association rules, and visualization in a new, interactive interface.
Review By: Dr. Tilmann Bruckhaus
03/24/2006The authors of "Data Mining--Practical Machine Learning Tools and Techniques" cover a wide variety of machine learning techniques and provide clear, comprehensive information on each algorithm. Readers only need to have more than a basic understanding of statistics to grasp the information presented, which makes this book easily accessible to software engineers and testers.
The book begins by defining data mining and reviewing examples of practical situations in which data mining is typically applied: decisions involving judgment, screening, prediction and forecasting, diagnosis, marketing and sales, performance analysis, and classification. Several important sections follow the introduction: inputs used for data mining, useful data mining algorithms, and the outputs they generate. The authors then cover another key aspect of data mining--assessing the credibility of the obtained results and evaluating the outcomes. Additional chapters with practical and technical details round out the first part of the book. These chapters cover implementation details on machine learning algorithms--transformations that can add value to inputs and outputs, extensions, and applications.
The second part of the book covers the Weka machine learning workbench, including the main functional components of Weka, how Weka functionality can be embedded into applications, and finally how Weka can be enhanced with new data mining capabilities.
The book includes helpful charts, mathematical notation, pseudo code, diagrams, and tables in key places. Here and there, a paragraph or a single page delves deep into statistical subject matter--passages clearly marked with gray sidebars so they can easily be skipped without loss of continuity. The presentation of the material is geared toward the technical non-expert. The book introduces concepts in general and easily understood terms, and gradually adds technical details. The clear organization of the book lets the hurried reader quickly find specific themes and topics.
If you have that nagging feeling that the problem you are working on cannot be solved with traditional programming techniques or mainstream statistics, then you will want to read this book. It covers cutting-edge, data mining technology that forward-looking organizations use to successfully tackle problems that are complex, highly dimensional, chaotic, non-stationary (changing over time), or plagued by The writing style is well-rounded and engaging without subjectivity, hyperbole, or ambiguity. I consider this book a classic already!