Machine Learning

Table of Contents

Courses

Course CS229

Caltech course Learning From Data

A real Caltech course, not a watered-down version: http://work.caltech.edu/telecourse.html

  • Notes

Competitions

Books

16 Free eBooks On Machine Learning1

  1. The LION Way: Machine Learning plus Intelligent Optimization http://www.e-booksdirectory.com/details.php?ebook=9575
  2. A Course in Machine Learning http://www.e-booksdirectory.com/details.php?ebook=9395
  3. A First Encounter with Machine Learning http://www.e-booksdirectory.com/details.php?ebook=8818
  4. Bayesian Reasoning and Machine Learning http://www.e-booksdirectory.com/details.php?ebook=5283
  5. Introduction to Machine Learning http://www.e-booksdirectory.com/details.php?ebook=4493
  6. The Elements of Statistical Learning: Data Mining, Inference, and Prediction http://www.e-booksdirectory.com/details.php?ebook=3267
  7. Reinforcement Learning by C. Weber, M. Elshaw, N. M. Mayer http://www.e-booksdirectory.com/details.php?ebook=3227
  8. Machine Learning by Abdelhamid Mellouk, Abdennacer Chebira http://www.e-booksdirectory.com/details.php?ebook=2852
  9. How Are We To Know? by Nils J. Nilsson http://www.e-booksdirectory.com/details.php?ebook=2710
  10. Reinforcement Learning: An Introduction http://www.e-booksdirectory.com/details.php?ebook=1825
  11. Gaussian Processes for Machine Learning http://www.e-booksdirectory.com/details.php?ebook=1774
  12. Machine Learning, Neural and Statistical Classification http://www.e-booksdirectory.com/details.php?ebook=1118
  13. Introduction To Machine Learning http://www.e-booksdirectory.com/details.php?ebook=1117
  14. Inductive Logic Programming: Techniques and Applications http://www.e-booksdirectory.com/details.php?ebook=1105
  15. Practical Artificial Intelligence Programming in Java http://www.e-booksdirectory.com/details.php?ebook=32
  16. Information Theory, Inference, and Learning Algorithms http://www.e-booksdirectory.com/details.php?ebook=21

Machine Learning: An Algorithmic Perspective" by Stephen Marsland

Lib

Deep Learning

  • Theano is the most mature of deep learning library. It provides nice data structures (tensors) to represent layers of neural networks and they are efficient in terms of linear algebra similar to Numpy arrays. There are a lot of libraries which build on top of Theano exploiting its data structures.
  • PyLearn2 is another library built on top of Theano.
  • Caffe is a deep learning framework developed with cleanliness, readability, and speed in mind.
  • nolearn contains a number of wrappers around existing neural network libraries, along with a few machine learning utility modules. Most functionality is written to be compatible with the the excellent scikit-learn library.
  • OverFeat is a Convolutional Network-based image classifier and feature extractor. which is written in C++ but it comes with a Python wrapper as well(along with Matlab and Lua). It uses GPU through Torch library so it is quite fast.
  • Hebel is another neural network library comes along with GPU support out of the box. You could determine the properties of your neural networks through YAML files(similar to Pylearn2) which provides a nice way to separate your neural network from the code and quickly run your models.
  • NeuroLab is another neural network library which has nice api(similar to Matlab’s api if you are familiar) It has different variants of Recurrent Neural Network(RNN) implementation unlike other libraries.

Python

  • SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering. In particular,
  • scipy-cluster An extension to Scipy for generating, visualizing, and analyzing hierarchical clusters.
  • orange Open source data visualization and analysis for novice and experts. Data mining through visual programming or Python scripting.
  • Scikit-learn: comprehensive and easy to use.
  • SciKit-Learn Laboratory provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. One of the primary goals of our project is to make it so that you can run scikit-learn experiments without actually needing to write any code other than what you used to generate/extract the features.
  • PyBrain: Neural networks are one thing that are missing from SciKit-learn, but this module makes up for it.
  • nltk: really useful if you’re doing anything NLP or text mining related.
  • Theano: efficient computation of mathematical expressions using GPU. Excellent for deep learning.
  • Pylearn2: machine learning toolbox built on top of Theano - in very early stages of development.
  • MDP (Modular toolkit for Data Processing): a framework that is useful when setting up workflows.
  • JSAT stands for “Java Statistical Analysis Tool” - created by Edward Raff and was born out of his frustation with Weka.
  • Elefant toolkit that includes kernel methods, optimization strategies and belief propagation. Elefant is developed by the ADA (Automated Data Analysis) group at NiCTA, Australia.
  • Milk toolkit for python that includes SVMs, decision trees, kNN, PCA, Kmeans, NMF and feature selection
  • Peach is a pure-python module, based on SciPy and NumPy to implement algorithms for computational intelligence and machine learning. Methods implemented include, but are not limited to, artificial neural networks, fuzzy logic, genetic algorithms, swarm intelligence and much more.
  • Pebl is a python library and command line application for learning the structure of a Bayesian network given prior knowledge and observations.
  • PyMVPA: python module including more classifiers, regression and feature selection methods than can be listed here.
  • Monte (python) is a Python framework for building gradient based learning machines, like neural networks, conditional random fields, logistic regression, etc. Monte contains modules (that hold parameters, a cost-function and a gradient-function) and trainers (that can adapt a module's parameters by minimizing its cost-function on training data).
  • mlpy - Python module that includes Wavelet transforms, Kernel methods, FDA, PDA, LASSO, LARS, feature selection and ranking and data management. Very clean interface.
  • Modular toolkit for Data Processing - Python toolkit for data processing. In my opinion the API needs a little getting used to. Includes PCA, Kmeans, RMBs, FastICA, Neural Gas, SVms, Perceptrons and many more.
  • PyML is an interactive object oriented framework for machine learning written in Python. PyML focuses on SVMs and other kernel methods.
  • PyMC is a python module that implements Bayesian statistical models and fitting algorithms, including Markov chain Monte Carlo.
  • Ramp - Rapid Machine Learning Prototyping
  • PythonForArtificialIntelligence attempts to collect information and links pertaining to the practice of AI and Machine Learning in python.
  • Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.
  • Gensim is defined as “topic modeling for humans”. As its homepage describes, its main focus is Latent Dirichlet Allocation (LDA) and its variants.
  • ffnet is a fast and easy-to-use feed-forward neural network training solution for python.

Java

  • Spark is a fast and general engine for large-scale data processing, includes MLLib, which contains a good selection of machine learning algorithms, including classification, clustering and recommendation generation. Currently undergoing rapid development. Development can be in Python as well as JVM languages.
  • Weka is a collection of machine learning algorithms for data mining tasks.
  • Mahout: Apache’s machine learning framework built on top of Hadoop, this looks promising, but comes with all the baggage and overhead of Hadoop.
  • MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

C/C++

  • Vowpal Wabbit: is the vowpal wabbit fast online learning code.
  • MultiBoost is a multi-class / multi-label / multi-task classification boosting software implemented in C++.
  • SHARK is a fast, modular, feature-rich open-source C++ machine learning library.
  • Waffles: A machine learning toolkit.
  • Dlib is a general purpose cross-platform C++ library designed using contract programming and modern C++ techniques.
  • PLearn is a C++ library aimed at research and development in the field of statistical machine learning algorithms.
  • MLC++ is a library of C++ classes for supervised machine learning.
  • mlpack a scalable c++ machine learning library
  • LIBSVM A Library for Support Vector Machines. Both C++ and Java sources.
  • LibLinear A Library for Large Linear Classification
  • Cluster implement the most commonly used clustering methods for gene expression data analysis.
  • SHOGUN is designed for unified large-scale learning for a broad range of feature types and learning settings, like classification, regression, or explorative data analysis.
  • VFML (Very Fast Machine Learning) toolkit for mining high-speed data streams and very large data sets. VFML is written in standard C (and a bit of Python).
  • Stochastic Gradient Descent library for SVMs with stochastic gradient descent (C++)
  • Maximum Entropy Modeling Toolkit for Python and C++
  • dbacl a digramic Bayesian classifier - a collection of command line tools for Bayesian classification particularly for spam filtering

.NET

  • Accord.NET: this seems to be pretty comprehensive, and comes recommended by primaryobjects on Reddit.
  • use one of the Java libraries compiled to .NET using IKVM.

Projects/Software

mloss

mloss: machine learning open source software.

GNU/Linux AI & Alife HOWTO

Articles

related things

Blog

Footnotes:

Author: Shi Shougang

Created: 2015-03-05 Thu 23:20

Emacs 24.3.1 (Org mode 8.2.10)

Validate