国产一级a片免费看高清,亚洲熟女中文字幕在线视频,黄三级高清在线播放,免费黄色视频在线看

打開APP
userphoto
未登錄

開通VIP,暢享免費電子書等14項超值服

開通VIP
C++的機器學習開源庫

from http://blog.sina.com.cn/s/blog_569d6df801014x4x.html

 一、c++開源機器學習庫


1)mlpack is a C++ machine learninglibrary.

2)PLearn is a C++ library aimed atresearch and development in the field of statistical machinelearning algorithms. Its originality is to allow to easily express,directly in C++ in a straightforward manner, complex non-linearfunctions to be optimized.


3)Waffles- C++ Machine Learning。
4)Torch7 provides a Matlab-likeenvironment for state-of-the-art machine learning algorithms. It iseasy to use and provides a very efficient implementation

5)SHARK is a modular C++ libraryfor the design and optimization of adaptive systems. It providesmethods for linear and nonlinear optimization, in particularevolutionary and gradient-based algorithms, kernel-based learningalgorithms and neural networks, and various other machine learningtechniques. SHARK serves as a toolbox to support real worldapplications as well as research in different domains ofcomputational intelligence and machine learning. The sources arecompatible with the following platforms: Windows, Solaris, MacOS X,and Linux.

6)Dlib-ml is an open sourcelibrary, targetedat both engineers and research scientists, whichaims to provide a similarly rich environment fordeveloping machinelearning software in the C++ language.

7) Eblearn is an object-orientedC++ library that implements various machine learning models,including energy-based learning, gradient-based learning formachine composed of multiple heterogeneous modules. In particular,the library provides a complete set of tools for building,training, and running convolutional networks.

8)  Machine Learning Open Source Software:Journal of Machine Learning Research: http://jmlr.csail.mit.edu/mloss/.

9) search in google: c++ site:jmlr.csail.mit.edufiletype:pdf  , Machine Learning Toolkit

10) SIGMA: Large-Scale and Parallel Machine-Learning ToolKit

11) http://sourceforge.net/directory/science-engineering/ai/machinelearning/os:windows/freshness:recently-updated/


-------------  2012.9.12   ---------
12) ELF: ensemble learningframework。特點:c++,監(jiān)督學習,使用了intel的IPP和MKL,training speed和accuracy是主要目標。http://elf-project.sourceforge.net/
------------- 2012.11.03  ---------
13)  http://mloss.org/software/ machinelearning open sources software。算是一個索引網(wǎng)站吧。
14) http://drwn.anu.edu.au/index.html
來源:http://blog.csdn.net/genliu777/article/details/7396760
 
 
二、機器學習的開源工具
以下工具絕大多數(shù)都是開源的,基于GPL、Apache等開源協(xié)議,使用時請仔細閱讀各工具的licensestatement

I. Information Retrieval
1. Lemur/Indri
The Lemur Toolkitfor Language Modeling and Information Retrieval

2. Lucene/Nutch
Apache Luceneis a high-performance, full-featured text search engine librarywritten entirely in Java.
Lucene是apache的頂級開源項目,基于Apache 2.0協(xié)議,完全用java編寫,具有perl, c/c++,dotNet等多個port
http://lucene.apache.org/
http://www.nutch.org/

3. WGet
GNU Wget is a freesoftware package for retrieving files using HTTP, HTTPS and FTP,the most idely-used Internet protocols. It is a non-interactivecommandline tool, so it may easily be called from scripts, cronjobs, terminals without X-Windows support, etc.
http://www.gnu.org/software/wget/wget.html

II. Natural Language Processing
1. EGYPT: A Statistical MachineTranslation Toolkit
http://www.clsp.jhu.edu/ws99/projects/mt/
包括GIZA等四個工具

2. GIZA++ (Statistical MachineTranslation)
http://www.fjoch.com/GIZA++.html
GIZA++ is an extension of the program GIZA (part of the SMT toolkitEGYPT) which was developed by the Statistical Machine Translationteam during the summer workshop in 1999 at the Center for Languageand Speech Processing at Johns-Hopkins University (CLSP/JHU).GIZA++ includes a lot of additional features. The extensions ofGIZA++ were designed and written by Franz Josef Och.
Franz JosefOch先后在德國Aachen大學,ISI(南加州大學信息科學研究所)和Google工作。GIZA++現(xiàn)已有Windows移植版本,對IBM的model 1-5有很好支持。

3. PHARAOH (Statistical MachineTranslation)
http://www.isi.edu/licensed-sw/pharaoh/
a beam search decoder for phrase-based statistical machinetranslation models

4. OpenNLP:
http://opennlp.sourceforge.net/
包括Maxent等20多個工具

btw:這些SMT的工具還都喜歡用埃及相關的名字命名,像什么GIZA、PHARAOH、Cairo等等。Och在ISI時開發(fā)了GIZA++,PHARAOH也是由來自ISI的PhilippKoehn 開發(fā)的,關系還真是復雜啊

5. MINIPAR by Dekang Lin (Univ. ofAlberta, Canada)
MINIPAR is a broad-coverage parser for the English language. Anevaluation with the SUSANNE corpus shows that MINIPAR achievesabout 88% precision and 80% recall with respect to dependencyrelationships. MINIPAR is very efficient, on a Pentium II 300 with128MB memory, it parses about 300 words per second.
binary填一個表后可以免費下載
http://www.cs.ualberta.ca/~lindek/minipar.htm

6. WordNet
http://wordnet.princeton.edu/
WordNet is an online lexical reference system whose design isinspired by current psycholinguistic theories of human lexicalmemory. English nouns, verbs, adjectives and adverbs are organizedinto synonym sets, each representing one underlying lexicalconcept. Different relations link the synonym sets.
WordNet was developed by the Cognitive Science Laboratory atPrinceton University under the direction of Professor George A.Miller (Principal Investigator).
WordNet最新版本是2.1 (for Windows & Unix-like OS),提供bin,src和doc。
WordNet的在線版本是http://wordnet.princeton.edu/perl/webwn

7. HowNet
http://www.keenage.com/
HowNet is an on-line common-sense knowledge base unveilinginter-conceptual relations and inter-attribute relations ofconcepts as connoting in lexicons of the Chinese and their Englishequivalents.
由CAS的Zhendong Dong & QiangDong開發(fā),是一個類似于WordNet的東東

8. Statistical Language Modeling Toolkit
http://svr-www.eng.cam.ac.uk/~prc14/toolkit.html
The CMU-Cambridge Statistical Language Modeling toolkit is a suiteof UNIX software tools to facilitate the construction and testingof statistical language models.

9. SRI Language Modeling Toolkit
www.speech.sri.com/projects/srilm/
SRILM is a toolkit for building and applying statistical languagemodels (LMs), primarily for use in speech recognition, statisticaltagging and segmentation. It has been under development in the SRISpeech Technology and Research Laboratory since 1995.

10. ReWrite Decoder
http://www.isi.edu/licensed-sw/rewrite-decoder/
The ISI ReWrite Decoder Release 1.0.0a by Daniel Marcu and UlrichGermann. It is a program that translates from one natural langugeinto another using statistical machine translation.

11. GATE (General Architecture forText Engineering)
http://gate.ac.uk/
A Java Library for Text Engineering

III. Machine Learning
1. YASMET: Yet Another Small MaxEntToolkit (Statistical Machine Learning)
http://www.fjoch.com/YASMET.html
由Franz JosefOch編寫。此外,OpenNLP項目里有一個java的MaxEnt工具,使用GIS估計參數(shù),由東北大學的張樂(目前在英國留學)port為C++版本

2. LibSVM
由國立臺灣大學(ntu)的Chih-JenLin開發(fā),有C++,Java,perl,C#等多個語言版本
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
LIBSVM is an integrated software for support vector classification,(C-SVC, nu-SVC ), regression (epsilon-SVR, nu-SVR) and distributionestimation (one-class SVM ). It supports multi-classclassification.

3. SVM Light
由cornell的ThorstenJoachims在dortmund大學時開發(fā),成為LibSVM之后最為有名的SVM軟件包。開源,用C語言編寫,用于ranking問題
http://svmlight.joachims.org/

4. CLUTO
http://www-users.cs.umn.edu/~karypis/cluto/
a software package for clustering low- and high-dimensionaldatasets
這個軟件包只提供executable/library兩種形式,不提供源代碼下載

5. CRF++
http://chasen.org/~taku/software/CRF++/
Yet Another CRF toolkit for segmenting/labelling sequentialdata
CRF(Conditional Random Fields),由HMM/MEMM發(fā)展起來,廣泛用于IE、IR、NLP領域

6. SVM Struct
http://www.cs.cornell.edu/People/tj/svm_light/svm_struct.html
同SVM Light,均由cornell的Thorsten Joachims開發(fā)。
SVMstruct is a Support Vector Machine (SVM) algorithm forpredicting multivariate outputs. It performs supervised learning byapproximating a mapping
h: X --> Y
using labeled training examples (x1,y1), ..., (xn,yn).
Unlike regular SVMs, however, which consider only univariatepredictions like in classification and regression, SVMstruct canpredict complex objects y like trees, sequences, or sets. Examplesof problems with complex outputs are natural language parsing,sequence alignment in protein homology detection, and markov modelsfor part-of-speech tagging.
SVMstruct can be thought of as an API for implementing differentkinds of complex prediction algorithms. Currently, we haveimplemented the following learning tasks:
SVMmulticlass: Multi-class classification. Learns to predict one ofk mutually exclusive classes. This is probably the simplestpossible instance of SVMstruct and serves as a tutorial example ofhow to use the programming interface.
SVMcfg: Learns a weighted context free grammar from examples.Training examples (e.g. for natural language parsing) specify thesentence along with the correct parse tree. The goal is to predictthe parse tree of new sentences.
SVMalign: Learning to align sequences. Given examples of howsequence pairs align, the goal is to learn the substitution matrixas well as the insertion and deletion costs of operations so thatone can predict alignments of new sequences.
SVMhmm: Learns a Markov model from examples. Training examples(e.g. for part-of-speech tagging) specify the sequence of wordsalong with the correct assignment of tags (i.e. states). The goalis to predict the tag sequences for new sentences.

 

IV. Misc:
1. Notepad++:一個開源編輯器,支持C#,perl,CSS等幾十種語言的關鍵字,功能可與新版的UltraEdit,Visual Studio.NET媲美
http://notepad-plus.sourceforge.net

2. WinMerge: 用于文本內(nèi)容比較,找出不同版本的兩個程序的差異
winmerge.sourceforge.net/

3. OpenPerlIDE:開源的perl編輯器,內(nèi)置編譯、逐行調試功能
open-perl-ide.sourceforge.net/
ps: 論起編輯器偶見過的最好的還是VS.NET了,在每個function前面有+/-號支持expand/collapse,支持區(qū)域copy/cut/paste,使用ctrl+c/ctrl+x/ctrl+v可以一次選取一行,使用ctrl+k+c/ctrl+k+u可以comment/uncomment多行,還有還有......Visual Studio .NET is really kool:D

4. Berkeley DB
http://www.sleepycat.com/
BerkeleyDB不是一個關系數(shù)據(jù)庫,它被稱做是一個嵌入式數(shù)據(jù)庫:對于c/s模型來說,它的client和server共用一個地址空間。由于數(shù)據(jù)庫最初是從文件系統(tǒng)中發(fā)展起來的,它更像是一個key-valuepair的字典型數(shù)據(jù)庫。而且數(shù)據(jù)庫文件能夠序列化到硬盤中,所以不受內(nèi)存大小限制。BDB有個子版本Berkeley DBXML,它是一個xml數(shù)據(jù)庫:以xml文件形式存儲數(shù)據(jù)?BDB已被包括microsoft、google、HP、ford、motorola等公司嵌入到自己的產(chǎn)品中去了
Berkeley DB (libdb) is a programmatic toolkit that providesembedded database support for both traditional and client/serverapplications. It includes b+tree, queue, extended linear hashing,fixed, and variable-length record access methods, transactions,locking, logging, shared memory caching, database recovery, andreplication for highly available systems. DB supports C, C++, Java,PHP, and Perl APIs.
It turns out that at a basic level Berkeley DB is just a very highperformance, reliable way of persisting dictionary style datastructures - anything where a piece of data can be stored andlooked up using a unique key. The key and the value can each be upto 4 gigabytes in length and can consist of anything that can becrammed in to a string of bytes, so what you do with it iscompletely up to you. The only operations available are "store thisvalue under this key", "check if this key exists" and "retrieve thevalue for this key" so conceptually it's pretty simple - thecomplicated stuff all happens under the hood.
case study:
Ask Jeeves uses Berkeley DB to provide an easy-to-use tool forsearching the Internet.
Microsoft uses Berkeley DB for the Groove collaborationsoftware
AOL uses Berkeley DB for search tool meta-data and otherservices.
Hitachi uses Berkeley DB in its directory services serverproduct.
Ford uses Berkeley DB to authenticate partners who access Ford'sWeb applications.
Hewlett Packard uses Berkeley DB in serveral products, includingstorage, security and wireless software.
Google uses Berkeley DB High Availability for GoogleAccounts.
Motorola uses Berkeley DB to track mobile units in its wirelessradio network products.

11. R
http://www.r-project.org/
R is a language and environment for statistical computing andgraphics. It is a GNU project which is similar to the S languageand environment which was developed at Bell Laboratories (formerlyAT&T, now Lucent Technologies) by John Chambers andcolleagues. R can be considered as a different implementation of S.There are some important differences, but much code written for Sruns unaltered under R.
R provides a wide variety of statistical (linear and nonlinearmodelling, classical statistical tests, time-series analysis,classification, clustering, ...) and graphical techniques, and ishighly extensible. The S language is often the vehicle of choicefor research in statistical methodology, and R provides an OpenSource route to participation in that activity.
One of R's strengths is the ease with which well-designedpublication-quality plots can be produced, including mathematicalsymbols and formulae where needed. Great care has been taken overthe defaults for the minor design choices in graphics, but the userretains full control.
R is available as Free Software under the terms of the FreeSoftware Foundation's GNU General Public License in source codeform. It compiles and runs on a wide variety of UNIX platforms andsimilar systems (including FreeBSD and Linux), Windows andMacOS.
R統(tǒng)計軟件與MatLab類似,都是用在科學計算領域的。

轉自:http://kapoc.blogdriver.com/kapoc/1268927.html
 
三、OpenCV的機器學習函數(shù)(部分)
機器學習庫(MLL,Machine Learning Library)是一個類和函數(shù)的集合,主要用于數(shù)據(jù)的統(tǒng)計學分類、回歸和聚類。

大多數(shù)聚類和回歸算法是以C++類的形式實現(xiàn)。因為每種算法都有不同的功能集合(如處理缺省數(shù)據(jù)或分類輸入變量的能力等),這些類里只有少量一些共同點。這些共同點通過類CvStatModel來定義,其他所有的ML類均由此派生。

CvStatModel
機器學習中,統(tǒng)計模型的基類
class CvStatModel
{
public:
  
  
    virtual~CvStatModel();
    virtual voidclear()=0;
  
  
    virtual voidsave( const char* filename, const char* name=0 )=0;
    virtual voidload( const char* filename, const char* name=0 )=0;
    virtual voidwrite( CvFileStorage* storage, const char* name )=0;
    virtual voidread( CvFileStorage* storage, CvFileNode* node )=0;
};

在這個聲明中,有些方法被注釋掉了。實際上,這些方法是那些沒有統(tǒng)一API(除默認構造函數(shù)外),然而在語法和語義上卻有很多相似性的功能,于是將它作為基類的一部分。這些方法介紹如下。

CvStatModel::CvStatModel
默認構造函數(shù)
CvStatModel::CvStatModel();

ML中的每一個統(tǒng)計模型類都有一個沒有參數(shù)的構造函數(shù)。這個構造器在模型構造的訓練(train())和加載(load())兩個階段非常有用。

CvStatModel::CvStatModel(...)
訓練構造函數(shù)
CvStatModel::CvStatModel( const CvMat* train_data ... );

大多數(shù)ML類都提供構造和訓練一步完成的構造函數(shù)。這一構造函數(shù)等價于使用默認構造函數(shù),接著使用訓練train()方法使用傳遞的參數(shù)訓練模型。

CvStatModel::~CvStatModel
虛析構函數(shù)
CvStatModel::~CvStatModel();

析構函數(shù)被定義為虛函數(shù),所以,可以安全的寫如下代碼:
CvStatModel* model;
if( use_svm )
    model = newCvSVM(... );
else
    model = newCvDTree(... );
...
delete model;

通常,每個派生類的析構函數(shù)不做任何事情,但是,調用重載函數(shù)clear()來釋放所有內(nèi)存。

CvStatModel::clear
釋放內(nèi)存并重置模型的狀態(tài)void CvStatModel::clear();

這一函數(shù)做和析構函數(shù)一樣的工作,也就是釋放類成員占有的所有內(nèi)存空間。但是,對象自己不被析構,并且它可以重新使用。這一方法被派生類的析構函數(shù)、訓練函數(shù)、load()、read()等調用,或者顯式地由用戶調用。

CvStatModel::save
保存模型到文件void CvStatModel::save( const char* filename, const char*name=0 );

函數(shù)save將模型的全部狀態(tài)存儲到指定文件名或默認名的xml或yaml文件中(這些依賴于特定的類)。cxcore的數(shù)據(jù)持續(xù)化功能在這里被應用。

CvStatModel::load
從文件加載模型
void CvStatModel::load( const char* filename, const char* name=0);

函數(shù)load從指定名的xml或yaml文件中讀取模型的所有狀態(tài)。在此過程中,原模型的狀態(tài)被clear()函數(shù)清空。注意到這個函數(shù)是虛函數(shù),所以,所有模型都可以使用這個虛函數(shù)。然而,不同于C版的OpenCV(可使用cvLoad()來加載),在這種情況下模型類型必須知道,因為一個適當類的實例模型必須要事先構造。這一限制將在以后版本的ML中取消。

CvStatModel::write
寫模型到文件存儲中void CvStatModel::write( CvFileStorage* storage, constchar* name );
函數(shù)write將模型的狀態(tài)存入指定名的文件中。這一函數(shù)在save()中被調用。

CvStatModel::read
從文件存儲中讀取模型void CvStatMode::read( CvFileStorage* storage,CvFileNode* node );

函數(shù)read將模型完整的從指定文件中恢復過來。node必須由用戶定位,例如,使用函數(shù)cvGetFileNodeByName()。這一方法被load()調用。原模型被clear()函數(shù)清空。

CvStatModel::train
訓練模型
bool CvStatMode::train( const CvMat* train_data, [int tflag,] ...,const CvMat* responses, ...,
    [constCvMat* var_idx,] ..., [const CvMat* sample_idx,] ...
    [constCvMat* var_type,] ..., [const CvMat* missing_mask,] ... );

函數(shù)train使用輸入特征向量集合和對應的反應輸出值(responses)來訓練統(tǒng)計模型,輸入輸出向量或值都作為矩陣傳輸。默認情況下,輸入的特征向量作為train_data的行存儲,也就是,一條訓練向量的所有成分(特征)是連續(xù)存儲的。然而,一些算法能處理轉置表示的矩陣,即整個輸入集合的每個特定屬性的所有值連續(xù)存放。如果兩種布局方式都支持,則將有一個tflag參數(shù)來指定方向:
tflag=CV_ROW_SAMPLE 說明樣本的特征向量按行存儲,每行一個樣本;
tflag=CV_COL_SAMPLE 說明樣本的特征向量按列存儲,每列一個樣本。

train_data必須是32fC1(32位浮點型單通道)數(shù)據(jù)格式。結果通常存儲在1維的行或列向量中,類型為32sC1(只在分類問題中)或32fC1格式,每條輸入向量對應一個值(有些算法,如各種各樣的神經(jīng)網(wǎng)絡,每條對應一個向量作為結果)。

對于分類問題,結果是離散的類別標識;對于回歸問題,結果是逼近函數(shù)的輸出值。有些算法可以處理其中一種問題,而有些兩者皆可處理。在最后一種情況下,輸出結果的以何種形式,由var_type設定:
CV_VAR_CATEGORICAL 指定輸入值為離散的類別標識;
CV_VAR_ORDERED(=CV_VAR_NUMERICAL)指定輸出結果為有序的,也就是兩個不同的數(shù)據(jù)被作為數(shù)據(jù)比較,這是一個回歸問題。
輸入變量的類型也可以使用var_type來設定。不過,大部分算法只能處理連續(xù)數(shù)據(jù)的輸入變量。

在ML中,很多模型可以使用選定的特征子集和/或在一個選定的訓練樣本子集上訓練,為了便于用戶使用,函數(shù)train使用var_idx參數(shù)來確定感興趣的特征,使用sample_idx參數(shù)來確定感興趣的樣本。這兩個向量都是整數(shù)向量(32sC1),也就是,從0開始的索引列表,或者8位的標識激活變量或樣本的遮罩(masks)。用戶傳遞NULL指針給這兩個參數(shù)則表示所有的屬性或樣本都將被用于訓練。

另外,一些算法可以處理缺省數(shù)據(jù),是指特定訓練樣本的特定特征沒有值(例如,他們忘了測量病人A在星期一的體溫)。參數(shù)missing_mask,一個8位、和train_data相同大小的矩陣,被用來標記缺失的數(shù)據(jù)值(mask的非零元素值表示缺數(shù)據(jù))。

通常,在調用訓練過程之前,原先的模型被clear()清除。然而,有些算法會使用新的數(shù)據(jù)來有選擇的更新模型,而非重置它。

CvStatModel::predict
預測樣本的結果float CvStatMode::predict( const CvMat* sample[, ] )const;

該函數(shù)用于預測一個新樣本的反應。在分類情況下,方法返回類別標識,在回歸情況下,返回輸出函數(shù)值。輸入樣本必須含有和train函數(shù)的train_data同樣多的組成量(維數(shù))。如果var_idx參數(shù)被傳遞給train,則它將被記憶并在predict函數(shù)中精確的使用這些需要的成分。

描述符const意味著預測不會影響模型的內(nèi)部狀態(tài),所以,這一函數(shù)可以安全的應用與各個不同的線程。
 
來源:http://www.aiseminar.cn/bbs/forum.php?mod=viewthread&tid=798
   

本站僅提供存儲服務,所有內(nèi)容均由用戶發(fā)布,如發(fā)現(xiàn)有害或侵權內(nèi)容,請點擊舉報
打開APP,閱讀全文并永久保存 查看更多類似文章
猜你喜歡
類似文章
const在函數(shù)后面是什么意思
基于深度學習的命名實體識別與關系抽取(下)
HC08 C語言程序設計
關于C語言中的restrict關鍵字
向量及單鏈表實現(xiàn)棧的界面
open cv C 錯誤及經(jīng)驗總結(十一)
更多類似文章 >>
生活服務
分享 收藏 導長圖 關注 下載文章
綁定賬號成功
后續(xù)可登錄賬號暢享VIP特權!
如果VIP功能使用有故障,
可點擊這里聯(lián)系客服!

聯(lián)系客服