本文檔中的SDK只適用于7月5日新發(fā)布的2.0版語音服務(wù),如果您是之前開通的,要使用此SDK需要新開通2.0版服務(wù)。否則請(qǐng)繼續(xù)使用舊版SDK,并按照1.0版文檔對(duì)接。
C++ SDK 2.0實(shí)時(shí)語音識(shí)別提供了異步識(shí)別方式和同步識(shí)別方式。異步識(shí)別方式通過設(shè)置回調(diào)函數(shù)來獲取識(shí)別結(jié)果,數(shù)據(jù)的發(fā)送和識(shí)別結(jié)果的獲取運(yùn)行在不同的線程中;同步識(shí)別方式通過get接口即可獲取識(shí)別結(jié)果,數(shù)據(jù)的發(fā)送和識(shí)別結(jié)果的獲取可以運(yùn)行在同一個(gè)線程中。
C++ SDK 可從CppSdk2.0中下載,壓縮文件包含以下幾個(gè)部分:
文件名 | 描述 |
---|---|
sdkDemo.cpp | windows專用,默認(rèn)為一句話識(shí)別功能demo,如需可自行替換成其它功能(編碼格式:UTF-8 代簽名) |
speechRecognizerDemo.cpp | 一句話異步識(shí)別demo |
speechRecognizerSyncDemo.cpp | 一句話同步識(shí)別demo |
speechSynthesizerDemo.cpp | 語音合成demo |
speechTranscriberDemo.cpp | 實(shí)時(shí)語音異步識(shí)別demo |
speechTranscriberSyncDemo.cpp | 實(shí)時(shí)語音同步識(shí)別demo |
testX.wav | 測(cè)試音頻 |
文件名 | 描述 |
---|---|
openssl | openssl |
pthread | pthread線(windows下使用) |
uuid | uuid(linux下使用) |
opus | opus |
jsoncpp | jsoncpp |
nlsClient.h | SDK實(shí)例 |
nlsEvent.h | 事件說明 |
speechRecognizerRequest.h | 一句話異步識(shí)別接口 |
speechRecognizerSyncRequest.h | 一句話同步識(shí)別接口 |
speechSynthesizerRequest.h | 語音合成接口 |
speechTranscriberRequest.h | 實(shí)時(shí)語音異步識(shí)別接口 |
speechTranscriberSyncRequest.h | 實(shí)時(shí)語音同步識(shí)別接口 |
iNlsRequest.h | Request基礎(chǔ) |
SDK 依賴 openssl(l-1.0.2j),opus(1.2.1),jsoncpp(0.10.6),uuid(1.0.3),pthread(2.9.1)。依賴庫放置在 path/to/sdk/lib 下。
注意:path/to/sdk/lib/linux/uuid僅在linux下使用。path/to/sdk/lib/windwos/1x.0/pthread僅在windows下使用。SDK壓縮包內(nèi)提供的依賴庫為64位,不提供32位。在32位下,需要用戶自行編譯。
Linux平臺(tái)下cmake編譯:
1: 請(qǐng)確認(rèn)本地系統(tǒng)以安裝Cmake,最低版本2.6
2: cd path/to/sdk/lib
3: tar -zxvpf linux.tar.gz
4: cd path/to/sdk
3: 執(zhí)行[./build.sh]編譯demo
4: 編譯完畢,cd path/to/sdk/demo,執(zhí)行[./stDemo your-token your-appkey]
如果不支持cmake,可嘗試手動(dòng)編譯:
1: cd path/to/sdk/lib
2: tar -zxvpf linux.tar.gz
3: cd path/to/sdk/demo
4: g++ -o stDemo speechTranscriberDemo.cpp -I./path/to/sdk/include -L./path/to/sdk/lib/linux -lnlsCppSdk -lssl -lcrypto -lopus -luuid -lpthread -ljsoncpp -D_GLIBCXX_USE_CXX11_ABI=0
5: export LD_LIBRARY_PATH=path/to/sdk/lib/linux/
6: ./stDemo your-token your-appkey
Windows平臺(tái)需要用戶自己搭建工程。
Token獲取方式可見鏈接:獲取訪問令牌
注意:
更多介紹參見api文檔鏈接:C++ API接口說明
錯(cuò)誤碼 | 錯(cuò)誤描述 | 解決方案 |
---|---|---|
10000001 | SSL: couldn’t create a ……! | 建議重試 |
10000002 | openssl官方錯(cuò)誤描述 | 根據(jù)描述提示處理之后,建議重試 |
10000003 | 系統(tǒng)錯(cuò)誤描述 | 根據(jù)系統(tǒng)錯(cuò)誤描述提示處理 |
10000004 | URL: The url is empty. | 檢查是否設(shè)置 云端URL地址 |
10000005 | URL: Could not parse WebSocket url | 檢查是否正確設(shè)置 云端URL地址 |
10000006 | MODE: unsupport mode. | 檢查時(shí)都正確設(shè)置了語音功能模式 |
10000007 | JSON: Json parse failed. | 服務(wù)端發(fā)送錯(cuò)誤響應(yīng)內(nèi)容,請(qǐng)?zhí)峁﹖ask_id,并反饋給阿里云 |
10000008 | WEBSOCKET: unkown head type. | 服務(wù)端發(fā)送錯(cuò)誤WebSocket類型,請(qǐng)?zhí)峁﹖ask_id,并反饋給阿里云 |
10000009 | HTTP: connect failed. | 與云端連接失敗,請(qǐng)檢查網(wǎng)絡(luò),在重試 |
HTTP協(xié)議官方狀態(tài)碼 | HTTP: Got bad status. | 根據(jù)HTTP協(xié)議官方描述提示處理 |
系統(tǒng)錯(cuò)誤碼 | IP: ip address is not valid. | 根據(jù)系統(tǒng)錯(cuò)誤描述提示處理 |
系統(tǒng)錯(cuò)誤碼 | ENCODE: convert to utf8 error. | 根據(jù)系統(tǒng)錯(cuò)誤描述提示處理 |
10000010 | please check if the memory is enough | 內(nèi)存不足. 請(qǐng)檢查本地機(jī)器內(nèi)存 |
10000011 | Please check the order of execution | 接口調(diào)用順序錯(cuò)誤(接收到Failed/complete事件時(shí),SDK內(nèi)部會(huì)關(guān)閉連接。此時(shí)在調(diào)用send會(huì)上報(bào)錯(cuò)誤。) |
10000012 | StartCommand/StopCommand Send failed | 參數(shù)錯(cuò)誤. 請(qǐng)檢查參數(shù)設(shè)置是否正確 |
10000013 | The sent data is null or dataSize <= 0. | 發(fā)送錯(cuò)誤. 請(qǐng)檢查發(fā)送參數(shù)是否正確 |
10000014 | Start invoke failed. | start超時(shí)錯(cuò)誤. 請(qǐng)調(diào)用stop,釋放資源,重新開始識(shí)別流程. |
10000015 | connect failed等 | connect失敗. 釋放資源,重新開始識(shí)別流程. |
完整示例,詳見SDK壓縮包中的demo目錄speechTranscriberDemo.cpp文件。
// 工作線程
void* pthreadFunc(void* arg) {
int sleepMs = 0;
ParamCallBack cbParam;
SpeechTranscriberCallback* callback = NULL;
// 0: 從自定義線程參數(shù)中獲取token, 配置文件等參數(shù).
ParamStruct* tst = (ParamStruct*)arg;
if (tst == NULL) {
cout << "arg is not valid." << endl;
return NULL;
}
// 初始化自定義回調(diào)參數(shù), 僅作為示例表示參數(shù)傳遞, 在demo中不起任何作用
cbParam.iExg = 1;
cbParam.sExg = "exg.";
/* 打開音頻文件, 獲取數(shù)據(jù) */
ifstream fs;
fs.open(tst->fileName.c_str(), ios::binary | ios::in);
if (!fs) {
cout << tst->fileName << " isn't exist.." << endl;
return NULL;
}
/*
* 1: 創(chuàng)建并設(shè)置回調(diào)函數(shù)
*/
callback = new SpeechTranscriberCallback();
callback->setOnTranscriptionStarted(onTranscriptionStarted, &cbParam); // 設(shè)置識(shí)別啟動(dòng)回調(diào)函數(shù)
callback->setOnTranscriptionResultChanged(onTranscriptionResultChanged, &cbParam); // 設(shè)置識(shí)別結(jié)果變化回調(diào)函數(shù)
callback->setOnTranscriptionCompleted(onTranscriptionCompleted, &cbParam); // 設(shè)置語音轉(zhuǎn)寫結(jié)束回調(diào)函數(shù)
callback->setOnSentenceBegin(onSentenceBegin, &cbParam); // 設(shè)置一句話開始回調(diào)函數(shù)
callback->setOnSentenceEnd(onSentenceEnd, &cbParam); // 設(shè)置一句話結(jié)束回調(diào)函數(shù)
callback->setOnTaskFailed(onTaskFailed, &cbParam); // 設(shè)置異常識(shí)別回調(diào)函數(shù)
callback->setOnChannelClosed(onChannelClosed, &cbParam); // 設(shè)置識(shí)別通道關(guān)閉回調(diào)函數(shù)
/*
* 創(chuàng)建實(shí)時(shí)音頻流識(shí)別SpeechTranscriberRequest對(duì)象, 參數(shù)為callback對(duì)象.
* request對(duì)象在一個(gè)會(huì)話周期內(nèi)可以重復(fù)使用.
* 會(huì)話周期是一個(gè)邏輯概念. 比如Demo中, 指讀取, 發(fā)送完整個(gè)音頻文件數(shù)據(jù)的時(shí)間.
* 音頻文件數(shù)據(jù)發(fā)送結(jié)束時(shí), 可以releaseTranscriberRequest()釋放對(duì)象.
* createTranscriberRequest(), start(), sendAudio(), stop(), releaseTranscriberRequest()請(qǐng)?jiān)?/code>
* 同一線程內(nèi)完成, 跨線程使用可能會(huì)引起異常錯(cuò)誤.
*/
/*
* 2: 創(chuàng)建實(shí)時(shí)音頻流識(shí)別SpeechTranscriberRequest對(duì)象
*/
SpeechTranscriberRequest* request = NlsClient::getInstance()->createTranscriberRequest(callback);
if (request == NULL) {
cout << "createTranscriberRequest failed." << endl;
delete callback;
callback = NULL;
return NULL;
}
request->setAppKey(tst->appkey.c_str()); // 設(shè)置AppKey, 必填參數(shù), 請(qǐng)參照官網(wǎng)申請(qǐng)
request->setFormat("pcm"); // 設(shè)置音頻數(shù)據(jù)編碼格式, 可選參數(shù),目前支持pcm, opu. 默認(rèn)是pcm
request->setSampleRate(16000); // 設(shè)置音頻數(shù)據(jù)采樣率, 可選參數(shù),目前支持16000, 8000. 默認(rèn)是16000
request->setIntermediateResult(false); // 設(shè)置是否返回中間識(shí)別結(jié)果, 可選參數(shù). 默認(rèn)false
request->setPunctuationPrediction(false); // 設(shè)置是否在后處理中添加標(biāo)點(diǎn), 可選參數(shù). 默認(rèn)false
request->setInverseTextNormalization(false); // 設(shè)置是否在后處理中執(zhí)行數(shù)字轉(zhuǎn)寫, 可選參數(shù). 默認(rèn)false
request->setSemanticSentenceDetection(false); // 設(shè)置是否語義斷句, 可選參數(shù). 默認(rèn)false
request->setMaxSentenceSilence(500); // 設(shè)置vad閥值, 可選參數(shù). 合法參數(shù)范圍200~2000(ms), 默認(rèn)值800ms
request->setToken(tst->token.c_str()); // 設(shè)置賬號(hào)校驗(yàn)token, 必填參數(shù)
/*
* 3: start()為阻塞操作, 發(fā)送start指令之后, 會(huì)等待服務(wù)端響應(yīng), 或超時(shí)之后才返回
*/
if (request->start() < 0) {
cout << "start() failed." << endl;
NlsClient::getInstance()->releaseTranscriberRequest(request); // start()失敗,釋放request對(duì)象
delete callback;
callback = NULL;
return NULL;
}
// 文件是否讀取完畢, 或者接收到TaskFailed, closed, completed回調(diào), 終止send
while (!fs.eof()) {
char data[FRAME_SIZE] = {0};
fs.read(data, sizeof(char) * FRAME_SIZE);
int nlen = fs.gcount();
/*
* 4: 發(fā)送音頻數(shù)據(jù). sendAudio返回-1表示發(fā)送失敗, 需要停止發(fā)送. 對(duì)于第三個(gè)參數(shù):
* request對(duì)象format參數(shù)為pcm時(shí), 使用false即可. format為opu, 使用壓縮數(shù)據(jù)時(shí), 需設(shè)置為true.
*/
nlen = request->sendAudio(data, nlen, false);
if (nlen < 0) {
// 發(fā)送失敗, 退出循環(huán)數(shù)據(jù)發(fā)送
cout << "send data fail." << endl;
break;
} else {
cout << "send len:" << nlen << " ." << endl;
}
/*
*語音數(shù)據(jù)發(fā)送控制:
*語音數(shù)據(jù)是實(shí)時(shí)的, 不用sleep控制速率, 直接發(fā)送即可.
*語音數(shù)據(jù)來自文件, 發(fā)送時(shí)需要控制速率, 使單位時(shí)間內(nèi)發(fā)送的數(shù)據(jù)大小接近單位時(shí)間原始語音數(shù)據(jù)存儲(chǔ)的大小.
*/
sleepMs = getSendAudioSleepTime(6400, 16000, 1); // 根據(jù) 發(fā)送數(shù)據(jù)大小,采樣率,數(shù)據(jù)壓縮比 來獲取sleep時(shí)間
/*
* 5: 語音數(shù)據(jù)發(fā)送延時(shí)控制
*/
#ifdef _WIN32
Sleep(sleepMs);
#else
usleep(sleepMs * 1000);
#endif
}
// 關(guān)閉音頻文件
fs.close();
/*
* 6: 數(shù)據(jù)發(fā)送結(jié)束,關(guān)閉識(shí)別連接通道.
* stop()為阻塞操作, 在接受到服務(wù)端響應(yīng), 或者超時(shí)之后, 才會(huì)返回.
*/
request->stop();
/*
* 7: 識(shí)別結(jié)束, 釋放request對(duì)象
*/
NlsClient::getInstance()->releaseTranscriberRequest(request);
/*
* 8: 釋放callback對(duì)象
*/
delete callback;
callback = NULL;
return NULL;
}
完整示例,詳見SDK壓縮包中的demo目錄speechTranscriberSyncDemo.cpp文件。
// 工作線程
void* pthreadFunc(void* arg) {
int sleepMs = 0;
// 0: 從自定義線程參數(shù)中獲取token, 配置文件等參數(shù).
ParamStruct* tst = (ParamStruct*)arg;
if (tst == NULL) {
cout << "arg is not valid." << endl;
return NULL;
}
// 打開音頻文件, 獲取數(shù)據(jù)
FILE* file = fopen(tst->fileName.c_str(), "rb");
if (NULL == file) {
cout << tst->fileName << " isn't exist." << endl;
return NULL;
}
fseek(file, 0, SEEK_END);
int fileSize = ftell(file); // 獲取音頻文件的長度
fseek(file, 0, SEEK_SET);
/*
* 創(chuàng)建實(shí)時(shí)音頻流同步識(shí)別SpeechTranscriberSyncRequest對(duì)象.
* request對(duì)象在一個(gè)會(huì)話周期內(nèi)可以重復(fù)使用.
* 會(huì)話周期是一個(gè)邏輯概念. 比如Demo中, 指讀取, 發(fā)送完整個(gè)音頻文件數(shù)據(jù)的時(shí)間.
* 音頻文件數(shù)據(jù)發(fā)送結(jié)束時(shí), 可以releaseTranscriberSyncRequest()釋放對(duì)象.
* createTranscriberSyncRequest(), sendSyncAudio(), getTranscriberResult(), releaseTranscriberSyncRequest()請(qǐng)?jiān)?/code>
* 同一線程內(nèi)完成, 跨線程使用可能會(huì)引起異常錯(cuò)誤.
*/
/*
* 1: 創(chuàng)建實(shí)時(shí)音頻流識(shí)別SpeechTranscriberSyncRequest對(duì)象
*/
SpeechTranscriberSyncRequest* request = NlsClient::getInstance()->createTranscriberSyncRequest();
if (request == NULL) {
cout << "createTranscriberSyncRequest failed." << endl;
return NULL;
}
request->setAppKey(tst->appkey.c_str()); // 設(shè)置AppKey, 必填參數(shù), 請(qǐng)參照官網(wǎng)申請(qǐng)
request->setFormat("pcm"); // 設(shè)置音頻數(shù)據(jù)編碼格式, 可選參數(shù),目前支持pcm, opu. 默認(rèn)是pcm
request->setSampleRate(16000); // 設(shè)置音頻數(shù)據(jù)采樣率, 可選參數(shù),目前支持16000, 8000. 默認(rèn)是16000
request->setIntermediateResult(true); // 設(shè)置是否返回中間識(shí)別結(jié)果, 可選參數(shù). 默認(rèn)false
request->setPunctuationPrediction(true); // 設(shè)置是否在后處理中添加標(biāo)點(diǎn), 可選參數(shù). 默認(rèn)false
request->setInverseTextNormalization(true); // 設(shè)置是否在后處理中執(zhí)行數(shù)字轉(zhuǎn)寫, 可選參數(shù). 默認(rèn)false
request->setSemanticSentenceDetection(false); // 設(shè)置是否語義斷句, 可選參數(shù). 默認(rèn)false
request->setMaxSentenceSilence(500); // 設(shè)置vad閥值, 可選參數(shù). 合法參數(shù)范圍200~2000(ms), 默認(rèn)值800ms
request->setToken(tst->token.c_str()); // 設(shè)置賬號(hào)校驗(yàn)token, 必填參數(shù)
int sentSize = 0; // 已發(fā)送的文件數(shù)據(jù)大小
while (sentSize < fileSize) {
char data[FRAME_SIZE] = {0};
int size = fread(data, sizeof(char), sizeof(char) * FRAME_SIZE, file);
AudioDataStatus status;
if (sentSize == 0) {
status = AUDIO_FIRST; // 發(fā)送第一塊音頻數(shù)據(jù)
}
else if (sentSize + size < fileSize) {
status = AUDIO_MIDDLE; // 發(fā)送中間音頻數(shù)據(jù)
}
else if (sentSize + size == fileSize) {
status = AUDIO_LAST; // 發(fā)送最后一塊音頻數(shù)據(jù)
}
sentSize += size;
/*
* 2: 發(fā)送音頻數(shù)據(jù). sendAudio返回-1表示發(fā)送失敗, 可在getTranscriberResult函數(shù)中獲得失敗的具體信息
* 對(duì)于第四個(gè)參數(shù), request對(duì)象format參數(shù)為pcm時(shí), 使用false即可. format為opu, 使用壓縮數(shù)據(jù)時(shí), 需設(shè)置為true.
*/
request->sendSyncAudio(data, size, status);
/*
*語音數(shù)據(jù)發(fā)送控制:
*語音數(shù)據(jù)是實(shí)時(shí)的, 不用sleep控制速率, 直接發(fā)送即可.
*語音數(shù)據(jù)來自文件, 發(fā)送時(shí)需要控制速率, 使單位時(shí)間內(nèi)發(fā)送的數(shù)據(jù)大小接近單位時(shí)間原始語音數(shù)據(jù)存儲(chǔ)的大小.
*/
sleepMs = getSendAudioSleepTime(6400, 16000, 1); // 根據(jù) 發(fā)送數(shù)據(jù)大小,采樣率,數(shù)據(jù)壓縮比 來獲取sleep時(shí)間
/*
* 3: 語音數(shù)據(jù)發(fā)送延時(shí)控制
*/
#ifdef _WIN32
Sleep(sleepMs);
#else
usleep(sleepMs * 1000);
#endif
/*
* 4: 獲取識(shí)別結(jié)果
* 接收到EventType為TaskFailed, closed, completed事件類型時(shí),停止發(fā)送數(shù)據(jù)
* 部分錯(cuò)誤可收到多次TaskFailed事件,只要發(fā)生TaskFailed事件,請(qǐng)停止發(fā)送數(shù)據(jù)
*/
bool isFinished = false;
std::queue<NlsEvent> eventQueue;
request->getTranscriberResult(&eventQueue);
while (!eventQueue.empty()) {
NlsEvent _event = eventQueue.front();
eventQueue.pop();
NlsEvent::EventType type = _event.getMsgType();
switch (type)
{
case NlsEvent::TranscriptionStarted:
cout << "************* Transcriber started *************" << endl;
break;
case NlsEvent::SentenceBegin:
cout << "************* Detected sentence begin *************" << endl;
cout << "sentence index: " << _event.getSentenceIndex() << endl;
cout << "sentence time: " << _event.getSentenceTime() << endl;
break;
case NlsEvent::TranscriptionResultChanged:
cout << "************* Transcriber has sentence middle result *************" << endl;
cout << "sentence index: " << _event.getSentenceIndex() << endl;
cout << "sentence time: " << _event.getSentenceTime() << endl;
cout << "result: " << _event.getResult() << endl;
break;
case NlsEvent::SentenceEnd:
cout << "************* Detected sentence end *************" << endl;
cout << "sentence index: " << _event.getSentenceIndex() << endl;
cout << "sentence time: " << _event.getSentenceTime() << endl;
cout << "result: " << _event.getResult() << endl;
break;
case NlsEvent::TranscriptionCompleted:
cout << "************* Transcriber completed *************" << endl;
isFinished = true;
break;
case NlsEvent::TaskFailed:
cout << "************* TaskFailed *************" << endl;
isFinished = true;
break;
case NlsEvent::Close:
cout << "************* Closed *************" << endl;
isFinished = true;
break;
default:
break;
}
cout << "allMessage: " << _event.getAllResponse() << endl;
}
if (isFinished) {
break;
}
}
// 關(guān)閉音頻文件
fclose(file);
/*
* 5: 識(shí)別結(jié)束, 釋放request對(duì)象
*/
NlsClient::getInstance()->releaseTranscriberSyncRequest(request);
return NULL;
}
聯(lián)系客服