Yildirim Kocdag, 29 Nov 2013
CPOL48.2K
2.6K
54
4.80 (29 votes)
Rate:
This article helps you to understand how you can write your own Siri application.
Download SiriUI.zip - 1.1 MBIntroduction
This article helps you to understand how you can write your own Siri application. I have already had a responsibility to develop an Android Siri application last year. It is complete and now in the Google Store. I will try to write my experiences while I did it.
What is a Mobile Assistant Application?
A mobile assistant application should consist of the below functions:,
It should be a mobile application (Android, IOS, Windows Phone Application etc.),
You can ask written or vocal questions,
You can get response written, vocal, graphical or activity for your questions,
It should use mobile device skills and abilities such as microphone, screen, GPS, internet, speaker, and your information stored in device.
What a Mobile Assistant Application can Do
Amobile assistant application can do a lot of features,the first version of the mobile assistant application that I developed couldunderstand and respond only 15 commands. Now it can understand and response more than 50commands. The basic command types should be about news, weathers, setalarm and call a contact. While I search the mobile assistants in mobile marketsI found out above commands are common. What is more you can add the belowcommands as a set of your advance command list to your mobile assistant.
Set alarm,
Get Info about news, weather, match scores, wikiinfos,
Run an application,
Open a media File (Video, Music),
Share something on Facebook or twitter, etc,
Read/Write SMS or Email,
Read some shared feeds on Social Media,
Find the nearest Market, Pharmacy, Hospital,Restaurant, etc,
Call someone,
Do basic Mathematical problems,
Check your bank balance,
Make a Money Transfer to someone,
Check latest currency or stock exchange,
Read/Set Calendar,
Buy a concert or travel ticket,
Etc.
Someof the command types that can be implemented only with third party company integrations.For instance you can make an integration with Amazon or best buy to order anitem with your mobile assistant.
Mobile Assistants in the Market
Thereare more than 60 known mobile assistants in markets. Popular ones are Siri andGoogle Voice Search.
Here is a list of mobile assistants in markets and mobile assistant development environments,
Siri,
Google Voice Search,
Nuance Nina,
Dragon Mobile Assistant,
Angel Lexee,
AIVC,
Iris,
Skyvi,
EverFriends,
EasyLuncher,
Speaktoit,
Evi,
Turkcell Mobil Asistan(Turkish).
Siriand Google Voice Search are popular ones, I will share some information andvideo links about Nina, Lexee, Dragon Mobile Assistant and Turkcell MobilAsistan.
Nuance Nina: Nuance companyoffers to large enterprise organizations a SDK to develop their own mobileassistant application which can be used as customer service application. It isa SDK that can be integrated to IOS and Android Application. You can get moreinformation in their website
Meet Nina.
Ilike the video that introduce the
Nuance Nina in Youtube.
Lexee: Lexee is the mobileassistant of Angel Labs Company. Lexee offers a web environment to create yourown mobile assistant also. You can add, update and delete your scenarios withoutcoding via this web interface. The other point about Lexee is Analyze tools, AngelLabs are good at analyzing tools. Lexee environment offers professionals a varietyreports and data about usage.
Youcan get more information and watch the video via this
link.
Dragon Mobile Assistant:Dragon Mobile Assistant is also a product ofNuance Company. Dragon Mobile Assistant offers users speak naturally to access widerange of content and do the everyday task on the their phone easily. You canget more information via this
link.
Youcan download the application and watch my favorite mobile assistant video byclicking
here.
Turkcell Mobil Asistan: Turkcell Mobil Asistan is the only one Turkish Mobile Assistant in Google Play. Turkcell is one of the biggest GSM companies in Europe.Via this application you can get customer care service such as your phone billdetails, tariff info. In addition to this you can ask some info about news,whether, currency, traffic in Istanbul.
Toget more information and download Turkcell Mobil Asistan click
here.
I hope above information would behelpful to understand the basic concepts of mobile assistants. Lets look at sometechnical points about the applications. A mobile assistant application shouldhave the below Technologies,
Speech to Text (STT) Engine,
Text to Speech (TTS) Engine,
Tagging (Intelligence),
Noise Reduction Engine,
Voice Biometrics,
Speech Compression Engine,
UI for Call Outs.
STT: Speech2Text engineshould get the voice from a user then convert it to text. The voice could be a voice file or a stream.
TTS: Text2Speechengine should convert text to voice. It is important for a user that listen theresponse while for example the user drives.
Tagging: Thetext which is created via STT is not always simple, The tagging technologyshould tag the text as what is the user wants via that speech. For Example, user asks what should I weartomorrow, then the tagging engine can tag the information with weather orcalendar info tag.
Noise Reduction Engine: Userspeech is not always simple, there could be some noise (for example, air-conditionnoise) around. The noise reduction engine should eliminate the white noise fromthe voice.
Voice Biometrics: MobileAssistants can give account based information such as credit card monthlyreport. Therefore authentication is important, Voice biometrics one of theauthentication methods. Via voice biometrics technology, the mobile assistantcan authenticate you to do system.
Speech Compression Engine: If your assistants works slow, the users can give up quicklyabout the application and choose to search on web via writing the text. TheInternet communication is really important, in addition to this the packet sizefor the transaction is also important. Small packets can transfer fast,and the result gets fast. That is why, A good mobile assistant applicationshould have a speech compression engine. The client should send the compressedvoice to server fast. The compression is differentthan the normal compression, because there is not so much repeating data in voice files. G711 can be chosen for the compressionalgorithm, one of the reason for this choice is that the algorithm is not lostthe data.
UI for Call Outs: After the server sends result you should play an audio, in addition tothis you should show some info on the device screen inside call outs. What I canadvice you, using native components can limited your application, if you prefera web based UI inside native application for call outs, it can be more convenient.
Architecture of Mobile Assistants
Mobiledevice and main server should have a communication asstreaming, because users doesn't like waiting voice data download and slow communication.Being fast is really important for this application, because if it is fast,user feel more nature. User can feel that he is speaking with a real agent orassistant.
Whenusers asks a question from client via clicking a button, client startsstreaming the question byte by byte to Main Server. Main serversends the data to STT Server, STT server finds the text of the speech, The textsends to the main server then main server send the text to tagging server tofind out what the user wants. Tagging server create atag for the request. Such as “weather_info” . Tagging serversends the tag to the main server, main server sends the tag to informationserver, if the tag needs an authentication before the sends information server,security server checks the authentication. At last, the response comes to themain server, main server creates the response text,response graphic and speech text (via in communication TTS Server) and sends the response class to Mobile Device.

Information server can be in communication with 3rd pary servers for some informations that are not stored in Information server. Security server can consists more than one authentication technology such as Voice Biometrics, IMSI-IP Radius Lookup, Account-Password authentication, etc.
Callout UI
If you try to develop your native components for Call Outs, it would be difficult to handle all the formats in client and scroll all items, etc. What I advice you, you can create a custom web view and add your call outs formatted easily.
The picture in left shows how your SiriWebView will be shown in screen. The webview can be scrolled by user, in addition to this when a new callout comes, the web view moves automatically.
In this section I will simply mention how to write your own SiriWebView. Inside the article you will find also a sample project about the webview. Sorry for other platform users, my all examples will be in android platform.
First of all, create a new class and name it SiriWebView. It should be extended from simple android webview. The class should consists constructer and also overided OnDraw function. What is more, we should add two new function to this class one to initialize it, and second one is to add new callout. Code snippet below shows how the add new callout function works.
Hide Shrink
Copy Code
public void AddNewCallOut(String message, Boolean ismsgResponse) { elementId = elementId + 1; StringBuilder messageBuilder = new StringBuilder(); if (!message.contentEquals("")) { if (!ismsgResponse) { messageBuilder .append("<table class='bubble-gray' cellspacing='0' cellpadding='0'><tr><td class='head'></td></tr>"); messageBuilder .append("<tr><td class='mid'><div class='txt shadow'>" + message + "</div></td></tr>"); messageBuilder .append("<tr><td class='foot'></td></tr></table>"); } else { messageBuilder .append("<table class='bubble-blue' cellspacing='0' cellpadding='0'><tr><td class='bhead'></td></tr>"); messageBuilder .append("<tr><td class='bmid'><div class='txt shadow'>" + message + "</div></td></tr>"); messageBuilder .append("<tr><td class='bfoot'></td></tr></table>"); } loadUrl("javascript:document.getElementById(\"div" + elementId + "\").innerHTML=\"" + messageBuilder.toString() + "\";"); } StringBuilder jvscr = new StringBuilder(); if (!ismsgResponse) { if (elementId != 1) { if (!ismsgResponse) { jvscr.append("var elem = document.getElementById('div" + (elementId - 1) + "'); var x = 0; var y = 0; while (elem != null) { x += elem.offsetLeft; y += elem.offsetTop; elem = elem.offsetParent; } "); jvscr.append("var endj=500; var i=window.scrollY; for(i=window.scrollY;i<y;i++){ var j=0; var a=0; for(j=0;j<endj;j++) {a=a+1; } window.scrollTo(x, i); } "); loadUrl("javascript:" + jvscr.toString()); } } } }
The function takes two parameters, they are message and isResponse. You can write your message as string and set the value of isResponse parameter to call function when you want to add new callout. IsResponse parameter shows if the message is response of Assistant or not. That parameter changes the color of callout and slides the scroll. In the first lines of function you can see the elementId Parameter. ElementId is important to slide the objects.
After you create your own component you can add it your main_activity.xml as shown below.
Hide Copy Code
<com.example.siriui.SiriWebView android:id="@+id/webview" android:layout_width="fill_parent" android:layout_height="fill_parent" android:keepScreenOn="true" android:layout_marginTop="0dp" android:layout_gravity="fill" android:layout_marginBottom="0dp" android:layout_marginLeft="0dp" android:layout_marginRight="0dp" android:scrollbars="horizontal" />
You can find out a working example of this component in this article.
Audio Compression
Audio compression reduces the size of audio data. The compressed audio data can be transferred more quickly via GSM Network. The compression type can be lossy and lossless.
Lossy: The method can reduces the amount of data during coding process. However, the retained data acceptable for recognition.The advantage of lossy method is that the data can be smaller.
Lossless: Via this method, the audio can be compressed without losing its original quality. It is important if the recognition or recording tools dont have any noise reduction process.
Some of data reduction does not effect directly the quality of speech data. Simply, if the recorded audio data will be used for speech recognition, The data which is not useful for speech recognition can be reduced. Human hearing sensivity is in 20 Hz - 20 KHz audiable frequency. The Outer of the range can be removed.
G.711: You can use G.711 standard for audio compression. The compression method is lossless one. It can compress your data as much as 50 percent. You can download the java source code of G711.java via this link (
https://code.google.com/p/sipdroid/source/browse/trunk/src/org/sipdroid/media/G711.java?r=386 ).
Other methods can be used are, MPEG-1 Layer III (MP3), MPEG-1 Layer II Multichannel, MPEG-1 Layer I, AAC, HE-AAC, MPEG Surround ,MPEG-4 ALS, MPEG-4 SLS, MPEG-4 DST, MPEG-4 HVXC, MPEG-4 CELP, USAC, G.718, G.719, G.722, G.722.1, G.722.2, G.723, G.723.1, G.726, G.728, G.729, G.729.1, Speex, Vorbis, WMA, Codec2 .
Revision History
I will add example code snippets about compression, streaming, playing buffer, Call Out UI, tagging, TTS and STT which can help programmers handle some difficult points.
18/04/13: Callout UI has been added to the article.
30/11/13: Audio Compression has been added to the article.
License
This article, along with any associated source code and files, is licensed under
The Code Project Open License (CPOL)Share