Voice assistants as a technology have emerged due to the development of a whole constellation of other technologies. And one of these voice assistants was Marusya, relatively recently created in Mail.ru Group. it is based on the latest developments of the company in the field of speech recognition, machine learning and fast processing of large data sets. Marusya is a dialog platform. It is able to interact with a person not only with sound, but also through images or text. There are many situations in which she will respond not with a voice, but with an image, text, or link.
Marusya debuted last year, and the first device with this assistant appeared in April 2020. The company develops it, makes it smarter and fills the library of skills. In particular, the assistant began to understand the user better, TTS improved. In addition, Marusya learned how to manage a smart home. in june 2020 Mail.ru Group has opened the possibility to create your own skills. Since then, the protocol has greatly improved and the assistant’s capabilities have grown much, and a debugger has appeared for developers, where you can test the created skills.
Content:
- How it works Mary.
- How to make a skill useful.
- How to create a skill for Marusia.
- How to transfer a skill from other assistants.
- How the skill debugging environment works.
- How to add a skill via VKontakte.
- How skills are moderated.
- How to add images to Marusya.
- How to add sounds to Marusya.
1. How Marusya works
Marusia’s work is based on a system of” skills “or, in Russian,”skills”. These are tiny dialog applications that specify the responses and actions of the voice assistant in response to certain voice requests. A kind of microservices, if you are familiar with this software architecture. Users interact with the skills of Marusia, which can be built into a variety of devices and individual programs. Moreover, devices can have touch screens or only voice interfaces.
To date, Marusya is supported by:
- Marusya mobile app on iOS;
- Marusya mobile app for Android;
- Smart column ” Capsule»;
- Prestigio Smartvoice Smart Speaker;
- The Mail app Mail.ru for iOS;
- The Mail app Mail.ru for Android.
The platform does not limit you to the set of skills that the developers have laid down. You can create them yourself.
2. How to make a skill useful
Before you start creating a skill, try to abstract from all the tools and processes. Imagine that the user is not talking to the device, but is asking some specialist to perform their task. Think about it: “If instead of my skill there was a person who perfectly performs this task, how would he communicate with the customer?” Describe in steps how this dialogue will take place, what information will be exchanged between the customer and the contractor. Keep in mind that the dialog and usage scenario may differ for Marusia in the smart column and in the app. In the app, you can put some of the information in the visual interface, and in the smart speaker, only voice control is available.
When communicating with Marusya, the user can say anything, and it is necessary to provide for border situations so that he does not come to a dead end. To do this, check your skill on the TRINDI checklist (in Russian).
After honing the script, you need to think through the interface. To check its convenience, we recommend the second important checklist-the Nielsen heuristics (in Russian). Use it to check the key scenario and all subtasks and branches. The checklist will show whether you are sending all the necessary signals to the user that the skill worked and how it worked. And at the same time, it will help you correctly handle errors: not just by issuing a notification, but with an explanation of what to do next.
We also recommend using a long dash ( — ) in dialogs with the user, and using only the Russian typographic font “herringbone”as quotation marks. And don’t forget that Marusya refers to “you” regardless of the user’s age.
3. How to create a skill for Marusia
- Using the Aimylogic voice app builder. It allows you to create any skill, get a webhook for it. At the same time, all developers who create skills for Marusia using Aimylogic can host these skills for free. There is a special rate for this — Skillmaster.
- Develop it yourself. To do this, you will have to study the detailed documentation on creating skills for Marusia.
- Order the development of skills from third-party developers.
4. How to transfer a skill from other assistants
If you already have a skill created on the aimylogic platform, then to connect it to Marusa, just select the Marusa channel in the settings and follow the instructions. More detailed information can be found on the platform page.
What you should pay attention to when transferring a skill:
- Check the skill for mentions of brands and companies.
- Remove third-party monetization, links to other directories, and other specific settings.
- Check the voice synthesis markup. Detailed instructions on the link.
- Check the image hosting. Perhaps in a portable skill, images are loaded from cloud storage. To work with images, you need to upload them in the VKontakte interface.
- If desired, you can convert audio using the Marusi sound library.
5. How does the skill debugging environment work?
It is important for us that developers can quickly check their ideas, easily debug and test skills on our platform. To do this, we created an environment for testing skills.
It allows you to test it on any Marusia client without publishing the skill. to get started, it is enough to specify the url of the developed skill to the webhook environment. At the same time, there is no need to upload it to the network — the developer can connect a skill deployed on his computer to the test environment by specifying a local address, for example:
http://localhost:3000/webhook
The skill connected to the environment can be tested in any Marusia client: on the “Capsule” column, in the mobile application, in VKontakte or in the emulator built into the environment. The developer can override some of the client parameters passed to the skill: time zone, geolocation, interface language, whether there is a display-this is convenient for testing scenarios of the skill.
The test environment recreates the production environment of Marusia, so even at the early stages of implementation, the developer can make sure that the skill is properly integrated with the platform. If it returns an error, responds for a long time, or the response from the skill does not comply with the protocol, then the environment will display the corresponding message. The environment also displays a log of JSON messages between the skill and the platform, which helps to look for errors in the business logic of the skill.
6. How to add a skill via VKontakte
The skill itself can be created in the section for developers on VKontakte. To add a ready-made skill:
- In the application types, select “Marusia Skill”.
- Add a name that matches the command to activate the skill.
- Enter the URL of the server where the skill will be placed in the Webhook field, for example https://example.com/test-webhook.
- Confirm the action.
You will be taken to the administration interface.
Please note: the name is the first trigger phrase for calling the skill. Phrases must be specific and unique so that we can use them for external skills. For example, the phrase “Tell a joke” can not be added, because it is already used in the internal skills of Marusia. But “Let’s make a codeview” can still be used. The length of the activation phrase cannot exceed 64 characters.
7. How to moderate skills
All skills that are created for Marusia by third-party developers must pass moderation. They pass quickly, in just one working day.
What rules do you need to follow to ensure that your Marusya skill is successfully approved for use:
- images must comply with the theme of the skill and the laws, as well as not violate copyright;
- the description of the skill should be simple, concise and succinct;
- the category must correspond to the subject of the skill;
- the skill should be useful and not be of an advertising nature;
- the name of the skill and the phrase that activates it must not contain well-known trademarks (exception: if the skill is represented by the owner of this brand);
- the skill must not give access to copyrighted content if the rights to the content do not belong to the developer of the skill.
if your skill meets these conditions, it will be moderated and published in just one day.
8. How to add images to Marusya
Marusya allows you to insert images in the response from the external skill. To use your image, you must upload it using the image upload form on the skill information editing page on the VKontakte skill upload platform. After the image is successfully uploaded, it will be displayed next to the upload form along with its ID. The image ID must be specified in the external skill response in the image_id field. This answer will look like this:
{
“response”: {
“text”: “Now there is a queue of 5 people in the dining room.”,
“tts”: “There are now five people waiting in the dining room.”,
“card”: {
“type”: “bigImage”,
“image_id”:239017,
“title”: “Title for the image”,
“description”: “a description of the image”
},
“buttons”: [
{
“title”: “Label on the button”,
” payload”:{},
“url”:”https://example.com/”
}
],
“end_session”:true
},
“session”:{
“session_id”:”574d41e0-a41e-4028-a73a-6f5b5″,
“message_id”:0,
“user_id”:”3eae3c2f69b9f04e8cb15e157c4a9e05″
},
“version”:”1.0″
}
9. How to add sounds to Marusya
The text spoken by Marusya can be varied with sound effects, which are included in the library of Marusya sounds. To do this, in the tts (text intended for translation into speech) field of the external skill response, you need to insert a self-closing speaker tag, in which you need to specify the audio attribute. It looks like this:
tts = “Congratulations! < speaker audio= “marusia-sounds/game-win-1” / > You answered all my questions correctly!”
You can also insert your own sounds into the spoken speech. To do this, you need to create a skill on the VKontakte platform, and then upload your audio files on the skill editing page. They will only be available for use in your external skill. Once the sounds are available, you can insert them into speech using the speaker tag with the audio_vk_id attribute. The attribute value will be the ID of the audio you uploaded. It looks like this:
tts = “Guess whose voice it is? <speaker audio_vk_id=”-2000000002_123456789” />”
The Marusia skill platform will continue to evolve.