- Speech recognize: It’s much easier today with Apple, Google and some others open the SDK. Many languages are supported but English is the best, of course.
- Natural language understanding (NLU): Most of big tech companies are working on the promising system like LOUS.ai (Microsoft), Watson (IBM)… I use Wit.ai just because of its simple of use. These systems created the base model, our task is just inserting more data to train the machine with a specific topic.
- Natural language processing (NLP): It’s actually the lower level of NLU but it could help. Google is in the best with cloud base service https://cloud.google.com/natural-language/, that can analyze the syntax and sentiment. It’s really helpful while we combine the NLU and NLP. We know the intent of the sentence as well as sentiment, it’s a guide to have more human react (with emotion).
- Text to speech: It’s a easy job, there are a lot of services help our Jarvis speak naturally like Amazon echo.
We can do somethings base on this code. Objectively, we can connect it with other things like laptop, IoT server… to do more jobs such as sending the message, turning on the light… etc. Technically, we can have better design to open for the possible command / intent, and think about the serverless jobs.
I hope I can do further in the next demo.
The presentation and source code are here: https://github.com/hiennvn/simple-jarvis