Document ID: CGR-VAB18 | Last Updated: July 31, 2018
Report Overview |
Abstract:
Voice assistants are voice-based conversational interfaces paired with intelligent cloud-based back-ends. Examples of voice assistants include Amazon Alexa, Apple Siri, Google Home, and Microsoft Cortana. Increasingly, vendors are positioning their devices as intelligent assistants, being used to perform real-world tasks. This is not just playing music or telling you the weather. This requires at least a minimum level of intelligence to perform without frustrating the user. In this report, Cognilytica evaluates the intelligence and knowledge graph capabilities of four voice assistants: Amazon Alexa, Google Assistant (Home), Apple Siri, and Microsoft Cortana. We want to know — just how intelligent is the AI back-end?
Key Findings:
- The Voice Assistant Benchmark determines the underlying intelligence of voice assistant platforms, and identifies categories of conversations and interactions to determine intelligence capabilities
- As a whole, voice assistants have a long way to go before even half of the responses are acceptable.
- The Voice Assistant Benchmark ranks responses on a scale of Category 0 to Category 3. Category 0 responses indicate an inability for the voice assistant to respond to the request; Category 1 responses indicate improper or incorrect responses; Category 2 requires the human listener to do the work of assessing the correct answer; and Category 3 are correct responses that humans can understand.
- For the current benchmark, Alexa provided the most number of total Category 3 responses with 25 out of 100 questions answered correctly.
- Google follows close behind with 19 Category 3 responses.
- Siri and Cortana trail behind with only 13 and 10 Category 3 responses respectively.
- Alexa shines with questions like “How old would George Washington be if he was alive today?” and “Where is the nearest bus stop?”, but can’t answer questions like “How long should you cook a 14 pound turkey?” or “What types of ticks can carry Lyme disease?”, both of which Google answered without any difficulty with Category 3 responses.
- Siri and Cortana defaulted mostly to search-based responses (making the human do all the work), but both devices did respond to “How old is George Washington?”, while Cortana did also answer or attempt to answer other questions.
Key Vendors Included in this Report:
- Amazon Alexa
- Google Assistant (Google Home)
- Apple Siri
- Microsoft Cortana
Report Details:
- 43 Pages
- 11 Charts
- 11 Tables
|
|
Table of Contents |
- Table of Contents
- Executive Summary 5
- Benchmark Details 6
- About the Voice Assistant Benchmark 6
- Testing Cloud-based Conversational Intelligence Capabilities of Edge Voice Assistants 6
- What this Benchmark Aims to Test: 7
- What this Benchmark Does NOT Test: 7
- Yes, We Know Voice Assistants Aren’t Smart… But the Bar is Moving. 7
- Purpose of Benchmark: Measure the Current State of Intelligence in Voice Assistants 8
- If you’re building Voice-based Skills or Capabilities on Voice Assistant Platforms, you NEED to Pay Attention 8
- This is Not a Ranking! 8
- Benchmark Methodology 8
- Open, Verifiable, Transparent. Your Input Needed. 10
- Benchmark Configuration 10
- Voice Assistants Tested: 10
- Computer Generated Voice(s) Used: 10
- Voice Assistant Benchmark 1.0 Questions 11
- Benchmark Calibration Questions (CQ) 11
- Overview: 11
- Current Benchmark Questions: 11
- Concept Understanding (CU) Benchmark Questions 12
- Overview: 12
- Current Benchmark Questions: 12
- Understanding Comparisons (UC) Benchmark Questions 12
- Overview: 12
- Current Benchmark Questions: 13
- Understanding Cause & Effect (CE) Benchmark Questions 13
- Overview: 13
- Current Benchmark Questions: 14
- Reasoning & Logic (RE) Benchmark Questions 14
- Overview: 14
- Current Benchmark Questions: 14
- Helpfulness Benchmark (HP) Questions 15
- Overview: 15
- Current Benchmark Questions: 15
- Emotional IQ (EI) Benchmark Questions 16
- Overview: 16
- Current Benchmark Questions: 16
- Intuition and Common Sense (IN) Benchmark Questions 17
- Overview: 17
- Current Benchmark Questions: 17
- Winograd Schema Inspired (WS) Benchmark Questions 17
- Overview: 17
- Current Benchmark Questions: 18
- Slang / Colloquialisms / Expressions (SE) Benchmark Questions 19
- Overview: 19
- Current Benchmark Questions: 19
- Miscellaneous Questions 19
- Overview: 19
- Current Benchmark Questions: 20
- Benchmark Results: Calibration Questions 21
- Overview: 21
- Complete Results 21
- Analysis of Results 22
- Benchmark Results: Understanding Concepts Questions 23
- Overview: 23
- Complete Results 23
- Analysis of Results 24
- Benchmark Results: Understanding Comparisons 25
- Overview: 25
- Complete Results 25
- Analysis of Results 26
- Benchmark Results: Understanding Cause & Effect 27
- Overview 27
- Complete Results 27
- Analysis of Results 28
- Benchmark Results: Reasoning & Logic 29
- Overview: 29
- Complete Results 29
- Analysis of Results 30
- Benchmark Results: Helpfulness Questions 31
- Overview 31
- Complete Results 31
- Analysis of Results 32
- Benchmark Results: Emotional IQ Questions 33
- Overview 33
- Complete Results 33
- Analysis of Results 34
- Benchmark Results: Intuition and Common Sense 35
- Overview 35
- Complete Results 35
- Analysis of Results 36
- Benchmark Results: Winograd Schema Inspired 37
- Overview 37
- Complete Results 37
- Analysis of Results 38
- Benchmark Results: Slang / Colloquialisms / Expressions 39
- Overview 39
- Complete Results 39
- Analysis of Results 40
- Benchmark Results: Miscellaneous Questions 41
- Overview 41
- Complete Results 41
- Analysis of Results 42
- Total Results & Overall Analysis 43
- Related Research 43
|
|