REPORT: Voice Assistant Benchmark 1.0 (2018)

$995.00

Category:

Document ID: CGR-VAB18 | Last Updated: July 31, 2018

Report Overview
Abstract:

Voice assistants are voice-based conversational interfaces paired with intelligent cloud-based back-ends. Examples of voice assistants include Amazon Alexa, Apple Siri, Google Home, and Microsoft Cortana. Increasingly, vendors are positioning their devices as intelligent assistants, being used to perform real-world tasks. This is not just playing music or telling you the weather. This requires at least a minimum level of intelligence to perform without frustrating the user. In this report, Cognilytica evaluates the intelligence and knowledge graph capabilities of four voice assistants: Amazon Alexa, Google Assistant (Home), Apple Siri, and Microsoft Cortana. We want to know — just how intelligent is the AI back-end?

Key Findings:

  • The Voice Assistant Benchmark determines the underlying intelligence of voice assistant platforms, and identifies categories of conversations and interactions to determine intelligence capabilities
  • As a whole, voice assistants have a long way to go before even half of the responses are acceptable.
  • The Voice Assistant Benchmark ranks responses on a scale of Category 0 to Category 3. Category 0 responses indicate an inability for the voice assistant to respond to the request; Category 1 responses indicate improper or incorrect responses; Category 2 requires the human listener to do the work of assessing the correct answer; and Category 3 are correct responses that humans can understand.
  • For the current benchmark, Alexa provided the most number of total Category 3 responses with 25 out of 100 questions answered correctly.
  • Google follows close behind with 19 Category 3 responses.
  • Siri and Cortana trail behind with only 13 and 10 Category 3 responses respectively.
  • Alexa shines with questions like “How old would George Washington be if he was alive today?” and “Where is the nearest bus stop?”, but can’t answer questions like “How long should you cook a 14 pound turkey?” or “What types of ticks can carry Lyme disease?”, both of which Google answered without any difficulty with Category 3 responses.
  • Siri and Cortana defaulted mostly to search-based responses (making the human do all the work), but both devices did respond to “How old is George Washington?”, while Cortana did also answer or attempt to answer other questions.

Key Vendors Included in this Report:

  • Amazon Alexa
  • Google Assistant (Google Home)
  • Apple Siri
  • Microsoft Cortana

Report Details:

  • 43 Pages
  • 11 Charts
  • 11 Tables

Price: $995

Table of Contents
  • Table of Contents
  • Executive Summary    5
    • Key Findings    5
  • Benchmark Details    6
  • About the Voice Assistant Benchmark    6
    • Testing Cloud-based Conversational Intelligence Capabilities of Edge Voice Assistants    6
    • What this Benchmark Aims to Test:    7
    • What this Benchmark Does NOT Test:    7
    • Yes, We Know Voice Assistants Aren’t Smart… But the Bar is Moving.    7
    • Purpose of Benchmark: Measure the Current State of Intelligence in Voice Assistants    8
    • If you’re building Voice-based Skills or Capabilities on Voice Assistant Platforms, you NEED to Pay Attention    8
    • This is Not a Ranking!    8
  • Benchmark Methodology    8
    • Open, Verifiable, Transparent. Your Input Needed.    10
    • Benchmark Configuration    10
    • Voice Assistants Tested:    10
    • Computer Generated Voice(s) Used:    10
  • Voice Assistant Benchmark 1.0 Questions    11
    • Benchmark Calibration Questions (CQ)    11
      • Overview:    11
      • Current Benchmark Questions:    11
    • Concept Understanding (CU) Benchmark Questions    12
      • Overview:    12
      • Current Benchmark Questions:    12
    • Understanding Comparisons (UC) Benchmark Questions    12
      • Overview:    12
      • Current Benchmark Questions:    13
    • Understanding Cause & Effect (CE) Benchmark Questions    13
      • Overview:    13
      • Current Benchmark Questions:    14
    • Reasoning & Logic (RE) Benchmark Questions    14
      • Overview:    14
      • Current Benchmark Questions:    14
    • Helpfulness Benchmark (HP) Questions    15
      • Overview:    15
      • Current Benchmark Questions:    15
    • Emotional IQ (EI) Benchmark Questions    16
      • Overview:    16
      • Current Benchmark Questions:    16
    • Intuition and Common Sense (IN) Benchmark Questions    17
      • Overview:    17
      • Current Benchmark Questions:    17
    • Winograd Schema Inspired (WS) Benchmark Questions    17
      • Overview:    17
      • Current Benchmark Questions:    18
    • Slang / Colloquialisms / Expressions (SE) Benchmark Questions    19
      • Overview:    19
      • Current Benchmark Questions:    19
    • Miscellaneous Questions    19
      • Overview:    19
      • Current Benchmark Questions:    20
  • Benchmark Results: Calibration Questions    21
    • Overview:    21
    • Complete Results    21
    • Analysis of Results    22
  • Benchmark Results: Understanding Concepts Questions    23
    • Overview:    23
    • Complete Results    23
    • Analysis of Results    24
  • Benchmark Results: Understanding Comparisons    25
    • Overview:    25
    • Complete Results    25
    • Analysis of Results    26
  • Benchmark Results: Understanding Cause & Effect    27
    • Overview    27
    • Complete Results    27
    • Analysis of Results    28
  • Benchmark Results: Reasoning & Logic    29
    • Overview:    29
    • Complete Results    29
    • Analysis of Results    30
  • Benchmark Results: Helpfulness Questions    31
    • Overview    31
    • Complete Results    31
    • Analysis of Results    32
  • Benchmark Results: Emotional IQ Questions    33
    • Overview    33
    • Complete Results    33
    • Analysis of Results    34
  • Benchmark Results: Intuition and Common Sense    35
    • Overview    35
    • Complete Results    35
    • Analysis of Results    36
  • Benchmark Results: Winograd Schema Inspired    37
    • Overview    37
    • Complete Results    37
    • Analysis of Results    38
  • Benchmark Results: Slang / Colloquialisms / Expressions    39
    • Overview    39
    • Complete Results    39
    • Analysis of Results    40
  • Benchmark Results: Miscellaneous Questions    41
    • Overview    41
    • Complete Results    41
    • Analysis of Results    42
    • Total Results & Overall Analysis    43
  • Related Research    43

 

Login Or Register

cropped-CogHeadLogo.png

Register to View Event

cropped-CogHeadLogo.png

Get The REPORT: Voice Assistant Benchmark 1.0 (2018)

cropped-CogHeadLogo.png

AI Best Practices

Get the Step By Step Checklist for AI Projects

login

Login to register for events. Don’t have an account? Just register for an event and an account will be created for you!