Dan Bohus [Publications by year]


Home	Publications

2024

Bohus, D., Andrist, S., Bao, Y., Horvitz, E., Paradiso, A., (2024) - "Is This It?": Towards Ecologically Valid Benchmarks for Situated Collaboration, to appear in Proceedings of International Conference on Multimodal Interaction (ICMI Companion '24), November 4--8, 2024, San Jose, Costa Rica. [abs]

Stiber, M., Bohus, D., Andrist, S., (2024) - "Uh, This One?": Leveraging Behavioral Signals for Detecting Confusion During Physical Tasks, to appear in Proceedings of International Conference on Multimodal Interaction, 2024, San Jose, Costa Rica. [abs]

A longstanding goal in the AI and HCI research communities is building intelligent assistants to help people with physical tasks. To be effective in this, AI assistants must be aware of not only the physical environment, but also the human user and their cognitive states. In this paper, we specifically consider the detection of confusion, which we operationalize as the moments when a user is “stuck” and needs assistance. We explore how behavioral features such as gaze, head pose, and hand movements differ between periods of confusion vs no-confusion. We present various modeling approaches for detecting confusion that combine behavioral features, length of time, instructional text embeddings, and egocentric video. Although deep networks (e.g., V-Jepa) trained on full video streams perform well in distinguishing confusion from non-confusion, simpler models leveraging lighter weight behavioral features exhibit similarly high performance, even when generalizing to unseen tasks.

Bohus, D., Andrist, S., Saw, N., Paradiso, A., Chakraborty, I., Rad, M., (2024) - SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research -- Extended Abstract, in Proceedings of 2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), Orlando, FL. [abs] [github] [blog post]

Bohus, D., Andrist, S., Saw, N., Paradiso, A., Chakraborty, I., Rad, M., (2024) - SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research, Technical Report [abs] [github] [blog post]

2023

Wang, X., Kwon, T., Rad, M., Pan, B., Chakraborty, I., Andrist, S., Bohus, D., Feniello, A., Tekin, B., Frujeri, F.V., Joshi, N., Pollefeys, M., (2023) - HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World, in Proceedings of ICCV'2023, Paris, France. [abs] [dataset website] [blog post]

Building an interactive AI assistant that can perceive, reason, and collaborate with humans in the real world has been a long-standing pursuit in the AI community. This work is part of a broader research effort to develop intelligent agents that can interactively guide humans through performing tasks in the physical world. As a first step in this direction, we introduce HoloAssist, a large-scale egocentric human interaction dataset, where two people collaboratively complete physical manipulation tasks. The task performer executes the task while wearing a mixed-reality headset that captures seven synchronized data streams. The task instructor watches the performer’s egocentric video in real time and guides them verbally. By augmenting the data with action and conversational annotations and observing the rich behaviors of various participants, we present key insights into how human assistants correct mistakes, intervene in the task completion procedure, and ground their instructions to the environment. HoloAssist spans 166 hours of data captured by 350 unique instructor-performer pairs. Furthermore, we construct and present benchmarks on mistake detection, intervention type prediction, and hand forecasting, along with detailed analysis. We expect HoloAssist will provide an important resource for building AI assistants that can fluidly collaborate with humans in the real world. Data can be downloaded at https://holoassist.github.io/.

2022

Bohus, D., Andrist, S., Feniello, A., Saw, N., Horvitz, E., (2022) - Continual Learning about Objects in the Wild: An Interactive Approach, in Proceedings of ICMI'2022, Bengaluru (Bangalore), India. [abs]

-	Andrist, S., Bohus, D., Feniello, A., Saw, N., (2022) - Developing Mixed Reality Applications with Platform for Situated Intelligence, 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 48-50, doi: 10.1109/VRW55335.2022.00018.

2021

-	Bohus, D., Andrist, S., Feniello, A., Saw, N., Jalobeanu, M., Sweeney, P., Thompson, A.L., and Horvitz, E., (2021) - Platform for Situated Intelligence, Microsoft Research Technical Report MSR-TR-2021-2, March, 2021.

2020

Andrist, S., and Bohus, D., (2020) - Accelerating the Development of Multimodal, Integrative-AI Systems with Platform for Situated Intelligence, in AAAI Fall Symposium on Artificial Intelligence for Human-Robot Interaction: Trust & Explainability in Artificial Intelligence for Human-Robot Interaction, October, 2020.

-	Zhi Tan, X., Andrist, S., Bohus, D., , and Horvitz, E., (2020) - Now, Over Here: Leveraging Extended Attentional Capabilities in Human-Robot Interaction, late breaking report, in Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, Cambridge, UK.

2019

-	Bohus, D., and Horvitz, E., (2019) - Situated Interaction , book chapter in The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions - Volume 3.

-	Andrist, S., Bohus, D., and Feniello, A., (2019) - Demonstrating a Framework for Rapid Development of Physically Situated Interactive Systems , demonstration paper in HRI'2019, Daegu, South Korea.

2017

Andrist, S., Bohus, D., Kamar, E., and Horvitz, E., (2017) - What Went Wrong and Why? Diagnosing Situated Interaction Failures in the Wild , in Proceedings of ICSR'2017, Tsukuba, Japan. [abs]

Effective situated interaction hinges on the well-coordinated operation of a set of competencies, including computer vision, speech recognition, and natural language, as well as higher-level inferences about turn taking and engagement. Systems often rely on a set of hand-coded and machine-learned components organized into several sensing and decision-making pipelines. Given their complexity and inter-dependencies, developing and debugging such systems can be challenging. "In-the-wild" deployments outside of controlled lab conditions bring further challenges due to unanticipated phenomena, including unexpected interactions such as playful engagements. We present a methodology for assessing performance, identifying problems, and diagnosing the root causes and influences of different types of failures on the overall performance of a situated interaction system functioning in the wild. We apply the methodology to a dataset of interactions collected with a robot deployed in a public space inside an office building. The analyses identify and characterize multiple types of failures, their causes, and their relationship to overall performance. We employ models that predict overall interaction quality from various combinations of failures. Finally, we discuss lessons learned with such a diagnostic methodology for improving situated systems deployed in the wild.

Bohus, D., Andrist, S., and Jalobeanu, M., (2017) - Rapid Development of Multimodal Interactive Systems: A Demonstration of Platform for Situated Intelligence , in Proceedings of ICMI'2017, Glasgow, Scotland. [abs] | ICMI'17 best demostration award

Bohus, D., Andrist, S., and Horvitz, E., (2017) - A Study in Scene Shaping: Adjusting F-formations in the Wild , in AAAI Fall Symposium 2017, Arlington, VA [abs]

2016

Andrist, S., Bohus, D., Mutlu, B., and Schlangen, D., (2016) - Turn-Taking and Coordination in Human-Machine Interaction , in AI Magazine, Winter Issue, vol. 34, no. 4 [abs]

2015

Yu, Z., Bohus, D., and Horvitz, E., (2015) - Incremental Coordination: Attention-Centric Speech Production in a Physically Situated Conversational Agent , in SigDIAL'2015, Prague, Czech Republic [abs]

Andrist, S., Bohus, D., Yu, Z., and Horvitz, E., (2015) - Are You Messing with Me? Querying about the Sincerity of Interactions in the Open World , late breaking report, in HRI'2015, Christchurch, New Zealand [abs]

2014

Bohus, D., Horvitz, E., (2014) - Managing Human-Robot Engagement with Forecasts and ... um ... Hesitations, in Proceedings of ICMI'2014, Istanbul, Turkey [abs]

Pejsa, T., Bohus, D., Cohen, M., Saw, C.W., Mahoney, J., Horvitz, E. (2014) - Natural Communication about Uncertainties in Situated Interaction, in ICMI'2014, Istanbul, Turkey [abs] [supplemental video]

Mitchell, M., Bohus, D., Kamar, E., (2014) - Crowdsourcing Language Generation Templates for Dialogue Systems , in INLG'2014, Philadelphia, PA, USA [abs]

Bohus, D., Saw, C.W., Horvitz, E., (2014) - Directions Robot: In-the-Wild Experiences and Lessons Learned , in AAMAS'2014, Paris, France [abs]

2013

Rosenthal, S., Bohus, D., Kamar, E., Horvitz, E., (2013) - Look versus Leap: Computing Value of Information with High-Dimensional Streaming Evidence , in IJCAI'2013, Beijing, China [abs]

A key decision facing autonomous systems with access to streams of sensory data is whether to act based on current evidence or to wait for additional information that might enhance the utility of taking an action. Computing the value of information is particularly difficult with streaming highdimensional sensory evidence. We describe a belief projection approach to reasoning about information value in these settings, using models for inferring future beliefs over states given streaming evidence. These belief projection models can be learned from data or constructed via direct assessment of parameters and they fit naturally in modular, hierarchical state inference architectures. We describe principles of using belief projection and present results drawn from an implementation of the methodology within a conversational system.

Rosenthal, S., Skaff, S., Veloso, M., Bohus, D., Horvitz, E., (2013) - Execution Memory for Grounding and Coordination , in HRI'2013, Tokyo, Japan [abs]

As robots are introduced into human environments for long periods of time, human owners and collaborators will expect them to remember shared events that occur during execution. Beyond naturalness of having memories about recent and longer-term engagements with people, such execution memories can be important in tasks that persist over time by allowing robots to ground their dialog and to refer efficiently to previous events. In this work, we define execution memory as the capability of saving interaction event information and recalling it for later use. We divide the problem into four parts: salience filtering of sensor evidence and saving to short term memory, archiving from short to long term memory and caching from long to short term memory, and recalling memories for use in state inference and policy execution. We then provide examples of how execution memory can be used to enhance user experience with robots.

Metallinou, A., Bohus, D., Williams, J.D., (2013) - Discriminative state tracking for spoken dialog systems , in ACL'2013, Sofia, Bulgaria [abs]

In spoken dialog systems, statistical state tracking aims to improve robustness to speech recognition errors by tracking a posterior distribution over hidden dialog states. Current approaches based on generative or discriminative models have different but important shortcomings that limit their accuracy. In this paper we discuss these limitations and introduce a new approach for discriminative state tracking that overcomes them by leveraging the problem structure. An offline evaluation with dialog data collected from real users shows improvements in both state tracking accuracy and the quality of the posterior probabilities. Features that encode speech recognition error patterns are particularly helpful, and training requires relatively few dialogs.

Lasecki, W.S., Kamar, E., Bohus, D. (2013) - Conversations in the Crowd: Collecting Data for Task-Oriented Dialog Learning , in HCOMP'2013, Palm Springs, CA, USA [abs]

Loomis-Thompson, A., Bohus, D. (2013) - A Framework for Multimodal Data Collection, Visualization, Annotation and Learning , in ICMI'2013, Sydney, Australia [abs]

2012

Wang, W.Y., Bohus, D., Kamar, E., Horvitz, E. (2012) - Crowdsourcing the Acquisition of Natural Language Corpora: Methods and Observations , in SLT'2012, Miami, USA [abs]

Vinyals, O., Bohus, D., Caruana, R. (2012) - Learning Speaker, Addressee and Overlap Detection Models from Multimodal Streams , in ICMI'2012, Santa Monica, USA [abs]

A key challenge in developing conversational systems is fusing streams of information provided by different sensors to make inferences about the behaviors and goals of people. Such systems can leverage visual and audio information collected through cameras and microphone arrays, including the location of various people, their focus of attention, body pose, the sound source direction, prosody, and speech recognition results. In this paper, we explore discriminative learning techniques for making accurate inferences on the problems of speaker, addressee and overlap detection in multiparty human-computer dialog. The focus is on finding ways to leverage within- and across-signal temporal patterns and to construct representations from the raw streams in an automated manner that are informative for the inference problem. We present a novel extension to traditional decision trees which allows them to incorporate and model temporal signals. We contrast these methods with more traditional approaches where a human expert manually engineers relevant temporal features. The proposed approach performs well even with relatively small amounts of training data, which is of practical importance as designing features that are task dependent is time consuming and not always possible.

Bohus, D., Kamar, E., Horvitz, E. (2012) - Towards Situated Collaboration , in NAACL Workshop on Future Directions and Challenges in Spoken Dialog Systems: Tools and Data [abs]

2011

Bohus, D., Horvitz, E. (2011) - Decisions about Turns in Multiparty Conversation: From Perception to Action, in ICMI-2011, Alicante, Spain [abs]

Bohus, D., Horvitz, E. (2011) - Multiparty Turn Taking in Situated Dialog: Study, Lessons, and Directions, in SIGdial-2011, Portland, OR [abs] [Supplemental materials and videos]

2010

Bohus, D., Horvitz, E. (2010) - On the Challenges and Opportunities of Physically Situated Dialog, in AAAI Fall Symposium on Dialog with Robots, Arlington, VA [abs]

Bohus, D., Horvitz, E. (2010) - Facilitating Multiparty Dialog with Gaze, Gesture and Speech, in ICMI'10, Beijing, China [abs] [Supplemental materials and videos]

Bohus, D., Horvitz, E., (2010) - Computational Models for Multiparty Turn-Taking, Microsoft Technical Report MSR-TR-2010-115 [abs] [Supplemental materials and videos]

2009

Bohus, D., Horvitz, E. (2009) - Dialog in the Open World: Platform and Applications, in Proceedings of ICMI'09, Boston, MA [abs] | ICMI'09 outstanding paper award | ICMI'19 Ten-Year Technical Impact Award Runner-up

Bohus, D., Horvitz, E. (2009) - Learning to Predict Engagement with a Spoken Dialog System in Open-World Settings, in Proceedings of SIGdial'09, London, UK [abs] [note]

Bohus, D., Horvitz, E. (2009) - Models for Multiparty Engagement in Open-World Dialog, in Proceedings of SIGdial'09, London, UK [abs] | SIGdial'09 best paper award

Bohus, D., Horvitz, E. (2009) - Open-World Dialog: Challenges, Directions, and Prototype, in Proceedings of IJCAI'2009 Workshop on Knowledge and Reasoning in Practical Dialogue Systems, Pasadena, CA [abs] [video]

Li, X., Nguyen, P., Zweig, G., Bohus, D. (2009) - Leveraging Multiple Query Logs to Improve Language Models for Spoken Query Recognition, in Proceedings of ICASSP'09, Taipei, Taiwan [abs]

2008

Bohus, D., Zweig, G., Nguyen, P., Li, X. (2008) - Joint N-Best Rescoring for Repeated Utterances in Spoken Dialog Systems, in Proceedings of SLT'08, Goa, India [abs]

Zweig, G., Bohus, D., Li, X., Nguyen, P. (2008) - Structured Models for Joint Decoding of Repeated Utterances, in Proceedings of InterSpeech'08, Brisbane, Australia [abs]

Bohus, D., Li, X., Nguyen, P., and Zweig, G. (2008) - Learning N-Best Correction Models from Implicit User Feedback in a Multi-modal Local Search Application, in Proceedings of SIGdial'08, Columbus, OH [abs]

Bohus, D., Rudnicky, A. (2008) - The RavenClaw dialog management framework: architecture and systems, in Computer Speech and Language, DOI:10.1016/j.csl.2008.10.001 [abs]

2007

Bohus, D. (2007) - Error Awareness and Recovery in Conversational Spoken Language Interfaces, Ph.D. Dissertation, CS-07-124, Carnegie Mellon University, Pittsburgh, PA [abs] [note]

Bohus, D., and Rudnicky, A. (2007) - Implicitly-supervised learning in spoken language interfaces: an application to the confidence annotation problem, in Proceedings of SIGdial 2007, Antwerp, Belgium [abs] [note]

Ai, H., Raux, A., Bohus, D., Eskenazi, M., and Litman, D. (2007) - Comparing Spoken Dialog Corpora Collected with Recruited Subjects versus Real Users, in Proceedings of SIGdial 2007, Antwerp, Belgium [abs]

Empirical spoken dialog research often involves the collection and analysis of a dialog corpus. However, it is not well understood whether and how a corpus of dialogs collected using recruited subjects differs from a corpus of dialogs obtained from real users. In this paper we use Let’s Go Lab, a platform for experimenting with a deployed spoken dialog bus information system, to address this question. Our first corpus is collected by recruiting subjects to call Let’s Go in a standard laboratory setting, while our second corpus consists of calls from real users calling Let’s Go during its operating hours. We quantitatively characterize the two collected corpora using previously proposed measures from the spoken dialog literature, then discuss the statistically significant similarities and differences between the two corpora with respect to these measures. For example, we find that recruited subjects talk more and speak faster, while real users ask for more help and more frequently interrupt the system. In contrast, we find no difference with respect to dialog structure.

Bohus, D., Raux, A., Harris, T., Eskenazi, M., and Rudnicky, A. (2007) - Olympus: an open-source framework for conversational spoken language interface research, in HLT-NAACL 2007 workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technology, Rochester, NY [abs]

Bohus, D., Grau, S., Huggins-Daines, D., Keri, V., Krishna, G., Kumar, R., Raux, A., and Tomko, S. (2007) - Conquest - an Open-Source Dialog System for Conferences, in Proceedings of HLT-NAACL 2007, Rochester, NY [abs]

Tetreault, J., and Bohus, D., (2007) - Estimating the Reliability of MDP Policies: a Confidence Interval Approach, in HLT-NAACL 2007, Rochester, NY [abs]

2006

Bohus, D., Langner, B., Raux, A., Black, A., Eskenazi, M. and Rudnicky A. (2006) - Online Supervised Learning of Non-understanding Recovery Policies, in SLT-2006, Palm Beach, Aruba [abs]

Bohus, D., and Rudnicky, A. (2006) - A K Hypotheses + Other Belief Updating Model, in AAAI Workshop on Statistical and Empirical Approaches to Spoken Dialogue Systems, 2006, Boston, MA [abs] [note]

Spoken dialog systems typically rely on recognition confidence scores to guard against potential misunderstandings. While confidence scores can provide an initial assessment for the reliability of the information obtained from the user, ideally systems should leverage information that is available in subsequent user responses to update and improve the accuracy of their beliefs. We present a machine-learning based solution for this problem. We use a compressed representation of beliefs that tracks up to k hypotheses for each concept at any given time. We train a generalized linear model to perform the updates. Experimental results show that the proposed approach significantly outperforms heuristic rules used for this task in current systems. Furthermore, a user study with a mixed-initiative spoken dialog system shows that the approach leads to significant gains in task success and in the efficiency of the interaction, across a wide range of recognition error-rates.

Raux, A., Bohus, D., Langner, B., Black, A., and Eskenazi, M. (2006) - Doing Research in a Deployed Spoken Dialog System: One Year of Let's Go! Public Experience, in Interspeech-2006, Pittsburgh, PA [abs]

2005

Bohus, D., and Rudnicky, A. (2005) - Constructing Accurate Beliefs in Spoken Dialog Systems , in ASRU-2005, San Juan, Puerto Rico [abs] [note]

Bohus, D., and Rudnicky, A. (2005) - Error Handling in the RavenClaw dialog management architecture, in HLT-EMNLP-2005, Vancouver, CA [abs]

Bohus, D., and Rudnicky, A. (2005) - Sorry, I Didn't Catch That! - An Investigation of Non-understanding Errors and Recovery Strategies, in SIGdial-2005, Lisbon, Portugal [abs] [sigdial book chapter]

We present results from an extensive empirical analysis of non-understanding errors and ten non-understanding recovery strategies, based on a corpus of dialogs collected with a spoken dialog system that handles conference room reservations. More specifically, the issues we investigate are: what are the main sources of non-understanding errors? What is the impact of these errors on global performance? How do various strategies for recovery from non- understandings compare to each other? What are the relationships between these strategies and subsequent user response types, and which response types are more likely to lead to successful recovery? Can dialog performance be improved by using a smarter policy for engaging the non-understanding recovery strategies? If so, can we learn such a policy from data? Whenever available, we compare and contrast our results with other studies in the literature. Finally, we summarize the lessons learned and present our plans for future work inspired by this analysis.

Bohus, D., and Rudnicky, A. (2005) - A Principled Approach for Rejection Threshold Optimization in Spoken Dialog Systems, in Interspeech-2005, Lisbon, Portugal [abs]

Raux, A., Langner, B., Bohus, D., Black, A., and Eskenazi, M. (2005) - Let's Go Public! Taking a Spoken Dialog System to the Real World, in Interspeech-2005, Lisbon, Portugal [abs]

2004

Bohus, D. (2004) - Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems, Ph.D Thesis Proposal, Carnegie Mellon University, Pittsburgh, PA [abs]

Bohus, D., and Rudnicky, A. (2004) - Task-Independent Conversational Strategies in the RavenClaw Dialogue Management Framework, unpublished manuscript [abs]

-	Aist, G., Bohus, D., Boven, B., Campana, E., Early, S., Phan, S. (2004) - Initial Development of a Voice-Activated Astronaut Assistant for Procedural Tasks: From Need to Concept to Prototype, in Journal of Interactive Instruction Development, Volume 16, Nr. 3, Winter 2004, pp 32-36

2003

Bohus, D., and Rudnicky A. (2003) - RavenClaw: Dialog Management Using Hierarchical Task Decomposition and an Expectation Agenda, in Eurospeech-2003, Geneva, Switzerland [abs]

Aist, G., Dowding, J., Hockey, B.A., Rayner, M., Hieronymus, J., Bohus, D., Boven, B., Blaylock, N., Campana, E., Early, S., Gorrell, G., and Phan, S. (2003) - Talking through procedures: An intelligent Space Station procedure assistant, in Demo Session at EACL-2003, Budapest, Hungary [abs]

2002

Bohus, D., and Rudnicky A. (2002) - LARRI: A Language-Based Maintenance and Repair Assistant, in IDS-2002, Kloster Irsee, Germany [abs]

Bohus, D., and Rudnicky A. (2002) - Integrating Multiple Knowledge Sources for Utterance-Level Confidence Annotation in the CMU Communicator Spoken Dialog System, Technical Report CS-190, Carnegie Mellon University, Pittsburgh, PA [abs]

In the recent years, automated speech recognition has been the main drive behind the advent of spoken language interfaces, but at the same time a severe limiting factor in the development of these systems. We believe that increased robustness in the face of recognition errors can be achieved by making the systems aware of their own misunderstandings, and employing appropriate recovery techniques when breakdowns in interaction occur. In this paper we address the first problem: the development of an utterance-level confidence annotator for a spoken dialog system. After a brief introduction to the CMU Communicator spoken dialog system (which provided the target platform for the developed annotator), we cast the confidence annotation problem as a machine learning classification task, and focus on selecting relevant features and on empirically identifying the best classification techniques for this task. The results indicate that significant reductions in classification error rate can be obtained using several different classifiers. Furthermore, we propose a data driven approach to assessing the impact of the errors committed by the confidence annotator on dialog performance, with a view to optimally fine-tuning the annotator. Several models were constructed, and the resulting error costs were in accordance with our intuition. We found, surprisingly, that, at least for a mixed-initiative spoken dialog system as the CMU Communicator, these errors trade-off equally over a wide operating characteristic range.

2001

Bohus, D., and Rudnicky, A. (2001) - Modeling the Cost of Misunderstandings in the CMU Communicator Dialog System, in ASRU-2001, Madonna di Campiglio, Italy [abs]

Carpenter P., Jin C., Wilson D., Zhang R., Bohus, D., and Rudnicky A. (2001) - Is This Conversation on Track?, in Eurospeech-2001, Aalborg, Denmark [abs]

2000

Bohus, D. (2000) - Stochastic Speech Understanding for Human-Computer Dialogue, Romanian Journal of Information Science and Technology, Volume 4, No 3-4/2001, p.261 [abs] [Romanian version]

-	Bohus, D. (2000) - Stochastically-Based Semantic Analysis in Human-Computer Dialog, Graduate Thesis, Politechnica University of Timisoara, Romania

Bohus, D., and Boldea, M. (2000) - A Web-based Text Corpora Development System, in LREC-2000, Athens, Greece [abs]