Engineering VP Josh Clemm on how we use knowledge graphs, MCP, and DSPy in Dash
Engineering VP Josh Clemm deep-dives into how we think about knowledge graphs, indexes, MCP, and prompt optimization using tools like DSPy.

Engineering VP Josh Clemm recently spoke about the innovative technologies and approaches that Dropbox Dash is using to revolutionize the way users access and manage their work content. In a talk for Jason Liu's online course on Retrieval-Augmented Generation (RAG) offered by the education platform Maven, Clemm delved into the company's use of knowledge graphs, indexes, Multiple Choice Prompt (MCP), and prompt optimization tools like DSPy.
Clemm began by highlighting the challenges faced by individuals and businesses in managing the vast amount of content scattered across multiple applications and platforms. With numerous tabs and accounts open, finding specific information becomes a daunting task. While large language models (LLMs) are advancing rapidly, they often lack access to proprietary, walled-garden content, limiting their utility in assisting users with work-related queries.
To address this gap, Dropbox Dash was developed. By connecting to a wide range of third-party apps and consolidating their content into a single platform, Dash enables users to search, retrieve, and perform agentic queries efficiently. The technology stack behind Dash includes custom crawlers, content understanding, and enrichment processes, as well as advanced retrieval techniques.
The foundation of Dash's context engine lies in its connectors, which are custom crawlers designed to extract data from various third-party applications. Building these crawlers presents challenges due to differing rate limits, API quirks, and permission systems. However, overcoming these obstacles is crucial for aggregating content from diverse sources into a unified platform.
Once the content is gathered, Dropbox focuses on understanding and enriching it. This involves normalizing data from different files and formats, ensuring consistency and accuracy. The goal is to provide a coherent and searchable index of the content, enabling users to find and utilize information more effectively.
In addition to content aggregation and understanding, Dash employs Multiple Choice Prompt (MCP) for efficient and effective query processing. MCP allows the system to present a set of plausible options to the user, streamlining the retrieval process and improving user experience.
To further enhance the system's capabilities, Dropbox utilizes prompt optimization tools like DSPy. These tools help refine the prompts used in retrieval-augmented generation, ensuring that the information returned is both relevant and accurate. By optimizing prompts, Dash can better assist users in accessing and managing their work content across multiple applications.
In conclusion, Dropbox Dash leverages a combination of advanced technologies and innovative approaches to create a seamless and efficient platform for users to navigate and utilize their work-related content. By connecting disparate applications, understanding and enriching content, and employing sophisticated retrieval techniques like MCP and prompt optimization, Dash aims to transform the way users access and manage information in their professional lives.










