Schedule a Consult

In Conversation: Sapien CTO Kelly Ryan Explores Decentralized Labeling, Gamified Tools, and the Future of AI

We recently had the opportunity to sit down with our CTO, Kelly Ryan, to explore the innovative features of the Sapien platform. From its decentralized network of skilled labelers to its gamified tools and specialized modules, our platform is a new and highly effective solution for business scaling their AI model training. In this conversation, we’ll take a look at the unique aspects of Sapien’s approach, the technical challenges we’ve overcome, and the future of data labeling.

1. Can you give us an overview of the platform we’re building? What sets it apart from other data labeling solutions on the market?

The Sapien platform is a two-sided marketplace connecting data labelers to AI projects that need structured data. What sets it apart is our advanced tooling and the extremely diverse, skilled community of labelers we have available on demand. This allows us to start projects quickly and produce new ground truth data with very high accuracy.

2. What role does our decentralized network of data labelers play in improving scalability and efficiency? How are we managing the technical challenges that come with building this global infrastructure?

Our decentralized network of data labelers is essential for scalability because it provides the diverse range of skills and backgrounds modern data tasks demand. These range from simple qualifications, like being a dog owner, to specialized expertise, like that of radiologists. Different types of data require workers with unique perspectives and skills, and our network is designed to meet that need.

3. How do our data labeling modules integrate with AI model development? What advantages do they provide in terms of flexibility and customization for different industries?

Sapien has a broad range of extremely specialized labeling modules, which is growing by the week. At a high level, instead of presenting the labeler with an overwhelming tool, we present the labeler with the smallest, easiest-to-complete section of a task. This makes the work easier and more accurate for the labeler, which translates to lower costs on the demand side of the marketplace. We’re not afraid to spin up whole new batches of labeling tools for a new customer or task that comes in—in our eyes, it is the minimum required to make sure all the Sapiens are working at their best.

4. Can you explain how we’ve built the infrastructure to support the large-scale management of data labelers across different regions? How does this benefit both our clients and labelers?

From first principles, we’ve built Sapien’s platform and infrastructure to work in a fully distributed environment. It’s not even a requirement that each step of a given labeling workflow has a worker who speaks the same language. This allows us to provide an efficient marketplace where workers from all types of backgrounds can find meaningful work, and where customers don’t need to overpay for the non-specialized parts of labor.

5. What are some of the key technical challenges you’ve encountered while developing the game-ified platform for data collection? How does this feature contribute to improving data quality?

The key challenges are always twofold: First, we must provide tools that let people work efficiently and accurately. Gamification helps a lot here to break up the monotony of labeling tasks. Secondly, building out our internal infrastructure to ensure our final data outputs are always high quality is what our customers care about at the end of the day. This means everything from identifying and relying on our very best labelers to creating sampling systems which guarantee high accuracy outputs.

6. How do you see the future of data labeling evolving with AI? What part does our platform play in shaping that future, particularly in balancing automation and human input?

The future of data labeling as automated tools become more sophisticated, if anything, places an even greater emphasis on human-derived labels, augmented by AI or other automated tooling to multiply the human’s output. New models need new base truth to be effective, and humans working within a framework of sophisticated tools is the best way to get that data.

7. How does our focus on building a robust, decentralized platform help clients who need large volumes of labeled data quickly? What are some of the performance and scalability advantages of this approach?

Having a large community with workers of many varied skills and backgrounds allows new projects to easily test at a small scale, and in turn ramp up to huge volumes of data, all at the customer’s convenience. It also makes it easy to change the backgrounds or skills of the labeling workforce as needed for different projects or even different needs within the same project.

See How our Data Labeling Works

Schedule a consult with our team to learn how Sapien’s data labeling and data collection services can advance your speech-to-text AI models