Extensive Multilingual Speech Dataset

Necessary cookies

Some cookies are required to provide core functionality. The website won't function properly without these cookies and they are enabled by default and cannot be disabled.

Necessary cookies
Name	Hostname	Vendor	Expiry
__cf_bm	.hubspot.com	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.
_cfuvid	.hubspot.com		Session
Used by Cloudflare WAF to distinguish individual users who share the same IP address and apply rate limits
__cf_bm	.hsforms.net	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.
__cf_bm	.hsforms.com	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.
_cfuvid	.hsforms.com		Session
Used by Cloudflare WAF to distinguish individual users who share the same IP address and apply rate limits
cookiehub	.sapien.io	CookieHub	365 days
Used by CookieHub to store information about whether visitors have given or declined the use of cookie categories used on the site.
_GRECAPTCHA	www.google.com	Google	180 days
Used by Google reCaptcha for risk analysis
__cf_bm	.hs-scripts.com	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.
__cf_bm	.hsadspixel.net	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.
__cf_bm	.hs-analytics.net	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.
__cf_bm	.hs-banner.com	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.
__cf_bm	.usemessages.com	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.
__cf_bm	.hsappstatic.net	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.
__cf_bm	.hubspotusercontent-na1.net	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.

Preferences

Preference cookies enables the web site to remember information to customize how the web site looks or behaves for each user. This may include storing selected currency, region, language or color theme.

Preferences
Name	Hostname	Vendor	Expiry
lidc	.linkedin.com	LinkedIn Ireland Unlimited Company	1 day
Used by LinkedIn for routing.
li_gc	.linkedin.com	LinkedIn Ireland Unlimited Company	180 days
Used by LinkedIn to store consent of guests regarding the use of cookies for non-essential purposes

Analytical cookies

Analytical cookies help us improve our website by collecting and reporting information on its usage.

Analytical cookies
Name	Hostname	Vendor	Expiry
_ga	.sapien.io	Google	400 days
Contains a unique identifier used by Google Analytics to determine that two distinct hits belong to the same user across browsing sessions.
_ga_	.sapien.io	Google	400 days
Contains a unique identifier used by Google Analytics 4 to determine that two distinct hits belong to the same user across browsing sessions.
__hstc	.sapien.io	HubSpot	180 days
This cookie name is associated with websites built on the HubSpot platform. This is the main cookie for tracking visitors. It contains the domain, utk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
hubspotutk	.sapien.io	HubSpot	180 days
This cookie name is associated with websites built on the HubSpot platform. This cookie is used to keep track of a visitor's identity. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.
__hssrc	.sapien.io	HubSpot	Session
This cookie name is associated with websites built on the HubSpot platform. Whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session.
__hssc	.sapien.io	HubSpot	1 hour
This cookie name is associated with websites built on the HubSpot platform. This cookie keeps track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp.
CLID	www.clarity.ms	Microsoft	365 days
Identifies the first-time Clarity saw this user on any site using Clarity.
_clck	.sapien.io	Microsoft	365 days
Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_clsk	.sapien.io	Microsoft	1 day
Connects multiple page views by a user into a single Clarity session recording.
MUID	.bing.com	Microsoft	390 days
Microsoft User Identifier tracking cookie used by Bing Ads. It can be set by embedded microsoft scripts. Widely believed to sync across many different Microsoft domains, allowing user tracking.
MR	.c.bing.com	Microsoft	7 days
Used by Microsoft Clarity to indicate whether to refresh MUID.
SM	.c.clarity.ms	Microsoft	Session
This cookie is installed by Clarity. The cookie is used to store non-personally identifiable information. The cookie is used in synchronizing the MUID (Microsoft unique user ID) across Microsoft domains.
MUID	.clarity.ms	Microsoft	390 days
Microsoft User Identifier tracking cookie used by Bing Ads. It can be set by embedded microsoft scripts. Widely believed to sync across many different Microsoft domains, allowing user tracking.
MR	.c.clarity.ms	Microsoft	7 days
Used by Microsoft Clarity to indicate whether to refresh MUID.
_cltk		Microsoft	Session
This cookie is installed by Microsoft Clarity tool and stores information about how visitors use the website

Marketing cookies

Marketing cookies are used to track visitors across websites to allow publishers to display relevant and engaging advertisements. By enabling marketing cookies, you grant permission for personalized advertising across various platforms.

Marketing cookies
Name	Hostname	Vendor	Expiry
_gcl_au	.sapien.io	Google Advertising Products	90 days
Used by Google AdSense to understand user interaction with the website by generating analytical data.
bcookie	.linkedin.com	LinkedIn Ireland Unlimited Company	365 days
This is a Microsoft MSN 1st party cookie for sharing the content of the website via social media.
UserMatchHistory	.linkedin.com	LinkedIn Ireland Unlimited Company	30 days
Contains a unique identifier used by LinkedIn to determine that two distinct hits belong to the same user across browsing sessions.
AnalyticsSyncHistory	.linkedin.com	LinkedIn Ireland Unlimited Company	30 days
Used by LinkedIn to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries
bscookie	.www.linkedin.com	LinkedIn Ireland Unlimited Company	365 days
Used by the social networking service, LinkedIn, for tracking the use of embedded services.
IDE	.doubleclick.net	Google Advertising Products	390 days
Used by Google's DoubleClick to serve targeted advertisements that are relevant to users across the web. Targeted advertisements may be displayed to users based on previous visits to a website. These cookies measure the conversion rate of ads presented to the user.
SRM_B	.c.bing.com	Microsoft	390 days
This cookie is installed by Microsoft Bing. Identifies unique web browsers visiting Microsoft sites.
ANONCHK	.c.clarity.ms	Microsoft	1 hour
Used to store session ID for a users session to ensure that clicks from adverts on the Bing search engine are verified for reporting purposes and for personalisation
_gcl_ls		Google Advertising Products	Persistent
Used by Google AdSense to understand user interaction with the website by generating analytical data.
ar_debug	.googleadservices.com		90 days
Enable/disable attribution report debugging. Attribution reporting is a Google Privacy Sandbox feature to measure conversions without third-party cookies.

多言語音声データセット

はじめに

ディスカバーこのデータセットでできること:

ユースケース

多言語音声アシスタント

音声テキスト変換アプリケーション

リアルタイム翻訳ツール

言語学習 AI

Sapienのデータセットを選ぶ理由

幅広い言語対応

アクセントと方言の多様性

専門家が厳選したオーディオサンプル

カスタマイズ可能でスケーラブル

プライバシーとコンプライアンス

多言語世界に向けてAIをトレーニングする準備はできていますか？

話そう