分布不足 (OOD) 检测是指识别不在用于构建机器学习模型的训练数据分布范围内的数据点的过程。这些 OOD 数据点不符合模型学到的模式,因此被视为异常或意外。分布偏差检测的含义对于确保机器学习系统的可靠性和安全性尤其重要,因为它有助于防止模型在面对不熟悉的数据时做出不可靠的预测。
在机器学习中,模型通常在特定的数据集上进行训练,该数据集代表部署期间预计会遇到的数据分布。但是,在现实场景中,模型经常会遇到与训练数据显著不同的数据,即所谓的分布不足 (OOD) 数据。如果模型没有能力处理这些数据,则可能导致错误的预测。
分布外检测涉及使模型能够识别何时出现 OOD 数据的技术。目标是模型通过拒绝预测或发出警报来标记这些实例,而不是尝试做出可能不正确且可能有害的预测。
有几种方法可以检测 OOD:
置信度阈值:许多机器学习模型会输出置信度分数及其预测。通过设置阈值,该模型可以将置信度低的预测标记为潜在的 OOD 数据。
基于距离的方法:这些方法涉及测量新数据点与训练数据分布的距离。如果距离超过某个阈值,则将数据点视为分布不足。
生成模型:生成模型,例如变分自动编码器 (VAE) 或生成对抗网络 (GAN),可用于对训练数据的分布进行建模。如果在此模型下出现新数据点的可能性较低,则将其标记为 OOD。
集成方法:使用模型集合可以帮助检测 OOD。如果集合中不同模型的预测存在显著差异,则输入数据可能分布不均。
输入预处理:可以应用输入预处理等技术在OOD数据输入模型之前对其进行识别,从而降低预测不可靠的可能性。
OOD 检测在各种应用中至关重要。例如,在自动驾驶中,系统可能会遇到训练数据中不存在的物体或情况。能够检测并妥善处理这些 OOD 案例对于安全至关重要。在医疗保健领域,OOD 检测可以阻止诊断系统对不属于其训练数据的罕见或新疾病做出预测,从而促使诊断系统转诊给人类专家。
分布外检测对企业很重要,因为它可以增强在现实环境中部署的机器学习模型的可靠性、安全性和稳健性。通过有效识别和管理 OOD 数据,企业可以防止模型在面对陌生情况时做出不可靠或潜在危险的决策。
在汽车行业,特别是在自动驾驶汽车中,OOD 检测对安全至关重要。自主系统必须识别何时遇到超出训练范围的物体、场景或环境,从而使车辆能够采取适当的行动,例如减速或提醒人类驾驶员。
在金融领域,OOD 检测可以帮助防止交易模型根据训练数据中未考虑的异常或异常市场条件做出决策,从而降低重大财务损失的风险。
在网络安全领域,OOD 检测可用于识别不同于已知模式的新型威胁或攻击,使企业能够主动应对新出现的安全风险。
此外,在任何由人工智能驱动的决策过程中,OOD 检测通过确保预测或行动仅在模型的专业领域内做出来帮助维持对系统的信任。这对于建立和保持用户对 AI 应用程序的信心至关重要。
简而言之,分布外检测的含义是指识别不在机器学习模型所用训练数据分布范围内的数据。对于企业而言,OOD 检测对于增强人工智能系统的可靠性和安全性至关重要,可确保它们能够有效管理各个行业的陌生或异常情况。
About cookies on this site
Sapien uses cookies to personalise your experience, understand how you interact with our website, and show you ads about our products and services. The cookie declaration provides detailed information on the cookies we use and allows you to adjust your preferences.
About cookies on this site
Cookies used on the site are categorized and below you can read about each category and allow or deny some or all of them. When categories than have been previously allowed are disabled, all cookies assigned to that category will be removed from your browser. Additionally you can see a list of cookies assigned to each category and detailed information in the cookie declaration.
Necessary cookies
Some cookies are required to provide core functionality. The website won't function properly without these cookies and they are enabled by default and cannot be disabled.
CookieHub is a Consent Management Platform (CMP) which allows users to control storage and processing of personal information.
Cloudflare is a global network designed to make everything you connect to the Internet secure, private, fast, and reliable.
Google reCaptcha enables web hosts to distinguish between human and automated access to websites.
Preferences
Preference cookies enables the web site to remember information to customize how the web site looks or behaves for each user. This may include storing selected currency, region, language or color theme.
Analytical cookies
Analytical cookies help us improve our website by collecting and reporting information on its usage.
Google Analytics is a web analytics service offered by Google that tracks and reports website traffic.
HubSpot is a CRM platform that provides tools for marketing, sales, and customer service.
Clarity is a user behavior analytics tool that helps you understand how users interact with your website.
Marketing cookies
Marketing cookies are used to track visitors across websites to allow publishers to display relevant and engaging advertisements. By enabling marketing cookies, you grant permission for personalized advertising across various platforms.
Google Ads is an advertising service by Google for businesses that want to display ads on Google search results and its advertising network.
The LinkedIn Insight tag powers conversion tracking, website audiences, and website demographics within the LinkedIn system.
Microsoft Advertising (formerly Bing Ads) is a service that provides pay per click advertising on the Bing, Yahoo!, and DuckDuckGo search engines.
Cookies used on the site are categorized and below you can read about each category and allow or deny some or all of them. When categories than have been previously allowed are disabled, all cookies assigned to that category will be removed from your browser. Additionally you can see a list of cookies assigned to each category and detailed information in the cookie declaration.
Necessary cookies
Some cookies are required to provide core functionality. The website won't function properly without these cookies and they are enabled by default and cannot be disabled.
Name | Hostname | Vendor | Expiry |
---|---|---|---|
__cf_bm | .hubspot.com | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
_cfuvid | .hubspot.com | Session | |
Used by Cloudflare WAF to distinguish individual users who share the same IP address and apply rate limits | |||
__cf_bm | .hsforms.net | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
__cf_bm | .hsforms.com | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
_cfuvid | .hsforms.com | Session | |
Used by Cloudflare WAF to distinguish individual users who share the same IP address and apply rate limits | |||
cookiehub | .sapien.io | CookieHub | 365 days |
Used by CookieHub to store information about whether visitors have given or declined the use of cookie categories used on the site. | |||
_GRECAPTCHA | www.google.com | 180 days | |
Used by Google reCaptcha for risk analysis | |||
__cf_bm | .hs-scripts.com | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
__cf_bm | .hsadspixel.net | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
__cf_bm | .hs-analytics.net | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
__cf_bm | .hs-banner.com | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
__cf_bm | .usemessages.com | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
__cf_bm | .hsappstatic.net | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
__cf_bm | .hubspotusercontent-na1.net | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. |
Preferences
Preference cookies enables the web site to remember information to customize how the web site looks or behaves for each user. This may include storing selected currency, region, language or color theme.
Name | Hostname | Vendor | Expiry |
---|---|---|---|
lidc | .linkedin.com | LinkedIn Ireland Unlimited Company | 1 day |
Used by LinkedIn for routing. | |||
li_gc | .linkedin.com | LinkedIn Ireland Unlimited Company | 180 days |
Used by LinkedIn to store consent of guests regarding the use of cookies for non-essential purposes |
Analytical cookies
Analytical cookies help us improve our website by collecting and reporting information on its usage.
Name | Hostname | Vendor | Expiry |
---|---|---|---|
_ga | .sapien.io | 400 days | |
Contains a unique identifier used by Google Analytics to determine that two distinct hits belong to the same user across browsing sessions. | |||
_ga_ | .sapien.io | 400 days | |
Contains a unique identifier used by Google Analytics 4 to determine that two distinct hits belong to the same user across browsing sessions. | |||
__hstc | .sapien.io | HubSpot | 180 days |
This cookie name is associated with websites built on the HubSpot platform. This is the main cookie for tracking visitors. It contains the domain, utk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session). | |||
hubspotutk | .sapien.io | HubSpot | 180 days |
This cookie name is associated with websites built on the HubSpot platform. This cookie is used to keep track of a visitor's identity. This cookie is passed to HubSpot on form submission and used when deduplicating contacts. | |||
__hssrc | .sapien.io | HubSpot | Session |
This cookie name is associated with websites built on the HubSpot platform. Whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session. | |||
__hssc | .sapien.io | HubSpot | 1 hour |
This cookie name is associated with websites built on the HubSpot platform. This cookie keeps track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp. | |||
CLID | www.clarity.ms | Microsoft | 365 days |
Identifies the first-time Clarity saw this user on any site using Clarity. | |||
_clck | .sapien.io | Microsoft | 365 days |
Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. | |||
_clsk | .sapien.io | Microsoft | 1 day |
Connects multiple page views by a user into a single Clarity session recording. | |||
MUID | .bing.com | Microsoft | 390 days |
Microsoft User Identifier tracking cookie used by Bing Ads. It can be set by embedded microsoft scripts. Widely believed to sync across many different Microsoft domains, allowing user tracking. | |||
MR | .c.bing.com | Microsoft | 7 days |
Used by Microsoft Clarity to indicate whether to refresh MUID. | |||
SM | .c.clarity.ms | Microsoft | Session |
This cookie is installed by Clarity. The cookie is used to store non-personally identifiable information. The cookie is used in synchronizing the MUID (Microsoft unique user ID) across Microsoft domains. | |||
MUID | .clarity.ms | Microsoft | 390 days |
Microsoft User Identifier tracking cookie used by Bing Ads. It can be set by embedded microsoft scripts. Widely believed to sync across many different Microsoft domains, allowing user tracking. | |||
MR | .c.clarity.ms | Microsoft | 7 days |
Used by Microsoft Clarity to indicate whether to refresh MUID. | |||
_cltk | Microsoft | Session | |
This cookie is installed by Microsoft Clarity tool and stores information about how visitors use the website |
Marketing cookies
Marketing cookies are used to track visitors across websites to allow publishers to display relevant and engaging advertisements. By enabling marketing cookies, you grant permission for personalized advertising across various platforms.
Name | Hostname | Vendor | Expiry |
---|---|---|---|
_gcl_au | .sapien.io | Google Advertising Products | 90 days |
Used by Google AdSense to understand user interaction with the website by generating analytical data. | |||
bcookie | .linkedin.com | LinkedIn Ireland Unlimited Company | 365 days |
This is a Microsoft MSN 1st party cookie for sharing the content of the website via social media. | |||
UserMatchHistory | .linkedin.com | LinkedIn Ireland Unlimited Company | 30 days |
Contains a unique identifier used by LinkedIn to determine that two distinct hits belong to the same user across browsing sessions. | |||
AnalyticsSyncHistory | .linkedin.com | LinkedIn Ireland Unlimited Company | 30 days |
Used by LinkedIn to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries | |||
bscookie | .www.linkedin.com | LinkedIn Ireland Unlimited Company | 365 days |
Used by the social networking service, LinkedIn, for tracking the use of embedded services. | |||
IDE | .doubleclick.net | Google Advertising Products | 390 days |
Used by Google's DoubleClick to serve targeted advertisements that are relevant to users across the web. Targeted advertisements may be displayed to users based on previous visits to a website. These cookies measure the conversion rate of ads presented to the user. | |||
SRM_B | .c.bing.com | Microsoft | 390 days |
This cookie is installed by Microsoft Bing. Identifies unique web browsers visiting Microsoft sites. | |||
ANONCHK | .c.clarity.ms | Microsoft | 1 hour |
Used to store session ID for a users session to ensure that clicks from adverts on the Bing search engine are verified for reporting purposes and for personalisation |