机器学习和数据科学背景下的验证是指使用训练阶段未使用的单独数据集评估模型性能的过程。这个过程有助于确保模型可以很好地推广到新的看不见的数据,并且不会简单地记住训练数据(这个问题被称为过度拟合)。验证是模型开发生命周期中的关键步骤,它可以深入了解模型在实际应用中的表现如何。
验证是机器学习工作流程的关键部分,可用作评估模型在以前从未见过的数据上的表现的检查点。验证的主要目标是在看不见的数据上估计模型的性能,这有助于选择最佳模型和调整超参数。
一种常见的验证方法是将可用数据拆分为单独的数据集:训练集和验证集。训练集用于拟合模型,而验证集用于评估模型的性能。验证集的性能可以衡量模型对新数据的推广程度。如果模型在训练数据上表现良好,但在验证数据上表现不佳,则表明该模型可能过度拟合。
交叉验证是一种广泛使用的方法,可以使验证过程更加稳健。在 k 折叠交叉验证中,将数据分成 k 个大小相等的折叠。该模型在 k-1 折叠上进行训练,并在其余折叠上进行验证。此过程重复 k 次,每折一次用作验证集。对结果求平均值以提供对模型性能的更可靠的估计。该技术减少了使用单一验证集可能导致的偏差和方差,可以更全面地了解模型在新数据上的表现。
另一个关键概念是验证集方法,其中数据集分为三个部分:训练集、验证集和测试集。该模型在训练集上训练,在验证集上进行验证(用于调整超参数),最后在测试集上进行评估,以提供对其性能的公正评估。在所有模型调整完成后,测试集仅使用一次,以最终估计模型在生产中的预期性能。
超参数调整涉及调整模型的参数以优化性能,在很大程度上依赖于验证。超参数是控制机器学习算法行为的设置,但不能从数据中学习。通过在验证集上验证模型的性能,可以测试不同的超参数组合,并可以选择性能最佳的配置。
验证对于确保模型没有过度拟合或不拟合也很重要。当模型过于复杂并在训练数据中捕获噪声时,就会发生过度拟合,从而导致新数据性能不佳。当模型过于简单而无法捕获数据中的基础模式时,就会发生欠拟合。通过选择在训练集和验证集上均表现良好的模型,验证有助于取得平衡。
验证对企业至关重要,因为它可以确保机器学习模型可靠、准确,并且在实际场景中部署时能够做出有意义的预测。如果没有适当的验证,企业就有可能部署在历史数据上表现良好但无法推广到新数据的模型,从而导致预测不准确和决策不力。
例如,在金融服务中,信用风险预测模型必须经过全面验证,以确保其准确评估新申请人的风险。验证不当的模型可能导致错误的信贷决策,从而导致财务损失或错失机会。同样,在医疗保健领域,用于诊断疾病的机器学习模型必须经过验证,以确保其在不同的患者群体中表现良好,避免可能伤害患者的错误。
验证在模型选择和优化中也起着至关重要的作用。通过使用交叉验证等验证技术,企业可以从一组候选模型中选择最佳模型并对其进行微调以实现最佳性能。该流程通过确保所部署的模型最适合当前的问题,帮助企业最大限度地提高人工智能和机器学习技术的投资回报率。
此外,验证有助于在利益相关者之间建立对机器学习模型的信任。当模型经过验证并证明其在看不见的数据上表现良好时,决策者可以对其预测更有信心。这在金融、医疗保健和保险等高度监管的行业中尤其重要,在这些行业中,模型错误的后果可能很严重。
本质上,验证是评估机器学习模型在单独数据集上的性能的过程,以确保其可以很好地推广到新数据。对于企业而言,验证对于确保模型可靠、准确并准备部署到现实应用程序中至关重要。通过有效验证模型,企业可以改善决策,降低风险并最大限度地提高机器学习投资的价值。
About cookies on this site
Sapien uses cookies to personalise your experience, understand how you interact with our website, and show you ads about our products and services. The cookie declaration provides detailed information on the cookies we use and allows you to adjust your preferences.
About cookies on this site
Cookies used on the site are categorized and below you can read about each category and allow or deny some or all of them. When categories than have been previously allowed are disabled, all cookies assigned to that category will be removed from your browser. Additionally you can see a list of cookies assigned to each category and detailed information in the cookie declaration.
Necessary cookies
Some cookies are required to provide core functionality. The website won't function properly without these cookies and they are enabled by default and cannot be disabled.
CookieHub is a Consent Management Platform (CMP) which allows users to control storage and processing of personal information.
Cloudflare is a global network designed to make everything you connect to the Internet secure, private, fast, and reliable.
Google reCaptcha enables web hosts to distinguish between human and automated access to websites.
Preferences
Preference cookies enables the web site to remember information to customize how the web site looks or behaves for each user. This may include storing selected currency, region, language or color theme.
Analytical cookies
Analytical cookies help us improve our website by collecting and reporting information on its usage.
Google Analytics is a web analytics service offered by Google that tracks and reports website traffic.
HubSpot is a CRM platform that provides tools for marketing, sales, and customer service.
Clarity is a user behavior analytics tool that helps you understand how users interact with your website.
Marketing cookies
Marketing cookies are used to track visitors across websites to allow publishers to display relevant and engaging advertisements. By enabling marketing cookies, you grant permission for personalized advertising across various platforms.
Google Ads is an advertising service by Google for businesses that want to display ads on Google search results and its advertising network.
The LinkedIn Insight tag powers conversion tracking, website audiences, and website demographics within the LinkedIn system.
Microsoft Advertising (formerly Bing Ads) is a service that provides pay per click advertising on the Bing, Yahoo!, and DuckDuckGo search engines.
Cookies used on the site are categorized and below you can read about each category and allow or deny some or all of them. When categories than have been previously allowed are disabled, all cookies assigned to that category will be removed from your browser. Additionally you can see a list of cookies assigned to each category and detailed information in the cookie declaration.
Necessary cookies
Some cookies are required to provide core functionality. The website won't function properly without these cookies and they are enabled by default and cannot be disabled.
Name | Hostname | Vendor | Expiry |
---|---|---|---|
__cf_bm | .hubspot.com | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
_cfuvid | .hubspot.com | Session | |
Used by Cloudflare WAF to distinguish individual users who share the same IP address and apply rate limits | |||
__cf_bm | .hsforms.net | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
__cf_bm | .hsforms.com | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
_cfuvid | .hsforms.com | Session | |
Used by Cloudflare WAF to distinguish individual users who share the same IP address and apply rate limits | |||
cookiehub | .sapien.io | CookieHub | 365 days |
Used by CookieHub to store information about whether visitors have given or declined the use of cookie categories used on the site. | |||
_GRECAPTCHA | www.google.com | 180 days | |
Used by Google reCaptcha for risk analysis | |||
__cf_bm | .hs-scripts.com | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
__cf_bm | .hsadspixel.net | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
__cf_bm | .hs-analytics.net | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
__cf_bm | .hs-banner.com | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
__cf_bm | .usemessages.com | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
__cf_bm | .hsappstatic.net | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. | |||
__cf_bm | .hubspotusercontent-na1.net | Cloudflare, Inc. | 1 hour |
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption. |
Preferences
Preference cookies enables the web site to remember information to customize how the web site looks or behaves for each user. This may include storing selected currency, region, language or color theme.
Name | Hostname | Vendor | Expiry |
---|---|---|---|
lidc | .linkedin.com | LinkedIn Ireland Unlimited Company | 1 day |
Used by LinkedIn for routing. | |||
li_gc | .linkedin.com | LinkedIn Ireland Unlimited Company | 180 days |
Used by LinkedIn to store consent of guests regarding the use of cookies for non-essential purposes |
Analytical cookies
Analytical cookies help us improve our website by collecting and reporting information on its usage.
Name | Hostname | Vendor | Expiry |
---|---|---|---|
_ga | .sapien.io | 400 days | |
Contains a unique identifier used by Google Analytics to determine that two distinct hits belong to the same user across browsing sessions. | |||
_ga_ | .sapien.io | 400 days | |
Contains a unique identifier used by Google Analytics 4 to determine that two distinct hits belong to the same user across browsing sessions. | |||
__hstc | .sapien.io | HubSpot | 180 days |
This cookie name is associated with websites built on the HubSpot platform. This is the main cookie for tracking visitors. It contains the domain, utk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session). | |||
hubspotutk | .sapien.io | HubSpot | 180 days |
This cookie name is associated with websites built on the HubSpot platform. This cookie is used to keep track of a visitor's identity. This cookie is passed to HubSpot on form submission and used when deduplicating contacts. | |||
__hssrc | .sapien.io | HubSpot | Session |
This cookie name is associated with websites built on the HubSpot platform. Whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session. | |||
__hssc | .sapien.io | HubSpot | 1 hour |
This cookie name is associated with websites built on the HubSpot platform. This cookie keeps track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp. | |||
CLID | www.clarity.ms | Microsoft | 365 days |
Identifies the first-time Clarity saw this user on any site using Clarity. | |||
_clck | .sapien.io | Microsoft | 365 days |
Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. | |||
_clsk | .sapien.io | Microsoft | 1 day |
Connects multiple page views by a user into a single Clarity session recording. | |||
MUID | .bing.com | Microsoft | 390 days |
Microsoft User Identifier tracking cookie used by Bing Ads. It can be set by embedded microsoft scripts. Widely believed to sync across many different Microsoft domains, allowing user tracking. | |||
MR | .c.bing.com | Microsoft | 7 days |
Used by Microsoft Clarity to indicate whether to refresh MUID. | |||
SM | .c.clarity.ms | Microsoft | Session |
This cookie is installed by Clarity. The cookie is used to store non-personally identifiable information. The cookie is used in synchronizing the MUID (Microsoft unique user ID) across Microsoft domains. | |||
MUID | .clarity.ms | Microsoft | 390 days |
Microsoft User Identifier tracking cookie used by Bing Ads. It can be set by embedded microsoft scripts. Widely believed to sync across many different Microsoft domains, allowing user tracking. | |||
MR | .c.clarity.ms | Microsoft | 7 days |
Used by Microsoft Clarity to indicate whether to refresh MUID. | |||
_cltk | Microsoft | Session | |
This cookie is installed by Microsoft Clarity tool and stores information about how visitors use the website |
Marketing cookies
Marketing cookies are used to track visitors across websites to allow publishers to display relevant and engaging advertisements. By enabling marketing cookies, you grant permission for personalized advertising across various platforms.
Name | Hostname | Vendor | Expiry |
---|---|---|---|
_gcl_au | .sapien.io | Google Advertising Products | 90 days |
Used by Google AdSense to understand user interaction with the website by generating analytical data. | |||
bcookie | .linkedin.com | LinkedIn Ireland Unlimited Company | 365 days |
This is a Microsoft MSN 1st party cookie for sharing the content of the website via social media. | |||
UserMatchHistory | .linkedin.com | LinkedIn Ireland Unlimited Company | 30 days |
Contains a unique identifier used by LinkedIn to determine that two distinct hits belong to the same user across browsing sessions. | |||
AnalyticsSyncHistory | .linkedin.com | LinkedIn Ireland Unlimited Company | 30 days |
Used by LinkedIn to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries | |||
bscookie | .www.linkedin.com | LinkedIn Ireland Unlimited Company | 365 days |
Used by the social networking service, LinkedIn, for tracking the use of embedded services. | |||
IDE | .doubleclick.net | Google Advertising Products | 390 days |
Used by Google's DoubleClick to serve targeted advertisements that are relevant to users across the web. Targeted advertisements may be displayed to users based on previous visits to a website. These cookies measure the conversion rate of ads presented to the user. | |||
SRM_B | .c.bing.com | Microsoft | 390 days |
This cookie is installed by Microsoft Bing. Identifies unique web browsers visiting Microsoft sites. | |||
ANONCHK | .c.clarity.ms | Microsoft | 1 hour |
Used to store session ID for a users session to ensure that clicks from adverts on the Bing search engine are verified for reporting purposes and for personalisation |