Save money on your Sentinel ingestion costs with Data Collection Rules by info.odysseyx@gmail.com October 15, 2024 written by info.odysseyx@gmail.com October 15, 2024 0 comment 3 views 3 This article was co-authored by Brain Delaney, Andrea Fisher, and Jon Shectman. As digital environments continue to expand, security operations teams are often asked to optimize costs even as the amount of data they need to collect and store grows exponentially. Teams often feel they have to choose between not collecting certain data sets or log sources and balancing a limited security budget. Today I’ll outline strategies you can use to reduce your data volume while still collecting and retaining the information that really matters. We’ll show you how to use Data Collection Rules (DCR) to remove less valuable information from your logs. Specifically, we’ll first discuss the thought process that goes into determining what’s important in your logs to your organization. I will then use two log source examples to demonstrate the process of using DCR to “project” unwanted or unnecessary information. This process reduces direct reception and long-term retention costs and reduces analyst fatigue. A word of caution. only you You can really determine what is important to your organization in a particular log or table. Actions taken cannot be undone, but if you choose to delete them (“abort the project”), data may not be captured. That’s why we’re spending time discussing the thought process that determines what’s really important. A word about DCR (or what is DCR and why should you care?) There is no space in this blog entry to go into detail about DCR, as it can quickly become complex. Please visit us for detailed consultation. Custom data collection and transformation in Microsoft Sentinel | microsoft run. There are two things to discuss here. First, what exactly is DCR and why should you care? DCR is one way that Sentinel and Log Analytics provide a high level of control over the specific data that is actually collected in your workspace. Think of DCR as a way to manipulate your collection pipeline. For our purposes here, DCR can be considered a set of basic KQL queries applied to incoming logs to allow operations to be performed on that data, such as filtering out irrelevant data, enriching existing data, masking sensitive attributes, or performing advanced security information. Model (ASIM) regularization. As you may have guessed by now, what we are interested in here is the first function (filtering out irrelevant data). Second, for our purposes, there are two kinds of DCRs: standard DCRs and workspaces. Standard DCR is currently supported for AMA-based connectors and workflows that use the new Logs Ingestion API. An example of a standard DCR is the DCR used for Windows security events collected through AMA. Workspace Conversion DCR provides supported workflows in workspaces that are not available in standard DCR. A Sentinel workspace can have only one workspace transformation DCR, but that DCR contains separate transformations for each input stream. An example of a workspace DCR is the one used for AADNonInteractiveSigninLogs collected through diagnostic settings. If you collect data using standard DCR, workspace DCR does not apply. Find bulk sources To optimize costs, it’s important to understand where all your data goes before making difficult decisions about which logs to delete and which to keep. We recommend focusing on high-volume sources that will give you the greatest return on your efforts. Large table decision First, if you haven’t already, you should determine which large billable tables (not all tables are billable) to see where they can have the greatest impact when optimizing costs. You can achieve this with a simple KQL query. Usage | where TimeGenerated > ago(30d) | where IsBillable | summarize SizeInGB=sum(Quantity) / 1000 by DataType | sort by SizeInGB desc Record levels of analytics Once you’ve decided which tables are bulk billable, it’s a good idea to look at volume by record type. You may need to experiment with different combinations to find some bulky patterns that may not have security value. For example, for the SecurityEvent table, you can evaluate which event IDs contribute the most volume. Their security value. The number of events is not directly related to cost because some events are much larger than others. To do this, we use the _BilledSize column, which contains the billed size (in bytes) for the record. SecurityEvent | summarize SizeInMB=sum(_BilledSize) / 1000 / 1000 by EventID | sort by SizeInMB Heat level analysis In some cases, you may not be able to discard the entire record, but you may have the opportunity to discard a column or part of a column. When you search the data source, you will notice that some columns contain a significant amount of data, such as AADNonInteractiveUserSignInLogs and the ConditionalAccessPolicies column, which is a large array of the status of each conditional access policy and whether that policy has been applied to background token activity. For this we use the estimate_data_size() function: AADNonInteractiveUserSignInLogs | extend ColumnSize = estimate_data_size(ConditionalAccessPolicies) | summarize RecordSizeInMB=sum(_BilledSize) / 1000 / 1000, ColumnSizeInMB=sum(ColumnSize) / 1000 / 1000 | extend PercentOfTotal = ColumnSizeInMB / RecordSizeInMB Process review Let’s look at the process of reducing collections using DCR in two examples (one for workspace DCR and one for standard). AADNOnInteractiveSigninLogs SOC engineers and administrators often worry about the cost of pulling additional logs, such as AADNonInteractiveSigninLogs. Non-interactive user sign-in is sign-in performed by a client app or OS component on behalf of the user. Unlike interactive user logins, these logins do not require the user to provide authentication. Instead, it uses a token or code to authenticate on the user’s behalf. There’s a good reason to collect this, as you can see how malicious actors can use this type of authentication. The AADNonInteractiveSigninLogs table has potentially significant optimization opportunities. One of the fields contains information about conditional access policy evaluation. Typically 50-80% of log data is conditional access policy data. In most cases, non-interactive logs will have the same conditional access results as those encountered in interactive logs. However, non-interactive volumes are much higher. If the results are different, is it important to know which specific conditional access policy allowed or blocked the session? Does knowing this add value to the investigation? This example uses a workspace DCR because there is no standard DCR available for this data type (e.g., Diagnostic Log Data Flow). If you already have a workspace DCR, edit it as follows: Conversely, if you don’t already have a workspace DCR, you’ll need to create one. Once you have it, click Next. Then click > Transformation Editor at the top and use the following query to remove: every ConditionalAccessPolicy for this table: source | project-away ConditionalAccessPolicies Alternatively, this array is sorted so that the applied policy (success/failure) appears at the top, so if you want to keep only the first few policies, you can use the following transformation: source | extend CAP = todynamic(ConditionalAccessPolicies) | extend CAPLen = array_length(CAP) | extend ConditionalAccessPolicies = tostring(iff(CAPLen > 0, pack_array(CAP[0], CAP[1], CAP[2], CAP[3]), todynamic('[]'))) | project-away CAPLen, CAP security event Security event logs are another source that can provide more information. The easiest way to collect data is to use standard categories of “least,” “common,” or “all.” But are these the right options for you? Some known noisy event IDs may have questionable security value. It’s a good idea to take a closer look at what you’re currently collecting in this table and evaluate the loudest events to see if they actually add security value. For example, you might want event IDs like 4624 (“The account was logged on successfully”) and 4688 (“A new process was created”). But should I keep 4634 (“Account has been logged off”) and 4647 (“User-initiated logoff”)? This may be useful for auditing, but is not useful for detecting violations. You can delete these events from the log by setting the category to “Minimum”, but you may miss other valuable event IDs. When using the collection hierarchy “all”, XPath queries do not explicitly collect these events by number. To remove events, you need to change the XPath in DCR to select all events except a specific event with the following query: security!*[System[(EventID!=4634) and (EventID!=4647)]] If you use the collection hierarchy “common” or “minimal”, the event ID is already listed in the DCR’s XPath query, so you can simply remove it from the query with the corresponding “or” statement. security!*[System[(EventID=1102) or (EventID=1107) or (EventID=1108) or (EventID=4608) or (EventID=4610) or (EventID=4611) or (EventID=4614) or (EventID=4622) or (EventID=4624) or (EventID=4625) or (EventID=4634) or (EventID=4647) or (EventID=4648) or (EventID=4649) or (EventID=4657)]] Alternatively, you can add a convertKql statement to the DCR to discard these events, but this is less efficient than using XPath. source | where EventID !in (toint(4634), toint(4647)) For more information about standard DCR updates, review the DCR Monitoring and Remediation section. https://techcommunity.microsoft.com/t5/microsoft-sentinel-blog/create-edit-and-monitor-data-collecti… In summary As digital footprints grow exponentially, it becomes increasingly important for security teams to remain cautious about the data they collect and store. By carefully selecting your data sources and using DCR to cleanse your data sets, you can ensure that your security budget is spent in the most efficient and effective way. Source link Share 0 FacebookTwitterPinterestEmail info.odysseyx@gmail.com previous post Announcing the Microsoft ISV AI Envisioning Day: Identify and Prioritize Use Cases for AI Solutions next post Strengthening Security in Azure IoT Hub: Transitioning to TLS 1.2+ and Planning for TLS 1.3 You may also like How to strengthen AI security with MLSecOps December 6, 2024 The Sonos Arc Ultra raises the bar for home theater audio December 5, 2024 Aptera Motors will showcase its solar EV at CES 2025 December 3, 2024 How Chromebook tools strengthen school cybersecurity December 2, 2024 Nvidia unveils the ‘Swiss Army Knife’ of AI audio tools: Fugato November 26, 2024 Nvidia Blackwell and the future of data center cooling November 25, 2024 Leave a Comment Cancel Reply Save my name, email, and website in this browser for the next time I comment.