MGDC for SharePoint FAQ: How do I join File Actions with Files? by info.odysseyx@gmail.com October 1, 2024 written by info.odysseyx@gmail.com October 1, 2024 0 comment 1 views 1 1. File operations If you use the SharePoint File Operations dataset in Microsoft Graph Data Connect, you can combine it with the SharePoint File dataset to get detailed information about each file. This post explains how to do this, including the most common issues. 2. What are the file operations? The SharePoint file activity dataset contains details about every time a file is accessed, deleted, downloaded, modified, moved, renamed, or uploaded. Information about each operation includes the type of operation, when the operation occurred, the file on which the operation was performed, the application used, and the user (actor) who performed the operation. For more information about the dataset, see the full schema documentation at: dataconnection-dataset-sharepointfileactions.md. 3. Combine with other datasets A common use for this dataset is to combine it with SharePoint sites or SharePoint files to understand the amount of activity going on for a particular site or file. You can use it to see which sites or files are most accessed on a particular day. By accumulating activity over a long period of time, you can find sites that have had no activity in the last 90 days. To find details about sites related to a specific task, group your file tasks by site ID and then join the site. Below is an example SQL to get a list of the most popular sites for the time period covered by the file operation. WITH ActionsBySite AS ( SELECT SiteId, MIN(ActionDate) AS EarliestAction, MAX(ActionDate) AS LatestAction, COUNT(*) AS ActionCount FROM FileActions GROUP BY SiteId ) SELECT S.Id AS SiteId, S.[RootWeb.Title] AS SiteName, S.CreatedTime AS SiteCreated, S.[RootWeb.LastItemModifiedDate] AS SiteLastModified, S.[RootWeb.TemplateId] AS SiteTemplateId, A.EarliestAction, A.LatestAction, A.ActionCount FROM ActionsBySite AS A JOIN Sites S ON A.SiteId = S.Id ORDER BY ActionCount DESC LIMIT 10; To find details about files related to a specific task, group file tasks by file and then combine them with the file. Here’s a similar example that gets a list of the most popular files for the time period to which a file operation applies: WITH ActionsByFile AS ( SELECT SiteId, WebId, ListId, ListItemId, MIN(ActionDate) AS EarliestAction, MAX(ActionDate) AS LatestAction, COUNT(*) AS ActionCount FROM FileActions GROUP BY SiteId, WebId, ListId, ListItemId ) SELECT F.DirName, F.FileName, F.AuthorEmail, F.TimeCreated AS FileCreated, F.TimeLastModified AS FileLastModified, A.EarliestAction, A.LatestAction, A.ActionCount FROM ActionsByFile AS A JOIN Files F ON A.SiteId = F.SiteId AND A.WebId = F.WebId AND A.ListId = F.ListId AND A.ListItemId = F.ItemId ORDER BY ActionCount DESC LIMIT 10; Note: If you authorize long-term retention of file activity data, please contact our Compliance team. There may be limits to how long we must keep your personal information. 4. Potential problems For large data sets, such as files and file operations, there are scenarios where JOIN does not match perfectly. Here are some issues you may encounter when combining file operations with sites or files: 4a. other regions Important: This section only applies to tenants using the Microsoft 365 Multi-Geo feature, as described next. Microsoft 365 multi-region and Multi-region functionality in OneDrive and SharePoint. A JOIN operation excludes certain operations if the file operations are in a different area than the file. If your tenant has multiple regions Actions are recorded in the actor’s domain.. Actors in that region can access files in other regions that are not in that environment. To overcome this, you need to run a collection of SharePoint files from all regions and combine (unify) them into a single data set. Note: These datasets are segmented by region for compliance purposes, so your compliance team should run this scenario before combining data from multiple regions. 4b. matching date There are certain situations where files are missing when attempting to join because the file operation snapshot is newer than the file snapshot date. For example, if a file activity snapshot from July 15th is combined with a file snapshot from July 1st, by default the JOIN excludes the file activity because the file data has not yet been created. Additionally, it may take approximately a week for the SharePoint file dataset to be fully updated. You can exclude operations on files that were recently created but not yet included in the file dataset. So even if both the file activity snapshot and the file snapshot are from July 1, it is possible that some of the more recent files may not have been captured yet. There is also a chance that some file operations will arrive late (typically less than 0.01%). For example, some of your July 1st tasks may appear in the July 2nd snapshot. Therefore, it is generally recommended to wait until July 8 to process and report July 1 work to obtain the most complete data set. This ensures that you use the most up-to-date file data sets and tolerates late-arriving file operation data. 4c. Not a document library Another reason for not finding a match between your file operations dataset and your file operations data set is when the file operation is for a file that is not in the document library. The SharePoint file dataset includes only files in document libraries and ignores other types of lists. This is to keep the file data set at a more reasonable size. A common scenario here involves working with SharePoint pages (files with the “aspx” extension), usually located on a communication site. Although you won’t see these “aspx” pages in the file dataset, you can still count the number of jobs per site. To bridge this gap, you need to provide additional detail by adding specific datasets for your SharePoint pages and/or SharePoint lists. This is something the team is considering in the future, but there is no expected date for this. 4d. List item ID is missing. The file activity dataset shows telemetry reported by your application. For some of these applications, the ListItemId may not be reported or may be reported as “null”. In this case, the join excludes that operation. You can summarize these operations at the site level even if the ListItemId is missing. You can also view a summary of user agents to identify specific applications that are not providing data. 5. Conclusion We hope you enjoyed learning more about SharePoint file operations datasets in Microsoft Graph Data Connect. More information about Microsoft Graph Data Connect for SharePoint can be found here: https://aka.ms/SharePointData. You can find a lot of details here, including a list of available datasets and frequently asked questions. Source link Share 0 FacebookTwitterPinterestEmail info.odysseyx@gmail.com previous post VoiceRAG: An App Pattern for RAG + Voice Using Azure AI Search and the GPT-4o Realtime API for Audio next post ExpressRoute Metro is now generally available!! You may also like Get to know Microsoft 365 Copilot in Microsoft OneDrive October 4, 2024 Connecting to Azure Cache for Redis with Entra ID in Azure Government October 4, 2024 Modern Charts in Microsoft Access is GA! October 4, 2024 Cowrie honeypot and its Integration with Microsoft Sentinel. October 4, 2024 Improved Accessibility ribbon in PowerPoint for Windows and Mac October 4, 2024 Introducing the Use Cases Mapper workbook October 4, 2024 Leave a Comment Cancel Reply Save my name, email, and website in this browser for the next time I comment.