Restoring Soft-Deleted Blobs with multithreading in Azure Storage Using C# by info.odysseyx@gmail.com September 3, 2024 written by info.odysseyx@gmail.com September 3, 2024 0 comment 2 views 2 Soft deletion of blobs is an essential feature to prevent accidental deletion or overwriting. It ensures data integrity and availability even in case of human error by retaining deleted data for a specified period of time. However, restoring data from a soft deletion state can be more labor-intensive as it requires calling the Undo Delete API for each individual deleted blob. There is currently no option to undelete all blobs in bulk. This blog provides sample C# code that helps you efficiently restore soft deleted data. This code is especially effective when you have a large number of blobs to restore, as it utilizes multiple threads to expedite the restore process. The program can also be configured to undelete blobs within a specific container or directory instead of scanning the entire storage account. To run this program, follow these steps: Installing the .NET SDK: Make sure you have the .NET SDK installed on your computer. Connect to your Azure account: Connect-AzAccount dotnet nuget add source https://api.nuget.org/v3/index.json -n nuget.org Creating a new console application: dotnet new console --force Add the following code to Program.cs.. using Azure.Core; using Azure.Identity; using Azure.Storage.Files.DataLake; using Azure.Storage.Files.DataLake.Models; var StorageAccountName = "xxxx"; var ContainerName = "xxxx"; var DirectoryPath = ""; var Concurrency = 500; var BatchSize = 500; static DataLakeServiceClient GetDatalakeClient(string accountName) { DataLakeClientOptions clientOptions = new DataLakeClientOptions() { Retry = { Delay = TimeSpan.FromMilliseconds(500), MaxRetries = 5, Mode = RetryMode.Fixed, MaxDelay = TimeSpan.FromSeconds(5), NetworkTimeout = TimeSpan.FromSeconds(30) }, }; // only works for prod. DataLakeServiceClient client = new( new Uri($"https://{accountName}.blob.core.windows.net"), new DefaultAzureCredential(), clientOptions); return client; } Console.WriteLine("Starting the program"); var client = GetDatalakeClient(StorageAccountName); var throttler = new SemaphoreSlim(initialCount: Concurrency); List tasks = new List(); List containerNames = new List(); if (string.IsNullOrEmpty(ContainerName)) { var containers = client.GetFileSystems(); foreach (var container in containers) { containerNames.Add(container.Name); } } else { containerNames.Add(ContainerName); } var totalSuccessCount = 0; var totalFailedCount = 0; foreach (var container in containerNames) { Console.WriteLine($"Recoverying for container {container}"); var fileSystem = client.GetFileSystemClient(container); var deletedItems = fileSystem.GetDeletedPaths(pathPrefix: DirectoryPath); var count = 0; var totalSuccessCountForContainer = 0; var totalFailedCountForContainer = 0; foreach (PathDeletedItem item in deletedItems) { await throttler.WaitAsync(); count++; try { var task = (fileSystem.UndeletePathAsync(item.Path, item.DeletionId)); var continuedTask = task.ContinueWith(t => { throttler.Release(); if (t.IsFaulted) { Interlocked.Increment(ref totalFailedCount); Interlocked.Increment(ref totalFailedCountForContainer); Console.WriteLine($"Failed count for container {totalFailedCountForContainer}, total failed count {totalFailedCount}, path {DirectoryPath + item.Path} due to {t.Exception.Message}"); } else { Interlocked.Increment(ref totalSuccessCount); Interlocked.Increment(ref totalSuccessCountForContainer); Console.WriteLine($"Success count for container {totalSuccessCountForContainer}, total success count {totalSuccessCount}"); } }); tasks.Add(continuedTask); } catch (Exception ex) { Console.WriteLine("Failed to create task: " + ex.ToString()); } finally { if (count == Math.Max(Concurrency, BatchSize)) { count = 0; await Task.WhenAll(tasks); tasks.Clear(); } } } await Task.WhenAll(tasks); Console.WriteLine($"Recover finished for container {container}"); } Replace xxxx with your specific storage account and container name. If you need to restore a specific directory, provide the directory name. Otherwise, leave it blank and scan the entire container. This code is configured to run with 500 threads by default, but you can adjust this number as needed. dotnet add package Azure.Identity dotnet add package Azure.Storage.Blobs dotnet build --configuration Release dotnet As your application runs, you can monitor the console window to track its progress and identify potential problems or errors. Source link Share 0 FacebookTwitterPinterestEmail info.odysseyx@gmail.com previous post Azure OpenAI now authorized as a service at DoD IL4 and IL5 next post Explore Azure AI Services: Curated list of prebuilt models and demos You may also like How to Stand Out as a Microsoft Student Ambassador: Perks, Process, and More… September 9, 2024 Optimizing a Terabyte-Scale Azure SQL Database September 7, 2024 Installation/Validation of extension-based hybrid worker September 7, 2024 New Surface Pro & Surface Laptop September 7, 2024 What's new in Microsoft Teams (free) | Aug 2024 September 6, 2024 Azure Durable Functions: FaaS for Stateful Logic and Complex Workflows September 6, 2024 Leave a Comment Cancel Reply Save my name, email, and website in this browser for the next time I comment.