Home NewsX Restoring Soft-Deleted Blobs with multithreading in Azure Storage Using C#

Restoring Soft-Deleted Blobs with multithreading in Azure Storage Using C#

by info.odysseyx@gmail.com
0 comment 12 views


Soft deletion of blobs is an essential feature to prevent accidental deletion or overwriting. It ensures data integrity and availability even in case of human error by retaining deleted data for a specified period of time. However, restoring data from a soft deletion state can be more labor-intensive as it requires calling the Undo Delete API for each individual deleted blob. There is currently no option to undelete all blobs in bulk.

This blog provides sample C# code that helps you efficiently restore soft deleted data. This code is especially effective when you have a large number of blobs to restore, as it utilizes multiple threads to expedite the restore process. The program can also be configured to undelete blobs within a specific container or directory instead of scanning the entire storage account.

To run this program, follow these steps:

  • Installing the .NET SDK: Make sure you have the .NET SDK installed on your computer.
  • Connect to your Azure account:
Connect-AzAccount

dotnet nuget add source https://api.nuget.org/v3/index.json -n nuget.org

  • Creating a new console application:
dotnet new console --force

  • Add the following code to Program.cs..

using Azure.Core;
using Azure.Identity;
using Azure.Storage.Files.DataLake;
using Azure.Storage.Files.DataLake.Models;

var StorageAccountName = "xxxx";
var ContainerName = "xxxx";
var DirectoryPath = "";
var Concurrency = 500;
var BatchSize = 500;

static DataLakeServiceClient GetDatalakeClient(string accountName)
{
    DataLakeClientOptions clientOptions = new DataLakeClientOptions()
    {
        Retry = {
                    Delay = TimeSpan.FromMilliseconds(500),
                    MaxRetries = 5,
                    Mode = RetryMode.Fixed,
                    MaxDelay = TimeSpan.FromSeconds(5),
                    NetworkTimeout = TimeSpan.FromSeconds(30)
                },
    };

    // only works for prod.
    DataLakeServiceClient client = new(
        new Uri($"https://{accountName}.blob.core.windows.net"),
        new DefaultAzureCredential(),
        clientOptions);

    return client;
}

Console.WriteLine("Starting the program");

var client = GetDatalakeClient(StorageAccountName);
var throttler = new SemaphoreSlim(initialCount: Concurrency);

List tasks = new List();
List containerNames = new List();

if (string.IsNullOrEmpty(ContainerName))
{
    var containers = client.GetFileSystems();
    foreach (var container in containers)
    {
        containerNames.Add(container.Name);
    }
}
else
{
    containerNames.Add(ContainerName);
}

var totalSuccessCount = 0;
var totalFailedCount = 0;

foreach (var container in containerNames)
{
    Console.WriteLine($"Recoverying for container {container}");
    var fileSystem = client.GetFileSystemClient(container);

    var deletedItems = fileSystem.GetDeletedPaths(pathPrefix: DirectoryPath);
    var count = 0;
    var totalSuccessCountForContainer = 0;
    var totalFailedCountForContainer = 0;
    foreach (PathDeletedItem item in deletedItems)
    {
        await throttler.WaitAsync();
        count++;
        try
        {
            var task = (fileSystem.UndeletePathAsync(item.Path, item.DeletionId));
            var continuedTask = task.ContinueWith(t =>
            {
                throttler.Release();
                if (t.IsFaulted)
                {
                    Interlocked.Increment(ref totalFailedCount);
                    Interlocked.Increment(ref totalFailedCountForContainer);
                    Console.WriteLine($"Failed count for container {totalFailedCountForContainer}, total failed count {totalFailedCount}, path {DirectoryPath + item.Path} due to {t.Exception.Message}");
                }
                else
                {
                    Interlocked.Increment(ref totalSuccessCount);
                    Interlocked.Increment(ref totalSuccessCountForContainer);
                    Console.WriteLine($"Success count for container {totalSuccessCountForContainer}, total success count {totalSuccessCount}");
                }
            });
            tasks.Add(continuedTask);
        }
        catch (Exception ex)
        {
            Console.WriteLine("Failed to create task: " + ex.ToString());
        }
        finally
        {
            if (count == Math.Max(Concurrency, BatchSize))
            {
                count = 0;
                await Task.WhenAll(tasks);
                tasks.Clear();
            }
        }
    }

    await Task.WhenAll(tasks);
    Console.WriteLine($"Recover finished for container {container}");
}

Replace xxxx with your specific storage account and container name. If you need to restore a specific directory, provide the directory name. Otherwise, leave it blank and scan the entire container. This code is configured to run with 500 threads by default, but you can adjust this number as needed.

dotnet add package Azure.Identity
dotnet add package Azure.Storage.Blobs

dotnet build --configuration Release

dotnet 

As your application runs, you can monitor the console window to track its progress and identify potential problems or errors.





Source link

You may also like

Leave a Comment

Our Company

Welcome to OdysseyX, your one-stop destination for the latest news and opportunities across various domains.

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2024 – All Right Reserved. Designed and Developed by OdysseyX