chorbs February 2016

how to transfer file to azure blob storage in chunks without writing to file using python

I need to transfer files from google cloud storage to azure blob storage.

Google gives a code snippet to download files to byte variable like so:

# Get Payload Data
req = client.objects().get_media(
        bucket=bucket_name,
        object=object_name,
        generation=generation)    # optional
# The BytesIO object may be replaced with any io.Base instance.
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, req, chunksize=1024*1024)
done = False
while not done:
    status, done = downloader.next_chunk()
    if status:
        print 'Download %d%%.' % int(status.progress() * 100)
    print 'Download Complete!'
print fh.getvalue()

I was able to modify this to store to file by changing the fh object type like so:

fh = open(object_name, 'wb')

Then I can upload to azure blob storage using blob_service.put_block_blob_from_path.

I want to avoid writing to local file on machine doing the transfer.

I gather Google's snippet loads the data into the io.BytesIO() object a chunk at a time. I reckon I should probably use this to write to blob storage a chunk at a time.

I experimented with reading the whole thing into memory, and then uploading using put_block_blob_from_bytes, but I got a memory error (file is probably too big (~600MB).

Any suggestions?

Answers


minghan February 2016

After looking through the SDK source code, something like this could work:

from azure.storage.blob import _chunking
from azure.storage.blob import BlobService

# See _BlobChunkUploader
class PartialChunkUploader(_chunking._BlockBlobChunkUploader):
    def __init__(self, blob_service, container_name, blob_name, progress_callback = None):
        super(PartialChunkUploader, self).__init__(blob_service, container_name, blob_name, -1, -1, None, False, 5, 1.0, progress_callback, None)

    def process_chunk(self, chunk_offset, chunk_data):
        '''chunk_offset is the integer offset. chunk_data is an array of bytes.'''
        return self._upload_chunk_with_retries(chunk_offset, chunk_data)

blob_service = BlobService(account_name='myaccount', account_key='mykey')

uploader = PartialChunkUploader(blob_service, "container", "foo")
# while (...):
#     uploader.process_chunk(...)


Peter Pan - MSFT February 2016

According to the source codes of blobservice.py for Azure Storage and BlobReader for Google Cloud Storage, you can try to use the Azure function blobservice.put_block_blob_from_file to write the stream from the GCS class blobreader has the function read as stream, please see below.

enter image description here

enter image description here

So refering to the code from https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_BlobReader, you can try to do this as below.

from google.appengine.ext import blobstore
from azure.storage.blob import BlobService

blob_key = ...
blob_reader = blobstore.BlobReader(blob_key)

blob_service = BlobService(account_name, account_key)
container_name = ...
blob_name = ...
blobservice.put_block_blob_from_file(container_name, blob_name, blob_reader)

Post Status

Asked in February 2016
Viewed 1,637 times
Voted 6
Answered 2 times

Search




Leave an answer