bdownload package

bdownload.download module

class bdownload.download.BDownloader(max_workers=None, min_split_size=1048576, chunk_size=102400, proxy=None, cookies=None, user_agent=None, logger=None, progress='mill', num_pools=20, pool_maxsize=20, request_timeout=None, request_retries=None, status_forcelist=None, resumption_retries=None, continuation=True, referrer=None, check_certificate=True, ca_certificate=None, certificate=None)[source]

Bases: object

The class for executing and managing download jobs.

The context of the current downloading job is structured as:

ctx = {
    "total_size": 2000,  # total size of all the to-be-downloaded files, maybe inaccurate due to chunked transfer encoding
    "accurate": True,  # Is `total_size` accurate?
    "orig_path_urls": [('file1', 'url1  url2    url3'), ('file2', 'url4 url5    url6')],  # originally added downloads,
        # which don't necessarily correspond to `files` e.g. due to duplicate or interruption
    "file_cnt": 2,  # number of current downloading files
    "alt_files": [("full_path_to_file1", `ctx_file1_obj`), ("full_path_to_file2", `ctx_file2_obj`)],  # flattened `files`
    "files":{
        "full_path_to_file1":{
            "length": 2000,  # 0 means 'unknown', i.e. file size can't be pre-determined through any one of provided URLs
            "progress": 0,  # `SUCCEEDED` downloaded bytes: initialized to 0, set to the last progress when
                            # resuming and updated on completion (SUCCEEDED only!) of every task (`Future`)
            "last_progress": 0,  # CONSTANT: the loaded progress of last run upon resuming from interruption
            "downloaded": 0, # downloaded bytes: initialized to 0, set to the last progress when resuming
                             # and updated on completion (SUCCEEDED, FAILED, CANCELLED) of every task (`Future`)
            "resumable": True,
            "resuming_from_intr": False,  # Are we resuming from keyboard interruption?
            "download_state": "inprocess",
            "cancelled_on_exception": False,
            "futures": [future1, future2],
            "tsk_num": 2,  # number of the `ranges` and `futures`
            "orig_path_url": ('file1', 'url1    url2    url3'),  # (path, url) as a subparameter passed to :meth:`downloads`
            "path_url": ('full_path_to_file1', 'url1    url2    url3'),  # (full_pathname, active_URLs)
            "urls":{"url1":{"accept_ranges": "bytes", "refcnt": 1, "interrupted": 2, "succeeded": -5},
                    "url2":{"accept_ranges": "none", "refcnt": 0, "interrupted": 0, "succeeded": 0},
                    "url3":{"accept_ranges": "bytes", "refcnt": 1, "interrupted": 0, "succeeded": -2}},
            "ranges":{
                "bytes=0-999": {
                    "start": 0,  # start byte position
                    "end": 999,  # end byte position, None for 'unkown', see above
                    "offset": 0,  # current pointer position relative to 'start'(i.e. 0)
                    "start_time": 0,
                    "rt_dl_speed": 0,  # x seconds interval
                    "download_state": "inprocess",
                   "future": future1,
                   "url": [url1],
                   "alt_urls": {}
                },
                "bytes=1000-1999": {
                    "start":1000,
                    "end":1999,
                    "offset": 0,  # current pointer position relative to 'start'(i.e. 1000)
                    "start_time": 0,
                    "rt_dl_speed": 0,  # x seconds interval
                    "download_state": "inprocess",
                    "future": future2,
                    "url": [url3],
                    "alt_urls": {}
                }
            }
        },
        "full_path_to_file2":{
        }
    },
    "futures": {
        future1: {"file": "full_path_to_file1", "range": "bytes=0-999"},
        future2: {"file": "full_path_to_file1", "range": "bytes=1000-1999"}
    }
}
CANCELLED = 'cancelled'
FAILED = 'failed'
INPROCESS = 'inprocess'
INPROCESS_EXT = '.bdl'
PENDING = 'pending'
PROGRESS_BS_BAR = 'bar'
PROGRESS_BS_MILL = 'mill'
PROGRESS_BS_NONE = 'none'
RESUM_PARTS_EXT = '.bdl.par'
SUCCEEDED = 'succeeded'
__init__(max_workers=None, min_split_size=1048576, chunk_size=102400, proxy=None, cookies=None, user_agent=None, logger=None, progress='mill', num_pools=20, pool_maxsize=20, request_timeout=None, request_retries=None, status_forcelist=None, resumption_retries=None, continuation=True, referrer=None, check_certificate=True, ca_certificate=None, certificate=None)[source]

Create and initialize a BDownloader object.

Parameters
  • max_workers (int) – The max_workers parameter specifies the number of the parallel downloading threads, whose default value is determined by #num_of_processor * 5 if set to None.

  • min_split_size (int) – min_split_size denotes the size in bytes of file pieces split to be downloaded in parallel, which defaults to 1024*1024 bytes (i.e. 1MB).

  • chunk_size (int) – The chunk_size parameter specifies the chunk size in bytes of every http range request, which will take a default value of 1024*100 (i.e. 100KB) if not provided.

  • proxy (str) – The proxy supports both HTTP and SOCKS proxies in the form of 'http://[user:pass@]host:port' and 'socks5://[user:pass@]host:port', respectively.

  • cookies (str, dict or CookieJar) – If cookies needs to be set, it must either take the form of 'cookie_key=cookie_value', with multiple pairs separated by whitespace and/or semicolon if applicable, e.g. 'key1=val1 key2=val2;key3=val3', be packed into a dict, or be an instance of CookieJar, i.e. cookielib.CookieJar for Python27, http.cookiejar.CookieJar for Python3.x or RequestsCookieJar from requests.

  • user_agent (str) – When user_agent is not given, it will default to 'bdownload/VERSION', with VERSION being replaced by the package’s version number.

  • logger (logging.Logger) – The logger parameter specifies an event logger. If logger is not None, it must be an object of class logging.Logger or of its customized subclass. Otherwise, it will use a default module-level logger returned by logging.getLogger(__name__).

  • progress (str) – progress determines the style of the progress bar displayed while downloading files. Possible values are 'mill', 'bar' and 'none'. 'mill' is the default. To disable this feature, e.g. while scripting or multi-instanced, set it to 'none'.

  • num_pools (int) – The num_pools parameter has the same meaning as num_pools in urllib3.PoolManager and will eventually be passed to it. Specifically, num_pools specifies the number of connection pools to cache.

  • pool_maxsize (int) – pool_maxsize will be passed to the underlying requests.adapters.HTTPAdapter. It specifies the maximum number of connections to save that can be reused in the urllib3 connection pool.

  • request_timeout (float or 2-tuple of float) – The request_timeout parameter specifies the timeouts for the internal requests session. The timeout value(s) as a float or (connect, read) tuple is intended for both the connect and the read timeouts, respectively. If set to None, it will take a default value of RequestsSessionWrapper.TIMEOUT.

  • request_retries (int) –

    request_retries specifies the maximum number of retry attempts allowed on exceptions and interested status codes(i.e. status_forcelist) for the builtin Retry logic of urllib3. It will default to URLLIB3_BUILTIN_RETRIES_ON_EXCEPTION if not given.

    Notes

    There are two retry mechanisms that jointly determine the total retries of a request. One is the above-mentioned Retry logic that is built into urllib3, and the other is the extended high-level retry factor that is meant to complement the builtin retry mechanism. The total retries is bounded by the following formula:

    request_retries * (_requests_extended_retries_factor + 1)

    See retry_requests(), RequestsSessionWrapper and requests_retry_session() for more details on the retry mechanisms.

  • status_forcelist (set of int) – status_forcelist specifies a set of HTTP status codes that a retry should be enforced on. The default set of status codes shall be URLLIB3_RETRY_STATUS_CODES if not given.

  • resumption_retries (int) – The resumption_retries parameter specifies the maximum allowable number of retries on error at resuming the interrupted download while streaming the request content. The default value of it is REQUESTS_RETRIES_ON_STREAM_EXCEPTION when not provided.

  • continuation (bool) – The continuation parameter specifies whether, if possible, to resume the partially downloaded files before, e.g. when the downloads had been terminated by the user by pressing Ctrl-C. When not present, it will default to True.

  • referrer (str) – referrer specifies an HTTP request header Referer that applies to all downloads. If set to '*', the request URL shall be used as the referrer per download.

  • check_certificate (bool) – The check_certificate parameter specifies whether to verify the server’s TLS certificate or not. It defaults to True.

  • ca_certificate (str) – The ca_certificate parameter specifies a path to the preferred CA bundle file (.pem) or directory with certificates in PEM format of trusted CAs. If set to a path to a directory, the directory must have been processed using the c_rehash utility supplied with OpenSSL, according to requests. NB the cert files in the directory each only contain one CA certificate.

  • certificate (str or tuple) – certificate specifies a client certificate. It has the same meaning as that of cert in requests.request().

Raises

ValueError – Raised when the cookies is of the str type and not in valid format.

_backup_resumption_ctx(the_file, ctx_file)[source]

Back up the necessary context of the unsuccessful download for resuming later.

Parameters
  • the_file (str) – The full path name of the file being downloaded.

  • ctx_file (dict) – The download context of the file the_file.

Returns

The resumption context for the file the_file.

Return type

dict

_build_ctx(path_urls)[source]

Build the context for downloading the file(s).

Parameters

path_urls (list of tuple) – Paths and URLs for the file(s) to download, see downloads() for details.

Returns

A 6-tuple of lists '(active, active_orig, failed, failed_orig, existing, existing_orig)', where the lists active and active_orig contain the active (path, url)’s, converted and original respectively; failed and failed_orig contain the same (path, url)’s that are not downloadable; existing and existing_orig contain the downloads whose desired files already exist out there.

Raises

BDownloaderException – Raised when the termination or cancellation flag has been set.

_build_ctx_internal(path_name, url)[source]

The helper method that actually does the build of the downloading context of the file.

Parameters
  • path_name (str) – The full path name of the file to download.

  • url (str) – The URL referencing the target file.

Returns

A 3-tuple '(downloadable, (path, url), (orig_path, orig_url))', where the downloadable indicates whether the desired file is downloadable, unavailable or existing by True, False or None respectively, (path, url) denotes the converted full pathname and the URL that consists only of active URLs, and (orig_path, orig_url) denotes the originally input pathname and URL.

Return type

tuple

Raises

BDownloaderException – Raised when the termination or cancellation flag has been set.

_calc_completed()[source]

Calculate the already downloaded bytes of the files.

Returns

The size in bytes of the downloaded pieces.

Return type

int

_cancel_all_on_interrupted()[source]

Cancel all the downloading tasks when receiving the SIGINT signal or the QUIT command.

_finalize_on_interrupted_py2()[source]

When interrupted under Python2.x, perform state transitions manually and act accordingly.

_get_alt_urls(path_name)[source]

Get alternative URLs from the multiple sources of the file to resume downloading from.

Parameters

path_name (str) – The full path name of the file to be downloaded.

Returns

The alternative source URLs sorted by descending succeeded downloads, then by ascending interrupted and references.

Return type

list

static _get_fname_from_hdr(content_disposition)[source]

“Get the file name from the HTTP response header.

Parameters

content_disposition (str) – Content of the Content-Disposition field of the response header.

Returns

The extracted file name.

Return type

str

static _get_fname_from_url(url)[source]

Generate a file name from the download URL.

Parameters

url (str) – A URL referencing the intended file.

Returns

The automatically generated file name.

Return type

str

_get_remote_file_multipart(path_name, req_range)[source]

The worker thread body for downloading an assigned piece of a file.

Parameters
  • path_name (str) – The full path name of the file to be downloaded.

  • req_range (str) – A chunk of the file path_name as a range request of the form 'bytes=start-end'.

Returns

None.

Raises
  • BDownloaderException – Raised when connect timeouts, read timeouts, failed connections or bad status codes occurred and the retries is exhausted.

  • EnvironmentError – Raised when file operations failed.

_get_remote_file_singlewhole(path_name, req_range)[source]

The worker thread body for downloading the whole of a file, as opposed to _get_remote_file_multipart().

Parameters
  • path_name (str) – The full path name of the file to be downloaded.

  • req_range (str) – The whole chunk of the file path_name as a mock range request of the form 'bytes=0-None'.

Returns

None.

Raises
  • BDownloaderException – Raised when connect timeouts, read timeouts, failed connections or bad status codes occurred and the retries is exhausted.

  • EnvironmentError – Raised when file operations failed.

_is_all_done()[source]

Check if all the tasks have completed.

Returns

True if all the Futures have been done, meaning that all the files have finished downloading, whether successfully or not; False otherwise.

Return type

bool

_is_download_resumable(path_name)[source]

Check if the current download of the file can be resumed from the point of last interruption through retrying.

Parameters

path_name (str) – The full path name of the file being downloaded.

Returns

True if the server accepts range requests for the file, otherwise False.

Return type

bool

_is_parallel_downloadable(path_name)[source]

Check if the file can be downloaded in parallel, i.e. using multi-threads to download the file pieces simultaneously.

Parameters

path_name (str) – The full path name of the file to be downloaded.

Returns

True if the file length is known and the server accepts its range requests, otherwise False.

Return type

bool

_load_resumption_ctx(the_file, ctx_file)[source]

Load from the resumption parts file to restore the download context.

Parameters
  • the_file (str) – The full path name of the file to download.

  • ctx_file (dict) – The download context of the file the_file.

Returns

A 2-tuple (is_resuming, resumption_ctx), where is_resuming indicates whether the download is resuming from last interruption, and if this is the case (True), resumption_ctx holds the successfully loaded resumption context.

Return type

(bool, dict)

_mgmnt_task()[source]

The management thread body.

This thread manages the downloading process of the whole job queue, currently including state management only. When all the tasks have been done, it signals the waiting thread and exits immediately.

Returns

None.

_on_cancelled(the_file, ctx_file)[source]

When transitioning to the CANCELLED state, remove the empty, obsolete files.

_on_failed(the_file, ctx_file)[source]

When transitioning to the FAILED state, save the resumption ctx or remove the intermediate files.

_on_succeeded(the_file, ctx_file)[source]

When transitioning to the SUCCEEDED state, convert from in-process to finished file and do the cleanup.

_pick_file_url(path_name)[source]

Select one URL from the multiple sources of the file to download from.

Parameters

path_name (str) – The full path name of the file to be downloaded.

Yields

list – A list of URL(s) to download the file from using a strategy of Round Robin.

_progress_task()[source]

The thread body for showing the progress of the downloading tasks.

Returns

None.

_rename_existing_file(full_pathname)[source]

Rename the file or directory with the given pathname if present.

Parameters

full_pathname (str) – The full path name of the file to check for duplicate.

_result()[source]

“Return both the succeeded and failed downloads when all done or interrupted by user.

Returns

Same as that returned by wait_for_all().

Return type

tuple of list

_state_mgmnt()[source]

Perform the state-related operations of file downloading.

This method updates the download status of the files and their related chunks when the associated worker threads completed, either because of finished without error, raised on exception or cancelled intentionally.

Returns

None.

_submit_dl_tasks(path_urls)[source]

Submit the download tasks of the files to the thread pool.

Parameters

path_urls (list of tuple) – The meaning and format of the path_urls is similar to the parameter for downloads().

Returns

None.

static _topmost_missing_dir(path)[source]

Find the topmost non-existent directory for a given path.

Parameters

path (str) – A path to the directory to save the downloaded file in.

Returns

The uppermost directory that is missing from the path.

Return type

str

_wait_py2()[source]

Wait for all the jobs done on Python 2.x

_wait_py3()[source]

Wait for all the jobs done on Python 3.x and newer

static calc_req_ranges(req_len, split_size, req_start=0)[source]

Split the request req_len into chunks of the size split_size starting from the point req_start.

Parameters
  • req_len (int) – The length of the request to split.

  • split_size (int) – The size of each split chunk.

  • req_start (int) – The start position to split from.

Returns

The list of ranges in the form of 2-tuple '(start ,end)'.

Return type

list of tuple

cancel(keyboard_interrupt=True)[source]

Cancel all the download jobs.

Parameters

keyboard_interrupt (bool) – Specifies whether or not the user hit the interrupt key (e.g. Ctrl-C).

Returns

None.

close()[source]

Shut down and perform the cleanup.

Returns

None.

download(path_name, url)[source]

Submit a single downloading job to the downloading queue.

This method is simply a wrapper of the method downloads().

Parameters
  • path_name (str) – The full path name of the file to be downloaded.

  • url (str) – The URL referencing the target file.

Returns

None.

:raises Same as in downloads().:

Notes

The limitation on the method and the path_name parameter herein is the same as in downloads().

downloads(path_urls)[source]

Submit multiple downloading jobs at a time to the downloading queue.

Parameters

path_urls (list of tuples) – path_urls accepts a list of tuples of the form (path, url), where path should be a pathname, optionally prefixed with absolute or relative paths, and url should be a URL string, which may consist of multiple TAB-separated URLs pointing to the same file. A valid path_urls, for example, could be [(‘/opt/files/bar.tar.bz2’, 'https://foo.cc/bar.tar.bz2'), (‘./sanguoshuowen.pdf’, 'https://bar.cc/sanguoshuowen.pdf\thttps://foo.cc/sanguoshuowen.pdf'), (‘/to/be/created/’, 'https://flash.jiefang.rmy/lc-cl/gaozhuang/chelsia/rockspeaker.tar.gz'), (‘/path/to/existing-dir’, 'https://ghosthat.bar/foo/puretonecone81.xz\thttps://tpot.horn/foo/pure tonecone81.xz\thttps://hawkhill.bar/foo/puretonecone81.xz')].

Returns

None.

Raises

BDownloaderException – Raised when the downloads were interrupted, e.g. by calling cancel() in a SIGINT signal handler, in the process of submitting the download requests.

Notes

The method is not thread-safe, which means it should not be called at the same time in multiple threads with one instance.

When multi-instanced (e.g. one instance per thread), the file paths specified in one instance should not overlap those in another to avoid potential race conditions. File loss may occur, for example, if a failed download task in one instance tries to delete a directory that is being accessed by some download tasks in other instances. However, this limitation doesn’t apply to the file paths specified in a same instance.

static list_split(li, chunk_size=5)[source]

Break a list into chunks.

Parameters
  • li (list) – The list to split.

  • chunk_size (int) – The size of the resultant chunk list.

Yields

list – The next chunk of the split list li.

raise_on_interrupted()[source]

Raise a customized exception signaling that the downloads have been terminated by the user.

Raises

BDownloaderException – Raised when the termination or cancellation flag has been set.

result()[source]

Return the final download status.

Returns

0 for success, and -1 failure.

Return type

int

results()[source]

Get both the succeeded and failed downloads when all done or interrupted by user.

Returns

Same as that returned by wait_for_all().

Return type

tuple of list

wait_for_all()[source]

Wait for all the downloading jobs to complete.

Returns

A 2-tuple of lists '(succeeded, failed)'. The first list succeeded contains the originally passed (path, url)s that finished successfully, while the second list failed contains the raised and cancelled ones.

Return type

tuple of list

exception bdownload.download.BDownloaderException[source]

Bases: Exception

The exception indicating that an error occurred while executing the download tasks.

bdownload.download.COOKIE_STR_REGEX = re.compile('\\s*(?:[^,; =]+=[^,; ]+\\s*(?:$|\\s+|;\\s*))+\\s*')

A compiled regular expression object used to match the cookie string in the form of key/value pairs.

See also BDownloader.__init__() for more details about cookies.

class bdownload.download.MillProgress(label='', hide=None, expected_size=None, every=1, eta_tag='eta:', elapsed_tag='elapsed:')[source]

Bases: object

Print a mill while progressing.

This class is adapted from clint.textui.progress, with added support for unknown expected_size.

ETA_INTERVAL = 1
ETA_SMA_WINDOW = 9
MILL_CHARS = ['|', '/', '-', '\\']
MILL_TEMPLATE = '{}  {}  {:,d}/{:<}  {}  {} {}\r'
NULL_EXPECTED_DISP = '--'
NULL_EXPECTED_WIDTH = 2
STREAM = <clint.packages.colorama.ansitowin32.StreamWrapper object>
done()[source]
static format_time(seconds)[source]
mill_char(progress)[source]
show(progress, count=None)[source]
bdownload.download.PICKLE_PROTOCOL_NUMBER = 2

The highest pickle protocol number valid for both Python 2.x and Python 3.x.

Type

int

bdownload.download.REQUESTS_EXTENDED_RETRIES_FACTOR = 3

Default number of retries factor for _requests_extended_retries_factor.

Type

int

bdownload.download.REQUESTS_RETRIES_ON_STREAM_EXCEPTION = 10

Default number of retries on exceptions raised while streaming the request content.

Type

int

bdownload.download.RETRY_BACKOFF_FACTOR = 0.1

Default retry backoff factor.

Type

float

class bdownload.download.RequestsSessionWrapper(timeout=None, proxy=None, cookies=None, user_agent=None, referrer=None, verify=True, cert=None, requester_cb=None)[source]

Bases: requests.sessions.Session

Subclass of the requests.Session class with extended retry-on-exception behavior for the get method.

Note

The retry mechanism here is independent from that built into urllib3 (see _requests_extended_retries_factor and retry_requests()). That is, the decorated retry attempts will be triggered whenever the get method raised on some requests.RequestException or for any bad status code, regardless of whether or not the builtin Retry of urllib3 is enabled. Nevertheless, they together determine the number of the total retries. See requests_retry_session() for more information about their cooperation.

TIMEOUT = (3.05, 6)

the connect timeout value defaults to 3.05 seconds, and the read timeout 6 seconds.

Type

Default timeouts

__init__(timeout=None, proxy=None, cookies=None, user_agent=None, referrer=None, verify=True, cert=None, requester_cb=None)[source]

Initialize the Session instance.

The HTTP header User-Agent of the session is set to a default value of bdownload/VERSION, if not provided, with VERSION being replaced by the package’s version number.

Parameters
  • timeout (float or 2-tuple of float) – Timeout value(s) as a float or (connect, read) tuple for both the connect and the read timeouts, respectively. If set to None, 0 or (), whether the whole or any item thereof, it will take a default value from TIMEOUT, accordingly.

  • proxy (str) – Same as for BDownloader.__init__().

  • cookies (str, dict or CookieJar) – Same as for BDownloader.__init__().

  • user_agent (str) – Same as for BDownloader.__init__().

  • referrer (str) – Same as for BDownloader.__init__().

  • verify (bool or str) – Same as for requests.request().

  • cert (str or tuple) – Same as for requests.request().

  • requester_cb (func) – The callback function provided by the downloader that uses the instantiated session object as the HTTP(S) requester. It will get called when making an HTTP GET request.

static _build_cookiejar_from_kvp(key_values)[source]

Build a CookieJar from cookies in the form of key/value pairs.

Parameters

key_values (str) – The cookies must take the form of 'cookie_key=cookie_value', with multiple pairs separated by whitespace and/or semicolon if applicable, e.g. 'key1=val1 key2=val2; key3=val3'.

Returns

The built CookieJar for requests sessions.

Return type

requests.cookies.RequestsCookieJar

Raises

ValueError – Raised when the cookies string key_values is not in valid format.

get(url, **kwargs)[source]

Wrapper around requests.Session’s get method decorated with the retry_requests() decorator.

Parameters
  • url – URL for the file to download from.

  • **kwargs – Same arguments as that requests.Session.get takes.

Returns

The response to the HTTP GET request.

Return type

requests.Response

Raises
  • BDownloaderException – Raised when the termination or cancellation flag has been set, for example, if RequestsSessionWrapper.requester_cb is initialized to BDownloader.raise_on_interrupted().

  • ExceptionByRequesterCB – Same exception(s) as that raised by RequestsSessionWrapper.requester_cb, if any.

bdownload.download.URLLIB3_BUILTIN_RETRIES_ON_EXCEPTION = 3

Default number of retries on exception set through urllib3’s Retry mechanism.

Type

int

bdownload.download.URLLIB3_RETRY_STATUS_CODES = frozenset({413, 429, 500, 502, 503, 504})

Default status codes to retry on intended for the underlying urllib3.

Type

set

bdownload.download._cpu_count()[source]

A simple wrapper around the cpu_count() for escaping the NotImplementedError.

Returns

The number of CPUs in the system. Return None if not obtained.

bdownload.download._requests_extended_retries_factor = 3

Number of retries that complements and extends the builtin Retry mechanism of urllib3.

This global variable is meant for the decorator retry_requests(), and its value can be modified through the module level function set_requests_retries_factor(). It is initialized to REQUESTS_EXTENDED_RETRIES_FACTOR by default, and usually you don’t want to change it.

Together with urllib3’s builtin retry logic, they determine the total number of the retries on exceptions and bad status codes at requests for downloading. For more details on the retry mechanisms, see requests_retry_session().

Notes

Don’t mix these two retry mechanisms up with the retries at failed connections while streaming the request content.

Type

int

bdownload.download.requests_retry_session(builtin_retries=None, backoff_factor=0.1, status_forcelist=None, session=None, num_pools=20, pool_maxsize=20, **kwargs)[source]

Create a session object of the class RequestsSessionWrapper by default.

Aside from the retry mechanism implemented by the wrapper decorator, the created session also leverages the built-in retries bound to urllib3. When both of them are enabled, they cooperate to determine the total retry attempts. The worst-case retries is determined using the following formula:

builtin_retries * (_requests_extended_retries_factor + 1)

which applies to all the exceptions and those status codes that fall into the status_forcelist. For other status codes, the maximum retries shall be _requests_extended_retries_factor.

Parameters
  • builtin_retries (int) – Maximum number of retry attempts allowed on errors and interested status codes, which will apply to the retry logic of the underlying urllib3. If set to None or 0, it will default to URLLIB3_RETRIES_ON_EXCEPTION.

  • backoff_factor (float) – The backoff factor to apply between retries.

  • status_forcelist (set of int) – A set of HTTP status codes that a retry should be enforced on. The default status forcelist shall be URLLIB3_RETRY_STATUS_CODES if not given.

  • session (requests.Session) – An instance of the class requests.Session or its customized subclass. When not provided, it will use RequestsSessionWrapper to create by default.

  • num_pools (int) – The number of connection pools to cache, which has the same meaning as num_pools in urllib3.PoolManager and will eventually be passed to it.

  • pool_maxsize (int) – The maximum number of connections to save that can be reused in the urllib3 connection pool, which will be passed to the underlying requests.adapters.HTTPAdapter.

  • **kwargs – Same arguments as that RequestsSessionWrapper.__init__() takes.

Returns

The session instance with retry capability.

Return type

requests.Session

bdownload.download.retry_requests(exceptions, backoff_factor=0.1, logger=None)[source]

A decorator that retries calling the wrapped requests’ function using an exponential backoff on exception.

The retry attempt will be activated in the event of exceptions being caught and for all the bad status codes (i.e. codes ranging from 400 to 600).

Parameters
Returns

The wrapper function.

Raises

exceptions – Re-raise the last caught exception when retries is exhausted.

Notes

This function has an external dependency on the global variable _requests_extended_retries_factor, whose value can be changed through the function set_requests_retries_factor(). Also, it should be greater than 0, thus allowing the decorated method to retry at least once to cover the edge cases of exceptions and bad status codes.

bdownload.download.set_requests_retries_factor(retries)[source]

Set the retries factor for the decorator retry_requests().

Parameters

retries (int) – Number of retries when a decorated method of requests raised an exception or returned any bad status code. It should take a value of at least 1, or else nothing changes.

Returns

None.

bdownload.download.unquote_unicode(string)[source]

Unquote a percent-encoded string.

Parameters

string (str) – A %xx- and %uxxxx- encoded string.

Returns

The unquoted unicode string.

Return type

str

bdownload.cli module

This module provides the entry point main for the command line utility bdownload.

bdownload.cli._cmd_quit_handler(bdownloader, signum, frame)[source]

The handler for the signals SIGTERM, SIGABRT, SIGHUP and SIGBREAK.

Parameters
  • bdownloader (BDownloader) – The BDownloader instance acting as the file downloader.

  • signum – The signal number being one of the possible values as signal.SIGTERM, signal.SIGABRT, signal.SIGHUP, or signal.SIGBREAK.

  • frame – The current stack frame when the signal SIGINT is received.

bdownload.cli._dec_raw_tab_separated_urls(url)[source]

Decode a raw URL string that may consist of multiple escaped TAB-separated URLs.

Parameters

url (str) – URL for the file to be downloaded, which might be TAB-separated composite URL pointing to the same file.

Returns

Decoded URL.

Return type

str

Raises

ArgumentTypeError – Raised when url contains URL(s) that don’t conform to the format “http[s]://[user:pass@]foo.bar[*]”.

Examples

Examples of the parameter url include:
  • 'https://fakewebsite-01.com/downloads/soulbody4ct.pdf\thttps://fakewebsite-02.com/archives/soulbody4ct.pdf'

  • 'https://fakewebsite-01.com/downloads/ipcress.docx      https://fakewebsite-02.com/archives/ipcress.docx'

  • 'https://tianchengren:öp€nsasimi@i.louder.ss\thttps://fangxun.xiaoqing.sunmoon.xue'

bdownload.cli._interrupt_handler(bdownloader, signum, frame)[source]

The handler for the signals SIGINT and SIGQUIT.

Parameters
  • bdownloader (BDownloader) – The BDownloader instance acting as the file downloader.

  • signum – The signal number being either signal.SIGINT or signal.SIGQUIT.

  • frame – The current stack frame when the signal SIGINT is received.

bdownload.cli._load_cookies(cookies)[source]

Load cookie(s) either from a Netscape cookie file or a string.

Parameters

cookies (str) –

Cookies either in the form of a string (maybe whitespace- and/or semicolon- separated) like “cookie_key=cookie_value cookie_key2=cookie_value2; cookie_key3=cookie_value3”, or a file, e.g. named “cookies.txt”, in the Netscape cookie file format.

Note

The option -D DIR does not apply to the cookie file.

Returns

A CookieJar or a validated cookies string.

Return type

cookielib.MozillaCookieJar or str

Raises

ArgumentTypeError – Raised when exception occurred while loading the cookies file or the cookies string is not in valid format.

bdownload.cli._normalize_bytes_num(bytes_num)[source]

Normalize and convert the integer number string expressed in the unit Byte.

Parameters

bytes_num (str) – The integer number string that may be suffixed with a quantity of ‘K’ or ‘M’, where ‘K’ indicates multiples of 1024 and ‘M’ means multiples of 1024*1024.

Returns

Normalized integer number.

Return type

int

Raises

ArgumentTypeError – Raised when passed bytes_num is neither a normal integer decimal number string nor a suffixed one.

bdownload.cli._win32_utf8_argv()[source]

Use kernel32.GetCommandLineW and shell32.CommandLineToArgvW to get sys.argv as a list of UTF-8 strings.

Versions 2.5 and older of Python don’t support Unicode (“mon€y röcks” for example) in sys.argv on Windows, with the underlying Windows API instead replacing multi-byte characters with ‘?’.

Returns

Command-line arguments. A list of utf-8 strings for success, None on failure.

Return type

list of str

bdownload.cli.ignore_termination_signals()[source]

Cause the process not to respond to termination signals.

bdownload.cli.install_signal_handlers(bdownloader)[source]

Install handlers for termination signals.

Parameters

bdownloader (BDownloader) – The BDownloader instance acting as the file downloader.

bdownload.cli.main()[source]

Collect the command-line arguments from sys.argv, parse and do the downloading as specified.