Module: collector.py
- Purpose:
This module provides file collection functionality to the project.
Specifically, this module is called by
badsnakes.badsnakes.BadSnakes.mainto populate the ‘files list’ which holds all files to be analysed.The CLI argument
PATHis passed into this module, which then traverses either the list of files, the directory or extracts the wheel, in efforts to determine the files which should be analysed. These files are passed back to the caller via thefilesproperty.- Platform:
Linux/Windows | Python 3.10+
- Developer:
J Berendt
- Email:
- Comments:
n/a
- Examples:
Collect plain-text files from a given directory:
>>> from badsnakes.libs.collector import Collector >>> c = Collector(paths=['/path/to/files']) >>> c.collect() >>> c.files [['/path/to/files/project.py', '/path/to/files/script.sh']]
Collect plain-text files from a Python wheel:
>>> from badsnakes.libs.collector import Collector >>> c = Collector(paths=['/path/to/project-0.7.3-py3-none-any.whl']) >>> c.collect() >>> c.files [['/tmp/tmpqnm6yka2/project/module00.py', '/tmp/tmpqnm6yka2/project/module01.py', '/tmp/tmpqnm6yka2/project/module02.py', ..., '/tmp/tmpqnm6yka2/project/script.sh', '/tmp/tmpqnm6yka2/project/file.txt', ..., '/tmp/tmpqnm6yka2/project/module08.py', '/tmp/tmpqnm6yka2/project/module09.py', '/tmp/tmpqnm6yka2/project/module10.py']]
- exception badsnakes.libs.collector.MixedTypesError[source]
Bases:
ExceptionCustom error class raised for mixed
PATHtype errors.- add_note()
Exception.add_note(note) – add a note to the exception
- with_traceback()
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class badsnakes.libs.collector._CollectorBase(path: str)[source]
Bases:
objectPrivate base class providing file collection functionality.
- Parameters:
path (str) – Full path to the module, directory or wheel for collection.
- property files: list
Accessor to the list of collected files.
- class badsnakes.libs.collector._CollectorDirectory(path: str)[source]
Bases:
_CollectorBaseCollect all files for analysis from the given directory.
This private class is not part of the public interface. Please call the
Collectorclass instead.- collect(path: str = None)[source]
Collect all files for this class-type.
- Parameters:
path (str, optional) – Directory path. This argument was originally implemented for use by
_CollectorWheelto enable directory traversal using existing logic. Defaults to None.- Logic:
Using
glob.globrecursively, all files (including hidden files) are collected.Next, using
filterremove any files which match the exclusion pattern and are not plain-text. See Tip below.Map
os.path.realpathto all files to expand the filepaths.
Tip
The excluded directories are maintained by the list in
config.tomlunder thesystem.exclude_dirskey.
- property files: list
Accessor to the list of collected files.
- class badsnakes.libs.collector._CollectorWheel(path: str)[source]
Bases:
_CollectorBaseCollect all files for analysis from a Python wheel.
This private class is not part of the public interface. Please call the
Collectorclass instead.- Parameters:
path (str) – Full path to the wheel file.
- property tmpdir: TemporaryDirectory
Accessor to the temporary directory object.
- collect()[source]
Unzip a wheel file and collect files.
- Logic:
Create a temporary directory object (using
tempfile).Using
zipfile, unzip the wheel into the temporary directory.Create an instance of the
_CollectorDirectoryclass and pass the path to the temp directory into the class for file collection.Store the list of collected files into the
_filesattribute.
- Temp Directory:
The
tempfile.TemporaryDirectoryobject created by this method is not explicitly closed, as the directory must exist for analysing the files. Therefore, the temp directory is removed when thetmpdirobject has been destroyed, generally on program completion.For this reason, the object must be kept ‘alive’ in the class instance, and therefore cannot be a local variable. To keep the object alive, the class’ instance of the temp directory object is appended to a list in the parent class.
- property files: list
Accessor to the list of collected files.
- class badsnakes.libs.collector.Collector(paths: list)[source]
Bases:
objectPrimary file collection interface class.
- Parameters:
paths (list) – A list of file paths or directories from the argument parser.
Note
On instantiation, all elements in the
pathslist argument are expanded to their realpath and tested to ensure they exist.- property files: list
Accessor to the list of Python files to be analysed.
Note
This property is a list of lists.
Each outer list represents a wheel or a directory, with each inner list representing the files contained therein.
- collect()[source]
Collect files for analysis from the provided paths.
- Criteria:
Using the private
_identify()method, the file collection is routed to the appropriate file collector based on the type of path provided to thepathsargument on instantiation.Directory: All paths in the
_pathsattribute must be directories.Module: All paths in the
_pathsattribute must be plain-text files.Wheel: All paths in the
_pathsattributes must be Python wheels, or zip files.
Only files of the same type (directory, module or wheel) can be collected at the same time, otherwise a
ValueErroris raised.- Raises:
MixedTypesError – Raised if the
_pathsattribute contains a mix of the types listed above.
- _checks()[source]
Perform pre-collection checks.
- Checks:
All files exist.
- Raises:
FileNotFoundError – Raised if any file in
pathsdoes not exist.
- _collect_from_directory()[source]
Collect all plain-text files from a directory.
Before this method is called, all paths are tested to ensure they are directories.
- _collect_from_files()[source]
Collect all plain-text files.
As the realpath conversion and file exists check have already been performed, this method can simply append the
_pathsargument to_files, for the caller’s use.
- _collect_from_wheel()[source]
Collect all plain-text files from wheels.
Before this method is called, all paths are tested to ensure they are wheels (or .zip files).
- _identify() str[source]
Identify the type of collection to take place.
- Returns:
One of the following strings are returned, based on the content of the
pathsargument:Directory: ‘dir’
Python modules: ‘modules’
Wheel: ‘wheel’
Anything else: ‘invalid’
- Return type:
str
- _isdir() bool[source]
Test if all elements of
pathsare directories.- Returns:
True if all paths are directories, otherwise False.
- Return type:
bool
- _istext() bool[source]
Test if all elements of
pathsare plain-text files.- Returns:
True if all elements of
pathsare plain-text files, otherwise False.- Return type:
bool
- _iswheel() bool[source]
Test if all elements of
pathsare Python wheels.Note
A file is tested as a wheel by checking the first four bytes of the file itself, not using the file extension. As such a
.zipfile will pass this test as well.- Returns:
True if all elements of
pathsare Python wheels (or ZIP archives), otherwise False.- Return type:
bool