Metadata-Version: 2.1
Name: MachineLearningComparisonPipeline
Version: 0.2.4
Summary: Pipeline for analysis of the machine learning applications in Sci-Kit Learn
Author-email: "Dr. Frank Mobley" <frank.mobley.1@afrl.af.mil>, Gregory Bowers <gregory.bowers.ctr@us.af.mil>
License: MIT
Project-URL: Homepage, https://gitlab.com/python-audio-feature-extraction/machine-learning-pipeline
Keywords: machine learning,feature extraction,audio
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.21.5
Requires-Dist: pandas>=1.4.3
Requires-Dist: scipy>=1.9.1
Requires-Dist: statsmodels>=0.13.2
Requires-Dist: scikit-learn>=1.1.0
Requires-Dist: salib>=1.1.1
Requires-Dist: tqdm>=4.50.0
Requires-Dist: json5>=0.9.0
Requires-Dist: bottleneck>=1.3.6

![Pipeline Logo](designer.png " ") 
# machine-learning-comparison-pipeline

Analysis of classification through machine learning is often accomplished with what the researcher is most comfortable
using in the analysis. But that does not mean that the most optimal learner was selected for the research question. It
is also often that feature selection is performed, but only with minimal processing with variation in the selection 
process.

During the analysis of a series of acoustic measurement from candidate propellers designed by the United States Air
Force Academy it was determined that the 711th Human Performance Wing did not want to fall into these limitations. 
The wing developed a importance getter function using sensitivity analysis to determine the feature importance. This
method was applied to random decision forests, support vector machines, neural networks, logistic regressions, and 
nearest neighbor machine learners. 

This package was developed from that research in effort to canonize the process for future work.

#   Usage
##  Define the inputs to the class, including the feature DataFrame, targets Series, the learners and cross-validation

    clf1 = nn.KNeighborsClassifier(n_neighbors=5)
    clf2 = nn.KNeighborsClassifier(n_neighbors=5, weights='distance')
    learners = list([clf1, clf2])
    cv = ms.KFold(n_splits=10)
    dataset = pd.read_csv(str(pathlib.Path(__file__).parents[1]) + '/data/features.csv')
    features = dataset.iloc[:, 1:74]
    targets = dataset['PROPELLER']
    pipe = pipeline.ProcessingPipeline(learners, cv, features, targets)

    pipe.process(72, verbose=True)
	
Cleared for public release on 14 November 2024 with case number AFRL-2024-6348.
