Metadata-Version: 2.0
Name: alpha-factory
Version: 0.3.3
Summary: generate alpha factors
Home-page: UNKNOWN
Author: Yili Peng
Author-email: yili_peng@outlook.com
License: UNKNOWN
Platform: UNKNOWN
Requires-Dist: RNWS (>=0.2.1)
Requires-Dist: alphalens
Requires-Dist: data-box
Requires-Dist: empyrical
Requires-Dist: single-factor-model (>=0.3.2)

This programme is to automatically generate alpha factors and filter
relatively good factors with back-testing methods. Time consuming parts
are optimized with ``numba`` packages.

Dependencies
------------

-  python >= 3.5
-  pandas >= 0.22.0
-  numpy >= 1.14.0
-  RNWS >= 0.2.1
-  numba >= 0.38.0
-  single_factor_model>=0.3.0
-  IPython 5.1.0
-  empyrical
-  alphalens

Note: It is best to use the latest version of ``llvmlite`` in order to
make ``numba`` work properly. Otherwise it may couse a kernel-dies
situation.

Example
-------

load packages and read in data
------------------------------

.. code:: python

   from alpha_factory import generator_class,get_memory_use_pct,clean
   from RNWS import read
   import numpy as np
   import pandas as pd
   start=20180101
   end=20180331
   factor_path='.'
   frame_path='.'

   df=pd.read_csv(frame_path+'/frames.csv')

   ## read in data

   re=read.read_df('./re',file_pattern='re',start=start,end=end)
   cap=read.read_df('./cap',file_pattern='cap',header=0,dat_col='cap',start=start,end=end)
   open_price,close,vwap,adj,high,low,volume,sus=read.read_df('./mkt_data',file_pattern='mkt',start=start,end=end,header=0,dat_col=['open','close','vwap','adjfactor','high','low','volume','sus'])
   ind1,ind2,ind3=read.read_df('./ind',file_pattern='ind',start=start,end=end,header=0,dat_col=['level1','level2','level3'])
   inx_weight=read.read_df('./ZZ800_weight','Stk_ZZ800',start=start,end=end,header=None,inx_col=1,dat_col=3)

Note:\ ``frames`` contains columns as:
``df_name,equation,dependency,type``, where ``type`` includes
``df,cap,group``. In this case ``frames.csv`` have ``df_name``:
``re,cap,open_price,close,vwap,high,low,volume,ind1,ind2,ind3``.

You can also read data by using ``pd.read_csv`` directly depending on
how you store your data.

start to generate
-----------------

.. code:: python

   parms={'re':close.mul(adj).pct_change()
          ,'cap':cap
          ,'open_price':open_price
          ,'close':close
          ,'vwap':vwap
          ,'high':high
          ,'low':low
          ,'volume':volume
          ,'ind1':ind1
          ,'ind2':ind2
          ,'ind3':ind3}

   with generator_class(df,factor_path,**parms) as gen:
       gen.generator(batch_size=3,name_start='a')
       gen.generator(batch_size=3,name_start='a')
       gen.output_df(path=frame_path+'/frames_new.csv')

continue to generate with existing frames and factors
-----------------------------------------------------

.. code:: python

   with generator_class(df,factor_path,**parms) as gen:
       gen.reload_df(path=frame_path+'/frames_new.csv')
       gen.reload_factors(align=True)
       clean()
       for i in range(5):
           gen.generator(batch_size=2,name_start='a')
           print('step %d memory usage:\t %.1f%% \n'%(i,get_memory_use_pct()))
           if get_memory_use_pct()>80:
               break
       gen.output_df(path=frame_path+'/frames_new2.csv')

Note: It is very important to ``align`` all factors and initial
dataframes before generating.

you can also choose how to store your factors by setting
``store_method``

backtesting with stratified sampling approach and ic-ir meansure after generation
---------------------------------------------------------------------------------

.. code:: python

   data_box_param={'ind':ind1
               ,'price':vwap*adjfactor
               ,'sus':sus
               ,'ind_weight':inx_weight
               ,'path':'./databox'
               }

   back_test_param={'sharpe_ratio_thresh':3
                    ,'n':5
                    ,'out_path':'.'
                    ,'back_end':'loky'
                    ,'n_jobs':6
                    ,'detail_root_path':None
                    ,'double_side_cost':0.003
                    ,'rf':0.03
                    }

   icir_param={'ir_thresh':0.4
               ,'out_path':'.'
               ,'back_end':'loky'
               ,'n_jobs':6
               }

   with generator_class(df,factor_path,**parms) as gen: 
       for i in range(5):
           gen.generator(batch_size=2,name_start='a')
           gen.output_df(path=frame_path+'/frames_new.csv')
           gen.getOrCreate_databox(**data_box_param)
           gen.back_test(**back_test_param)
           gen.icir(**icir_param)
           clean()
           if get_memory_use_pct()>90:
               print('Memory exceeded')
               break

generate script of factors
--------------------------

.. code:: python

   from alpha_factory import write_file
   import pandas as pd
   df2=pd.read_csv(frame_path+'/frames_new.csv')
   write_file(df2,'script.py')

locate a factor
---------------

.. code:: python

   from alpha_factory.utilise import get_factor_path
   factor_name='a0'
   path=get_factor_path(factor_path,factor_name)

only when ``storage_method='byTime'``

use your own functions
----------------------

To use your own functions you need to append your code in class
``functions`` from ``basic_functions.py`` in the sourse file and also
append the corresponding names in ``functions.csv`` from ``data`` file
in the sourse file.

After that you can set ``debug=True`` in ``generator`` function to check
if there is any bug from all those functions. If indeed there is, a new
embeded ipython would be activated to help you find out what is going on
in the loop.


