Starting scenario 4, validation against site 3
2022-03-01 13:17:48.740969: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /nopt/slurm/current/lib:/nopt/slurm/current/lib:
2022-03-01 13:17:48.741000: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Cross validation site: 3
Training sites: [0, 1, 2, 4, 5, 6, 7, 8]
Number of files: 8
Number of east files: 4
Number of west files: 4
Source files: ['/projects/pxs/mlclouds/training_data/2016_east_v322/mlclouds_surfrad_east_2016.h5', '/projects/pxs/mlclouds/training_data/2016_west_v322/mlclouds_surfrad_west_2016.h5', '/projects/pxs/mlclouds/training_data/2017_east_v322/mlclouds_surfrad_east_2017.h5', '/projects/pxs/mlclouds/training_data/2017_west_v322/mlclouds_surfrad_west_2017.h5', '/projects/pxs/mlclouds/training_data/2018_east_v322/mlclouds_surfrad_east_2018.h5', '/projects/pxs/mlclouds/training_data/2018_west_v322/mlclouds_surfrad_west_2018.h5', '/projects/pxs/mlclouds/training_data/2019_east_v322/mlclouds_surfrad_east_2019.h5', '/projects/pxs/mlclouds/training_data/2019_west_v322/mlclouds_surfrad_west_2019.h5']
Full config: {'clean_training_data_kwargs': {'filter_clear': False, 'nan_option': 'interp'}, 'epochs_a': 100, 'epochs_b': 100, 'features': ['solar_zenith_angle', 'cloud_type', 'refl_0_65um_nom', 'refl_0_65um_nom_stddev_3x3', 'refl_3_75um_nom', 'temp_3_75um_nom', 'temp_11_0um_nom', 'temp_11_0um_nom_stddev_3x3', 'cloud_probability', 'cloud_fraction', 'air_temperature', 'dew_point', 'relative_humidity', 'total_precipitable_water', 'surface_albedo'], 'hidden_layers': [{'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}], 'learning_rate': 0.0005, 'loss_weights_a': [1, 0], 'loss_weights_b': [0.5, 0.5], 'metric': 'relative_mae', 'n_batch': 32, 'one_hot_categories': {'flag': ['clear', 'ice_cloud', 'water_cloud', 'bad_cloud']}, 'p_fun': 'p_fun_all_sky', 'p_kwargs': {'loss_terms': ['mae_ghi']}, 'phygnn_seed': 0, 'surfrad_window_minutes': 15, 'y_labels': ['cld_opd_dcomp', 'cld_reff_dcomp']}
INFO - 2022-03-01 13:17:57,444 [trainer.py:40] : Trainer: Training on sites [0, 1, 2, 4, 5, 6, 7, 8] from files ['/projects/pxs/mlclouds/training_data/2016_east_v322/mlclouds_surfrad_east_2016.h5', '/projects/pxs/mlclouds/training_data/2016_west_v322/mlclouds_surfrad_west_2016.h5', '/projects/pxs/mlclouds/training_data/2017_east_v322/mlclouds_surfrad_east_2017.h5', '/projects/pxs/mlclouds/training_data/2017_west_v322/mlclouds_surfrad_west_2017.h5', '/projects/pxs/mlclouds/training_data/2018_east_v322/mlclouds_surfrad_east_2018.h5', '/projects/pxs/mlclouds/training_data/2018_west_v322/mlclouds_surfrad_west_2018.h5', '/projects/pxs/mlclouds/training_data/2019_east_v322/mlclouds_surfrad_east_2019.h5', '/projects/pxs/mlclouds/training_data/2019_west_v322/mlclouds_surfrad_west_2019.h5']
INFO - 2022-03-01 13:17:57,444 [trainer.py:49] : Trainer: Training on sites [0, 1, 2, 4, 5, 6, 7, 8] from files ['/projects/pxs/mlclouds/training_data/2016_east_v322/mlclouds_surfrad_east_2016.h5', '/projects/pxs/mlclouds/training_data/2016_west_v322/mlclouds_surfrad_west_2016.h5', '/projects/pxs/mlclouds/training_data/2017_east_v322/mlclouds_surfrad_east_2017.h5', '/projects/pxs/mlclouds/training_data/2017_west_v322/mlclouds_surfrad_west_2017.h5', '/projects/pxs/mlclouds/training_data/2018_east_v322/mlclouds_surfrad_east_2018.h5', '/projects/pxs/mlclouds/training_data/2018_west_v322/mlclouds_surfrad_west_2018.h5', '/projects/pxs/mlclouds/training_data/2019_east_v322/mlclouds_surfrad_east_2019.h5', '/projects/pxs/mlclouds/training_data/2019_west_v322/mlclouds_surfrad_west_2019.h5']
INFO - 2022-03-01 13:17:57,444 [data_handlers.py:60] : Loading training data
DEBUG - 2022-03-01 13:17:57,444 [data_handlers.py:78] : Loading vars ['solar_zenith_angle', 'cloud_type', 'refl_0_65um_nom', 'refl_0_65um_nom_stddev_3x3', 'refl_3_75um_nom', 'temp_3_75um_nom', 'temp_11_0um_nom', 'temp_11_0um_nom_stddev_3x3', 'cloud_probability', 'cloud_fraction', 'air_temperature', 'dew_point', 'relative_humidity', 'total_precipitable_water', 'surface_albedo', 'cld_opd_dcomp', 'cld_reff_dcomp']
DEBUG - 2022-03-01 13:17:57,444 [data_handlers.py:85] : Loading data for site(s) [0, 1, 2, 4, 5, 6, 7, 8], from /projects/pxs/mlclouds/training_data/2016_east_v322/mlclouds_surfrad_east_2016.h5
DEBUG - 2022-03-01 13:17:58,659 [data_handlers.py:103] : 	Shape temp_raw=(140544, 19), temp_all_sky=(140544, 14)
DEBUG - 2022-03-01 13:17:58,663 [data_handlers.py:106] : 	Time step is 30 minutes
DEBUG - 2022-03-01 13:17:58,663 [data_handlers.py:110] : 	Grabbing surface data for 2016 and [0, 1, 2, 4, 5, 6, 7, 8]
DEBUG - 2022-03-01 13:17:58,669 [data_handlers.py:117] : 		Grabbing surface data for bon from /projects/pxs/surfrad/h5/bon_2016.h5
DEBUG - 2022-03-01 13:17:59,371 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:17:59,374 [data_handlers.py:117] : 		Grabbing surface data for tbl from /projects/pxs/surfrad/h5/tbl_2016.h5
DEBUG - 2022-03-01 13:18:00,051 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:00,054 [data_handlers.py:117] : 		Grabbing surface data for dra from /projects/pxs/surfrad/h5/dra_2016.h5
DEBUG - 2022-03-01 13:18:00,739 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:00,743 [data_handlers.py:117] : 		Grabbing surface data for gwn from /projects/pxs/surfrad/h5/gwn_2016.h5
DEBUG - 2022-03-01 13:18:01,428 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:01,431 [data_handlers.py:117] : 		Grabbing surface data for psu from /projects/pxs/surfrad/h5/psu_2016.h5
DEBUG - 2022-03-01 13:18:02,114 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:02,118 [data_handlers.py:117] : 		Grabbing surface data for sxf from /projects/pxs/surfrad/h5/sxf_2016.h5
DEBUG - 2022-03-01 13:18:02,801 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:02,805 [data_handlers.py:117] : 		Grabbing surface data for sgp from /projects/pxs/surfrad/h5/sgp_2016.h5
DEBUG - 2022-03-01 13:18:03,490 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:03,493 [data_handlers.py:117] : 		Grabbing surface data for srrl from /projects/pxs/surfrad/h5/srrl_2016.h5
DEBUG - 2022-03-01 13:18:04,200 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:04,200 [data_handlers.py:85] : Loading data for site(s) [0, 1, 2, 4, 5, 6, 7, 8], from /projects/pxs/mlclouds/training_data/2016_west_v322/mlclouds_surfrad_west_2016.h5
DEBUG - 2022-03-01 13:18:05,224 [data_handlers.py:103] : 	Shape temp_raw=(140544, 19), temp_all_sky=(140544, 14)
DEBUG - 2022-03-01 13:18:05,228 [data_handlers.py:106] : 	Time step is 30 minutes
DEBUG - 2022-03-01 13:18:05,228 [data_handlers.py:110] : 	Grabbing surface data for 2016 and [0, 1, 2, 4, 5, 6, 7, 8]
DEBUG - 2022-03-01 13:18:05,232 [data_handlers.py:117] : 		Grabbing surface data for bon from /projects/pxs/surfrad/h5/bon_2016.h5
DEBUG - 2022-03-01 13:18:05,897 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:05,901 [data_handlers.py:117] : 		Grabbing surface data for tbl from /projects/pxs/surfrad/h5/tbl_2016.h5
DEBUG - 2022-03-01 13:18:06,565 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:06,569 [data_handlers.py:117] : 		Grabbing surface data for dra from /projects/pxs/surfrad/h5/dra_2016.h5
DEBUG - 2022-03-01 13:18:07,237 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:07,240 [data_handlers.py:117] : 		Grabbing surface data for gwn from /projects/pxs/surfrad/h5/gwn_2016.h5
DEBUG - 2022-03-01 13:18:07,900 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:07,903 [data_handlers.py:117] : 		Grabbing surface data for psu from /projects/pxs/surfrad/h5/psu_2016.h5
DEBUG - 2022-03-01 13:18:08,568 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:08,571 [data_handlers.py:117] : 		Grabbing surface data for sxf from /projects/pxs/surfrad/h5/sxf_2016.h5
DEBUG - 2022-03-01 13:18:09,235 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:09,239 [data_handlers.py:117] : 		Grabbing surface data for sgp from /projects/pxs/surfrad/h5/sgp_2016.h5
DEBUG - 2022-03-01 13:18:09,905 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:09,908 [data_handlers.py:117] : 		Grabbing surface data for srrl from /projects/pxs/surfrad/h5/srrl_2016.h5
DEBUG - 2022-03-01 13:18:10,576 [data_handlers.py:134] : 	Shape: temp_surf=(17568, 5)
DEBUG - 2022-03-01 13:18:10,576 [data_handlers.py:85] : Loading data for site(s) [0, 1, 2, 4, 5, 6, 7, 8], from /projects/pxs/mlclouds/training_data/2017_east_v322/mlclouds_surfrad_east_2017.h5
DEBUG - 2022-03-01 13:18:11,603 [data_handlers.py:103] : 	Shape temp_raw=(140160, 19), temp_all_sky=(140160, 14)
DEBUG - 2022-03-01 13:18:11,607 [data_handlers.py:106] : 	Time step is 30 minutes
DEBUG - 2022-03-01 13:18:11,607 [data_handlers.py:110] : 	Grabbing surface data for 2017 and [0, 1, 2, 4, 5, 6, 7, 8]
DEBUG - 2022-03-01 13:18:11,611 [data_handlers.py:117] : 		Grabbing surface data for bon from /projects/pxs/surfrad/h5/bon_2017.h5
DEBUG - 2022-03-01 13:18:12,288 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:12,291 [data_handlers.py:117] : 		Grabbing surface data for tbl from /projects/pxs/surfrad/h5/tbl_2017.h5
DEBUG - 2022-03-01 13:18:12,969 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:12,972 [data_handlers.py:117] : 		Grabbing surface data for dra from /projects/pxs/surfrad/h5/dra_2017.h5
DEBUG - 2022-03-01 13:18:13,649 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:13,652 [data_handlers.py:117] : 		Grabbing surface data for gwn from /projects/pxs/surfrad/h5/gwn_2017.h5
DEBUG - 2022-03-01 13:18:14,328 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:14,331 [data_handlers.py:117] : 		Grabbing surface data for psu from /projects/pxs/surfrad/h5/psu_2017.h5
DEBUG - 2022-03-01 13:18:15,015 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:15,018 [data_handlers.py:117] : 		Grabbing surface data for sxf from /projects/pxs/surfrad/h5/sxf_2017.h5
DEBUG - 2022-03-01 13:18:15,695 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:15,699 [data_handlers.py:117] : 		Grabbing surface data for sgp from /projects/pxs/surfrad/h5/sgp_2017.h5
DEBUG - 2022-03-01 13:18:16,382 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:16,385 [data_handlers.py:117] : 		Grabbing surface data for srrl from /projects/pxs/surfrad/h5/srrl_2017.h5
DEBUG - 2022-03-01 13:18:17,070 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:17,070 [data_handlers.py:85] : Loading data for site(s) [0, 1, 2, 4, 5, 6, 7, 8], from /projects/pxs/mlclouds/training_data/2017_west_v322/mlclouds_surfrad_west_2017.h5
DEBUG - 2022-03-01 13:18:18,199 [data_handlers.py:103] : 	Shape temp_raw=(140160, 19), temp_all_sky=(140160, 14)
DEBUG - 2022-03-01 13:18:18,203 [data_handlers.py:106] : 	Time step is 30 minutes
DEBUG - 2022-03-01 13:18:18,204 [data_handlers.py:110] : 	Grabbing surface data for 2017 and [0, 1, 2, 4, 5, 6, 7, 8]
DEBUG - 2022-03-01 13:18:18,207 [data_handlers.py:117] : 		Grabbing surface data for bon from /projects/pxs/surfrad/h5/bon_2017.h5
DEBUG - 2022-03-01 13:18:18,880 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:18,883 [data_handlers.py:117] : 		Grabbing surface data for tbl from /projects/pxs/surfrad/h5/tbl_2017.h5
DEBUG - 2022-03-01 13:18:19,537 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:19,540 [data_handlers.py:117] : 		Grabbing surface data for dra from /projects/pxs/surfrad/h5/dra_2017.h5
DEBUG - 2022-03-01 13:18:20,190 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:20,193 [data_handlers.py:117] : 		Grabbing surface data for gwn from /projects/pxs/surfrad/h5/gwn_2017.h5
DEBUG - 2022-03-01 13:18:20,845 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:20,848 [data_handlers.py:117] : 		Grabbing surface data for psu from /projects/pxs/surfrad/h5/psu_2017.h5
DEBUG - 2022-03-01 13:18:21,512 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:21,515 [data_handlers.py:117] : 		Grabbing surface data for sxf from /projects/pxs/surfrad/h5/sxf_2017.h5
DEBUG - 2022-03-01 13:18:22,167 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:22,170 [data_handlers.py:117] : 		Grabbing surface data for sgp from /projects/pxs/surfrad/h5/sgp_2017.h5
DEBUG - 2022-03-01 13:18:22,829 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:22,832 [data_handlers.py:117] : 		Grabbing surface data for srrl from /projects/pxs/surfrad/h5/srrl_2017.h5
DEBUG - 2022-03-01 13:18:23,485 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:23,485 [data_handlers.py:85] : Loading data for site(s) [0, 1, 2, 4, 5, 6, 7, 8], from /projects/pxs/mlclouds/training_data/2018_east_v322/mlclouds_surfrad_east_2018.h5
DEBUG - 2022-03-01 13:18:30,053 [data_handlers.py:103] : 	Shape temp_raw=(840960, 19), temp_all_sky=(840960, 14)
DEBUG - 2022-03-01 13:18:30,073 [data_handlers.py:106] : 	Time step is 5 minutes
DEBUG - 2022-03-01 13:18:30,073 [data_handlers.py:110] : 	Grabbing surface data for 2018 and [0, 1, 2, 4, 5, 6, 7, 8]
DEBUG - 2022-03-01 13:18:30,077 [data_handlers.py:117] : 		Grabbing surface data for bon from /projects/pxs/surfrad/h5/bon_2018.h5
DEBUG - 2022-03-01 13:18:30,745 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:30,749 [data_handlers.py:117] : 		Grabbing surface data for tbl from /projects/pxs/surfrad/h5/tbl_2018.h5
DEBUG - 2022-03-01 13:18:31,427 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:31,431 [data_handlers.py:117] : 		Grabbing surface data for dra from /projects/pxs/surfrad/h5/dra_2018.h5
DEBUG - 2022-03-01 13:18:32,109 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:32,113 [data_handlers.py:117] : 		Grabbing surface data for gwn from /projects/pxs/surfrad/h5/gwn_2018.h5
DEBUG - 2022-03-01 13:18:32,788 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:32,792 [data_handlers.py:117] : 		Grabbing surface data for psu from /projects/pxs/surfrad/h5/psu_2018.h5
DEBUG - 2022-03-01 13:18:33,472 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:33,476 [data_handlers.py:117] : 		Grabbing surface data for sxf from /projects/pxs/surfrad/h5/sxf_2018.h5
DEBUG - 2022-03-01 13:18:34,220 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:34,223 [data_handlers.py:117] : 		Grabbing surface data for sgp from /projects/pxs/surfrad/h5/sgp_2018.h5
DEBUG - 2022-03-01 13:18:34,896 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:34,900 [data_handlers.py:117] : 		Grabbing surface data for srrl from /projects/pxs/surfrad/h5/srrl_2018.h5
DEBUG - 2022-03-01 13:18:35,594 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:35,594 [data_handlers.py:85] : Loading data for site(s) [0, 1, 2, 4, 5, 6, 7, 8], from /projects/pxs/mlclouds/training_data/2018_west_v322/mlclouds_surfrad_west_2018.h5
DEBUG - 2022-03-01 13:18:36,816 [data_handlers.py:103] : 	Shape temp_raw=(140160, 19), temp_all_sky=(140160, 14)
DEBUG - 2022-03-01 13:18:36,820 [data_handlers.py:106] : 	Time step is 30 minutes
DEBUG - 2022-03-01 13:18:36,820 [data_handlers.py:110] : 	Grabbing surface data for 2018 and [0, 1, 2, 4, 5, 6, 7, 8]
DEBUG - 2022-03-01 13:18:36,823 [data_handlers.py:117] : 		Grabbing surface data for bon from /projects/pxs/surfrad/h5/bon_2018.h5
DEBUG - 2022-03-01 13:18:37,486 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:37,490 [data_handlers.py:117] : 		Grabbing surface data for tbl from /projects/pxs/surfrad/h5/tbl_2018.h5
DEBUG - 2022-03-01 13:18:38,140 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:38,144 [data_handlers.py:117] : 		Grabbing surface data for dra from /projects/pxs/surfrad/h5/dra_2018.h5
DEBUG - 2022-03-01 13:18:38,807 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:38,811 [data_handlers.py:117] : 		Grabbing surface data for gwn from /projects/pxs/surfrad/h5/gwn_2018.h5
DEBUG - 2022-03-01 13:18:39,464 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:39,467 [data_handlers.py:117] : 		Grabbing surface data for psu from /projects/pxs/surfrad/h5/psu_2018.h5
DEBUG - 2022-03-01 13:18:40,133 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:40,136 [data_handlers.py:117] : 		Grabbing surface data for sxf from /projects/pxs/surfrad/h5/sxf_2018.h5
DEBUG - 2022-03-01 13:18:40,793 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:40,796 [data_handlers.py:117] : 		Grabbing surface data for sgp from /projects/pxs/surfrad/h5/sgp_2018.h5
DEBUG - 2022-03-01 13:18:41,468 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:41,471 [data_handlers.py:117] : 		Grabbing surface data for srrl from /projects/pxs/surfrad/h5/srrl_2018.h5
DEBUG - 2022-03-01 13:18:42,127 [data_handlers.py:134] : 	Shape: temp_surf=(17520, 5)
DEBUG - 2022-03-01 13:18:42,127 [data_handlers.py:85] : Loading data for site(s) [0, 1, 2, 4, 5, 6, 7, 8], from /projects/pxs/mlclouds/training_data/2019_east_v322/mlclouds_surfrad_east_2019.h5
DEBUG - 2022-03-01 13:18:48,748 [data_handlers.py:103] : 	Shape temp_raw=(840960, 19), temp_all_sky=(840960, 14)
DEBUG - 2022-03-01 13:18:48,769 [data_handlers.py:106] : 	Time step is 5 minutes
DEBUG - 2022-03-01 13:18:48,769 [data_handlers.py:110] : 	Grabbing surface data for 2019 and [0, 1, 2, 4, 5, 6, 7, 8]
DEBUG - 2022-03-01 13:18:48,773 [data_handlers.py:117] : 		Grabbing surface data for bon from /projects/pxs/surfrad/h5/bon_2019.h5
DEBUG - 2022-03-01 13:18:49,456 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:49,459 [data_handlers.py:117] : 		Grabbing surface data for tbl from /projects/pxs/surfrad/h5/tbl_2019.h5
DEBUG - 2022-03-01 13:18:50,255 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:50,259 [data_handlers.py:117] : 		Grabbing surface data for dra from /projects/pxs/surfrad/h5/dra_2019.h5
DEBUG - 2022-03-01 13:18:50,949 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:50,953 [data_handlers.py:117] : 		Grabbing surface data for gwn from /projects/pxs/surfrad/h5/gwn_2019.h5
DEBUG - 2022-03-01 13:18:51,639 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:51,642 [data_handlers.py:117] : 		Grabbing surface data for psu from /projects/pxs/surfrad/h5/psu_2019.h5
DEBUG - 2022-03-01 13:18:52,337 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:52,340 [data_handlers.py:117] : 		Grabbing surface data for sxf from /projects/pxs/surfrad/h5/sxf_2019.h5
DEBUG - 2022-03-01 13:18:53,022 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:53,026 [data_handlers.py:117] : 		Grabbing surface data for sgp from /projects/pxs/surfrad/h5/sgp_2019.h5
DEBUG - 2022-03-01 13:18:53,725 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:53,729 [data_handlers.py:117] : 		Grabbing surface data for srrl from /projects/pxs/surfrad/h5/srrl_2019.h5
DEBUG - 2022-03-01 13:18:54,435 [data_handlers.py:134] : 	Shape: temp_surf=(105120, 5)
DEBUG - 2022-03-01 13:18:54,435 [data_handlers.py:85] : Loading data for site(s) [0, 1, 2, 4, 5, 6, 7, 8], from /projects/pxs/mlclouds/training_data/2019_west_v322/mlclouds_surfrad_west_2019.h5
DEBUG - 2022-03-01 13:18:57,843 [data_handlers.py:103] : 	Shape temp_raw=(420480, 19), temp_all_sky=(420480, 14)
DEBUG - 2022-03-01 13:18:57,853 [data_handlers.py:106] : 	Time step is 10 minutes
DEBUG - 2022-03-01 13:18:57,853 [data_handlers.py:110] : 	Grabbing surface data for 2019 and [0, 1, 2, 4, 5, 6, 7, 8]
DEBUG - 2022-03-01 13:18:57,856 [data_handlers.py:117] : 		Grabbing surface data for bon from /projects/pxs/surfrad/h5/bon_2019.h5
DEBUG - 2022-03-01 13:18:58,536 [data_handlers.py:134] : 	Shape: temp_surf=(52560, 5)
DEBUG - 2022-03-01 13:18:58,539 [data_handlers.py:117] : 		Grabbing surface data for tbl from /projects/pxs/surfrad/h5/tbl_2019.h5
DEBUG - 2022-03-01 13:18:59,210 [data_handlers.py:134] : 	Shape: temp_surf=(52560, 5)
DEBUG - 2022-03-01 13:18:59,214 [data_handlers.py:117] : 		Grabbing surface data for dra from /projects/pxs/surfrad/h5/dra_2019.h5
DEBUG - 2022-03-01 13:18:59,901 [data_handlers.py:134] : 	Shape: temp_surf=(52560, 5)
DEBUG - 2022-03-01 13:18:59,904 [data_handlers.py:117] : 		Grabbing surface data for gwn from /projects/pxs/surfrad/h5/gwn_2019.h5
DEBUG - 2022-03-01 13:19:00,567 [data_handlers.py:134] : 	Shape: temp_surf=(52560, 5)
DEBUG - 2022-03-01 13:19:00,571 [data_handlers.py:117] : 		Grabbing surface data for psu from /projects/pxs/surfrad/h5/psu_2019.h5
DEBUG - 2022-03-01 13:19:01,250 [data_handlers.py:134] : 	Shape: temp_surf=(52560, 5)
DEBUG - 2022-03-01 13:19:01,254 [data_handlers.py:117] : 		Grabbing surface data for sxf from /projects/pxs/surfrad/h5/sxf_2019.h5
DEBUG - 2022-03-01 13:19:01,918 [data_handlers.py:134] : 	Shape: temp_surf=(52560, 5)
DEBUG - 2022-03-01 13:19:01,921 [data_handlers.py:117] : 		Grabbing surface data for sgp from /projects/pxs/surfrad/h5/sgp_2019.h5
DEBUG - 2022-03-01 13:19:02,616 [data_handlers.py:134] : 	Shape: temp_surf=(52560, 5)
DEBUG - 2022-03-01 13:19:02,619 [data_handlers.py:117] : 		Grabbing surface data for srrl from /projects/pxs/surfrad/h5/srrl_2019.h5
DEBUG - 2022-03-01 13:19:03,292 [data_handlers.py:134] : 	Shape: temp_surf=(52560, 5)
DEBUG - 2022-03-01 13:19:03,292 [data_handlers.py:136] : Data load complete. Shape df_raw=(2803968, 19), df_all_sky=(2803968, 14), df_surf=(2803968, 5)
DEBUG - 2022-03-01 13:19:04,167 [data_handlers.py:159] : Extracting 2D arrays to run rest2 for clearsky PhyGNN inputs.
DEBUG - 2022-03-01 13:19:16,185 [data_handlers.py:176] : Running rest2 for clearsky PhyGNN inputs.
DEBUG - 2022-03-01 13:21:10,521 [data_handlers.py:194] : Completed rest2 run for clearsky PhyGNN inputs.
INFO - 2022-03-01 13:21:12,195 [data_handlers.py:62] : Prepping training data
DEBUG - 2022-03-01 13:21:12,196 [data_handlers.py:214] : Training data clean kwargs: {'filter_daylight': True, 'filter_clear': False, 'add_cloud_flag': True, 'sza_lim': 89, 'nan_option': 'interp'}
DEBUG - 2022-03-01 13:21:12,196 [data_handlers.py:215] : Shape before cleaning: df_raw=(2803968, 19)
INFO - 2022-03-01 13:21:12,500 [data_cleaners.py:36] : 49.69% of timesteps are daylight
INFO - 2022-03-01 13:21:12,505 [data_cleaners.py:38] : 51.78% of daylight timesteps are cloudy
INFO - 2022-03-01 13:21:12,509 [data_cleaners.py:40] : 3.60% of daylight timesteps are missing cloud type
INFO - 2022-03-01 13:21:12,514 [data_cleaners.py:42] : 33.45% of cloudy daylight timesteps are missing cloud opd
INFO - 2022-03-01 13:21:12,518 [data_cleaners.py:44] : 33.67% of cloudy daylight timesteps are missing cloud reff
DEBUG - 2022-03-01 13:21:12,519 [data_cleaners.py:47] : Column NaN values:
DEBUG - 2022-03-01 13:21:12,521 [data_cleaners.py:50] : 	"gid" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:12,527 [data_cleaners.py:50] : 	"time_index" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:12,531 [data_cleaners.py:50] : 	"solar_zenith_angle" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:12,536 [data_cleaners.py:50] : 	"cloud_type" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:12,540 [data_cleaners.py:50] : 	"refl_0_65um_nom" has 51.63% NaN values
DEBUG - 2022-03-01 13:21:12,544 [data_cleaners.py:50] : 	"refl_0_65um_nom_stddev_3x3" has 51.63% NaN values
DEBUG - 2022-03-01 13:21:12,547 [data_cleaners.py:50] : 	"refl_3_75um_nom" has 3.62% NaN values
DEBUG - 2022-03-01 13:21:12,551 [data_cleaners.py:50] : 	"temp_3_75um_nom" has 3.53% NaN values
DEBUG - 2022-03-01 13:21:12,555 [data_cleaners.py:50] : 	"temp_11_0um_nom" has 3.53% NaN values
DEBUG - 2022-03-01 13:21:12,559 [data_cleaners.py:50] : 	"temp_11_0um_nom_stddev_3x3" has 3.61% NaN values
DEBUG - 2022-03-01 13:21:12,563 [data_cleaners.py:50] : 	"cloud_probability" has 3.61% NaN values
DEBUG - 2022-03-01 13:21:12,567 [data_cleaners.py:50] : 	"cloud_fraction" has 3.61% NaN values
DEBUG - 2022-03-01 13:21:12,571 [data_cleaners.py:50] : 	"air_temperature" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:12,574 [data_cleaners.py:50] : 	"dew_point" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:12,578 [data_cleaners.py:50] : 	"relative_humidity" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:12,582 [data_cleaners.py:50] : 	"total_precipitable_water" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:12,586 [data_cleaners.py:50] : 	"surface_albedo" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:12,590 [data_cleaners.py:50] : 	"cld_opd_dcomp" has 82.88% NaN values
DEBUG - 2022-03-01 13:21:12,594 [data_cleaners.py:50] : 	"cld_reff_dcomp" has 82.94% NaN values
DEBUG - 2022-03-01 13:21:12,594 [data_cleaners.py:53] : Interpolating opd and reff
DEBUG - 2022-03-01 13:21:15,787 [data_cleaners.py:79] : Adding cloud type flag (e.g. flag=[night, clear, ice_cloud, water_cloud, bad_cloud])
INFO - 2022-03-01 13:21:16,080 [data_cleaners.py:99] : Data reduced from 2803968 rows to 1393294 after filters (49.69% of original)
DEBUG - 2022-03-01 13:21:16,205 [data_cleaners.py:105] : Feature flag column has these values: ['clear' 'bad_cloud' 'water_cloud' 'ice_cloud']
INFO - 2022-03-01 13:21:16,205 [data_cleaners.py:107] : Cleaning took 4.0 seconds
DEBUG - 2022-03-01 13:21:16,205 [data_handlers.py:218] : Shape after cleaning: df_train=(1393294, 20)
DEBUG - 2022-03-01 13:21:16,205 [data_handlers.py:221] : Cleaning df_all_sky training data (for pfun).
DEBUG - 2022-03-01 13:21:16,205 [data_handlers.py:222] : Shape before cleaning: df_all_sky=(2803968, 25)
INFO - 2022-03-01 13:21:16,559 [data_cleaners.py:36] : 49.69% of timesteps are daylight
INFO - 2022-03-01 13:21:16,564 [data_cleaners.py:38] : 51.78% of daylight timesteps are cloudy
INFO - 2022-03-01 13:21:16,569 [data_cleaners.py:40] : 3.60% of daylight timesteps are missing cloud type
INFO - 2022-03-01 13:21:16,573 [data_cleaners.py:42] : 33.45% of cloudy daylight timesteps are missing cloud opd
INFO - 2022-03-01 13:21:16,578 [data_cleaners.py:44] : 33.67% of cloudy daylight timesteps are missing cloud reff
DEBUG - 2022-03-01 13:21:16,578 [data_cleaners.py:47] : Column NaN values:
DEBUG - 2022-03-01 13:21:16,581 [data_cleaners.py:50] : 	"gid" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,585 [data_cleaners.py:50] : 	"alpha" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,589 [data_cleaners.py:50] : 	"aod" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,592 [data_cleaners.py:50] : 	"asymmetry" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,598 [data_cleaners.py:50] : 	"cloud_type" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,601 [data_cleaners.py:50] : 	"cld_opd_dcomp" has 82.88% NaN values
DEBUG - 2022-03-01 13:21:16,605 [data_cleaners.py:50] : 	"cld_reff_dcomp" has 82.94% NaN values
DEBUG - 2022-03-01 13:21:16,609 [data_cleaners.py:50] : 	"ozone" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,613 [data_cleaners.py:50] : 	"solar_zenith_angle" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,617 [data_cleaners.py:50] : 	"ssa" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,621 [data_cleaners.py:50] : 	"surface_albedo" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,623 [data_cleaners.py:50] : 	"surface_pressure" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,627 [data_cleaners.py:50] : 	"total_precipitable_water" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,632 [data_cleaners.py:50] : 	"surfrad_dhi" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,637 [data_cleaners.py:50] : 	"surfrad_dni" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,642 [data_cleaners.py:50] : 	"surfrad_ghi" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,645 [data_cleaners.py:50] : 	"doy" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,650 [data_cleaners.py:50] : 	"radius" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,655 [data_cleaners.py:50] : 	"Tuuclr" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,660 [data_cleaners.py:50] : 	"clearsky_ghi" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,666 [data_cleaners.py:50] : 	"clearsky_dni" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,671 [data_cleaners.py:50] : 	"Ruuclr" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,676 [data_cleaners.py:50] : 	"Tddclr" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,681 [data_cleaners.py:50] : 	"Tduclr" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,686 [data_cleaners.py:50] : 	"time_index" has 0.00% NaN values
DEBUG - 2022-03-01 13:21:16,686 [data_cleaners.py:53] : Interpolating opd and reff
DEBUG - 2022-03-01 13:21:19,282 [data_cleaners.py:79] : Adding cloud type flag (e.g. flag=[night, clear, ice_cloud, water_cloud, bad_cloud])
INFO - 2022-03-01 13:21:19,573 [data_cleaners.py:99] : Data reduced from 2803968 rows to 1393294 after filters (49.69% of original)
DEBUG - 2022-03-01 13:21:19,755 [data_cleaners.py:105] : Feature flag column has these values: ['clear' 'bad_cloud' 'water_cloud' 'ice_cloud']
INFO - 2022-03-01 13:21:19,755 [data_cleaners.py:107] : Cleaning took 3.6 seconds
DEBUG - 2022-03-01 13:21:19,756 [data_handlers.py:226] : Shape after cleaning: df_all_sky=(1393294, 26)
DEBUG - 2022-03-01 13:21:19,850 [data_handlers.py:240] : **Shape: df_train=(1393294, 17)
DEBUG - 2022-03-01 13:21:19,878 [data_handlers.py:250] : Shapes: x=(1393294, 15), y=(1393294, 2), p=(1393294, 26)
DEBUG - 2022-03-01 13:21:19,879 [data_handlers.py:253] : Training features: ['solar_zenith_angle', 'refl_0_65um_nom', 'refl_0_65um_nom_stddev_3x3', 'refl_3_75um_nom', 'temp_3_75um_nom', 'temp_11_0um_nom', 'temp_11_0um_nom_stddev_3x3', 'cloud_probability', 'cloud_fraction', 'air_temperature', 'dew_point', 'relative_humidity', 'total_precipitable_water', 'surface_albedo', 'flag']
DEBUG - 2022-03-01 13:21:19,879 [trainer.py:67] : Building PHYGNN model
INFO - 2022-03-01 13:21:19,879 [trainer.py:70] : Using p_fun: <function p_fun_all_sky at 0x2b0a9a97a8b0>
INFO - 2022-03-01 13:21:19,879 [base.py:152] : Active python environment versions: 
{   'numpy': '1.22.2',
    'pandas': '1.2.4',
    'phygnn': '0.0.14',
    'python': '3.8.8 (default, Feb 24 2021, 21:46:12) \n[GCC 7.3.0]',
    'sklearn': '0.24.1',
    'tensorflow': '2.8.0'}
INFO - 2022-03-01 13:21:19,895 [base.py:111] : Successfully initialized model with 17 layers
INFO - 2022-03-01 13:21:19,895 [trainer.py:84] : Training part A - pure data. Loss is [1, 0]
2022-03-01 13:21:29.197976: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /nopt/slurm/current/lib:/nopt/slurm/current/lib:
2022-03-01 13:21:29.198911: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /nopt/slurm/current/lib:/nopt/slurm/current/lib:
2022-03-01 13:21:29.199571: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /nopt/slurm/current/lib:/nopt/slurm/current/lib:
2022-03-01 13:21:29.200237: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /nopt/slurm/current/lib:/nopt/slurm/current/lib:
2022-03-01 13:21:29.200951: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /nopt/slurm/current/lib:/nopt/slurm/current/lib:
2022-03-01 13:21:29.201600: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /nopt/slurm/current/lib:/nopt/slurm/current/lib:
2022-03-01 13:21:29.202236: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /nopt/slurm/current/lib:/nopt/slurm/current/lib:
2022-03-01 13:21:29.203166: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /nopt/slurm/current/lib:/nopt/slurm/current/lib:
2022-03-01 13:21:29.203185: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-03-01 13:21:29.203640: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO - 2022-03-01 13:21:37,342 [phygnn.py:576] : Epoch 0 train loss: 7.01e-01 val loss: 6.88e-01 for "phygnn"
INFO - 2022-03-01 13:21:45,541 [phygnn.py:576] : Epoch 1 train loss: 6.28e-01 val loss: 6.19e-01 for "phygnn"
INFO - 2022-03-01 13:21:54,023 [phygnn.py:576] : Epoch 2 train loss: 5.60e-01 val loss: 5.47e-01 for "phygnn"
INFO - 2022-03-01 13:22:02,429 [phygnn.py:576] : Epoch 3 train loss: 5.26e-01 val loss: 5.11e-01 for "phygnn"
INFO - 2022-03-01 13:22:10,850 [phygnn.py:576] : Epoch 4 train loss: 5.08e-01 val loss: 4.90e-01 for "phygnn"
INFO - 2022-03-01 13:22:19,282 [phygnn.py:576] : Epoch 5 train loss: 4.98e-01 val loss: 4.81e-01 for "phygnn"
INFO - 2022-03-01 13:22:27,673 [phygnn.py:576] : Epoch 6 train loss: 4.92e-01 val loss: 4.75e-01 for "phygnn"
INFO - 2022-03-01 13:22:35,946 [phygnn.py:576] : Epoch 7 train loss: 4.83e-01 val loss: 4.68e-01 for "phygnn"
INFO - 2022-03-01 13:22:44,353 [phygnn.py:576] : Epoch 8 train loss: 4.78e-01 val loss: 4.65e-01 for "phygnn"
INFO - 2022-03-01 13:22:52,833 [phygnn.py:576] : Epoch 9 train loss: 4.77e-01 val loss: 4.63e-01 for "phygnn"
INFO - 2022-03-01 13:23:01,356 [phygnn.py:576] : Epoch 10 train loss: 4.72e-01 val loss: 4.59e-01 for "phygnn"
INFO - 2022-03-01 13:23:09,853 [phygnn.py:576] : Epoch 11 train loss: 4.66e-01 val loss: 4.57e-01 for "phygnn"
INFO - 2022-03-01 13:23:18,467 [phygnn.py:576] : Epoch 12 train loss: 4.68e-01 val loss: 4.57e-01 for "phygnn"
INFO - 2022-03-01 13:23:26,856 [phygnn.py:576] : Epoch 13 train loss: 4.65e-01 val loss: 4.52e-01 for "phygnn"
INFO - 2022-03-01 13:23:35,235 [phygnn.py:576] : Epoch 14 train loss: 4.65e-01 val loss: 4.53e-01 for "phygnn"
INFO - 2022-03-01 13:23:43,583 [phygnn.py:576] : Epoch 15 train loss: 4.60e-01 val loss: 4.50e-01 for "phygnn"
INFO - 2022-03-01 13:23:52,265 [phygnn.py:576] : Epoch 16 train loss: 4.57e-01 val loss: 4.48e-01 for "phygnn"
INFO - 2022-03-01 13:24:00,711 [phygnn.py:576] : Epoch 17 train loss: 4.60e-01 val loss: 4.47e-01 for "phygnn"
INFO - 2022-03-01 13:24:09,299 [phygnn.py:576] : Epoch 18 train loss: 4.56e-01 val loss: 4.46e-01 for "phygnn"
INFO - 2022-03-01 13:24:17,699 [phygnn.py:576] : Epoch 19 train loss: 4.49e-01 val loss: 4.45e-01 for "phygnn"
INFO - 2022-03-01 13:24:25,950 [phygnn.py:576] : Epoch 20 train loss: 4.58e-01 val loss: 4.44e-01 for "phygnn"
INFO - 2022-03-01 13:24:34,282 [phygnn.py:576] : Epoch 21 train loss: 4.50e-01 val loss: 4.41e-01 for "phygnn"
INFO - 2022-03-01 13:24:42,745 [phygnn.py:576] : Epoch 22 train loss: 4.48e-01 val loss: 4.40e-01 for "phygnn"
INFO - 2022-03-01 13:24:51,145 [phygnn.py:576] : Epoch 23 train loss: 4.55e-01 val loss: 4.39e-01 for "phygnn"
INFO - 2022-03-01 13:24:59,799 [phygnn.py:576] : Epoch 24 train loss: 4.50e-01 val loss: 4.36e-01 for "phygnn"
INFO - 2022-03-01 13:25:08,425 [phygnn.py:576] : Epoch 25 train loss: 4.44e-01 val loss: 4.35e-01 for "phygnn"
INFO - 2022-03-01 13:25:16,687 [phygnn.py:576] : Epoch 26 train loss: 4.44e-01 val loss: 4.33e-01 for "phygnn"
INFO - 2022-03-01 13:25:25,144 [phygnn.py:576] : Epoch 27 train loss: 4.39e-01 val loss: 4.32e-01 for "phygnn"
INFO - 2022-03-01 13:25:33,798 [phygnn.py:576] : Epoch 28 train loss: 4.40e-01 val loss: 4.30e-01 for "phygnn"
INFO - 2022-03-01 13:25:42,371 [phygnn.py:576] : Epoch 29 train loss: 4.43e-01 val loss: 4.29e-01 for "phygnn"
INFO - 2022-03-01 13:25:51,047 [phygnn.py:576] : Epoch 30 train loss: 4.41e-01 val loss: 4.26e-01 for "phygnn"
INFO - 2022-03-01 13:25:59,596 [phygnn.py:576] : Epoch 31 train loss: 4.42e-01 val loss: 4.26e-01 for "phygnn"
INFO - 2022-03-01 13:26:08,116 [phygnn.py:576] : Epoch 32 train loss: 4.39e-01 val loss: 4.27e-01 for "phygnn"
INFO - 2022-03-01 13:26:16,712 [phygnn.py:576] : Epoch 33 train loss: 4.40e-01 val loss: 4.25e-01 for "phygnn"
INFO - 2022-03-01 13:26:25,152 [phygnn.py:576] : Epoch 34 train loss: 4.41e-01 val loss: 4.24e-01 for "phygnn"
INFO - 2022-03-01 13:26:33,722 [phygnn.py:576] : Epoch 35 train loss: 4.31e-01 val loss: 4.23e-01 for "phygnn"
INFO - 2022-03-01 13:26:42,283 [phygnn.py:576] : Epoch 36 train loss: 4.31e-01 val loss: 4.22e-01 for "phygnn"
INFO - 2022-03-01 13:26:50,655 [phygnn.py:576] : Epoch 37 train loss: 4.27e-01 val loss: 4.21e-01 for "phygnn"
INFO - 2022-03-01 13:26:59,028 [phygnn.py:576] : Epoch 38 train loss: 4.29e-01 val loss: 4.22e-01 for "phygnn"
INFO - 2022-03-01 13:27:07,497 [phygnn.py:576] : Epoch 39 train loss: 4.29e-01 val loss: 4.22e-01 for "phygnn"
INFO - 2022-03-01 13:27:16,137 [phygnn.py:576] : Epoch 40 train loss: 4.35e-01 val loss: 4.20e-01 for "phygnn"
INFO - 2022-03-01 13:27:24,476 [phygnn.py:576] : Epoch 41 train loss: 4.36e-01 val loss: 4.17e-01 for "phygnn"
INFO - 2022-03-01 13:27:32,962 [phygnn.py:576] : Epoch 42 train loss: 4.29e-01 val loss: 4.16e-01 for "phygnn"
INFO - 2022-03-01 13:27:41,398 [phygnn.py:576] : Epoch 43 train loss: 4.28e-01 val loss: 4.17e-01 for "phygnn"
INFO - 2022-03-01 13:27:49,764 [phygnn.py:576] : Epoch 44 train loss: 4.30e-01 val loss: 4.19e-01 for "phygnn"
INFO - 2022-03-01 13:27:57,905 [phygnn.py:576] : Epoch 45 train loss: 4.24e-01 val loss: 4.15e-01 for "phygnn"
INFO - 2022-03-01 13:28:06,251 [phygnn.py:576] : Epoch 46 train loss: 4.23e-01 val loss: 4.15e-01 for "phygnn"
INFO - 2022-03-01 13:28:14,784 [phygnn.py:576] : Epoch 47 train loss: 4.27e-01 val loss: 4.14e-01 for "phygnn"
INFO - 2022-03-01 13:28:23,414 [phygnn.py:576] : Epoch 48 train loss: 4.22e-01 val loss: 4.13e-01 for "phygnn"
INFO - 2022-03-01 13:28:31,691 [phygnn.py:576] : Epoch 49 train loss: 4.23e-01 val loss: 4.12e-01 for "phygnn"
INFO - 2022-03-01 13:28:39,954 [phygnn.py:576] : Epoch 50 train loss: 4.31e-01 val loss: 4.13e-01 for "phygnn"
INFO - 2022-03-01 13:28:48,299 [phygnn.py:576] : Epoch 51 train loss: 4.31e-01 val loss: 4.11e-01 for "phygnn"
INFO - 2022-03-01 13:28:56,810 [phygnn.py:576] : Epoch 52 train loss: 4.26e-01 val loss: 4.13e-01 for "phygnn"
INFO - 2022-03-01 13:29:05,345 [phygnn.py:576] : Epoch 53 train loss: 4.19e-01 val loss: 4.11e-01 for "phygnn"
INFO - 2022-03-01 13:29:13,847 [phygnn.py:576] : Epoch 54 train loss: 4.17e-01 val loss: 4.10e-01 for "phygnn"
INFO - 2022-03-01 13:29:22,333 [phygnn.py:576] : Epoch 55 train loss: 4.14e-01 val loss: 4.09e-01 for "phygnn"
INFO - 2022-03-01 13:29:30,723 [phygnn.py:576] : Epoch 56 train loss: 4.18e-01 val loss: 4.09e-01 for "phygnn"
INFO - 2022-03-01 13:29:39,130 [phygnn.py:576] : Epoch 57 train loss: 4.17e-01 val loss: 4.08e-01 for "phygnn"
INFO - 2022-03-01 13:29:47,714 [phygnn.py:576] : Epoch 58 train loss: 4.19e-01 val loss: 4.09e-01 for "phygnn"
INFO - 2022-03-01 13:29:56,213 [phygnn.py:576] : Epoch 59 train loss: 4.20e-01 val loss: 4.07e-01 for "phygnn"
INFO - 2022-03-01 13:30:04,638 [phygnn.py:576] : Epoch 60 train loss: 4.24e-01 val loss: 4.07e-01 for "phygnn"
INFO - 2022-03-01 13:30:13,080 [phygnn.py:576] : Epoch 61 train loss: 4.26e-01 val loss: 4.06e-01 for "phygnn"
INFO - 2022-03-01 13:30:21,699 [phygnn.py:576] : Epoch 62 train loss: 4.20e-01 val loss: 4.06e-01 for "phygnn"
INFO - 2022-03-01 13:30:30,189 [phygnn.py:576] : Epoch 63 train loss: 4.11e-01 val loss: 4.04e-01 for "phygnn"
INFO - 2022-03-01 13:30:38,613 [phygnn.py:576] : Epoch 64 train loss: 4.17e-01 val loss: 4.05e-01 for "phygnn"
INFO - 2022-03-01 13:30:46,897 [phygnn.py:576] : Epoch 65 train loss: 4.14e-01 val loss: 4.04e-01 for "phygnn"
INFO - 2022-03-01 13:30:55,132 [phygnn.py:576] : Epoch 66 train loss: 4.10e-01 val loss: 4.04e-01 for "phygnn"
INFO - 2022-03-01 13:31:03,468 [phygnn.py:576] : Epoch 67 train loss: 4.20e-01 val loss: 4.02e-01 for "phygnn"
INFO - 2022-03-01 13:31:11,777 [phygnn.py:576] : Epoch 68 train loss: 4.14e-01 val loss: 4.03e-01 for "phygnn"
INFO - 2022-03-01 13:31:20,321 [phygnn.py:576] : Epoch 69 train loss: 4.12e-01 val loss: 4.03e-01 for "phygnn"
INFO - 2022-03-01 13:31:28,921 [phygnn.py:576] : Epoch 70 train loss: 4.20e-01 val loss: 4.02e-01 for "phygnn"
INFO - 2022-03-01 13:31:37,406 [phygnn.py:576] : Epoch 71 train loss: 4.12e-01 val loss: 4.02e-01 for "phygnn"
INFO - 2022-03-01 13:31:45,930 [phygnn.py:576] : Epoch 72 train loss: 4.18e-01 val loss: 4.02e-01 for "phygnn"
INFO - 2022-03-01 13:31:54,353 [phygnn.py:576] : Epoch 73 train loss: 4.10e-01 val loss: 4.00e-01 for "phygnn"
INFO - 2022-03-01 13:32:02,835 [phygnn.py:576] : Epoch 74 train loss: 4.15e-01 val loss: 4.00e-01 for "phygnn"
INFO - 2022-03-01 13:32:11,281 [phygnn.py:576] : Epoch 75 train loss: 4.15e-01 val loss: 4.01e-01 for "phygnn"
INFO - 2022-03-01 13:32:19,513 [phygnn.py:576] : Epoch 76 train loss: 4.12e-01 val loss: 4.00e-01 for "phygnn"
INFO - 2022-03-01 13:32:28,023 [phygnn.py:576] : Epoch 77 train loss: 4.13e-01 val loss: 4.00e-01 for "phygnn"
INFO - 2022-03-01 13:32:36,475 [phygnn.py:576] : Epoch 78 train loss: 4.10e-01 val loss: 4.02e-01 for "phygnn"
INFO - 2022-03-01 13:32:44,799 [phygnn.py:576] : Epoch 79 train loss: 4.13e-01 val loss: 3.98e-01 for "phygnn"
INFO - 2022-03-01 13:32:53,214 [phygnn.py:576] : Epoch 80 train loss: 4.06e-01 val loss: 3.97e-01 for "phygnn"
INFO - 2022-03-01 13:33:01,732 [phygnn.py:576] : Epoch 81 train loss: 4.07e-01 val loss: 3.98e-01 for "phygnn"
INFO - 2022-03-01 13:33:10,139 [phygnn.py:576] : Epoch 82 train loss: 4.08e-01 val loss: 3.98e-01 for "phygnn"
INFO - 2022-03-01 13:33:18,494 [phygnn.py:576] : Epoch 83 train loss: 4.13e-01 val loss: 3.98e-01 for "phygnn"
INFO - 2022-03-01 13:33:26,891 [phygnn.py:576] : Epoch 84 train loss: 4.10e-01 val loss: 3.97e-01 for "phygnn"
INFO - 2022-03-01 13:33:35,285 [phygnn.py:576] : Epoch 85 train loss: 4.05e-01 val loss: 3.97e-01 for "phygnn"
INFO - 2022-03-01 13:33:43,468 [phygnn.py:576] : Epoch 86 train loss: 4.12e-01 val loss: 3.97e-01 for "phygnn"
INFO - 2022-03-01 13:33:51,899 [phygnn.py:576] : Epoch 87 train loss: 4.08e-01 val loss: 3.96e-01 for "phygnn"
INFO - 2022-03-01 13:33:59,946 [phygnn.py:576] : Epoch 88 train loss: 4.13e-01 val loss: 3.96e-01 for "phygnn"
INFO - 2022-03-01 13:34:08,291 [phygnn.py:576] : Epoch 89 train loss: 4.06e-01 val loss: 3.96e-01 for "phygnn"
INFO - 2022-03-01 13:34:16,723 [phygnn.py:576] : Epoch 90 train loss: 4.07e-01 val loss: 3.96e-01 for "phygnn"
INFO - 2022-03-01 13:34:25,223 [phygnn.py:576] : Epoch 91 train loss: 4.09e-01 val loss: 3.95e-01 for "phygnn"
INFO - 2022-03-01 13:34:33,618 [phygnn.py:576] : Epoch 92 train loss: 4.09e-01 val loss: 3.96e-01 for "phygnn"
INFO - 2022-03-01 13:34:41,957 [phygnn.py:576] : Epoch 93 train loss: 4.01e-01 val loss: 3.93e-01 for "phygnn"
INFO - 2022-03-01 13:34:50,091 [phygnn.py:576] : Epoch 94 train loss: 4.01e-01 val loss: 3.93e-01 for "phygnn"
INFO - 2022-03-01 13:34:58,225 [phygnn.py:576] : Epoch 95 train loss: 4.11e-01 val loss: 3.94e-01 for "phygnn"
INFO - 2022-03-01 13:35:06,538 [phygnn.py:576] : Epoch 96 train loss: 4.03e-01 val loss: 3.93e-01 for "phygnn"
INFO - 2022-03-01 13:35:14,724 [phygnn.py:576] : Epoch 97 train loss: 4.06e-01 val loss: 3.92e-01 for "phygnn"
INFO - 2022-03-01 13:35:22,779 [phygnn.py:576] : Epoch 98 train loss: 4.11e-01 val loss: 3.93e-01 for "phygnn"
INFO - 2022-03-01 13:35:31,201 [phygnn.py:576] : Epoch 99 train loss: 4.07e-01 val loss: 3.93e-01 for "phygnn"
INFO - 2022-03-01 13:35:32,090 [trainer.py:92] : Training part B - data and phygnn. Loss is [0.5, 0.5]
INFO - 2022-03-01 13:35:56,322 [phygnn.py:576] : Epoch 100 train loss: 2.81e-01 val loss: 2.72e-01 for "phygnn"
INFO - 2022-03-01 13:36:10,736 [phygnn.py:576] : Epoch 101 train loss: 2.80e-01 val loss: 2.71e-01 for "phygnn"
INFO - 2022-03-01 13:36:24,293 [phygnn.py:576] : Epoch 102 train loss: 2.76e-01 val loss: 2.71e-01 for "phygnn"
INFO - 2022-03-01 13:36:38,507 [phygnn.py:576] : Epoch 103 train loss: 2.79e-01 val loss: 2.73e-01 for "phygnn"
INFO - 2022-03-01 13:36:52,741 [phygnn.py:576] : Epoch 104 train loss: 2.76e-01 val loss: 2.72e-01 for "phygnn"
INFO - 2022-03-01 13:37:06,451 [phygnn.py:576] : Epoch 105 train loss: 2.76e-01 val loss: 2.71e-01 for "phygnn"
INFO - 2022-03-01 13:37:20,942 [phygnn.py:576] : Epoch 106 train loss: 2.79e-01 val loss: 2.71e-01 for "phygnn"
INFO - 2022-03-01 13:37:35,238 [phygnn.py:576] : Epoch 107 train loss: 2.79e-01 val loss: 2.71e-01 for "phygnn"
INFO - 2022-03-01 13:37:49,012 [phygnn.py:576] : Epoch 108 train loss: 2.79e-01 val loss: 2.71e-01 for "phygnn"
INFO - 2022-03-01 13:38:02,798 [phygnn.py:576] : Epoch 109 train loss: 2.76e-01 val loss: 2.71e-01 for "phygnn"
INFO - 2022-03-01 13:38:16,511 [phygnn.py:576] : Epoch 110 train loss: 2.76e-01 val loss: 2.71e-01 for "phygnn"
INFO - 2022-03-01 13:38:30,950 [phygnn.py:576] : Epoch 111 train loss: 2.78e-01 val loss: 2.71e-01 for "phygnn"
INFO - 2022-03-01 13:38:44,740 [phygnn.py:576] : Epoch 112 train loss: 2.82e-01 val loss: 2.71e-01 for "phygnn"
INFO - 2022-03-01 13:38:58,894 [phygnn.py:576] : Epoch 113 train loss: 2.78e-01 val loss: 2.70e-01 for "phygnn"
INFO - 2022-03-01 13:39:12,417 [phygnn.py:576] : Epoch 114 train loss: 2.76e-01 val loss: 2.71e-01 for "phygnn"
INFO - 2022-03-01 13:39:26,405 [phygnn.py:576] : Epoch 115 train loss: 2.75e-01 val loss: 2.70e-01 for "phygnn"
INFO - 2022-03-01 13:39:40,344 [phygnn.py:576] : Epoch 116 train loss: 2.82e-01 val loss: 2.70e-01 for "phygnn"
INFO - 2022-03-01 13:39:54,248 [phygnn.py:576] : Epoch 117 train loss: 2.76e-01 val loss: 2.71e-01 for "phygnn"
INFO - 2022-03-01 13:40:08,588 [phygnn.py:576] : Epoch 118 train loss: 2.79e-01 val loss: 2.70e-01 for "phygnn"
INFO - 2022-03-01 13:40:22,728 [phygnn.py:576] : Epoch 119 train loss: 2.78e-01 val loss: 2.70e-01 for "phygnn"
INFO - 2022-03-01 13:40:36,724 [phygnn.py:576] : Epoch 120 train loss: 2.79e-01 val loss: 2.71e-01 for "phygnn"
INFO - 2022-03-01 13:40:50,235 [phygnn.py:576] : Epoch 121 train loss: 2.78e-01 val loss: 2.70e-01 for "phygnn"
INFO - 2022-03-01 13:41:03,981 [phygnn.py:576] : Epoch 122 train loss: 2.77e-01 val loss: 2.70e-01 for "phygnn"
INFO - 2022-03-01 13:41:17,999 [phygnn.py:576] : Epoch 123 train loss: 2.79e-01 val loss: 2.70e-01 for "phygnn"
INFO - 2022-03-01 13:41:32,098 [phygnn.py:576] : Epoch 124 train loss: 2.77e-01 val loss: 2.70e-01 for "phygnn"
INFO - 2022-03-01 13:41:46,088 [phygnn.py:576] : Epoch 125 train loss: 2.77e-01 val loss: 2.70e-01 for "phygnn"
ERROR - 2022-03-01 13:41:57,797 [phygnn.py:461] : phygnn calculated a NaN loss value!
Traceback (most recent call last):
  File "k_fold.py", line 50, in <module>
    t = Trainer(train_sites=train_sites, train_files=files, config=config)
  File "/home/gbuster/code/mlclouds/mlclouds/trainer.py", line 95, in __init__
    out = model.train_model(self.x, self.y, self.p,
  File "/home/gbuster/miniconda3/envs/nsrdb/lib/python3.8/site-packages/phygnn/model_interfaces/phygnn_model.py", line 187, in train_model
    diagnostics = self.model.fit(x, y, p,
  File "/home/gbuster/miniconda3/envs/nsrdb/lib/python3.8/site-packages/phygnn/phygnn.py", line 570, in fit
    tr_loss, tr_nn_loss, tr_p_loss = self.run_gradient_descent(
  File "/home/gbuster/miniconda3/envs/nsrdb/lib/python3.8/site-packages/phygnn/phygnn.py", line 482, in run_gradient_descent
    grad, loss, nn_loss, p_loss = self._get_grad(x, y_true, p, p_kwargs)
  File "/home/gbuster/miniconda3/envs/nsrdb/lib/python3.8/site-packages/phygnn/phygnn.py", line 473, in _get_grad
    loss, nn_loss, p_loss = self.calc_loss(y_true, y_predicted,
  File "/home/gbuster/miniconda3/envs/nsrdb/lib/python3.8/site-packages/phygnn/phygnn.py", line 462, in calc_loss
    raise ArithmeticError(msg)
ArithmeticError: phygnn calculated a NaN loss value!
