Examples
========

Load the PMF
------------

Given an output folder in `/home/myname/Documents/PMF/GRE-cb/MobilAir_woOrga` that looked like:

```
MobilAir_woOrga
├── GRE-cb_BaseErrorEstimationSummary.xlsx
├── GRE-cb_base.xlsx
├── GRE-cb_boot.xlsx
├── GRE-cb_ConstrainedDISPest.dat
├── GRE-cb_ConstrainedDISPres1.txt
├── GRE-cb_ConstrainedDISPres2.txt
├── GRE-cb_ConstrainedDISPres3.txt
├── GRE-cb_ConstrainedDISPres4.txt
├── GRE-cb_ConstrainedErrorEstimationSummary.xlsx
├── GRE-cb_Constrained.xlsx
├── GRE-cb_diagnostics.xlsx
├── GRE-cb_DISPest.dat
├── GRE-cb_DISPres1.txt
├── GRE-cb_DISPres2.txt
├── GRE-cb_DISPres3.txt
├── GRE-cb_DISPres4.txt
├── GRE-cb_Gcon_profile_boot.xlsx
├── GRE-cb_rotational_comments.txt
└── GRE-cb_sourcecontributions.xls

```

in order to convert them to a PMF object, run the following command :

```python
from py4pm.pmfutilities import PMF

grecb = PMF(site="GRE-cb", BDIR="/home/myname/Documents/PMF/GRE-fr/MobilAir_woOrga")

```

Now, `grecb` is an instance of a PMF object, and has a lot of `reader` and `ploter`.

Read the data
-------------

### Organization

The `read` class of the PMF object give access to different reader to retreive data from
the different xlsx files outputed by the EPA PMF5 software.

They all start by `read_base*` or `read_constrained*` name, for the base and constrained
run, respectively.

The special method `read_metadata` is used to retrieve the factors names and species names
from the `_base.xlsx` files, and use them everywhere else. It also try to set the total
variable name if any (one of PM10, PM2.5, PMrecons, PM10rec, PM10recons, otherwise try to
guess), used to convert unit and to be the default variable to plot.

For now, the following readers are implemented :

 - [read_metadata](api_py4pm.html#py4pm.pmfutilities.ReaderAccessor.read_metadata)
 - [read_base_contributions](api_py4pm.html#py4pm.pmfutilities.ReaderAccessor.read_base_contributions)
 - [read_base_profiles](api_py4pm.html#py4pm.pmfutilities.ReaderAccessor.read_base_profiles)
 - [read_base_bootstrap](api_py4pm.html#py4pm.pmfutilities.ReaderAccessor.read_base_bootstrap)
 - [read_base_uncertainties_summary](api_py4pm.html#py4pm.pmfutilities.ReaderAccessor.read_base_uncertainties_summary)
 - [read_constrained_contributions](api_py4pm.html#py4pm.pmfutilities.ReaderAccessor.read_constrained_contributions)
 - [read_constrained_profiles](api_py4pm.html#py4pm.pmfutilities.ReaderAccessor.read_constrained_profiles)
 - [read_constrained_bootstrap](api_py4pm.html#py4pm.pmfutilities.ReaderAccessor.read_constrained_bootstrap)
 - [read_constrained_uncertainties_summary](api_py4pm.html#py4pm.pmfutilities.ReaderAccessor.read_constrained_uncertainties_summary)

### Contribution

The contributions of the factors (`G` matrix) are read from the `_base.xlsx` and
`_Constrained.xlsx` files, sheet `contributions`.
You can read them using the reader `read_base_contributions` and
`read_constrained_contributions`:

```python
grecb.read.read_base_contributions()
grecb.read.read_constrained_contributions()

```

And now, the `grecb` object has a `dfcontrib_b` and `dfcontrib_c` attributes (`_b` for the
base run, `_c` for the constrained run):

```python
>>> grecb.dfcontrib_c

            Sulfate-rich  Nitrate-rich  ...  Biomass burning  Sea/road salt  Mineral dust
Date                                    ...
2017-02-28      0.321580     -0.105980  ...          0.19419       0.606290      0.182880
2017-03-03      0.429480     -0.038802  ...          0.61595       0.050129      0.382890
2017-03-06     -0.098123     -0.151530  ...          0.53346       4.636400      0.272410
2017-03-09      0.643500     -0.002527  ...          1.09060       0.153200      1.083600
2017-03-12      0.664090      0.308390  ...          1.70740      -0.200000      0.846930

```

which is the `G` matrix, in normalized unit. 

### Chemical profiles

The chemical profiles (or simply profiles) is the `F` matrix of the PMF (in `µg/m³`) and
are read from the `_base.xslx` and `_Constrained.xlsx` files, sheet `Profiles`.
You can read them using the reader `read_base_profiles` and `read_constrained_profiles`:

```python
grecb.read.read_base_profile()
grecb.read.read_constrained_profile()

```

and `grecb` has now a not null `dfprofiles_b` and `dfprofiles_c` dataframe :

```python
>>> grecb.dfprofiles_c
              Sulfate-rich  Nitrate-rich  ...  Biomass burning  Sea/road salt  Mineral dust
specie                                    ... 
PMrecons          4.402500      2.421300  ...         3.027900       0.364280      2.009600
OC*               1.225300      0.000000  ...         1.308900       0.041038      0.428110
EC                0.162970      0.000000  ...         0.347050       0.019199      0.030703
Cl-               0.000000      0.002425  ...         0.026819       0.109070      0.000000
NO3-              0.300660      1.702200  ...         0.093396       0.000000      0.000000
SO42-             0.977680      0.010441  ...         0.092800       0.032969      0.189890
...                    ...           ...  ...             ...            ...           ...

```

The values are in `µg/m³`.

### Uncertainties

#### Summary

You can also read the bootstrap and DISP results from the
`_BaseErrorEstimationSummary.xlsx` and `_ConstrainedErrorEstimationSummary.xlsx` files.

```python
grecb.read.read_base_summary()
grecb.read.read_constrained_summary()

```

and now, you have access to `df_uncertainties_summary_b` and `df_uncertainties_summary_c`:
the summaries of the BS, DISP and BS-DISP uncertainties for each profiles and species.

```python
>>> grecb.df_uncertainties_summary_c
                       Constrained base run    BS 5th  BS median   BS 95th  BS-DISP 5th  BS-DISP average  BS-DISP 95th  DISP Min  DISP average  DISP Max
profile      specie
Sulfate-rich PMrecons              4.402500  4.261867   4.511374  4.709612          NaN              NaN           NaN  3.788500      4.337850  4.887200
             OC*                   1.225300  0.822712   1.161025  1.702325          NaN              NaN           NaN  0.988480      1.211690  1.434900
             EC                    0.162970  0.051262   0.211147  0.436615          NaN              NaN           NaN  0.121070      0.213030  0.304990
             Cl-                   0.000000  0.000000   0.000000  0.000000          NaN              NaN           NaN  0.000000      0.006156  0.012311
             NO3-                  0.300660  0.000000   0.346984  0.563892          NaN              NaN           NaN  0.068862      0.260436  0.452010
...                                     ...       ...        ...       ...          ...              ...           ...       ...           ...       ...
Mineral dust Se                    0.000008  0.000000   0.000012  0.000029          NaN              NaN           NaN  0.000000      0.000023  0.000046
             Sn                    0.000000  0.000000   0.000032  0.000154          NaN              NaN           NaN  0.000000      0.000069  0.000139
             Ti                    0.002545  0.001121   0.001750  0.002546          NaN              NaN           NaN  0.002881      0.003464  0.004047
             V                     0.000265  0.000063   0.000145  0.000249          NaN              NaN           NaN  0.000265      0.000278  0.000290
             Zn                    0.000218  0.000000   0.000030  0.001286          NaN              NaN           NaN  0.000000      0.000177  0.000354

```

#### All bootstrap profiles

If you want to retreive the individual bootstrap results, read from
`_boot.xlsx` and `_Gcon_profile_boot.xlsx`:

```python
grecb.read.read_base_bootstrap()
grecb.read.read_constrained_bootstrap()

```

and now you have access to `dfBS_profile_b` and `dfBS_profile_c`, which are all the
bootstrap chemical profiles for the base and constrained run, respectively.

```python
>>> grecb.dfBS_profile_c
                              Boot0     Boot1     Boot2  ...    Boot97    Boot98   Boot100
specie   profile                                         ... 
PMrecons Sulfate-rich      4.412330  2.259480  4.330630  ...  3.191810  4.041220  3.109190
         Nitrate-rich      2.462740  2.254470  2.609910  ...  2.068200  2.349640  2.404520
         Industrial        0.259120  0.289952  0.474484  ...  0.214298  0.206250  0.875102
         Primary biogenic  0.579702  1.437820  0.633064  ...  1.290640  0.358833  0.296207
         Primary traffic   1.862990  1.178150  1.711440  ...  1.171830  1.974060  1.678320
...                             ...       ...       ...  ...       ...       ...       ...
Zn       Marine SOA        0.000826  0.002239  0.001256  ...  0.000265  0.000389  0.000436
         Aged seasalt      0.000000  0.000000  0.000000  ...  0.000814  0.000304  0.001018
         Biomass burning   0.002404  0.001699  0.002053  ...  0.002012  0.002270  0.001188
         Sea/road salt     0.000625  0.000457  0.000848  ...  0.000234  0.000596  0.000187
         Mineral dust      0.000000  0.000000  0.000000  ...  0.001355  0.000000  0.000000

```

as well as `dfbootstrap_mapping_b` and `dfbootstrap_mapping_c`, which are the
tables of the mapping between reference and BS factors:

```python
>>> grecb.dfbootstrap_mapping_c
                    Sulfate-rich Nitrate-rich Industrial Primary biogenic Primary traffic Marine SOA Aged seasalt Biomass burning Sea/road salt Mineral dust unmapped
BF-Sulfate-rich               94            0          2                0               3          0            0               0             0            0        0
BF-Nitrate-rich                0           99          0                0               0          0            0               0             0            0        0
BF-Industrial                  0            0         99                0               0          0            0               0             0            0        0
BF-Primary biogenic            0            0          0               99               0          0            0               0             0            0        0
BF-Primary traffic             0            0          0                0              99          0            0               0             0            0        0
BF-Marine SOA                  0            0          0                0               1         98            0               0             0            0        0
BF-Aged seasalt                0            0          0                0               0          0           99               0             0            0        0
BF-Biomass burning             0            0          0                0               0          0            0              99             0            0        0
BF-Sea/road salt               0            0          0                0               0          0            0               0            99            0        0
BF-Mineral dust                0            0          0                0               0          0            0               0             0           99        0

```


Plot utilities
--------------

### Chemical profile (per microgram of total variable)

### Chemical profile (in percentage of the sum of each species)

### Contribution time series and uncertainties

```python
grecb.plot.plot_contrib(profiles=["Primary biogenic"])

```

will produce the following graph 

```eval_rst
.. figure:: images/timeseries_POA.png
   :scale: 50 %
   :alt: Time series of POA
   :align: center

   Primary biogenic factor contribution to the total variable.

```

Since the EPA PMF5 does not output the chemical profile (F) matrix of the boostrap, the uncertainties is estimated by computing the species concentration given the F matrix of the reference run and the G matrix of the bootstrap run. As a result, the output is "hacky" since in the bootstrap method, bith the F and G matrix are changing. If you want to remove them, just pass `BS=False` to the method.


Utilities
---------

### Convert to cubic meter

In order to have the contributions in `µg/m³`, which is given by `G⋅F`, we need to know
both the chemical profile `F` and the contribution `G`.
And we can easily reconstruct the time serie in `µg/m³` of each specie for every profile
by simple multiplication of the timeserie by the concentration in the chemical profile.
Since this is a very often computation, the method `to_cubic_metter` does just that :

```python
>>> grecb.to_cubic_metter()
            Sulfate-rich  Nitrate-rich  ... Biomass burning  Sea/road salt  Mineral dust
Date                                    ...
2017-02-28      1.415756     -0.256609  ...        0.587988       0.220859      0.367516
2017-03-03      1.890786     -0.093951  ...        1.865035       0.018261      0.769456
2017-03-06     -0.431987     -0.366900  ...        1.615264       1.688948      0.547435
2017-03-09      2.833009     -0.006120  ...        3.302228       0.055808      2.177603
2017-03-12      2.923656      0.746705  ...        5.169836      -0.072856      1.701991
...                  ...           ...  ...             ...            ...           ...


```

Note that `to_cubic_metter` use by default the constrained run, all the profile
and the total variable, but you can specify other conditions (see [the doc of
this method](api_py4pm.html#py4pm.pmfutilities.PMF.to_cubic_meter)).

### Relative contributions of species to the total mass

By default, the profile matrix `F` is in µg/m³. But it's often convenient to know the
relative contribution of each species to the "total variable" mass (for instance, percent of
contribution of each specie to the $PM_10$).
This result is the ratio of each species in a profile to the total variable.

The method `to_relative_mass` conveniently handle it, and return you a new dataframe:

```python
>>> grecb.to_relative_mass()
              Sulfate-rich  Nitrate-rich  ... Biomass burning  Sea/road salt  Mineral dust
specie                                    ...
PMrecons          1.000000      1.000000  ...        1.000000       1.000000      1.000000
OC*               0.278319      0.000000  ...        0.432280       0.112655      0.213032
EC                0.037018      0.000000  ...        0.114617       0.052704      0.015278
Cl-               0.000000      0.001002  ...        0.008857       0.299413      0.000000
NO3-              0.068293      0.703011  ...        0.030845       0.000000      0.000000
SO42-             0.222074      0.004312  ...        0.030648       0.090505      0.094491
...                    ...           ...  ...             ...            ...           ...

```

The values are now in `%` of the PMrecons mass.

### Relative contribution of the factor for each species

Another usefull information is how much a given specie is apportioned by all
factors, denoted as the *total specie sum* graph in the EPA PMF5 software. It is
the amount of a given specie in a factor divided by the sum of this specie in
all factors.

The method `get_total_specie_sum` return this value for every species in all profiles:

```python
>>> grecb.get_total_specie_sum()
              Sulfate-rich  Nitrate-rich  ...  Biomass burning  Sea/road salt  Mineral dust
specie                                    ...
PMrecons         27.520080     15.135575  ...        18.927439       2.277119     12.562033
OC*              30.440474      0.000000  ...        32.517372       1.019519     10.635658
EC               14.525003      0.000000  ...        30.931475       1.711146      2.736462
Cl-               0.000000      1.506558  ...        16.659544      67.752580      0.000000
NO3-             11.676593     66.107550  ...         3.627177       0.000000      0.000000
SO42-            66.571611      0.710942  ...         6.318883       2.244906     12.929878
...                    ...           ...  ...             ...            ...           ...

```

In this example, the *Biomass burning* factor apportion 18% of the total
PMrecons, 32% of the OC*, 30% of the EC, etc. We also see that the *NO3-* is
mainly apportioned by the *Nitrate-rich* factor (66%).