# Baggianalysis

Baggianalysis is a library aimed at simplifying the analysis of particle-based simulations. It makes it easy to parse, convert and analyse trajectories generated by simulation codes in an agnostic way. It is written in C++ and provides Python bindings. It is modular and can be extended from C++ and Python.

```eval_rst
.. toctree::
   :maxdepth: 2
   
   install.md
```

## A simple example

The following code imports the `baggianalysis` module, parses a single LAMMPS data file, initialises its topology and perform some computations:

```python
import baggianalysis as ba
import numpy as np
import sys

if len(sys.argv) < 2:
    print("Usage is %s data_file" % sys.argv[0])
    exit(1)

# initialise a parser for LAMMPS data files with atom_style "bond"
parser = ba.LAMMPSDataFileParser("bond")
# parse the system
system = parser.make_system(sys.argv[1])
# initialise the topology from the same file
topology = ba.Topology.make_topology_from_file(sys.argv[1], ba.parse_LAMMPS_topology)
# apply the topology to the system
topology.apply(system)

print("Number of molecules: %d" % len(system.molecules()))
# compute the centres of mass of all the molecules and store them in a list 
coms = list(map(lambda mol: mol.com(), system.molecules()))
# print the centres of mass to the "coms.dat" file
np.savetxt("coms.dat", coms)
```
	
The library makes it straightforward to work with whole trajectories. Here is an example where we compute the centre of mass of the first molecule of the system averaged over a whole trajectory:

```python
import baggianalysis as ba
import numpy as np
import sys

if len(sys.argv) < 3:
    print("Usage is %s topology_file dir" % sys.argv[0])
    exit(1)
	
parser = ba.LAMMPSDataFileParser("bond")
trajectory = ba.FullTrajectory(parser)
# the first parameter is the directory where the trajectory is stored
# the second parameter is the pattern that will be used to match the filenames
# the third parameter is True if we want to sort the files, False otherwise 
trajectory.initialise_from_folder(sys.argv[2], "no_*", True)
topology = ba.Topology.make_topology_from_file(sys.argv[1], ba.parse_LAMMPS_topology)

com = np.array([0., 0., 0.])
for system in trajectory.frames:
    topology.apply(system)
    com += system.molecules()[0].com()
    
print("The average COM is: %lf %lf %lf" % tuple(com / len(trajectory.frames)))
```
	
Note that baggianalysis provides a {class}`~baggianalysis.core.LazyTrajectory` class that parses files one by one to avoid taking up too much memory. This can be useful when dealing with very large trajectories.

## Features

* Supports parsing of Gromacs, LAMMPS and oxDNA configurations and trajectories out of the box. See [here](extending/parser.md) for instructions about how to write custom parsers.
* Configurations can be pre-filtered (by excluding some particles, or modifying others). See [here](core/filters.md) for a list of filters.
* Makes available some common (and less commond) observables (mean-squared displacement, radial distribution function, bond-order parameters, *etc.*). See [here](core/observables.md) for the complete list of observables.

## Main classes

* Each particle is an instance of the {class}`~baggianalysis.core.Particle` class.
* Simulation snapshots are stored in {class}`~baggianalysis.core.System` objects that have several attributes that allow to retrieve the properties of the particles they contain.
* Multiple systems (also called *frames* in this context) can be stored in a [trajectory object](core/trajectories.md).
* The library provides a set of built-in [observables](core/observables.md) that can be used to analyse both single systems and whole trajectories.
* The {class}`~baggianalysis.core.Topology` class can be used to manage the topology of a configuration. Topologies can be initialised in two ways:
  * by hand, using the {meth}`~baggianalysis.core.Topology.make_empty_topology` static method to create a new topology and then adding bonds one after the other with the {class}`~baggianalysis.core.Topology.add_bond` method
  * by using an helper function to parse the topology out of a file through the {meth}`~baggianalysis.core.Topology.make_topology_from_file` static method. Baggianalysis comes with some [ready-made](core/topology.md) functions that can be used to parse topologies.

## Logging

Several library methods and functions output some logging information, which by default is printed to the standard error. This behaviour can be altered by using the {meth}`~baggianalysis.core.set_logging_mode` static method. Here are a few examples:

```python
import baggianalysis as ba

ba.set_logging_mode(ba.STDERR) # this is the default

ba.set_logging_mode(ba.SILENT) # switch off logging

ba.set_logging_mode(ba.FILE)   # redirect logging to "ba_log.txt"

ba.set_logging_mode(ba.FILE, "my_log.txt")   # redirect logging to "my_log.txt"
```

## Library API

```eval_rst
.. toctree::
   :maxdepth: 2
   
   core/core.md
   converters.md
   traj.md
   utils.md
```

## Extending baggianalysis

```eval_rst
.. toctree::
   :maxdepth: 2
   
   extending/parser.md
   extending/topology.md
```

## Notes

* By default, the core library is compiled dynamically. However, if Python bindings are enabled, the core library is compiled statically.
* The timestep associated to a configuration **must** be an integer number. If your preferred format stores it as a floating-precision number, your parser will have to find a way of converting that to an integer. This is *by design*, as the time of a configuration is used as a key in several maps around the code, and floating-point numbers are not good at that. Moreover, integer numbers can be stored without losing any precision, in contrast to floats.
* Normal trajectories need not load all the frames at once. Trajectories that do are called "full trajectories". Many observables, in general, do not require access to all frames at once, which means that frames can parsed (and hence loaded) one by one when needed (lazy loading). This allows to work on big trajectories without consuming up too much memory.
* Lists of 3D vectors are copied when accessed from the Python's side. This means that their c++ counterparts (which are `std::vector`s) are not modified when `append` or similar Python methods are used.
* Simple Python parsers can be used to either parse single `System`s or to initialise trajectories from file lists and folders only. In order to do so, parsers should inherit from `BaseParser` and override the `parse_file` method, which takes a string as its only argument.
* Molecules built by the `Topology` class are named `mol_XXX`, where `XXX` is an index that runs from zero to the number of molecules minus one.