This document describes a MongoDB for vaspy calculations. Each calculation is saved as a document in the database with the atomic geometry, calculation parameters, and calculation results. Some data is stored redundantly to facilitate queries.
Here is some typical data stored.
from vasp.mongo import MongoDatabase
db = MongoDatabase()
import pprint
pprint.pprint(next(db.find({'calculator.class': 'Vasp'}, limit=1)))
It is easy to write atoms with arbitrary key-value pairs. You have to do the work to decide if writing would add a duplicate entry.
Here is an example of adding an entry to the database.
from ase import Atoms
from ase.calculators.emt import EMT
h2 = Atoms('H2', [(0, 0, 0), (0, 0, 0.7)])
h2.calc = EMT()
print(h2.get_forces())
from vasp.mongo import MongoDatabase, mongo_doc
db = MongoDatabase()
doc = mongo_doc(h2)
print(db.write(doc, relaxed=False))
You can query the database by an id:
from vasp.mongo import MongoDatabase
from bson import ObjectId
db = MongoDatabase()
c = db.get_atoms({'_id': ObjectId('58c0b7d2340e3b29a6c6d4d2')})
a = next(c)
print(a)
print(a.get_potential_energy())
print(a.get_forces())
It isn’t that easy to search by id unless you know it. Here we query a different way, using some parameters and calculator type.
from vasp.mongo import MongoDatabase
db = MongoDatabase()
hits = db.find({'calculator.class': 'EMT',
'atoms.symbol_counts.H': 2,
'atoms.natoms': 2,
'relaxed': False,})
for hit in hits: print(hit)
We can add any kind of calculator.
from ase.calculators.singlepoint import SinglePointCalculator
from ase import Atoms
h2 = Atoms('H2', [(0, 0, 0), (0, 0, 0.7)])
calc = SinglePointCalculator(energy=0.0, atoms=h2)
h2.set_calculator(calc)
from vasp.mongo import MongoDatabase, mongo_doc
db = MongoDatabase()
print(db.write(mongo_doc(h2)))
from vasp.mongo import MongoDatabase
from bson import ObjectId
db = MongoDatabase()
c = db.find({'_id': ObjectId('58c0b835340e3b2a2afb4999')})
print(next(c))
We assume a Mongo server is running on localhost at port 27017, with an “ase” database and “atoms” collection by default. You can set all of these with args to MongoDatabase(). The server does not run automatically right now, it has to be started after reboots. There is currently no security on the database.
Then, you create atoms, and write them to the database. You can write arbitrary (anything that can be serialized to json) key-value pairs to the database.
Start here: https://docs.mongodb.com/manual/
Query intro: https://docs.mongodb.com/manual/crud/#read-operations
Find by pathtags, which is just the path split by directories. The order is not important.
The MongoDatabase initializer returns the database object. There is a db.collection attribute that is the actual collection you want to work on. There are a few thing wrappers for functions like find and count.
The find function returns a pymongo cursor, which is a generator that returns documents. The documents are basically Python dictionaries.
import pprint
from vasp.mongo import MongoDatabase
db = MongoDatabase()
c = db.find({'calculator.pathtags': {'$all': ['O2-sp-triplet', 'molecules']}})
print(c.count())
pprint.pprint(next(c))
By formula, Say NH3. We query by type and number, and we specify natoms too, to prevent getting slabs with adsorbates of this composition.
from pprint import pprint
from vasp.mongo import MongoDatabase
db = MongoDatabase()
c = db.find({'atoms.symbol_counts.N': 1,
'atoms.symbol_counts.H': 3,
'atoms.natoms': 4})
print(c.count())
pprint(next(c))
Here we find calculations containing N and H.
from pprint import pprint
from vasp.mongo import MongoDatabase
db = MongoDatabase()
c = db.find({'atoms.chemical_symbols': {'$all': ['N', 'H']}})
print(c.count())
You can use dot notation to search for fields in subdocuments.
import numpy as np
from vasp.mongo import MongoDatabase
db = MongoDatabase()
c = db.find({'calculator.parameters.hfscreen': 0.2})
print(c.count())
# find special setups
c = db.find({'calculator.parameters.setups': {'$exists': True}})
print(c.count())
for doc in c: print(doc['calculator']['parameters']['setups'])
# An neb
c = db.find({'calculator.parameters.images': {'$exists': True}})
print(c.count())
Here we filter by spacegroup to get a set of calculations we could use for an equation of state of fcc Cu. We match on a regular expression on the spacegroup since it is stored as a string with the number in parentheses.
import numpy as np
from vasp.mongo import MongoDatabase
db = MongoDatabase()
eos = db.find({'atoms.symbol_counts.Cu': 1, 'atoms.natoms': 1,
'atoms.spacegroup': {'$regex': '(225)'},
'calculator.parameters.kpts': [8, 8, 8],
'calculator.parameters.encut': 350},
projection={'_id': 0, # do not show id
'calculator.pathtags': 1,
'calculator.energy': 1,
'atoms.volume': 1})
print(eos.count())
for c in eos: print c
This shows we can rebuild a calculator from the database.
from vasp.mongo import MongoDatabase
from vasp import Vasp
db = MongoDatabase()
c = next(db.find({'atoms.symbol_counts.O': 1}))
calc = Vasp(c['calculator']['path'], c['calculator']['parameters'])
print(calc)
Vasp calculation directory:
/home-research/jkitchin/dft-book/blog/source/org/molecules/co-1.05
Unit cell:
x y z |v| v0 6.000 0.000 0.000 6.000 Ang v1 0.000 6.000 0.000 6.000 Ang v2 0.000 0.000 6.000 6.000 Ang alpha, beta, gamma (deg): 90.0 90.0 90.0 Total volume: 216.000 Ang^3 Stress: xx yy zz yz xz xy -0.060 0.011 0.011 -0.000 -0.000 -0.000 GPa
ID tag sym x y z rmsF (eV/A) 0 0 C 0.000 0.000 0.000 14.93 1 0 O 1.050 0.000 0.000 14.93 Potential energy: -14.2158 eV
INPUT Parameters:
lcharg : False pp : PBE nbands : 6 xc : pbe ismear : 1 lwave : False sigma : 0.01 kpts : [1, 1, 1] encut : 350
Pseudopotentials used:
C: potpaw_PBE/C/POTCAR (git-hash: ee4d8576584f8e9f32e90853a0cbf9d4a9297330) O: potpaw_PBE/O/POTCAR (git-hash: 592f34096943a6f30db8749d13efca516d75ec55)
from vasp.mongo import MongoDatabase
from vasp import Vasp
db = MongoDatabase()
atoms = next(db.get_atoms({'calculator.path': '/home-research/jkitchin/dft-book/molecules/O_s'}))
calc = atoms.get_calculator()
print(calc)
Vasp calculation directory:
/home-research/jkitchin/dft-book/molecules/O_s
Unit cell:
x y z |v| v0 6.000 0.000 0.000 6.000 Ang v1 0.000 6.000 0.000 6.000 Ang v2 0.000 0.000 6.000 6.000 Ang alpha, beta, gamma (deg): 90.0 90.0 90.0 Total volume: 216.000 Ang^3 Stress: xx yy zz yz xz xy 0.001 0.001 0.001 -0.000 -0.000 -0.000 GPa
ID tag sym x y z rmsF (eV/A) 0 0 O 5.000 5.000 5.000 0.00 Potential energy: -1.5056 eV
INPUT Parameters:
magmom : [1.0] pp : PBE setups : ‘O’, ‘_s’ kpts : [1, 1, 1] encut : 300 lcharg : False xc : pbe ispin : 2 ismear : 0 lwave : False sigma : 0.001 lorbit : 11
Pseudopotentials used:
O: potpaw_PBE/O_s/POTCAR (git-hash: b4bfc67547c457885a1cc949eeda825354a6520a)
from vasp.mongo import MongoDatabase
from vasp import Vasp
db = MongoDatabase()
atoms = next(db.get_atoms({'calculator.path': '/home-research/jkitchin/dft-book/molecules/co-ados'}))
calc = atoms.get_calculator()
print(calc)
Vasp calculation directory:
/home-research/jkitchin/dft-book/molecules/co-ados
Unit cell:
x y z |v| v0 6.000 0.000 0.000 6.000 Ang v1 0.000 6.000 0.000 6.000 Ang v2 0.000 0.000 6.000 6.000 Ang alpha, beta, gamma (deg): 90.0 90.0 90.0 Total volume: 216.000 Ang^3 Stress: xx yy zz yz xz xy 0.060 0.027 0.027 -0.000 -0.000 -0.000 GPa
ID tag sym x y z rmsF (eV/A) 0 0 C 0.000 0.000 0.000 5.14 1 0 O 1.200 0.000 0.000 5.14 Potential energy: -14.7178 eV
INPUT Parameters:
lcharg : False pp : PBE kpts : [1, 1, 1] xc : pbe ismear : 1 lwave : False sigma : 0.1 rwigs : {‘C’: 1.0, ‘O’: 1.0} encut : 300
Pseudopotentials used:
C: potpaw_PBE/C/POTCAR (git-hash: ee4d8576584f8e9f32e90853a0cbf9d4a9297330) O: potpaw_PBE/O/POTCAR (git-hash: 592f34096943a6f30db8749d13efca516d75ec55)
By C-O bond-length, say we want C-O bond lengths less than 1.2 angstroms. This would not be an easy query to do in the database. Instead we get all documents that match at least one C and one O, and use python externally to filter the matches.
import numpy as np
from vasp.mongo import MongoDatabase
db = MongoDatabase()
all_atoms = db.get_atoms({'atoms.symbol_counts.C': {'$gte': 1},
'atoms.symbol_counts.O': {'$gte': 1}})
def bond_length_filter(atoms, bond_length=1.2):
"Return True if there is a C-O bond less than bond_length in atoms."
C = [atom for atom in atoms if atom.symbol == 'C']
O = [atom for atom in atoms if atom.symbol == 'O']
for catom in C:
for oatom in O:
d = np.sqrt(sum(catom.position - oatom.position)**2)
if d <= bond_length:
return d
A = [atoms for atoms in all_atoms if bond_length_filter(atoms)]
print(len(A))
Here we have to use the db.collection to access the distinct command. You can always use this, it is just a little longer.
import numpy as np
from vasp.mongo import MongoDatabase
db = MongoDatabase()
c = db.collection.distinct('calculator.pathtags', {})
print(c)
Mongo provides update and findAndModify functions. Here is an example with update. Note, that it is possible to update many documents at a time, here we query by id to avoid that.
from vasp.mongo import MongoDatabase
from bson.objectid import ObjectId
db = MongoDatabase()
db.collection.update({'calculator.path': '/home-research/jkitchin/dft-book/molecules/nh3-initial'},
{'$set': {'special_tags': ['initial-state']}})
# this is how to add a tag to the tags array
db.collection.update({'calculator.path': '/home-research/jkitchin/dft-book/molecules/nh3-initial'},
{'$addToSet': {'special_tags': {'$each': ['neb', 'initial-state']}}})
c = db.find({'calculator.path': '/home-research/jkitchin/dft-book/molecules/nh3-initial'},
projection={'special_tags': 1})
import pprint
pprint.pprint(next(c))
from vasp.mongo import MongoDatabase
from bson.objectid import ObjectId
db = MongoDatabase()
c = db.find({'atoms.constraints.name': 'FixAtoms'})
print(c.count())
This just defines a function that usually recognizes a Vasp directory (it fails on NEB directories), and if the directory is not in the database, it adds it.
import os
from vasp import *
from vasp.vasprc import VASPRC
VASPRC['mode'] = None
def vasp_p(directory):
'returns True if a finished OUTCAR file exists in the current directory, else False'
outcar = os.path.join(directory, 'OUTCAR')
incar = os.path.join(directory, 'INCAR')
if os.path.exists(outcar) and os.path.exists(incar):
with open(outcar, 'r') as f:
contents = f.read()
if 'General timing and accounting informations for this job:' in contents:
return True
return False
from vasp.mongo import MongoDatabase, mongo_doc
db = MongoDatabase()
for root, dirs, files in os.walk('/home-research/jkitchin/dft-book'):
for d in dirs:
# compute absolute path to each directory in the current root
absd = os.path.join(root, d)
if (vasp_p(absd)
# the test dir had some problems.
and 'test' not in absd
# Don't add things already in
and db.find({"calculator.path": absd}).count() == 0):
# we found a vasp directory, so we can do something in it.
# here we add it to the ase mongdb
calc = Vasp(absd)
atoms = calc.get_atoms()
db.write(mongo_doc(atoms), source="dft-book")
print('added {}'.format(absd))
This is some idea that you could store an adsorption energy with links to the documents. Here is an example of getting an adsorption energy.
import numpy as np
from vasp.mongo import MongoDatabase
db = MongoDatabase()
clean = db.collection.find_one({'calculator.pathtags': {'$all': ['surfaces', 'Pt-slab']}})
oslab = db.collection.find_one({'calculator.pathtags': {'$all': ['surfaces', 'Pt-slab-O-fcc']}})
o2 = db.collection.find_one({'calculator.pathtags': {'$all': ['molecules', 'O2-sp-triplet-350']}})
print(clean['_id'])
print(oslab['calculator']['energy'] - clean['calculator']['energy'] - 0.5 * o2['calculator']['energy'])
As a document, you could store something like this. This is a loose thought, the pseudo-example below should also include the _id for each calculation so you know where it came from. Maybe there is some jsonic way of storing variables. Alternatively, you could store a python script to do the calculation, and its result.
{"+" : [clean_slab_energy o_slab_energy {"*": [0.5 o2_energy]}]}
You can build up the document any way you want and store it.