[ase-users] Database issues

Jens Jørgen Mortensen jjmo at dtu.dk
Tue Nov 27 12:27:39 CET 2018


On 11/23/18 2:47 PM, David Kleiven via ase-users wrote:
>
> Dear ASE users,
>
>
> we are working on a project that involves many DFT calculations and we 
> use the database functionality to keep track of everything. Our 
> calculations have a lot of external data for machine learning 
> purposes. Currently, the only way we could find to "attach" such info 
> to each structure is via the key_value_pair. To our knowledge, the 
> key_value_pairs are duplicated in the database 1) They are stored as a 
> serialized JSON string on each row and 2) Distributed in 
> number_key_values and text_key_values. Hence, when you have for 
> instance 5000 key_value_pairs it becomes cumbersome to maintain this 
> duplication. Moreover, we have experienced cases where appending more 
> key_value_pairs leads to errors and a corrupted database. We tested 
> various solution and one way we found is to allow user defined tables 
> (i.e. users can create tables with the same schema as 
> number_key_values). Hence, big chunks of static data can be placed in 
> those tables. One avoids duplication and appending 
> dynamic key_value_pairs is no longer a problem. When you read back 
> data from the database, all data from these external tables are 
> automatically added to the AtomsRow object as if they were regular 
> key_value_pairs.
>
>
> The syntax for storing a separate table would be
>
> db.write(atoms, tables={"some_table_name": dict_with_data}, ... 
> regular key value pairs...)
>
>
> and the same syntax for read. All external tables would be added to 
> AtomsRow as if they where key_value_pairs.
>
>
> Is this solution (separating out big chunks of data in separate 
> tables) interesting to include in ASE via a class that inherits from 
> SQLite3Database and provide this extra functionality in addition to 
> everything that is supported by SQLite3Database? (Note: that in our 
> case using the data field is not a good option as this is essentially 
> a big binary chunk and it is therefore not so easy to 1) manually look 
> at the data via external tools and 2) not so easy to update)
>

That's an interesting idea. If I understand correctly, you would like to 
be able to do key=value where value can be (almost) anything and not 
just float or str as it must be now.  Is that correct?  Maybe you can 
explain a bit more what your use case is and how using this new feature 
would look?


Jens Jørgen


PS: Can you create a simple example script that demonstrates the error 
you mentioned?


>>
> Cheers,
>
> David Kleiven
>
>
>
> _______________________________________________
> ase-users mailing list
> ase-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/ase-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.fysik.dtu.dk/pipermail/ase-users/attachments/20181127/d10f9306/attachment.html>


More information about the ase-users mailing list