[gpaw-users] GPAW Help

Tomlinson, Warren (CDR) wwtomlin at nps.edu
Fri Mar 4 00:52:30 CET 2016


Ask-
	Completely unrelated equation.  I had a Pd atom adsorb on an unexpected spot on my MOF (UiO-67).  I decided to double check the results with some different basis sets and functions.  I tried B3LYP, but got:
	NotImplementedError: LCAO mode does not support orbital-dependent XC functionals

	Is B3LYP not an option when in doing LCAO?
Thanks
Warren

> On Mar 2, 2016, at 11:20 AM, Tomlinson, Warren (CDR) <wwtomlin at nps.edu> wrote:
> 
> All-
> 	Thanks for the help.  I tried setting the environmental variable MKL_CBWR as suggested, but still got the error (after 3 SCF cycles).  I also made a number by number comparison of all positions on all CPUs to the reference CPU (rank 0) eg:
> 	For all the position vectors in reference and other CPUs:
> 		for i in range(3):
> 			if ref[i] <> allpos[i]:
> 				same = False
> 
> 	The same variable was never set to False.  I also compared chemical symbols, PBCs, Cells and compound symbol.  All were identical among all CPUs.  Is there perhaps something else I should check? 
> Thanks for all the help.  I realize there might not be an answer to this right now, and that’s OK.  I’m moving forward with LBFGS and so far it’s working just fine.
> Thanks
> Warren
> 
>> On Mar 1, 2016, at 12:33 PM, Ask Hjorth Larsen <asklarsen at gmail.com> wrote:
>> 
>> As far as I can see, it should be printing 1.2345e-7 or something like
>> that if the positions were in fact different.  The tolerance is 1e-8.
>> Which other property could possibly be wrong?  Is the cell off by
>> 1e-37 in one of them?  That would be enough to cause that error,
>> because those are compared as a == b.  But then why would it happen at
>> iteration three?  I find it highly unlikely that, say, the number of
>> atoms suddenly differs by something :).
>> 
>> Anyway, Warren, if you could similarly compare the other properties of
>> each dumped atoms object, then it would be very useful.  Also, print
>> repr(errs) might be more appropriate since you know that you get all
>> precision (but it should stil have shown an actual error if there were
>> one).
>> 
>> Another thing we can do is to make a wrapper for ASE optimizers which
>> broadcasts the work of rank 0 to all ranks.  That should completely
>> prevent such an error from arising from within ASE (at least for the
>> positions), but it should be used with caution I guess.
>> 
>> Best regards
>> Ask
>> 
>> 2016-03-01 9:57 GMT+01:00 Jussi Enkovaara <jussi.enkovaara at csc.fi>:
>>> Hi all,
>>> the problem is most likely related to the fact (as Ask already mentioned),
>>> that modern optimized libraries (Intel MKL as prime example) do not
>>> necessary provide the same output even with bit-identical input, but result
>>> may depend for example how memory allocations are aligned, see e.g.
>>> https://software.intel.com/en-us/articles/getting-reproducible-results-with-intel-mkl/
>>> 
>>> Symmetry is not the key issue, in systems with symmetry there can be
>>> degenerate eigenvectors and tiny numerical differences can produce
>>> completery different linear combinations of eigenvectors which can amplify
>>> the problem, but problems can arise even without symmetry, and therefore
>>> rattle does not necessarily solve anything.
>>> 
>>> There has been some effort to solve the problems due to numerical
>>> reproducibility in GPAW (e.g. atomic positions returned from ASE are not
>>> required to be bitwise identical), but apparantly some bugs are still
>>> remaining.
>>> 
>>> For MKL, one could try to enforce numerical reproducibility by setting the
>>> environment variable MKL_CBWR, suitable values might depend on MKL version,
>>> but one could try to start with
>>> 
>>> export MKL_CBWR=AVX
>>> 
>>> This can lead to some performance degregation.
>>> 
>>> Best regards,
>>> Jussi
>>> 
>>> 
>>> 
>>> On 2016-03-01 09:57, Torsten Hahn wrote:
>>>> 
>>>> Hey all,
>>>> 
>>>> 
>>>> Sometimes I experience similar errors. We once thought we had tracked it
>>>> down to erroneous mpi implementation (Intel mpi). However, some people in my
>>>> group still do see the same error with openmpi and to be honest we have no
>>>> idea where it is come from. It looks like in some CPUs there is sometimes a
>>>> very small numerical error in the atomic positions. This error does never
>>>> happen in non- mpi calculations.
>>>> 
>>>> Would be really nice to track that down.
>>>> 
>>>> Best,
>>>> Torsten.
>>>> 
>>>>> Am 29.02.2016 um 18:42 schrieb Tomlinson, Warren (CDR)
>>>>> <wwtomlin at nps.edu>:
>>>>> 
>>>>> Ask-
>>>>>   Thanks for the help.  I tried running with the atoms.rattle as well
>>>>> as the hack you sent me.  The exact problem still persists.  Three SCF
>>>>> cycles are completed and then the error pops up.  I have had success with
>>>>> LBFGS.  There’s no reason why I shouldn’t be OK using that optimizer,
>>>>> correct?  It is odd, though, that BFGS can’t make it past three steps,
>>>>> though.
>>>>> Thanks
>>>>> Warren
>>>>> 
>>>>>> On Feb 26, 2016, at 11:47 AM, Ask Hjorth Larsen <asklarsen at gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>> I realize that more symmetry breaking might be necessary depending on
>>>>>> how some things are implemented.  You can try with this slightly
>>>>>> symmetry-breaking hack:
>>>>>> 
>>>>>> http://dcwww.camd.dtu.dk/~askhl/files/bfgshack.py
>>>>>> 
>>>>>> If push comes to shove and we cannot guess what the problem is, try
>>>>>> reducing it in size as much as possible.  As few cores as possible,
>>>>>> and as rough parameters as possible.
>>>>>> 
>>>>>> Best regards
>>>>>> Ask
>>>>>> 
>>>>>> 2016-02-26 20:40 GMT+01:00 Ask Hjorth Larsen <asklarsen at gmail.com>:
>>>>>>> 
>>>>>>> Hi Warren
>>>>>>> 
>>>>>>> 2016-02-26 19:40 GMT+01:00 Tomlinson, Warren (CDR) <wwtomlin at nps.edu>:
>>>>>>>> 
>>>>>>>> Ask-
>>>>>>>>      Thank you for your help.  I reran with the --debug option and
>>>>>>>> also ran with with 36 cores.  Both still failed for the same synchronization
>>>>>>>> problem.  I have all 144 synchronize_atoms_r##.pckl files, but I’m not sure
>>>>>>>> exactly what to do with them.
>>>>>>>> 
>>>>>>>>      On a related note, I ran the 680 atom structure with
>>>>>>>> QuasiNewton instead of BFGS and it worked.  So I’m guessing that’s a big
>>>>>>>> clue.
>>>>>>> 
>>>>>>> 
>>>>>>> That's interesting.  BFGS calculates eigenvectors.  Sometimes in
>>>>>>> exactly symmetric systems, different cores can get different results
>>>>>>> even though they perform the same mathematical operation, typically
>>>>>>> due to aggressive BLAS stuff.  They will differ very little, but they
>>>>>>> can order eigenvalues/vectors differently and maybe end up doing
>>>>>>> different things.
>>>>>>> 
>>>>>>> Try doing atoms.rattle(stdev=1e-12) and see if it runs.  Of course,
>>>>>>> the optimization should be robust against that sort of problem, so we
>>>>>>> would have to look into it even if it runs.
>>>>>>> 
>>>>>>>> 
>>>>>>>>      On an unrelated note, I’m afraid I have very little experience
>>>>>>>> doing this kind of thing and so I’m not surprised that I have not correctly
>>>>>>>> set the scalapack parameters.  I simply set the default based on what I
>>>>>>>> found on the gpaw “Parallel runs’ page at the bottom:
>>>>>>>>      mb = 64
>>>>>>>>      m = floor(sqrt(bands/mb))
>>>>>>>>      n = m
>>>>>>>>      There are 2360 bands in the calculations, so that’s where I
>>>>>>>> came up with ‘sl_default’:(6,6,64).  I would appreciate any insight you can
>>>>>>>> give me on how to get the scalapack options set correctly.
>>>>>>> 
>>>>>>> 
>>>>>>> It is the number of atomic orbitals, and not bands, which is relevant
>>>>>>> for how big the scalapack problem is in LCAO mode.  (6, 6, 64) might
>>>>>>> be a good scalapack setting on 36 cores because the number of cores
>>>>>>> multiplies to 36 (with one k-point/spin), but if you use more cores,
>>>>>>> you should increase the CPU grid to the maximum available.  You can
>>>>>>> also use sl_auto=True to choose something which is probably
>>>>>>> non-horrible.  For this size of system, there is no point in doing an
>>>>>>> LCAO calculation and not using the maximal possible number of cores
>>>>>>> for scalapack, because the Scalapack operations are the most expensive
>>>>>>> by far.
>>>>>>> 
>>>>>>> I will have a look at the documentation and maybe update it.
>>>>>>> 
>>>>>>> Best regards
>>>>>>> Ask
>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Warren
>>>>>>>> 
>>>>>>>>> On Feb 26, 2016, at 8:35 AM, Ask Hjorth Larsen <asklarsen at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hello
>>>>>>>>> 
>>>>>>>>> This sounds strange.  Can you re-run it with --debug, please (e.g.,
>>>>>>>>> gpaw-python script.py --debug)?  Then we can see in which way they
>>>>>>>>> differ, whether it's due to slight numerical imprecision or position
>>>>>>>>> values that are complete garbage.
>>>>>>>>> 
>>>>>>>>> On an unrelated note, does it not run with just 36 cores for the 680
>>>>>>>>> atoms?  Also, the scalapack parameters should probably be set to use
>>>>>>>>> all available cores.
>>>>>>>>> 
>>>>>>>>> Best regards
>>>>>>>>> Ask
>>>>>>>>> 
>>>>>>>>> 2016-02-25 8:21 GMT+01:00 Tomlinson, Warren (CDR) <wwtomlin at nps.edu>:
>>>>>>>>>> 
>>>>>>>>>> Hello-
>>>>>>>>>>     I have been using GPAW for a couple of months now and have run
>>>>>>>>>> into a persistent problem that I can not figure out.  I’m using a cluster
>>>>>>>>>> with 3,456 nodes each with 36 cores (Intel Xeon E5-2699v3 Haswell).  I
>>>>>>>>>> installed using MKL 11.2 and have python 2.7.10 and numpy 1.9.2.  I used the
>>>>>>>>>> following setting to relax a periodic cell containing 114 atoms
>>>>>>>>>> (successfully, with 72 cores):
>>>>>>>>>> 
>>>>>>>>>> cell = read('small.pdb’)
>>>>>>>>>> cell.set_pbc(1)
>>>>>>>>>> cell.set_cell([[18.752, 0., 0.], [9.376, 16.239708, 0.], [9.376,
>>>>>>>>>> 5.413236, 15.310944]])
>>>>>>>>>> calc = GPAW(mode='lcao',
>>>>>>>>>>         gpts=(80,80,80),
>>>>>>>>>>         xc='PBE',
>>>>>>>>>>         poissonsolver=PoissonSolver(relax='GS', eps=1e-7),
>>>>>>>>>>         parallel={'band':2,'sl_default':(3,3,64)},
>>>>>>>>>>         basis='dzp',
>>>>>>>>>>         mixer=Mixer(0.1, 5, weight=100.0),
>>>>>>>>>>         occupations=FermiDirac(width=0.1),
>>>>>>>>>>         maxiter=1000,
>>>>>>>>>>         txt='67_sml_N_LCAO.out'
>>>>>>>>>>         )
>>>>>>>>>> cell.set_calculator(calc)
>>>>>>>>>> opt = BFGS(cell)
>>>>>>>>>> opt.run()
>>>>>>>>>> 
>>>>>>>>>> When I try virtually the exact same options on a larger (cubic)
>>>>>>>>>> cell:
>>>>>>>>>> 
>>>>>>>>>> cell = read(‘big.pdb')
>>>>>>>>>> cell.set_cell([26.52,26.52,26.52])
>>>>>>>>>> cell.set_pbc(1)
>>>>>>>>>> calc_LCAO = GPAW(mode='lcao',
>>>>>>>>>>         gpts=(144,144,144),
>>>>>>>>>>         xc='PBE',
>>>>>>>>>>         poissonsolver=PoissonSolver(relax='GS', eps=1e-7),
>>>>>>>>>>         parallel={'band':2,'sl_default':(6,6,64)},
>>>>>>>>>>         basis = 'dzp',
>>>>>>>>>>         mixer=Mixer(0.1, 5, weight=100.0),
>>>>>>>>>>         occupations=FermiDirac(width=0.1),
>>>>>>>>>>         txt='67_Full.out',
>>>>>>>>>>         maxiter=1000
>>>>>>>>>>         )
>>>>>>>>>> etc…
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I get an error after three SCF steps.  The larger cell has 680 atoms
>>>>>>>>>> and I used 144 cores.  The error I get is below:
>>>>>>>>>> 
>>>>>>>>>> rank=036 L00: Traceback (most recent call last):
>>>>>>>>>> rank=036 L01:   File
>>>>>>>>>> "/p/home/wwtomlin/PhaseII/Proj1/INITIAL/67_Full_C.py", line 44, in <module>
>>>>>>>>>> rank=036 L02:     opt.run()
>>>>>>>>>> rank=036 L03:   File
>>>>>>>>>> "/p/home/wwtomlin/ase/ase/optimize/optimize.py", line 148, in run
>>>>>>>>>> rank=036 L04:     f = self.atoms.get_forces()
>>>>>>>>>> rank=036 L05:   File "/p/home/wwtomlin/ase/ase/atoms.py", line 688,
>>>>>>>>>> in get_forces
>>>>>>>>>> rank=036 L06:     forces = self._calc.get_forces(self)
>>>>>>>>>> rank=036 L07:   File "/p/home/wwtomlin/gpaw/gpaw/aseinterface.py",
>>>>>>>>>> line 78, in get_forces
>>>>>>>>>> rank=036 L08:
>>>>>>>>>> force_call_to_set_positions=force_call_to_set_positions)
>>>>>>>>>> rank=036 L09:   File "/p/home/wwtomlin/gpaw/gpaw/paw.py", line 272,
>>>>>>>>>> in calculate
>>>>>>>>>> rank=036 L10:     self.set_positions(atoms)
>>>>>>>>>> rank=036 L11:   File "/p/home/wwtomlin/gpaw/gpaw/paw.py", line 328,
>>>>>>>>>> in set_positions
>>>>>>>>>> rank=036 L12:     spos_ac = self.initialize_positions(atoms)
>>>>>>>>>> rank=036 L13:   File "/p/home/wwtomlin/gpaw/gpaw/paw.py", line 314,
>>>>>>>>>> in initialize_positions
>>>>>>>>>> rank=036 L14:     self.synchronize_atoms()
>>>>>>>>>> rank=036 L15:   File "/p/home/wwtomlin/gpaw/gpaw/paw.py", line 1034,
>>>>>>>>>> in synchronize_atoms
>>>>>>>>>> rank=036 L16:     mpi.synchronize_atoms(self.atoms, self.wfs.world)
>>>>>>>>>> rank=036 L17:   File "/p/home/wwtomlin/gpaw/gpaw/mpi/__init__.py",
>>>>>>>>>> line 714, in synchronize_atoms
>>>>>>>>>> rank=036 L18:     err_ranks)
>>>>>>>>>> rank=036 L19: ValueError: ('Mismatch of Atoms objects.  In debug
>>>>>>>>>> mode, atoms will be dumped to files.', array([  5,   9,  13,  17,  18,  19,
>>>>>>>>>> 20,  21,  22,  23,  25,  27,  28,
>>>>>>>>>> rank=036 L20:         32,  33,  35,  37,  40,  41,  43,  44,  46,
>>>>>>>>>> 50,  51,  52,  53,
>>>>>>>>>> rank=036 L21:         54,  60,  62,  63,  64,  65,  68,  71,  72,
>>>>>>>>>> 74,  80,  82,  85,
>>>>>>>>>> rank=036 L22:         87,  90,  91,  94,  97,  98,  99, 100, 101,
>>>>>>>>>> 104, 106, 107, 110,
>>>>>>>>>> rank=036 L23:        111, 115, 116, 118, 123, 125, 129, 130, 137,
>>>>>>>>>> 138, 139, 142]))
>>>>>>>>>> GPAW CLEANUP (node 36): <type 'exceptions.ValueError'> occurred.
>>>>>>>>>> Calling MPI_Abort!
>>>>>>>>>> 
>>>>>>>>>> ————
>>>>>>>>>> 
>>>>>>>>>> I think this means my atoms are not in the same positions across
>>>>>>>>>> different cores, but I can’t figure out how this happened.  Do you have any
>>>>>>>>>> suggestions?
>>>>>>>>>> Thank you
>>>>>>>>>> Warren
>>>>>>>>>> PhD Student
>>>>>>>>>> Naval Postgraduate School
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> gpaw-users mailing list
>>>>>>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>>>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> gpaw-users mailing list
>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>> 
>>>> 
>>>> _______________________________________________
>>>> gpaw-users mailing list
>>>> gpaw-users at listserv.fysik.dtu.dk
>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> gpaw-users mailing list
>>> gpaw-users at listserv.fysik.dtu.dk
>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> 
> 
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users




More information about the gpaw-users mailing list