[gpaw-users] GPAW Help
Tomlinson, Warren (CDR)
wwtomlin at nps.edu
Fri Mar 4 00:52:30 CET 2016
Ask-
Completely unrelated equation. I had a Pd atom adsorb on an unexpected spot on my MOF (UiO-67). I decided to double check the results with some different basis sets and functions. I tried B3LYP, but got:
NotImplementedError: LCAO mode does not support orbital-dependent XC functionals
Is B3LYP not an option when in doing LCAO?
Thanks
Warren
> On Mar 2, 2016, at 11:20 AM, Tomlinson, Warren (CDR) <wwtomlin at nps.edu> wrote:
>
> All-
> Thanks for the help. I tried setting the environmental variable MKL_CBWR as suggested, but still got the error (after 3 SCF cycles). I also made a number by number comparison of all positions on all CPUs to the reference CPU (rank 0) eg:
> For all the position vectors in reference and other CPUs:
> for i in range(3):
> if ref[i] <> allpos[i]:
> same = False
>
> The same variable was never set to False. I also compared chemical symbols, PBCs, Cells and compound symbol. All were identical among all CPUs. Is there perhaps something else I should check?
> Thanks for all the help. I realize there might not be an answer to this right now, and that’s OK. I’m moving forward with LBFGS and so far it’s working just fine.
> Thanks
> Warren
>
>> On Mar 1, 2016, at 12:33 PM, Ask Hjorth Larsen <asklarsen at gmail.com> wrote:
>>
>> As far as I can see, it should be printing 1.2345e-7 or something like
>> that if the positions were in fact different. The tolerance is 1e-8.
>> Which other property could possibly be wrong? Is the cell off by
>> 1e-37 in one of them? That would be enough to cause that error,
>> because those are compared as a == b. But then why would it happen at
>> iteration three? I find it highly unlikely that, say, the number of
>> atoms suddenly differs by something :).
>>
>> Anyway, Warren, if you could similarly compare the other properties of
>> each dumped atoms object, then it would be very useful. Also, print
>> repr(errs) might be more appropriate since you know that you get all
>> precision (but it should stil have shown an actual error if there were
>> one).
>>
>> Another thing we can do is to make a wrapper for ASE optimizers which
>> broadcasts the work of rank 0 to all ranks. That should completely
>> prevent such an error from arising from within ASE (at least for the
>> positions), but it should be used with caution I guess.
>>
>> Best regards
>> Ask
>>
>> 2016-03-01 9:57 GMT+01:00 Jussi Enkovaara <jussi.enkovaara at csc.fi>:
>>> Hi all,
>>> the problem is most likely related to the fact (as Ask already mentioned),
>>> that modern optimized libraries (Intel MKL as prime example) do not
>>> necessary provide the same output even with bit-identical input, but result
>>> may depend for example how memory allocations are aligned, see e.g.
>>> https://software.intel.com/en-us/articles/getting-reproducible-results-with-intel-mkl/
>>>
>>> Symmetry is not the key issue, in systems with symmetry there can be
>>> degenerate eigenvectors and tiny numerical differences can produce
>>> completery different linear combinations of eigenvectors which can amplify
>>> the problem, but problems can arise even without symmetry, and therefore
>>> rattle does not necessarily solve anything.
>>>
>>> There has been some effort to solve the problems due to numerical
>>> reproducibility in GPAW (e.g. atomic positions returned from ASE are not
>>> required to be bitwise identical), but apparantly some bugs are still
>>> remaining.
>>>
>>> For MKL, one could try to enforce numerical reproducibility by setting the
>>> environment variable MKL_CBWR, suitable values might depend on MKL version,
>>> but one could try to start with
>>>
>>> export MKL_CBWR=AVX
>>>
>>> This can lead to some performance degregation.
>>>
>>> Best regards,
>>> Jussi
>>>
>>>
>>>
>>> On 2016-03-01 09:57, Torsten Hahn wrote:
>>>>
>>>> Hey all,
>>>>
>>>>
>>>> Sometimes I experience similar errors. We once thought we had tracked it
>>>> down to erroneous mpi implementation (Intel mpi). However, some people in my
>>>> group still do see the same error with openmpi and to be honest we have no
>>>> idea where it is come from. It looks like in some CPUs there is sometimes a
>>>> very small numerical error in the atomic positions. This error does never
>>>> happen in non- mpi calculations.
>>>>
>>>> Would be really nice to track that down.
>>>>
>>>> Best,
>>>> Torsten.
>>>>
>>>>> Am 29.02.2016 um 18:42 schrieb Tomlinson, Warren (CDR)
>>>>> <wwtomlin at nps.edu>:
>>>>>
>>>>> Ask-
>>>>> Thanks for the help. I tried running with the atoms.rattle as well
>>>>> as the hack you sent me. The exact problem still persists. Three SCF
>>>>> cycles are completed and then the error pops up. I have had success with
>>>>> LBFGS. There’s no reason why I shouldn’t be OK using that optimizer,
>>>>> correct? It is odd, though, that BFGS can’t make it past three steps,
>>>>> though.
>>>>> Thanks
>>>>> Warren
>>>>>
>>>>>> On Feb 26, 2016, at 11:47 AM, Ask Hjorth Larsen <asklarsen at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> I realize that more symmetry breaking might be necessary depending on
>>>>>> how some things are implemented. You can try with this slightly
>>>>>> symmetry-breaking hack:
>>>>>>
>>>>>> http://dcwww.camd.dtu.dk/~askhl/files/bfgshack.py
>>>>>>
>>>>>> If push comes to shove and we cannot guess what the problem is, try
>>>>>> reducing it in size as much as possible. As few cores as possible,
>>>>>> and as rough parameters as possible.
>>>>>>
>>>>>> Best regards
>>>>>> Ask
>>>>>>
>>>>>> 2016-02-26 20:40 GMT+01:00 Ask Hjorth Larsen <asklarsen at gmail.com>:
>>>>>>>
>>>>>>> Hi Warren
>>>>>>>
>>>>>>> 2016-02-26 19:40 GMT+01:00 Tomlinson, Warren (CDR) <wwtomlin at nps.edu>:
>>>>>>>>
>>>>>>>> Ask-
>>>>>>>> Thank you for your help. I reran with the --debug option and
>>>>>>>> also ran with with 36 cores. Both still failed for the same synchronization
>>>>>>>> problem. I have all 144 synchronize_atoms_r##.pckl files, but I’m not sure
>>>>>>>> exactly what to do with them.
>>>>>>>>
>>>>>>>> On a related note, I ran the 680 atom structure with
>>>>>>>> QuasiNewton instead of BFGS and it worked. So I’m guessing that’s a big
>>>>>>>> clue.
>>>>>>>
>>>>>>>
>>>>>>> That's interesting. BFGS calculates eigenvectors. Sometimes in
>>>>>>> exactly symmetric systems, different cores can get different results
>>>>>>> even though they perform the same mathematical operation, typically
>>>>>>> due to aggressive BLAS stuff. They will differ very little, but they
>>>>>>> can order eigenvalues/vectors differently and maybe end up doing
>>>>>>> different things.
>>>>>>>
>>>>>>> Try doing atoms.rattle(stdev=1e-12) and see if it runs. Of course,
>>>>>>> the optimization should be robust against that sort of problem, so we
>>>>>>> would have to look into it even if it runs.
>>>>>>>
>>>>>>>>
>>>>>>>> On an unrelated note, I’m afraid I have very little experience
>>>>>>>> doing this kind of thing and so I’m not surprised that I have not correctly
>>>>>>>> set the scalapack parameters. I simply set the default based on what I
>>>>>>>> found on the gpaw “Parallel runs’ page at the bottom:
>>>>>>>> mb = 64
>>>>>>>> m = floor(sqrt(bands/mb))
>>>>>>>> n = m
>>>>>>>> There are 2360 bands in the calculations, so that’s where I
>>>>>>>> came up with ‘sl_default’:(6,6,64). I would appreciate any insight you can
>>>>>>>> give me on how to get the scalapack options set correctly.
>>>>>>>
>>>>>>>
>>>>>>> It is the number of atomic orbitals, and not bands, which is relevant
>>>>>>> for how big the scalapack problem is in LCAO mode. (6, 6, 64) might
>>>>>>> be a good scalapack setting on 36 cores because the number of cores
>>>>>>> multiplies to 36 (with one k-point/spin), but if you use more cores,
>>>>>>> you should increase the CPU grid to the maximum available. You can
>>>>>>> also use sl_auto=True to choose something which is probably
>>>>>>> non-horrible. For this size of system, there is no point in doing an
>>>>>>> LCAO calculation and not using the maximal possible number of cores
>>>>>>> for scalapack, because the Scalapack operations are the most expensive
>>>>>>> by far.
>>>>>>>
>>>>>>> I will have a look at the documentation and maybe update it.
>>>>>>>
>>>>>>> Best regards
>>>>>>> Ask
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Warren
>>>>>>>>
>>>>>>>>> On Feb 26, 2016, at 8:35 AM, Ask Hjorth Larsen <asklarsen at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hello
>>>>>>>>>
>>>>>>>>> This sounds strange. Can you re-run it with --debug, please (e.g.,
>>>>>>>>> gpaw-python script.py --debug)? Then we can see in which way they
>>>>>>>>> differ, whether it's due to slight numerical imprecision or position
>>>>>>>>> values that are complete garbage.
>>>>>>>>>
>>>>>>>>> On an unrelated note, does it not run with just 36 cores for the 680
>>>>>>>>> atoms? Also, the scalapack parameters should probably be set to use
>>>>>>>>> all available cores.
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>> Ask
>>>>>>>>>
>>>>>>>>> 2016-02-25 8:21 GMT+01:00 Tomlinson, Warren (CDR) <wwtomlin at nps.edu>:
>>>>>>>>>>
>>>>>>>>>> Hello-
>>>>>>>>>> I have been using GPAW for a couple of months now and have run
>>>>>>>>>> into a persistent problem that I can not figure out. I’m using a cluster
>>>>>>>>>> with 3,456 nodes each with 36 cores (Intel Xeon E5-2699v3 Haswell). I
>>>>>>>>>> installed using MKL 11.2 and have python 2.7.10 and numpy 1.9.2. I used the
>>>>>>>>>> following setting to relax a periodic cell containing 114 atoms
>>>>>>>>>> (successfully, with 72 cores):
>>>>>>>>>>
>>>>>>>>>> cell = read('small.pdb’)
>>>>>>>>>> cell.set_pbc(1)
>>>>>>>>>> cell.set_cell([[18.752, 0., 0.], [9.376, 16.239708, 0.], [9.376,
>>>>>>>>>> 5.413236, 15.310944]])
>>>>>>>>>> calc = GPAW(mode='lcao',
>>>>>>>>>> gpts=(80,80,80),
>>>>>>>>>> xc='PBE',
>>>>>>>>>> poissonsolver=PoissonSolver(relax='GS', eps=1e-7),
>>>>>>>>>> parallel={'band':2,'sl_default':(3,3,64)},
>>>>>>>>>> basis='dzp',
>>>>>>>>>> mixer=Mixer(0.1, 5, weight=100.0),
>>>>>>>>>> occupations=FermiDirac(width=0.1),
>>>>>>>>>> maxiter=1000,
>>>>>>>>>> txt='67_sml_N_LCAO.out'
>>>>>>>>>> )
>>>>>>>>>> cell.set_calculator(calc)
>>>>>>>>>> opt = BFGS(cell)
>>>>>>>>>> opt.run()
>>>>>>>>>>
>>>>>>>>>> When I try virtually the exact same options on a larger (cubic)
>>>>>>>>>> cell:
>>>>>>>>>>
>>>>>>>>>> cell = read(‘big.pdb')
>>>>>>>>>> cell.set_cell([26.52,26.52,26.52])
>>>>>>>>>> cell.set_pbc(1)
>>>>>>>>>> calc_LCAO = GPAW(mode='lcao',
>>>>>>>>>> gpts=(144,144,144),
>>>>>>>>>> xc='PBE',
>>>>>>>>>> poissonsolver=PoissonSolver(relax='GS', eps=1e-7),
>>>>>>>>>> parallel={'band':2,'sl_default':(6,6,64)},
>>>>>>>>>> basis = 'dzp',
>>>>>>>>>> mixer=Mixer(0.1, 5, weight=100.0),
>>>>>>>>>> occupations=FermiDirac(width=0.1),
>>>>>>>>>> txt='67_Full.out',
>>>>>>>>>> maxiter=1000
>>>>>>>>>> )
>>>>>>>>>> etc…
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I get an error after three SCF steps. The larger cell has 680 atoms
>>>>>>>>>> and I used 144 cores. The error I get is below:
>>>>>>>>>>
>>>>>>>>>> rank=036 L00: Traceback (most recent call last):
>>>>>>>>>> rank=036 L01: File
>>>>>>>>>> "/p/home/wwtomlin/PhaseII/Proj1/INITIAL/67_Full_C.py", line 44, in <module>
>>>>>>>>>> rank=036 L02: opt.run()
>>>>>>>>>> rank=036 L03: File
>>>>>>>>>> "/p/home/wwtomlin/ase/ase/optimize/optimize.py", line 148, in run
>>>>>>>>>> rank=036 L04: f = self.atoms.get_forces()
>>>>>>>>>> rank=036 L05: File "/p/home/wwtomlin/ase/ase/atoms.py", line 688,
>>>>>>>>>> in get_forces
>>>>>>>>>> rank=036 L06: forces = self._calc.get_forces(self)
>>>>>>>>>> rank=036 L07: File "/p/home/wwtomlin/gpaw/gpaw/aseinterface.py",
>>>>>>>>>> line 78, in get_forces
>>>>>>>>>> rank=036 L08:
>>>>>>>>>> force_call_to_set_positions=force_call_to_set_positions)
>>>>>>>>>> rank=036 L09: File "/p/home/wwtomlin/gpaw/gpaw/paw.py", line 272,
>>>>>>>>>> in calculate
>>>>>>>>>> rank=036 L10: self.set_positions(atoms)
>>>>>>>>>> rank=036 L11: File "/p/home/wwtomlin/gpaw/gpaw/paw.py", line 328,
>>>>>>>>>> in set_positions
>>>>>>>>>> rank=036 L12: spos_ac = self.initialize_positions(atoms)
>>>>>>>>>> rank=036 L13: File "/p/home/wwtomlin/gpaw/gpaw/paw.py", line 314,
>>>>>>>>>> in initialize_positions
>>>>>>>>>> rank=036 L14: self.synchronize_atoms()
>>>>>>>>>> rank=036 L15: File "/p/home/wwtomlin/gpaw/gpaw/paw.py", line 1034,
>>>>>>>>>> in synchronize_atoms
>>>>>>>>>> rank=036 L16: mpi.synchronize_atoms(self.atoms, self.wfs.world)
>>>>>>>>>> rank=036 L17: File "/p/home/wwtomlin/gpaw/gpaw/mpi/__init__.py",
>>>>>>>>>> line 714, in synchronize_atoms
>>>>>>>>>> rank=036 L18: err_ranks)
>>>>>>>>>> rank=036 L19: ValueError: ('Mismatch of Atoms objects. In debug
>>>>>>>>>> mode, atoms will be dumped to files.', array([ 5, 9, 13, 17, 18, 19,
>>>>>>>>>> 20, 21, 22, 23, 25, 27, 28,
>>>>>>>>>> rank=036 L20: 32, 33, 35, 37, 40, 41, 43, 44, 46,
>>>>>>>>>> 50, 51, 52, 53,
>>>>>>>>>> rank=036 L21: 54, 60, 62, 63, 64, 65, 68, 71, 72,
>>>>>>>>>> 74, 80, 82, 85,
>>>>>>>>>> rank=036 L22: 87, 90, 91, 94, 97, 98, 99, 100, 101,
>>>>>>>>>> 104, 106, 107, 110,
>>>>>>>>>> rank=036 L23: 111, 115, 116, 118, 123, 125, 129, 130, 137,
>>>>>>>>>> 138, 139, 142]))
>>>>>>>>>> GPAW CLEANUP (node 36): <type 'exceptions.ValueError'> occurred.
>>>>>>>>>> Calling MPI_Abort!
>>>>>>>>>>
>>>>>>>>>> ————
>>>>>>>>>>
>>>>>>>>>> I think this means my atoms are not in the same positions across
>>>>>>>>>> different cores, but I can’t figure out how this happened. Do you have any
>>>>>>>>>> suggestions?
>>>>>>>>>> Thank you
>>>>>>>>>> Warren
>>>>>>>>>> PhD Student
>>>>>>>>>> Naval Postgraduate School
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> gpaw-users mailing list
>>>>>>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>>>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>>>>>>
>>>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> gpaw-users mailing list
>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>>
>>>>
>>>> _______________________________________________
>>>> gpaw-users mailing list
>>>> gpaw-users at listserv.fysik.dtu.dk
>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>>
>>>
>>>
>>> _______________________________________________
>>> gpaw-users mailing list
>>> gpaw-users at listserv.fysik.dtu.dk
>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
More information about the gpaw-users
mailing list