Some issues on the size of the dataset #135

Wangyt54549 · 2021-09-13T08:32:08Z

Wangyt54549
Sep 13, 2021

Hi everyone,
I'm trying to repeat the simulation of n-c12 pyrolysis (https://dx.doi.org/10.1021/acs.energyfuels.0c03211).

I've got the dump file from 1 ps MD with the parameters from the original paper, and use the MDDatasetBuilder with the Line," datasetbuilder -d c12.dump -c 3.5 -a C H -n c12". Then a dataset of 20,493 structures(.xyz/.gjf) is created, and this is much larger than the initial dataset of 590 clusters in the paper.

I wonder if any other operations is needed? or the k-means clustering algorithm in sklearn is not performed correctly? or this is a common result?

njzjz · 2021-09-13T20:24:16Z

njzjz
Sep 13, 2021
Maintainer

You can use --size or -s command to specify the size of each data set generated by MDDatasetBuilder. In this paper, we use -s 100 instead of the default value 10000.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some issues on the size of the dataset #135

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Some issues on the size of the dataset #135

Wangyt54549 Sep 13, 2021

Replies: 1 comment

njzjz Sep 13, 2021 Maintainer

Wangyt54549
Sep 13, 2021

njzjz
Sep 13, 2021
Maintainer