Curated Dataset of Association Constants Between a Cyclodextrin and a Guest for Machine Learning: Raw Data and Generation Script
Determining the association constant between a cyclodextrin and a guest molecule is an important task for various applications in various industrial and academical fields. However, such a task is time consuming, tedious and requires samples of both molecules. A significant number of association constants and relevant data is available from the literature. The availability of data makes the use of machine learning techniques to predict association constants possible. However, such data is mainly available from tables in articles or appendices. It is necessary to make them available in a computer friendly format and to curate them. Furthermore, the raw data need to be enriched with physicochemical information about each molecule and when such information does not allow to discriminate molecules, some additional data is needed. We present a dataset built from data gathered from the literature. The dataset contains both the original raw data from the articles and the enriched ones. We also provide the scripts used to curate and enrich the raw data.