In the past,
we have received a number of queries about the status of the PDBbind core set.
We also noticed that there are some confusions in literature regarding the
naming convention of the CASF benchmark developed by our group. Here, we would
like to make a formal statement about the PDBbind core set and the CASF
benchmark in a hope to answer those queries and also clarify the confusion.
Our group has
a long-standing interesting in scoring function development. The PDBbind
database is a notable outcome along the path (Liu et al., Acc. Chem. Res. 2017, 50, 302-309). The PDBbind database is now
updated on an annual basis, and each release of PDBbind is named after the
release year, such as PDBbind v.2016, PDBbind v.2017, and so on. The
PDBbind database collects experimentally measured binding affinity data for
four types of molecular complexes, i.e. protein-ligand complexes, nucleic
acid-ligand complexes, protein-protein complexes, protein-nucleic acid
complexes. Among them, we have named the collection of protein-ligand
complexes as the "general set". We put a focus on this data set because
it is most relevant to drug design and discovery studies. Apparently, not every
entry in the general set is suitable for calibrating or validating
docking/scoring methods due to misc problems in 3D structure, binding data, and
other aspects. Therefore, we have selected the relatively
"healthy" entries from the general set to compile the so-called
"refined set". The refined set serves as a generally acceptable
data set for docking/scoring studies. Other researchers may apply the refined
set directly to their studies, or use the refined set as the starting point to
compile data sets with their own focus. Both the general set and the refined
set are updated with the PDBbind database on an annual basis. They should be
correctly cited as, for example, "the PDBbind general set v.2016",
"the PDBbind refined set v.2017", and so on.
As another part of our efforts, we have
established the CASF benchmark (Comparative Assessment of Scoring Functions),
which aims at providing an objective platform for assessing scoring functions. The
first published work was CASF-2007 (Cheng et al., J. Chem. Inf. Model. 2009, 49, 1079-1093). Another major update,
i.e. CASF-2013, was published a few years later (Li et al., J. Chem. Inf. Model. 2014, 54, 1700-1716; J. Chem. Inf. Model. 2014, 54, 1717-1736).
The CASF benchmark employs a high-quality set of protein-ligand complexes as the
primary test set. This data set is selected from the PDBbind refined set
through a systematic, non-redundant sampling procedure, which is named as the PDBbind
"core set" by us. Accordingly, each public release of the CASF
benchmark is named after the version of the PDBbind database from which the
test set is selected. For example, the test set in CASF-2007 was compiled based
on PDBbind v.2007, the test set in CASF-2013 was compiled based on PDBbind
v.2013, and so on. It is not a good idea to name each CASF benchmark by its
publish year. It is because we cannot predict when our paper will be published
in prior when we prepare the manuscript.
It is
important to point out that unlike the PDBbind database, the PDBbind core
set is not updated on an annual basis. As implied above, the PDBbind core
set is a component of the CASF benchmark rather than the PDBbind database. The
CASF benchmark is not updated on an annual basis due to the following reasons:
• A HUGE amount of efforts is
needed to finish each CASF update. The CASF benchmark is more than a simple
data set. For instead, it consists of a whole set of evaluation methods, the
test set, as well as a large panel of standard scoring functions to be tested
as demonstration. A lot of material needs to be prepared, and a lot of
computation needs to be conducted for each CASF update.
• Even if it were doable, in our opinion, there is no need to update
CASF so frequently. Our current plan is to update the CASF benchmark every
three years. In fact, we have already finished CASF-2016, and are preparing
a manuscript regarding it. We hope that this paper can be published in the year
of 2018.
As mentioned above, the last published version of the PDBbind core set is v.2013. This data set was not updated with PDBbind v.2014 and v.2015, so there is no PDBbind core set v.2014 and v.2015. For historical reasons, the PDBbind core set used to be included in the downloadable data package in some previous releases of PDBbind. To avoid further confusion, we have removed the core set from the data packages of recent releases of PDBbind (e.g. PDBbind v.2014, v.2015, v.2016, and v.2017). If needed, the user can obtain the information of the PDBbind core set in the data package of the corresponding CASF benchmark (e.g. CASF-2007 and CASF-2013), which is also downloadable from the PDBbind-CN web site.
In conclusion,
the take-home message is:
• The CASF benchmark should not
be referred to as the "PDBbind benchmark". There are such wrong
naming conventions in literature, and now you know what the correct one is.
• Data package of the CASF
benchmark can be downloaded from the PDBbind-CN web site under the
"CASF" tab (http://www.pdbbind-cn.org/casf.php). At this point, we do
not think it is necessary to set up two separate web sites to host PDBbind and
CASF, respectively.
• Currently, the latest public
release of the CASF benchmark is CASF-2013. There will be CASF-2016 soon.
We have received a good number of queries regarding the next release of PDBbind. PDBbind database has a long-standing tradition of regular annual update since its inception. However, it is already year 2023 but the available release is still version 2020 --- We understand your concern.
In fact, our team has been working diligently on PDBbind version 2021 in the past three years. It is important to note that version 2021 is not a regular update but the most significant update in the history of PDBbind, encompassing more binding data (increased by ~20%), new workflow for processing structures, new on-line functions, and a new cloud-based server. It turns out that achieving all these objectives requires much more efforts than what we had anticipated. After version 2021, we will be able to return to the tradition of annual update in the near future.
Our current plan is to relase PDBbind version 2021 officially before the new year of 2024. We would like to express our gratitude for your continued support of PDBbind. Please keep an eye on the new announcements put on this website.
Best wishes,
Prof. Renxiao Wang, on behalf of the PDBbind team
Department of Medicinal Chemistry, School of Pharmacy, Fudan University
Shanghai, P. R. China
E-mail: wangrx@fudan.edu.cn
Dear All,
We are excited to announce that the beta version of our new PDBbind+ web site is now ready for test. Starting from version 2021, all future new versions of the PDBbind database will be released solely on PDBbind+. We cordially invite you to experience the upgraded features of the PDBbind+ web site.
Current registered PDBbind users will be receiving an e-mail soon, from which his/her account on PDBbind+ can be activated directly after transferring his/her user profile on PDBbind-CN to the new web site. Others are encouraged to visit PDBbind+ at www.pdbbind-plus.org.cn. Registration on the PDBbind+ web site as a demo user is FREE. Demo users may access the contents of the PDBbind database up to version 2020 on the new web site.
We plan to release version 2021, as well as additional functional modules, on PDBbind+ once the beta test is completed. Official release of version 2021 is anticipated in this month, so please stay tuned. For the sake of current PDBbind users, the PDBbind-CN web site will still be up running as is, but no future update of PDBbind-CN is planned.
If you need any assistance or have any questions regarding PDBbind+, please feel free to reach us at support@pdbbind-plus.org.cn. Thank you for your continued support to the PDBbind database!
Best regards,
The PDBbind Team
School of Pharmacy, Fudan University
Dear valued PDBbind users,
We are pleased to announce the official release of PDBbind version 2024 on the PDBbind+ platform (https://www.pdbbind-plus.org.cn/). Note that the previous release is version 2021. It means we have chosen to provide version 2024 directly by skipping version 2022 and 2023. The PDBbind database will return to the track of regular annual update from now.
The key highlights of PDBbind version 2024 include:
(1) Expanded collection of binding data: The new release encompasses experimental binding affinity data for 33,660 biomolecular complexes sourced from the Protein Data Bank, marking a 23% growth from the previous release (version 2021). Compared to the last free-accessible release (version 2020), the growth reaches a significant level of 43%. PDBbind version 2024 provides binding data for >27300 protein-ligand complexes, >200 nucleic acid-ligand complexes, >4500 protein-protein complexes, and >1400 protein-nucleic acid complexes. This expansion enables a broader and deeper exploration of molecular interactions, such as training deep-learning models and so on.
(2) Carefully processed complex structural files: We have implemented a new workflow since version 2021 to ensure the structural files of protein-ligand complexes are processed properly to be compatible with other popular software (such as RDKit). As for version 2024, we have further refined this workflow to achieve even higher accuracy and reliability in data interpretation. This workflow has been applied to process nucleic acid-ligand complexes as well.
(3) Attention to Macrobiomolecular complexes: For the first time in the history of PDBbind, version 2024 now provides processed structural files for the protein-protein complexes and protein-nucleic acid complexes in PDBbind. For this purpose, necessary annotations are added to the binding data, so one can interpret the interacting chains in those complexes. A new workflow has been established to process and fix certain defects in the original PDB structural files. This new feature is expected to facilitate the computational research focusing on such molecular systems.
(4) New functions implemented on the PDBbind+ platform: Our web platform offers useful features for structural visualization and data analysis. Additionally, computational tools developed by our team, such as COMET (target-fishing for bioactive molecules) and PLANET (ultra-fast structure-based virtual screening), have been integrated to enrich user experience, while cloud resources facilitate efficient on-line computation. Even the demo users may use these computing services on PDBbind+.
As part of our commitment to fostering collaboration and knowledge exchange, registration on PDBbind+ as a demo user remains FREE. Demo users have access to a range of data and computing services free of charge. For those seeking access to the latest data collection and complete functions, we offer the users the option to purchase the PDBbind dataset (version 2024 for now) with a modest licensing fee. Upon becoming a paid user, you will unlock full access to all available data and computing services on PDBbind+.
It needs to be emphasized that the PDBbind+ platform is the only legal resource where one can obtain the PDBbind dataset since version 2021. Public re-distribution of the PDBbind dataset, or a derivative dataset, is prohibited by the user license agreement.
We express our heartfelt gratitude to you for your unwavering support, which serves as the cornerstone of our endeavors. Your feedback and engagement continue to inspire us as we strive to evolve the PDBbind database into a more valuable community resource.
With best regards,
The PDBbind+ Team
School of Pharmacy, Fudan University & TopScience Ltd., Shanghai
Dear PDBbind users,
Normally a new update of PDBbind is released in the fourth quarter each year. Unfortunately, this year this project is also affected by the COVID-19 pandemic. In addition, our team, as well as the PDBbind-CN server, is in the process of re-location, and thus a lot of extra work needs to be done. However, we will certainly keep the wheel rolling. We expect to release PDBbind v.2020 in the first quarter of 2021.
We wish you a happy and productive new year of 2021!
The PDBbind team
School of Pharmacy, Fudan University
Welcome to the PDBbind-CN Database!
Introduction. The aim of the PDBbind database is to provide a comprehensive collection of experimentally measured binding affinity data for all biomolecular complexes deposited in the Protein Data Bank (PDB). It provides an essential linkage between the energetic and structural information of those complexes, which is helpful for various computational and statistical studies on molecular recognition, drug discovery, and many more (see the list of published applications of PDBbind).
The PDBbind database was originally developed by Prof. Shaomeng Wang's group at the University of Michigan in USA, which was first released to the public in May, 2004. This database is now maintained and further developed by Prof. Renxiao Wang's group at College of Pharmacy, Fudan University in China. The PDBbind database is updated on an annual base to keep up with the growth of the Protein Data Bank.
PDBbind version 2024 is now released on PDBbind+ 01/24/2025 Invitation to the new PDBbind+ web site 02/03/2024
Current release.
The current release, i.e. version 2020, is based on the contents of PDB officially released at the first week in 2020. This release provides binding affinity data for a total of 23,496 biomolecular complexes in PDB, including protein-ligand (19,443), protein-protein (2,852), protein-nucleic acid (1,052), and nucleic acid-ligand complexes (149). Compared to the last release (v.2019), binding data included in this release have increased by ~10%. All binding data are curated by ourselves from ~40,500 original references. Click here for a brief introduction to the PDBbind database (PDF).
A special remark on the PDBbind core set.
Compilation of the PDBbind core set aims at providing a relatively small set of high-quality protein-ligand complexes for validating docking/scoring methods. The data set is selected based on the contents of PDBbind. In particular, this data set has served as the primary test set in the popular Comparative Assessment of Scoring Functions (CASF) benchmark developed by our group. The PDBbind core set is not included in the PDBbind data package because it is not updated annually as PDBbind itself. Users can obtain the PDBbind core set by downloading the CASF data package at http://www.pdbbind.org.cn/casf.php. The latest available version of the PDBbind core set is included in CASF-2016, which consists of 285 protein-ligand complexes.
Accessibility.
The basic information of each complex in PDBbind is completely open for access (see the [BROWSE] page). Users are required to register under a license agreement in order to utilize the searching functions provided on this web site or to download PDBbind data sets in bulk. Registration is currently free of charge to all academic and industrial users. Please go to the [REGISTER] page and follow the instructions to complete registration.
Acknowledgments.
This project is financially supported by the Ministry of Science and Technology of China (National Key Research Program, Grant No. 2016YFA0502302) and the National Natural Science Foundation of China (Grant No. 81725022, 81430083, 21661162003, 21673276, 21472227, 21472226). We are very grateful to Prof. Zenghui (John) Zhang's group at the East China Normal University for their aid to version 2015, 2016, and 2017.