Every Voice Counts: Digital Chemistry and Chemical Education

A comment on chemical education and digital chemistry, part of a collective Voices article published later this year.

comment
Author

Magdalena Lederbauer

Published

May 25, 2024

A flood of data

Each year, over 10’000 papers are published in the field of machine learning in chemistry, which equates to over 27 papers per day (1). As of May 2024, GitHub hosts nearly 12’000 repositories featuring the term “chemistry”. New chemicals are discovered at a rapid pace, pushing the boundaries of traditional laboratory practices through automation. As the field evolves, chemists need to handle large amounts of heterogeneous data.

Augmenting Chemistry with Digital Tools

Digital Chemistry is a young field that combines Computational Chemistry, Cheminformatics, Software Engineering and AI to accelerate the discovery of materials, apply machine learning methods to chemical problems and build actionable tools for researchers. As traditional chemistry emphasizes important hands-on synthetic and analytical methods, the recent digital shift requires strong skills in programming, navigating coding environments, data analysis, and developing reproducible machine learning pipelines. Critically assessing published works for reproducibility is essential, reflecting a broader trend of increasing retractions in scientific publishing (2).1

The Practical Implications

Similar to wet-lab chemistry, expertise in programming grows with experience. Effective learning in this domain is achieved through hands-on projects, from academic coursework over research to personal mini-projects. Building a machine learning pipeline, for example, mirrors the process of planning and conducting a chemical synthesis.

The community is encouraged to embrace mentorship and foster an environment of support and openness. As chemists increasingly transition to computer science, they bring a dual perspective that enriches our approach to both fields. Our collective goal is to maximize the utility of data and algorithms, integrating them seamlessly with established chemical knowledge.


In response to these needs, we have curated a comprehensive list of digital chemistry learning resources available on GitHub at mlederbauer/awesome-learning-digital-chemistry, promoting open contribution and accessibility to all interested parties.

Tip

Subscribe here to stay updated on coming posts.

1. Blaiszik, B. blaiszik/ml_publication_charts: AI/ML Publication Statistics for 2022, 2023. doi:10.5281/zenodo.7713954.
2. Van Noorden, R. More than 10,000 research papers were retracted in 2023 - a new record. Nature 2023, 624(7992), 479–481.

Footnotes

  1. Similar to the previous 10’000 papers being published in chemistry and ML each year, more than 10’000 papers were retracted in 2023 alone.↩︎