A flood of data
Each year, over 10’000 papers are published in the field of machine learning in chemistry, which equates to over 27 papers per day (1). As of May 2024, GitHub hosts nearly 12’000 repositories featuring the term “chemistry”. New chemicals are discovered at a rapid pace, pushing the boundaries of traditional laboratory practices through automation. As the field evolves, chemists need to handle large amounts of heterogeneous data.
Augmenting Chemistry with Digital Tools
Digital Chemistry is a young field that combines Computational Chemistry, Cheminformatics, Software Engineering and AI to accelerate the discovery of materials, apply machine learning methods to chemical problems and build actionable tools for researchers. As traditional chemistry emphasizes important hands-on synthetic and analytical methods, the recent digital shift requires strong skills in programming, navigating coding environments, data analysis, and developing reproducible machine learning pipelines. Critically assessing published works for reproducibility is essential, reflecting a broader trend of increasing retractions in scientific publishing (2).1
The Practical Implications
Similar to wet-lab chemistry, expertise in programming grows with experience. Effective learning in this domain is achieved through hands-on projects, from academic coursework over research to personal mini-projects. Building a machine learning pipeline, for example, mirrors the process of planning and conducting a chemical synthesis.
The community is encouraged to embrace mentorship and foster an environment of support and openness. As chemists increasingly transition to computer science, they bring a dual perspective that enriches our approach to both fields. Our collective goal is to maximize the utility of data and algorithms, integrating them seamlessly with established chemical knowledge.
In response to these needs, we have curated a comprehensive list of digital chemistry learning resources available on GitHub at mlederbauer/awesome-learning-digital-chemistry, promoting open contribution and accessibility to all interested parties.
Subscribe here to stay updated on coming posts.
Footnotes
Similar to the previous 10’000 papers being published in chemistry and ML each year, more than 10’000 papers were retracted in 2023 alone.↩︎