ARIA will be unavailable on the morning of Tuesday 29th April 2025 due to scheduled maintenance. Please plan work accordingly.

Instruct-ERIC Events

Machine Learning for Complex Datasets with ChatGPT

Training
Registration Date: 01-Apr-2025 to 30-Apr-2025
Date: 30-Apr-2025

This one-day workshop, led by Nikolay Oskolkov from Lund University, provides a comprehensive introduction to machine learning techniques in data analysis, focusing on both theoretical knowledge and practical coding skills in R and Python while using ChatGPT as a research assistant for coding and interpreting results. Participants will learn to implement from scratch and optimize algorithms such as neural networks, random forest, k-means clustering, Gaussian Mixture Model (GMM), and Markov Chain Monte Carlo (MCMC), making it an essential resource for advancing research in statistics and data science.

Machine learning has become an indispensable tool in the field of computational data analysis, offering powerful techniques to analyze and interpret complex data. As the volume of data continues to grow exponentially, the ability to apply machine learning algorithms effectively is crucial for advancing research across the biological, environmental, health, and social sciences — as well as statistics and engineering. Nikolay Oskolkov from Molecular Biosciences, Lund University, brings his extensive expertise to this one-day workshop, showing how to use AI tools like ChatGPT alongside R and Python for advanced data analysis, thereby equipping participants with both theoretical knowledge and practical skills in this cutting-edge area.
This workshop is particularly valuable for PhD students, academics, and professional researchers who are looking to enhance their analytical capabilities using machine learning. By focusing on practical applications in R and Python with coding and interpretation assistance from ChatGPT, participants will not only learn the theoretical underpinnings of various machine learning algorithms but also gain hands-on experience in coding these algorithms from scratch in extremely intuitive ways. This dual approach ensures that attendees can immediately apply what they learn to their own research projects, making the workshop an essential investment for anyone involved in computational data analysis.

Workshop Topics and Learning Objectives:

  • Introduction to machine learning in data science and computational biology

  • Limitations of traditional statistics and need for machine learning approach

  • Understanding principles of neural networks and their applications

  • Coding gradient descent and neural network from scratch in R and Python and optimizing with ChatGPT

  • Choice of machine learning algorithm for tabular, image, text and time series data

  • Implementing random forest algorithm from scratch in R and Python and improving with ChatGPT

  • K-means clustering and Gaussian Mixture Model (GMM) in R and Python

  • Markov Chain Monte Carlo (MCMC) methods in R and Python for bioinformatics and genomics

  • Applications of Autoencoder neural network for integration of heterogeneous data

  • Case studies and real-world applications with live coding and interpretation with ChatGPT

  • Troubleshooting and optimizing machine learning models, hyperparameter tuning

  • Ethical considerations and best practices in machine learning research