Training objectives
Nextbit conducts instructor led training programs for client companies, this extensive training includes more then ten distinct modules that cover a spectrum of topics: Data Governance, Data Science, Data Visualization, Open Data, Innovation, Emerging Technologies. The modules are in part lecture based and in part hands-on introduction to data analysis and machine learning. The objective is to give a sound overview to participants to the terminology, the methodologies, the possible applications of these emerging technologies and innovative approaches, generate visual experiences that enable users to interact with machines and data.
A sample training course
Overview of Data Governance
- What is Data Governance?
- Why is it critical in business?
- Introducing the concept of entities, attributes, relationships
- Creating & capturing the necessary metadata
Data Quality
- Implementation of algorithms for data certification processes
- Setting the baseline and KPI for data quality improvement
Innovative Data Architectures
- Open Source vs “traditional” software solutions
- Hadoop Architecture overview
Overview of Data Sources
- Open data sources
- Introduction to Python for connecting to open data to enrich internal datasets
Data Science
- Introduction to Machine learning using Python, R, Scala
- Supervised and unsupervised learning (including K-means, random forest, ensemble models)
Data Visualization & user interface
- The importance of Data Visualization and its effect on interpretation
- Matching the user needs and objectives with an appropriate data experience
Agile Methodology
- Agile Project Management and typical Data Analysis project lifecycle
Learning Objectives
At the end of this course, participants will be able to:
- Understand concepts of data science and different types of machine learning algorithms
- Explore, analyse and transform data using Python and R
- Gain an understanding of and experience implementing common machine-learning techniques
- Gain familiarity with the most commonly used machine learning libraries in Python, R and
Spark
- Generate data visualisations to communicate findings in the data