A Data Science Case Study

wittytech5.png

Are you ready to take your Data Science skills to the next level? Have you heard enough about machine learning and ready to put it to good use? Witty Tech Webinars has all you need to get you to the finish line.

This week we covered Part 2 of the fundamentals of Data Science in the Witty Tech Webinars series, introducing the most popular data science libraries through a hands-on case study of the Iris Dataset. Using Pandas, Matplotlib, Scikit-learn, we predicted the Iris plant species with ~98% accuracy! Just imagine the applications of these techniques to address humanity’s most pressing matters.

Here’s the TL;DR on what we covered

Together we journeyed through the machine learning lifecycle: from data processing to model training and finally interpreting our results. In the data processing phase, we use Pandas to read in the Iris Dataset as a data frame, which is a data structure that can be used to represent databases, CSV files, or matrices. Visualizing and analyzing our data is now a piece of cake! Boxplots and histograms with Matplotlib show us the distribution of our features and provide insight into which machine learning methods would best fit our data.

During the model training and result analysis phases, we employ the power of cross-validation to resample our data and maximize model accuracy. With Scikit-learn, we test the predictive abilities of 6 machine learning algorithms resulting in ~97% accuracy rates! Check out our case study on GitHub or rewatch the webinar recording for a guided review.

Taking your skills to the next level

  • Feature Engineering -  is the process of creating and selecting features from the data that are useful for machine learning algorithms.

  • Decision Trees - ask a series of questions about the features to predict what the outcome should be with the added advantage that they can be used for both regression and classification.

  • Unsupervised Learning - a set of techniques designed to explore and find "hidden structure" rather than predict outcomes, typically applied on data without predicted outcomes (this is true for most data).

  • Deep Learning - is a machine learning technique that teaches a computer to filter inputs (observations in the form of images, text, or sound) through layers in order to learn how to predict and classify information.

Now what?

You’re finally ready to apply your newfound skills to answer those questions keeping you up at night. Looking for more practice? Kaggle hosts data science competitions for data enthusiasts of any level! And if you can’t get enough of us, join us on 5/20 for Part 2 of our Introduction to React!

See you there!

The WIT Project