Work I did during my time at the Purdue Data Mine. The Data Mine is a learning community where students of any major can learn data science concepts and apply them to their projects. This is extremely beneficial for me because it was the first time I could use the Python libraries I learned to good use and get practice with them. For me, it offered real-life coding experience which isn't available in-class. the focus wasn't learning concepts, but applying what you know and what you learned
- Represented, extracted, manipulated, interpreted, transformed, and visualized big data sets. Moreover, I explored, analyzed, and communicated insights about the data sets.
- Working with Tensorflow, PySpark (along with Spark SQL and Apache Spark), MLib, GraphX, Extensible Markup Language (XML)
- Scraping and parsing websites using Selenium and BeautifulSoup
- Gathering data using different methods (XPath, By.CSS_SELECTOR, etc)
- Using PANDAS in a real-world setting: reading CSV files, exploring them and understanding what they're about, and working with them
- Using sklearn and tensorflow to make, compile, and train a model
- Cleaning dataframes using PANDAS, plotting using Plotly to visualize trends in data, and removing rows, columns, and specific entries to get a better understanding of trends