Giter Club home page Giter Club logo

ccr_data_x_f20's Introduction

CCR_data_x_f20

Link to Google Drive: https://drive.google.com/drive/folders/1tuFWDCLsI0qT6JOR9Xgl9srEzAx_EocH?usp=sharing

Summary

UC Berkeley is very fortunate to offer over 150 different undergraduate majors and minors, which provides its students with the opportunity to experience a multitude of different fields of study. With that being said, there are almost too many options to choose from. In fact, UC Berkeley offered almost 7,000 different classes in just the fall 2020 semester alone. This makes it not only impossible to know all the classes offered but also difficult to decide. Not to mention, class selection can be especially challenging for new majors like data science, that do not yet have a set path. This often leads to an extensive amount of time spent switching between sites such as the berkeley class catalog, berkeleytime.com for grade distributions, and ratemyprofessor.com for professor ratings.

However, a class recommendation system such as Cal Class Recommender, or CCR for short, can make the class selection process much easier. CCR provides data science related class recommendations based on classes that were enjoyed by other data science students while considering the natural order of classes. With the use of CCR, class selection is now quicker and easier than ever which is especially important when self-navigating during these times of virtual learning.

CCR utilizes content-based and collaborative filtering. The recommendation engine takes in a class that a student is currently taking as its input, analyzes the contents (such as the title of the class). Then, it figures out which other users have taken similar classes. It will then rank similar students according to their similarity scores and recommend the most relevant classes to the student. For example, if the system detects that user A is the most similar to user B, then if user A has taken a class that user B has not, the class will get recommended to user B and vice-versa.

Project Components

1. clean

We gathered our data through surveys sent in IND ENG 135, Econ 140, and IND ENG 95. Using Python's pandas library, we loaded the .csv files into a dataframe. After doing so, we lowercased all letters, removed spaces, and converted class names such that they are consistent. For example, some students entered "Data-X" while others entered "INDENG 135" so we converted them all to "indeng135". We compiled the information into a dataframe containing:

  • Major
  • Last Semester Classes
  • This Semester Classes
  • Recommendation Classes

2. `combine_features'

We then created a list of features, which was composed of all the classes taken last semester, this semester, and recommended. We combined these features so that they may be fed into our recommendation engine.

3. ccr_recommendation

Next, the similarity between two users was found through the cosine similarity score of their coursework. This was done by representing the class titles as vectors and by importing CountVectorizer from sci-kit learn’s feature extraction text package. Ultimately, we fed the combined string into CountVectorizer and called cosine_similarity on the matrix to get similarities among users. Our next step was to get the title of the class the user recommends. After obtaining the title, we found the index of the class and accessed the row corresponding to this class in the similarity matrix. Thus, we get the similarity scores of all the other classes from the current one. Afterwards, we enumerated through all the similarity scores to make a tuple of class index and similarity score. Next, we sorted the list of similar classes according to similarity scores in descending order and output the first entry at the top of the list.

ccr_data_x_f20's People

Contributors

khuynh2021 avatar chayabakshi avatar

Watchers

Johann Schleier-Smith avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.