scottonestak Goto Github PK

followers: 0.0 following: 0.0 repos: 19.0 gists: 0.0

Name: Scott Onestak

Type: User

Bio: Trying to make sense of the world with extremely messy data.

Twitter: ScottOnestak

Location: Pittsburgh, PA

Scott Onestak's Projects

2018housepredict

This is a 2018 House of Representatives forecasting model.

allegheny_tax_map

Scrape data to visualize Allegheny County local and property tax by municipality

fp-growth

This is an implementation of the FP-Growth Algorithm.

TEST CLASS (THE MEAT): To begin, I began by reading in the in the data line by line and storing it either as a vertex (intersections) or and edge (roads). This was done by creating the vertex and then storing it in the vertexMap and countVertex hashmaps. Edges were created and stored in the countEdgeMap. In addition to that, I create an Adjacent with the edges. These store in a linked list in each of its vertices the vertex it connects to and the weight from itself to that vertex. As I'm going through the vertices, I also keep track of the largest and smallest longitudes and latitudes to draw later. Believe it or not, this is the operation that takes the longest time according to my timing. It has a O(n) runtime. Below reading in the file, I have the program call the correct operations based on the count of command line arguments and the commands passed in. Since no operations besides if...if else statements and assignments are performed here, these seeming intricate commands have a constant time runtime. What they call however, do not. Below that, outside the main method is my getPath, Dijkstra, and findSmallestVertex method - all part of finding the shortest path between two points. I'm not exactly sure of the runtime, but I can explain why I think it has a O(n log n) runtime, but almost always runs faster. The method begins with the getPath method (the wrapper) calling dijkstra. I used the pseudocode provided from lab 20 to implement dijkstra's. However, I cut the method short by passing in the vertex we're going to. Therefore, it cuts the method short from finding every single node. So, if two vertices are side by side, the method can have a constant runtime. Also, the only way this part of the method gets to O(n) is if the two vertices passed in are the farthest vertices from each other in the entire graph. For finding the smallest vertex, I keep track of vertices on the edge of the graph. When I was keeping track of all the vertices, Monroe county ran in 3 minutes to find the smallest path. Therefore, by keeping the arraylist of reached vertices, I am able to drastically reduce the runtime. To the point where the algorithm runs and displays in under 8 seconds for New York State. This is done by keeping track of known vertices, and if they are known, removing them from the arraylist. Therefore, the arraylist only keeps track of the current boundaries of the branching graph. Instead of having potentially millions of elements stored in the arraylist, the example provided in the output (which goes essentially across New York State only reaches a size of about 130 by the end of the algorithm. After the vertex we're going to is known, dikstra stops and the algorithm goes back to getPath, where it unwraps it. This is done by starting at the vertex we're going to and working our way back up the tree through the parents. It's not an AVL tree, but we can expect this unwrapping to take on a log n runtime like trees do. Below is my getMeridianPath method and prim. Prim runs essentially the same as Dijkstra's, except we must hit every node, so it has a O(n) runtime. Additionally, we must hit every node unwrapping it, so this also has a O(n) runtime, giving the entire runtime of implementation a O(n^2 runtime). EDGE CLASS: The edge class is a fairly simple class, as almost all from here on out will. The edge class is a storage container for edges. It takes in two vertices. From here, we can calculate the weight of that road using Haversine's formula. Citation provided in the code. VERTEX CLASS: The vertex class is also another storage container. It stores vertices by taking in a number, name, latitude, and longitude. I initialize it's known to false and path to null. Additionally, I create an arrayList of Adjacents for each vertex. ADJACENTS CLASS: Adjacents is a simple class. It takes in a vertex and weight of the edge. It then is stored in the appropriate vertex arrayList. DRAWMAP CLASS: DrawMap has three different initializing methods depending on what parameters are passed in - one to just show the graph, another for the shortest path, and another for the minimum weight spanning tree. Based on which is called determines the runtime. Just showing the graph should take O(n). However, the amount of lines to be inserted get larger depending on the size of the other array passed in, so the other two may be something like O(n+m), which is still O(n), but will take longer because there are more elements. For the 2D drawing, I cited the stackOverflow website I found information on that. Additionally, I cited the site that helped me with color combinations. The math for drawing the lines probably looks insane. However, I sat down and calculated that, and it does draw it correctly. However, I don't think there's enough time to explain or space to explain the math. I have it though. From there, I loop through the list and print out the the appropriate lines depending on which arraylists are filled. If I am drawing Dijkstra's, I also put endpoints on the lines.

outlierdetection

Detecting outliers in Yahoo's webscope anomaly detection data sets using different outlier detection algorithms.

pima-knn-naivebayes

pointlocation

I build a binary tree that took in points of line segments and build a tree with those in order to determine whether the test cases of points lie in the same region or are separated by a line, and if so return that line. TEST.JAVA: To do this, I began by creating a buffered reader that read line by line. The first told us how many of the lines are line segments and the rest are points. I then used the input from the first line of the file to determine the size of the for loop that creates the line segments and inserts them into the tree. From there, I was able to read in the points until the end of the file. I also created the JFrame and set up the GUI in this class. POINT.JAVA: Exactly what it sounds like. Point takes in two values (x and y) and creates a point with them. LINESEGMENT.JAVA: Takes in 4 values x1,y1,x2,and y2 and makes them into a line segment. It also contains the method to check whether the lines intersect (citation for helping code in the code). Additionally, it has a boolean method to say true or false if they intersect and a toString method. TREENODE.JAVA: TreeNode creates the tree node. It also contains many important methods I call, such as the printInOrder method that prints the tree out in order, which I use in the main method. Additionally, it also contains a method to print out a node, which is a lineSegment. I also count the external nodes, the length of the path of the external nodes, and the average path length of the tree. The average path length of the tree should grow at a rate fairly log n. This is because to go down another level, there is an additional two nodes for the node before. Therefore, the tree is expanding at a rate 2n every level. However, since this is not an AVL tree and does not balance, it will be less efficient and therefore could grow at rate n if all the nodes were inserted to the left or right of each other, making a linked list. BINARYTREE.JAVA: This is the meat of the program. To insert, I have a public and private insert. The public only inserts into the tree if the root is null. Besides that, it goes through all the test cases, using the ccw given to us to insert the line segments into the proper nodes. I won't go though all the cases, because that would make a long-winded explanation even longer. After that, I have my search method, which takes in the root of the tree and the two test points. We call the ccw on the test points and traverse them through the tree based on whether they are above or below the current node (whether they are clockwise, counter-clockwise, or collinear (which they shouldn't be). We know that if they return different values for the same node, then that is the line that splits them, so I return that line and print it out. Otherwise, we check to see if the next node is null. If it is, then we have reached the bottom of the tree and they are on the same side of the bottom node, meaning they are in the same region, so I return null. If the next node is not null, however, we call the search method with the next node as the head. MYMOUSELISTENER.JAVA: Very simple. If the person clicks, for the first one it adds the point, and for the second it checks them. Same as the binary tree, just with points determined by the mouse click. UNITSQUARE.JAVA: In this method, I draw the line segments first. It is important to note that the project drawing are wrong in this case because the origin for GUI is not the bottom left hand corner, it is the upper left hand corner. Therefore, I had to flip my x-values so the lines would match up correctly to the regions. After drawing the lines and displaying the message to select the points, when the person clicks the first time I display the coordinate of that point. Then, when they click the second point, I display the coordinate and the same message the user receives from the regular method which tells the person whether the points are in the same region or if they are in a different region and what line splits them. This can be repeated as long as the user likes.

scottonestak Goto Github PK

Scott Onestak's Projects

Recommend Projects

Recommend Topics

Recommend Org