jschluger / orie4741-project Goto Github PK
View Code? Open in Web Editor NEWUsing linear modeling to predict domestic flight prices in the US. Term project for Learning with Big Messy Data. Check out FinalReport.pdf for results!
Using linear modeling to predict domestic flight prices in the US. Term project for Learning with Big Messy Data. Check out FinalReport.pdf for results!
This project aims to find the ideal price of US domestic flight tickets in order to maximize profit. It uses as dataset a the US DOT's domestic airline consumer airfare report.
I really like how you spent some time describing why you're pursuing this project, and you have data to back it up! Additionally, I appreciated how you were very explicit about which features you were using and why you were dropping some. I was left with no questions about that. I think overall, you outlined things very clearly. You made sure to address most, if not all questions the reader may have, and laid all of the information out very clearly. I said that twice for emphasis-- this was an easy to read and understand project report, at least from my perspective. Also, you had a very in-depth analysis about the model's viability as a weapon of math destruction, going through each part of the WMD definition.
It would've been nice if Figure 2 was on the same page as when you described it. I had to scroll back and forth to compare what you were saying with the visualization. If I have to nitpick, I think you could've defined extrapolation vs interpolation one more time for less technical readers, and maybe in different words. Additionally, it may have been nice to see more visuals- not just tables of information, but more charts and graphs.
Overall, great job!
This project analyzes the dataset from the US Department of Transportation’s Domestic Airline Consumer Airfare Report from 2019. The group is trying to predict the domestic flight prices assuming a normal year of travel. The data contains an origin airport and city, destination airport and city, year, time of year (quarter), average fare, average fare for the carrier with the largest market share, average fare for the lowest carrier, number of miles, passengers per day, and geocoded information.
Things I like
Areas for improvement
The main goal of the “Domestic_Flight_Predictions” project is to analyze the data from the US Department of Transportation’s Domestic Airline Consumer Airfare Report from 2019. This dataset contains information on flight’s origin airport and city, destination airport and city, year, time of year, average fare, average fare for the carrier with the largest market share, average fare for the lowest carrier, number of miles, passengers per day, and geocoded information.
One thing I like about this project midterm report are the histograms of the data. This graphics help get a sense of the information in the dataset. Another thing I like about this report is the use of k-fold cross validation to avoid overfitting. I also liked this group's process going forward.
One improvement I would suggest for this project is to include graphs for the preliminary analysis on the data. This would help the reader better understand the models explained in the explanation. Another improvement I would suggest is to experiment with other models beside the linear regression model because the MSE appears to be pretty high. I would also suggest you state the project objective clearly in the beginning of the report.
Summary
The project is looking at the price of domestic airlines, aiming to come up with prices for airline companies that are both profitable and competitive in the market. The data using mainly comes from the United States Bureau of Transportation and online airline trackers.
Things I like
1.The result of the project can be applied easily since travel by plane is now a very common way of transportation and has great significance for users when choosing the most affordable flight.
2.The objective of the project is very clear and feasible to me since most of the variables needed for predicting a domestic flight price can be obtained.
3.Both data sets chosen are very extensive, and they will serve good purposes for the project.
Things I think could be improved
1.Using data from online airline trackers seems interesting but I am not sure how you are going to process the data. It can be time-consuming based on your approach to extract the data and then clean it.
2.It seems to me that the data from the United States Bureau of Transportation can be excessive since if you are extracting the data from online airline trackers, I do not think you still need the data for each airport at different quarters in different years as the data from “Average Domestic Airline Itinerary Fares By Origin City” listed since you are looking at specific flights instead of the average of airports. This, of course, will depend on much data you are able to extract from the online tracker.
3.The objective is a little contradicting since you are trying to find both the most profitable price for airline companies and at the same time, the maximized trip for consumers within their budgets. I would suggest focusing on just one side.
This project takes the prices of domestic flights from 1993 to 2019 and predicts prices based on that. The dataset includes information on origin airport and city, destination airport and city, year, time of year (quarter), average fare, average fare for the carrier with the largest market share, average fare for the lowest carrier, number of miles, passengers per day, and geocoded information.
Things I like:
Things to improve:
This group aims to use data from the US Department of Transportation’s Domestic Airline Consumer Airfare Report to analyze the cost of domestic flights given a host of financial and geographical data on more than 200,000 flights.
Some comments:
(+) The explanation of the visualizations was great, a lot of detail and effort was put in so that the reader understood not only what the visualization meant, but why it was important
(+) The "Avoiding Overfitting" section was thoroughly detailed, it is important to state modeling assumptions and say that your approach won't be robust to outliers such as 2020.
(+) Your analysis of your least-squares model is detailed and thorough
(-) Might be better in the "large_ms" density to display actual probabilities for the y-axis so the interpretation is more straightforward for the reader (I can see the spike but what does the value of 2.5 mean?)
(-) For your modeling section, you don't really need to specify that you used the Julia backslash operator, seems superfluous
(-) There seems to be some redundant information in your report (i.e. your definition of the "when" column and your use of k-fold cross-validation) perhaps consider removing some of these for a more streamlined final outcome
(-) What did cleaning the data look like? Did you simply remove all entries with missing data? Will this affect any future modeling decisions you make? More information on this would be good.
Overall great work! Excited for the final project
Summary
This group worked to build models interpolating and extrapolating airline ticket prices given a dataset from the US department of transportation. They build many different linear models with hyper-parameter tuning over different feature sets in order to accomplish this. Many of their final models performed well, under one standard deviation from the dataset.
What I liked
I thought it was thorough and interesting that you separated your models into the two categories of interpolation and extrapolation.
The tables comparing your models, the feature set used on each model, and the results were really well done and organized. It made interpreting the results of your project really easy.
I thought you did a good job analyzing the applications of your model for both consumers and airlines including the note of where your model would still predict well given the COVID pandemic.
Areas of improvement
Summary of Project:
The objective of the project is to identify the optimal price point of domestic flights from the perspective of an airline company. The ideal price point will be defined by the profit maximizing price that allows the airline to remain competitive. The dataset that will be used is the Average Domestic Airline Itinerary Fares By Origin City dataset from the US Bureau of Transportation as well as data from online fare trackers.
What I like:
Areas for improvement:
Overall, it’s a very interesting project and I look forward to seeing how it evolves over the course of the semester!
The group sought to predict the airline fare of historical flight data in order to advise airline companies on the best pricing to be competitive and make a profit. First they fit a baseline model using basic linear regression to understand the strengths and weakness of more developed models on the same data. They fit over a thousand models to their data set and were able to create accurate models which produced errors largely within a standard deviation of the actual price.
Great work!
Summary
The report detailed the group's process of cleaning, feature selection, model selection + tuning, and analysis of flight pricing data. Understanding pricing and demand for flights is valuable for both consumers and airline companies, because knowing trends from seasonality or other features can help users make informed decisions. The final model was optimized over 4 different feature sets and 1280 different linear models.
What I Liked
Areas of Improvement
There was a clear methodology and the supporting tables and graphs enhanced my understanding. Great work!
This group seeks to analyze historical data so as to find the optimal price of flights for an airline company from a specific set of destinations. Their ultimate objective is to use this data analysis to help airlines with the pricing of future flights. The data sets they're using are largely historical data from the US Bureau of Transportation, and from an online airfare tracker called FareDetective.
Things I like:
Areas of Improvement:
Under the assumption that airlines aren't currently pricing flights optimally, I think historical data isn't necessarily the best indicator of the most profitable pricing points. If I understand your project correctly, without a set of data that has the profit of each flight, it would be difficult to find the best price. This feels more like an optimization problem to me rather than a simple data analysis. Though, I could be misinterpreting it.
For someone who is not interested in the airline industry, this may be a little bland. It might be good to emphasize how this project could affect the average person.
"...in addition to helping consumers decide when and where to fly to maximize a trip within their budget." This line in the proposal is a little vague - how do you maximize a trip?
Summary
The group analyzed flight pricing data in order to derive insights that could potentially benefit both customers and airline companies. They aim to provide this insight to airline companies so they can price their flights better. The group uses a datset from the US Department of Transportation consumer airfare report.
3 Things I Liked
Potential improvements
Overall, great report!
This project is about discovering patterns to determine optimal price points for airlines to price their flights. The group also would like to determine what the optimal buying price is from the customers standpoint. They will be using datasets from the United States Bureau of Transportation from year 1993-2020 because it provides historical data on airline pricing. They will also use a website for specifics on certain flights.
Certain aspects I think are especially good about this proposal is the fact that it really outlines the value of conducting this study. Including the statistic of aviations impact on the GDP definitely helps assert the value and importance of this project. Furthermore, I think that the United States Bureau of Transportation’s data is an excellent dataset for this project. It has tons of historical data and it definitely brings up more interesting questions for the group to explore (i.e. will they evaluate how the optimal price changes over time? Will they try to do predictive analytics on the optimal price). The third thing that I liked about this proposal is that they narrowed the scope of the problem to only evaluating Domestic Flights. I think this was a smart move as including international flights would perhaps make the problem a little too Messy and the group may have been unable to get meaningful results by the time the project was over. Narrowing it down to domestic flights definitely make the project more feasible ,and the time can be used to explore different methods more than preprocessing the data.
While this project is definitely really interesting, I think that it would be stronger if it improved in three areas. The first being establishing how the online websites would be used. Would the group have to web scrape the data or write a script that automatically puts in a bunch of destinations and starting points., or would it have to be done manually? The second aspect that they should consider is what has already been done in the field. Websites like kayak and Expedia already have their own algorithms to determine optimal flight paths by budget. Seeing what work is already done in the field may serve as an excellent baseline to start. Finally, I think the most concerning part of the project is the lack of definition on a specific problem and specific factors. I think looking at it from both the airlines perspective and the customers perspective maybe rather challenging to accomplish in this short period of time. Also, there are so many more factors that can be chosen from to make even one of those two problem extremely complex already. As they have not clearly defined the aspect they wish to evaluate and what the Input and Output will be several questions come to mind. For example, will they factor in destination variability or just consider one destination and one starting point? If this is a predictive model, how will they handle the non normal flight trends from COVID? Will they consider factors such as weather that may not be in the data set but are readily available? I definitely think this is what makes this project interesting but it might be great to start thinking about now.
Overall, I’m super excited to see the work to come from this project and the direction they decide to take it! Great work :)
This project looks at a dataset from the U.S. Department of Transportation's Domestic Airline Consumer report. It contains information about where flights are going from and to, time of year. The goal is try to predict the fare of a flight.
3 things I like:
I like how in the midterm report you used headings to make it clear which part of the midterm report requirements you were addressing. I also like how large and robust the dataset is. I also liked the detailed description of how you guys hope to proceed in the remainder of the project, mentioning what features you want to add to your predictive analysis and how you want to use one hot encoding.
3 things for improvement:
I felt like the writeup was a little clunky and hastily put together, this could be easily fixed by some more effort spent in formatting. I also felt like more of the midterm report should have been spent discussing model/models you've already fitted on your data--I think you guys only had a couple of lines on it. Additionally I felt like you could have included a couple more visualizations than the ones you did. I for one would have liked to see a correlation matrix.
Domestic Flight Predictions:
Summary: This project is about predicting optimal airline prices. The objective is to predict an optimal domestic flight price that maximizes an airline company’s profit while also staying competitive within the airline market. They plan on using the United States Bureau of Transportation which lists “Average Domestic Airline Itinerary Fares By Origin City”2 from years 1993 to 2020, as well as online airfare trackers such as Fare Detective3, which provides historical pricing data on flights between any two airports within the United States.
Positive Feedback:
Constructive Feedback:
Summary:
The project is trying to analyze the dataset from the US Department of Transportation’s Domestic Airline Consumer Airfare Report from 2019. The dataset contains 213175 rows and 23 columns, which after cleaning contains 201392 rows and 24 columns, describing information for airport pair markets. They have added a “when” column to the dataset that is calculated by the formula: when = year + (quarter - 1)/4. The rows contain an origin airport and city, destination airport and city, year, time of year (quarter), average fare, average fare for the carrier with the largest market share, average fare for the lowest carrier, number of miles, passengers per day, and geocoded information.
Things I liked:
Areas for improvement:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.