Lefteris Souflas's Projects
"DressMeUp" project utilizes fashion images and color combinations to achieve image classification for clothing combinations. Algorithms include SGD (SVM), Passive Aggressive Classifier, ResNet50 CNN, and EfficientNetV2-S CNN with K-Means for color analysis. Achieved accuracy exceeds 90%. Built with Python, Scikit-Learn, TensorFlow, and Streamlit.
Explore data virtualization and query performance optimization with Apache Drill, Hive, and Impala. Tasks include comparing virtualization precision, proposing solutions for a bookstore's diverse data formats, creating Impala databases, and addressing query performance issues. The report offers practical insights and commands for implementation
Azure Stream Analytics processes ATM transaction data streams, employing Event Hub, Storage, and Stream Analytics Job. Queries include total amounts and alerts. The setup and query execution process are documented with screenshots.
Three business analytics case studies were undertaken, encompassing market basket analysis, customer segmentation, and campaign management. SAS Visual Data Mining and Machine Learning on SAS Viya was utilized to explore data and provide insights. A comprehensive report addressing both technical and business aspects was delivered.
Delved into a Call Center company dataset, meticulously analyzing key performance indicators (KPIs) to provide actionable insights for informed decision-making. Through visually appealing dashboards and compelling storytelling, we present a detailed overview of call center performance, customer segmentation, regional trends, and operator efficiency
Exploring US Census microdata, tackling privacy issues, and anonymization. Exercise A delves into quasi-identifiers, anonymization methods, identification risks, and differential privacy. Exercise B involves data loading, k-anonymity, histograms, adding noise for privacy, computing private averages, and analyzing privacy parameter impacts.
SQL Server, Analysis Services, and BI tools are utilized to build a data warehouse, create OLAP reports, and visualize insights. Actions also include finding datasets, defining star/snowflake schemas, populating tables, and designing cubes. Steps, challenges, and solutions are documented, showcasing reports and visualizations for presentation.
The Streamlit application of the DressMeUp business idea.
Jupyter notebook, replicating studies on social capital from Nature journal, analyze economic connectedness, upward income mobility, and more. Python and relevant datasets are utilized to recreate figures and analyses.
Creating predictive models to classify Trump's vote share and clustering counties based on demographics and economic variables. Report findings in PDF with detailed methodologies, model assessments, and R code for the project.
Addressed Entity Resolution challenges. Tasks include schema-agnostic blocking, pairwise comparisons, Meta-Blocking graph construction, and Jaccard similarity computation. Deliverables include source code, reports, and reproducibility guidelines in Python
Dementia-related deaths in Europe, focusing on gender-based trends. Utilized R to create 10 visualizations exploring data from 2011-2020, specifically focusing on Greece.
The salary dataset contains info on 474 Midwestern bank employees. Tasks include understanding the dataset's structure, summarizing numerical variables, testing hypotheses on salary equality, gender-based differences, age group analysis, and proportion comparison.
Java desktop app for parking lot management calculates charges based on time, records entries/exits in a database, and displays status/collections. User options include entering/exiting parking, saving transactions, starting a new day, searching cars, and exiting. JTextArea and JTable components are utilized for interaction and data display.
Java Web App aiming to digitize the annual evaluation process for Army Staff. A single-page application facilitates the theoretical assessment process. Operational and non-operational requirements outline user authentication, question selection, assessment submission, scoring, data protection, system performance, and user training.
Config files for my GitHub profile.
Ministry of Health requires a database for prescription analysis. Tasks include creating ERD, relational schema, and SQL queries for patient demographics, prescription details, doctor info, and drug analytics. Additionally, a database interaction Python program is developed to fetch prescription details.
Jupyter notebook using machine learning techniques to explore the complex drivers of modern slavery. Models from a research paper are replicated and evaluated . Actions also include filling missing data, training regression models, and analyzing feature importance.
Explored Jaccard distance, Min-Hashing, and LSH for user similarity in a movie rating dataset. Tasks involve dataset preprocessing, exact Jaccard Similarity computation, Min-Hash signatures, and LSH implementation. Results and observations are documented in code, output files, and a report
Analysis of real estate sales data. Tasks include understanding dataset structure, variable conversion, descriptive analysis, pairwise comparisons, linear relationship analysis, multiple regression modeling, feature selection using stepwise methods, final model summary, assumptions checking, and LASSO variable selection. Results are documented.
Modeling a high-energy physics citation network in Neo4j, importing data from CSV files, and executing Cypher queries. Tasks include designing a property graph model, importing data, and querying the database for various insights. The deliverables comprise a report detailing the graph model, import commands, Cypher queries with results and script
This assignment entails analyzing the 'A Song of Ice and Fire' character network using R and igraph. Tasks include graph creation, exploring properties, subgraph creation, centrality calculation, and ranking characters based on PageRank. Deliverables include a PDF report with concise answers and an R file containing the code.
Object Oriented Programming Java Project for Class creation and Inheritance.
Website development using PHP for an attendance tracking system at a Computer Programming School. Features include role-based capabilities, timestamping status modifications, and user tracking. Preview pages display login, admin, student attendance edit, and view students functionalities.
Analyzed customer churn using transaction data. Built ML model to predict lapses. Dataset includes customer status, collection/redemption info, and program tenure. Delivered business presentation outlining modeling approach, findings, and churn reduction strategies.
Analyzing classified ads data from the used motorcycles market. Tasks involve utilizing Redis Bitmaps for analytics on seller actions and MongoDB for analyzing bike listings. Includes data installation, cleaning, and analysis.
Applied SAS techniques for data analysis and machine learning in a milestone project. Base SAS Programming and SAS Viya tools were utilized for preprocessing, customer profiling, sales analysis, promotions, supplier evaluation, and customer segmentation. Results were visualized comprehensively.
Utilizing Apache Spark & PySpark to analyze a movie dataset. Tasks include data exploration, identifying top-rated movies, training a linear regression model, and experimenting with Airflow.
This assignment involves creating a temporal graph structure from Twitter data, exploring metrics over five days, identifying important nodes, and detecting communities. Deliverables include a concise report and source files with code.
This repository provides a systematic approach to winning the "Guess Who?" game through advanced machine learning techniques. It offers a comprehensive methodology for enhancing gameplay strategy and optimizing decision-making processes with meticulous attention to detail.