Comments (4)
catched by @kyrylogy
from cmsspark.
I believe the issue lies here and here
tmpDirs=hadoop fs -ls ${hdir}/$year/$month | grep tmp$ | awk '{print $8}' | sed -e "s,\.tmp,,g" -e "s,${hdir},,g"
This gives us the date of .tmp file, which is the file of the day - on 1st of the month it would give /2022/08/01
lastSnapshot=hadoop fs -ls ${hdir}/$year/$month | egrep -v ${pat} | tail -1 | awk '{print $8}'
Here we get the same list of files, but without todays date. Then we take the path of last file - on 1st of the month it would pop the only available file. Additionally, we only apply it to current month's list of data, so there's no file behind the first file of current month.
I will fix it
from cmsspark.
Thanks @kyrylogy , it would be great!
from cmsspark.
Fixed by #109
from cmsspark.
Related Issues (20)
- rucio_daily.py error due to wrong default date HOT 1
- Typo in date string HOT 1
- sparkexjar always empty HOT 23
- Evaluate to run yum update cern-hadoop-config in each Spark script
- Consider file pfn vs lfn in schemas
- --cvmfs option to run_spark de facto mandatory HOT 2
- Reading Avro files as Dataframes instead of RDD? HOT 5
- Make run_spark compatible with the ithdp-client cluster HOT 1
- Wrong type HOT 4
- Typo in variable name HOT 1
- dbs_events script missing header row in output HOT 2
- Corrupt timestamps in classad csvs HOT 1
- Authentication issues on it-hadoop-client HOT 8
- Simple setup for it-hadoop-client HOT 11
- Bug in run_spark? HOT 1
- Wrong field name in processing era dataframe? HOT 5
- Creating check functions for critic cron jobs HOT 1
- Main bash script for cron jobs and exposing cron metrics to push-gateway HOT 3
- Apply check util functions in cron jobs HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cmsspark.