Comments (14)
CNAF is also observing the same thing. Not yet clear what's causing it. There haven't been any changes in the router config recently as far as I know...
from phedex.
Looking at the routing activity table for this dataset [1], nothing is currently routed from any Tier-1.
The only destination for this DS currently is T2_FR_GRIF_LLR, and all blocks are routed from T2 sites.
Could triple-A be causing this?
from phedex.
from phedex.
from phedex.
Yeah, I'm guessing PhEDEx gave up on the ones from that dataset. If I look at what is currently routed from FNAL_Buffer, it's tons of stuff that shouldn't be coming from tape [1]. Lots of 2017 data, (MINI)AOD(SIM), etc. Picking one at random, I see 3 full disk copies, and yet it is routed from FNAL_MSS [2].
[1] https://cmsweb.cern.ch/phedex/prod/Activity::Routing?tofilter=.*&fromfilter=T1_US_FNAL_Buffer&priority=any&showinvalid=on&blockfilter=&.submit=Update#
[2] https://cmsweb.cern.ch/phedex/datasvc/json/prod/subscriptions?dataset=/JetHT/Run2016G-07Aug17-v1/MINIAOD
from phedex.
Sid, thanks for the example. I do not think router can decide based on the dataset name (AOD, etc). I will see if I can figure out about the links weights from the router agent log.
Dave, you should be able to see from the local stager agent logs whether and when it tried to re-stage the file. By default stager will "forget" about the staged files after 8 hours, you may adjust this using -stage-stale option:
https://github.com/dmwm/PHEDEX/blob/master/Toolkit/Transfer/FileStager#L49-L50
from phedex.
The dataset name should not have anything to do with what the router decides. I was trying to point out that these are data tiers that are already replicated on disk, and therefore should not be recalled form tape.
from phedex.
Okay, I got your point. As far as I can tell, Router considers all available sources, including T1_*_Buffer nodes, and chooses a link with minimal cost. It simply adds a half-an hour penalty for the files that need staging:
https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Infrastructure/FileRouter/Agent.pm#L1044-L1051
If you want the disk-only sources to outweigh the Buffer nodes, we could try to adjust this penalty.
from phedex.
+1 on making this penalty 10 thousand hours to prevent tape copies from being considered a good source
from phedex.
Seeing again today encp recalls at FNAL very much dominated by things that also exist on disk, even at FNAL_Disk. I would bet fixing this goes a long way to settle any tape recall problems CMS has -- should set the penalty somewhere high enough that all functional disk replicas are tried first, but not so high to prevent exclusion of a tape replica when only disk replicas at broken/very backlogged sites are available. Not knowing the distribution I won't offer a number :) I would put addressing this at high prio, possibly just behind the secret 4th queue.
from phedex.
Actually now I go and look at the code @nataliaratnikova referenced -- assuming half an hour for unstaged data is ridiculous! Half a day or a day is maybe as low as I would ever have thought there. Maybe the real number something like longer than 90% of the "from disk" transfer latencies? But I guess I'd need to see what that cost function looks like. Is there a data service query to see what these numbers look like?
from phedex.
Hi Dave,
https://cmsweb.cern.ch/phedex/datasvc/perl/prod/routerhistory
shows the last hour numbers for rate and latency used in the cost calculation per link.
See https://cmsweb.cern.ch/phedex/datasvc/doc/routerhistory for more filters.
In the last hour the latency varies from 0 to 7days.. .
I'll see how easy it would be to pass the staging penalty to the Router as an option, instead of a hard-coded value.
from phedex.
Just checking where we are on this?
from phedex.
I'm done with new priority queue. This one is next on my list. If you figured out the desired number, I can put it right away as a new default. Since this is a trivial change, we could also ask T0 PhEDEx operators to patch the FileRouter in place to put this feature in action.
from phedex.
Related Issues (20)
- ID of files without replica HOT 4
- 4th queue to expedite high priority requests HOT 1
- Submit changes button often failing HOT 5
- Webpage does not filter requests properly for deprecated sites
- Improve config files processing
- Show default value for staging latency in FileRouter help HOT 1
- restapi for file local replications
- Problems reported with FTS3 backend in 4.2.2 release
- blockreplica last_update time stamp gets updated due to operation on another site
- New file replica state is depending on the source node
- Inefficient query by site name in Web Page Data::Subscriptions interface
- PhEDEx transfer issues to T3_US_Rutgers HOT 6
- Reserved queue not appearing on routing page
- PhEDEx data service API pfn2lfn fails to validate the PFN parameter HOT 1
- Remove reference to sitedb url
- BlockReplicas API returns not what was requested HOT 16
- no data in subsription page HOT 1
- Request subscription link results in param validation error HOT 1
- internal server error in phedex transfer approval HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from phedex.