Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tile 7741 from 2023-04-16 was never reduced #116

Closed
djschlegel opened this issue May 12, 2023 · 17 comments
Closed

Tile 7741 from 2023-04-16 was never reduced #116

djschlegel opened this issue May 12, 2023 · 17 comments
Assignees
Labels
dailyops For listing individual dailyops problems

Comments

@djschlegel
Copy link

The main survey dark tile 7741 from a single exposure 176606 on 2023-04-16 was never reduced. Anthony reports the data transfer was delayed by 8 hours, after the daily pipeline shut down the following morning, and therefore wasn’t registered in the exposure_table_20230416.csv . Since this exposure was never reduced, it's still just pending and blocking overlapping tiles at RA,Dec=166,+8. Probably not urgent, since this RA has just set for the season.

@djschlegel djschlegel added the dailyops For listing individual dailyops problems label May 12, 2023
@araichoor
Copy link
Contributor

for what is worth, as of today, I see 23 possible main tiles concerned by such an issue (with 28 exposures): one dark (7741), four bright (23046, 24410, 24459, 25880), and some backup.
below is the list.
some might be due to exposures marked as bad, and not processed by the pipeline, or things like that.
I can try to follow-up a bit on those cases.

>>> a = Table.read("/global/cfs/cdirs/desi/survey/ops/surveyops/trunk/ops/exposures.ecsv")
>>> sel = (a["TILEID"] >= 1000) & (a["TILEID"] < 60000) & (np.in1d(a["PROGRAM"], ["BACKUP", "BRIGHT", "DARK"]))
>>> a = a[sel]

>>> b = Table.read("/global/cfs/cdirs/desi/spectro/redux/daily/exposures-daily.csv")

>>> sel = ~np.in1d(a["EXPID"], b["EXPID"])
>>> sel.sum()
28
>>> a = a[sel]
>>> a[a["TILEID"].argsort()].pprint_all()
 NIGHT   TILEID EXPID  OBSTYPE PROGRAM  EXPTIME  EFFTIME_ETC EFFTIME_SPEC EFFTIME  GOALTIME QUALITY COMMENTS
-------- ------ ------ ------- ------- --------- ----------- ------------ -------- -------- ------- --------
20230416   7741 176606 SCIENCE    DARK  1685.884    1005.371       -1.000 1005.371     -1.0    good       --
20211220  23046 114876 SCIENCE  BRIGHT  1455.613     105.105       -1.000  105.105     -1.0    good       --
20220110  24410 117827 SCIENCE  BRIGHT   916.408      17.409       -1.000   17.409     -1.0    good       --
20220110  24410 117828 SCIENCE  BRIGHT  1228.330       7.778       -1.000    7.778     -1.0    good       --
20220110  24410 117829 SCIENCE  BRIGHT   826.652       0.469       -1.000    0.469     -1.0    good       --
20220216  24459 122527 SCIENCE  BRIGHT  1560.731      55.470       -1.000   55.470     -1.0    good       --
20220216  24459 122528 SCIENCE  BRIGHT  1330.230      66.059       -1.000   66.059     -1.0    good       --
20220216  24459 122529 SCIENCE  BRIGHT   909.493      28.064       -1.000   28.064     -1.0    good       --
20220216  24459 122530 SCIENCE  BRIGHT    42.099       1.116       -1.000    1.116     -1.0    good       --
20220221  25880 123283 SCIENCE  BRIGHT   792.406     181.242       -1.000  181.242     -1.0    good       --
20220911  40065 141865 SCIENCE  BACKUP   601.838      17.400       -1.000   17.400     -1.0    good       --
20220911  40835 141795 SCIENCE  BACKUP   137.773       0.000       -1.000    0.000     -1.0    good       --
20220911  40895 141870 SCIENCE  BACKUP   607.867      30.101       -1.000   30.101     -1.0    good       --
20220911  40896 141872 SCIENCE  BACKUP   445.460      61.653       -1.000   61.653     -1.0    good       --
20220911  40935 141871 SCIENCE  BACKUP   606.913      52.994       -1.000   52.994     -1.0    good       --
20220911  41334 141794 SCIENCE  BACKUP   600.729       4.951       -1.000    4.951     -1.0    good       --
20220911  41419 141866 SCIENCE  BACKUP   602.791      27.518       -1.000   27.518     -1.0    good       --
20220911  41421 141867 SCIENCE  BACKUP   602.802      34.934       -1.000   34.934     -1.0    good       --
20220911  41461 141869 SCIENCE  BACKUP   607.988      29.690       -1.000   29.690     -1.0    good       --
20220216  41654 122535 SCIENCE  BACKUP   116.539       0.014       -1.000    0.014     -1.0    good       --
20230503  41894 178901 SCIENCE  BACKUP   601.914       1.044       -1.000    1.044     -1.0    good       --
20220415  42416 130372 SCIENCE  BACKUP   147.499       0.065       -1.000    0.065     -1.0    good       --
20220216  42660 122532 SCIENCE  BACKUP   601.416      22.097       -1.000   22.097     -1.0    good       --
20220216  42665 122534 SCIENCE  BACKUP   601.498      15.680       -1.000   15.680     -1.0    good       --
20220216  42666 122533 SCIENCE  BACKUP   602.993      16.234       -1.000   16.234     -1.0    good       --
20220216  42669 122531 SCIENCE  BACKUP   602.955      25.048       -1.000   25.048     -1.0    good       --
20220911  42809 141868 SCIENCE  BACKUP   608.612      60.675       -1.000   60.675     -1.0    good       --
20220911  42873 141873 SCIENCE  BACKUP   156.037      14.020       -1.000   14.020     -1.0    good       --

@araichoor
Copy link
Contributor

small follow-up for the dark+bright tiles, if useful (though I suspect that folks from the data team would be better at tracking the history here):

  • tileid=7741: the per-expid folder does not exist;
  • tileid=23046: the per-expid folder exists, but is empty;
  • tileids=24410, 24459: the pipeline generated the sframes, but not the cframes
  • tileid=25880: the 30 cframes are there..

from a quick search in the desisurveyops issues, I don t see mention of those.
my 2 cents is that all those are corner cases / oversights for nights where the data transfer or processing was chaotic, and needed manual intervention.

@abhi0395 : could take care of bringing those bright/dark tiles to full processing + having exposures-daily and tiles-daily ingesting those?

I leave the backup tiles aside for now; if you ve time to also investigate + solve those, it d be great; but that s much lower priority.

thanks!

@akremin
Copy link
Member

akremin commented May 15, 2023 via email

@araichoor
Copy link
Contributor

from my earlier post, you ve the (tileid, night, expid), so I guess you can easily infer the underlying exposure_table, right?

@akremin
Copy link
Member

akremin commented May 15, 2023

I am happy to investigate the above. This was a general comment for the future and to advertise these important files.

My point is that a common occurrence is that multiple tables are being cross-matched to generate an ascii table like the one above, but the exposure_table's data aren't included. I then have to manually cross-match row-by-row to the exposure_table's, where the majority will be answered/resolved. The exposure_table's should be just as easily checked as the /global/cfs/cdirs/desi/spectro/redux/daily/exposures-daily.csv file, but few people are aware of their existence.

@schlafly
Copy link
Contributor

I promised at the meeting on Monday that I would add warning messages to afternoon planning mentioning these tiles. Here is what I've planned:

WARNING:collect_etc.py:414:update_donefrac_from_offline: Some (54) exposures are missing offline effective times.
WARNING:collect_etc.py:426:update_donefrac_from_offline: List of nights with tiles with exposures with missing times:
20211220 (1): 23046 (114876), 
20220110 (3): 24410 (117827 117828 117829), 
20220216 (9): 24459 (122527 122528 122529 122530), 41654 (122535), 42660 (122532), 42665 (122534), 42666 (122533), 42669 (122531), 
20220221 (1): 25880 (123283), 
20220415 (1): 42416 (130372), 
20220911 (10): 40065 (141865), 40895 (141870), 40896 (141872), 40935 (141871), 41334 (141794), 41419 (141866), 41421 (141867), 41461 (141869), 42809 (141868), 42873 (141873), 
20230416 (1): 7741 (176606), 
20230503 (1): 41894 (178901), 
20230521 (27): 1214 (181656), 1227 (181655), 2132 (181647), 2797 (181661), 2810 (181662), 2962 (181667), 2994 (181668), 5204 (181652), 6778 (181648 181649), 7458 (181658), 7462 (181659), 7464 (181660), 7467 (181657), 7689 (181669), 8345 (181653), 8384 (181654), 9213 (181666), 9224 (181664), 9225 (181665), 10631 (181663), 21108 (181646), 22079 (181671), 22657 (181650), 23576 (181645), 24218 (181651), 25216 (181670), 

@sbailey , @akremin , does that look like roughly the relevant information? Running this tonight, I see the ~expected thing that there are 27 exposures from tonight since we have the files but haven't processed them; I'm hoping that by breaking them out separately by night we have the right balance of notifying and maing it easy to see why.

@schlafly
Copy link
Contributor

schlafly commented Jun 2, 2023

A reminder that we want to make sure that the tiles in Anand's list above get the processing information propagated into the exposures-daily.ecsv file; they have presently been lost.

@schlafly
Copy link
Contributor

Pinging this outstanding issue.

@schlafly
Copy link
Contributor

Note that we found some portion of this again in #132.

@schlafly
Copy link
Contributor

schlafly commented Sep 5, 2023

Pinging this outstanding issue; @akremin ?

@weaverba137
Copy link
Member

I can add some details about the raw data transfer. For raw data in 20230416/00176606, the file timestamps are consistent with the surrounding exposure ids. However the directory timestamp is hours later; note April 17 07:07 on the listing for ./ below.

bweaver@login02[799]: ll 20230416/00176606
total 3296914
dr-xr-s---   2 desi desi       4096 Apr 17 07:07 ./
dr-xr-s--- 130 desi desi      16384 Apr 17 06:09 ../
-r--r-----   1 desi desi    1300611 Apr 16 21:31 centroids-00176606.json
-r--r-----   1 desi desi       1328 Apr 16 21:32 checksum-00176606.sha256sum
-r--r-----   1 desi desi    4049280 Apr 16 21:32 coordinates-00176606.fits
-r--r-----   1 desi desi  394989120 Apr 16 21:32 desi-00176606.fits.fz
-r--r-----   1 desi desi     207091 Apr 16 21:30 etc-00176606.json
-r--r-----   1 desi desi      83239 Apr 16 21:02 etc-00176606.png
-r--r-----   1 desi desi    5162309 Apr 16 21:00 fiberassign-007741.fits.gz
-r--r-----   1 desi desi  291291840 Apr 16 21:31 focus-00176606.fits.fz
-r--r-----   1 desi desi   62976960 Apr 16 21:03 fvc-00176606.fits.fz
-r--r-----   1 desi desi   13057920 Apr 16 21:02 guide-00176606-0000.fits.fz
-r--r-----   1 desi desi 2311597440 Apr 16 21:31 guide-00176606.fits.fz
-r--r-----   1 desi desi    8570880 Apr 16 21:30 guide-rois-00176606.fits.fz
-r--r-----   1 desi desi    1103040 Apr 16 21:03 pm-00176606.fits
-r--r-----   1 desi desi   40806400 Apr 16 21:03 pm-00176606-logs.tar
-r--r-----   1 desi desi       5164 Apr 16 21:00 request-00176606.json
-r--r-----   1 desi desi  240661440 Apr 16 21:31 sky-00176606.fits.fz

The most likely explanation is that the exposure was not linked into /data/dts/exposures/raw until the morning. I would need to see the symlink timestamps in /data/dts/exposures/raw/20230416, but I don't have that access right now.

@weaverba137
Copy link
Member

Further details: the directory timestamp indicates that the morning "catch-up" transfer successfully copied that exposure id. There was a one-off failure to copy that exposure id earlier in the night.. This appears to have been sheer bad luck; a momentary network glitch or something.

Does the pipeline take into account the "catch-up" transfers that we do in the morning?

@akremin
Copy link
Member

akremin commented Oct 6, 2023

@weaverba137 Apologies for the late response. The daily pipeline will find anything that has a request and data file in the DESI_SPECTRO_DATA directory prior to 7am in the summer and 8am in the winter (or when we re-launch it by hand because of sneakernet or known transfer issues). These are in Tucson time.

I agree with other recommendations that we should have a script running roughly weekly to re-check to see if any late exposures did arrive. I'll make a desispec ticket about that.

@akremin
Copy link
Member

akremin commented Oct 7, 2023

All current cases have been resolved.

I developed a script in desispec PR #2124 to help debug and understand each of these cases. There was a wide range of issues from data arriving late to what appear to be manually deleted files presumably left over from partial cleanup activities, to data that was perfectly fine but the tsnr afterburner needed to be rerun to update the entries.

Since the original posts, the numbers of exposures increased. I used the same code snippet to identify an updated list:

NIGHT   TILEID EXPID  OBSTYPE PROGRAM  EXPTIME  EFFTIME_ETC EFFTIME_SPEC EFFTIME  GOALTIME QUALITY COMMENTS
-------- ------ ------ ------- ------- --------- ----------- ------------ -------- -------- ------- --------
20211220  23046 114876 SCIENCE  BRIGHT  1455.613     105.105       -1.000  105.105     -1.0    good       --
20220110  24410 117827 SCIENCE  BRIGHT   916.408      17.409       -1.000   17.409     -1.0    good       --
20220110  24410 117828 SCIENCE  BRIGHT  1228.330       7.778       -1.000    7.778     -1.0    good       --
20220110  24410 117829 SCIENCE  BRIGHT   826.652       0.469       -1.000    0.469     -1.0    good       --
20220216  24459 122527 SCIENCE  BRIGHT  1560.731      55.470       -1.000   55.470     -1.0    good       --
20220216  24459 122528 SCIENCE  BRIGHT  1330.230      66.059       -1.000   66.059     -1.0    good       --
20220216  24459 122529 SCIENCE  BRIGHT   909.493      28.064       -1.000   28.064     -1.0    good       --
20220216  24459 122530 SCIENCE  BRIGHT    42.099       1.116       -1.000    1.116     -1.0    good       --
20220216  42669 122531 SCIENCE  BACKUP   602.955      25.048       -1.000   25.048     -1.0    good       --
20220216  42660 122532 SCIENCE  BACKUP   601.416      22.097       -1.000   22.097     -1.0    good       --
20220216  42666 122533 SCIENCE  BACKUP   602.993      16.234       -1.000   16.234     -1.0    good       --
20220216  42665 122534 SCIENCE  BACKUP   601.498      15.680       -1.000   15.680     -1.0    good       --
20220216  41654 122535 SCIENCE  BACKUP   116.539       0.014       -1.000    0.014     -1.0    good       --
20220221  25880 123283 SCIENCE  BRIGHT   792.406     181.242       -1.000  181.242     -1.0    good       --
20220415  42416 130372 SCIENCE  BACKUP   147.499       0.065       -1.000    0.065     -1.0    good       --
20220911  41334 141794 SCIENCE  BACKUP   600.729       4.951       -1.000    4.951     -1.0    good       --
20220911  40835 141795 SCIENCE  BACKUP   137.773       0.000       -1.000    0.000     -1.0    good       --
20220911  40065 141865 SCIENCE  BACKUP   601.838      17.400       -1.000   17.400     -1.0    good       --
20220911  41419 141866 SCIENCE  BACKUP   602.791      27.518       -1.000   27.518     -1.0    good       --
20220911  41421 141867 SCIENCE  BACKUP   602.802      34.934       -1.000   34.934     -1.0    good       --
20220911  42809 141868 SCIENCE  BACKUP   608.612      60.675       -1.000   60.675     -1.0    good       --
20220911  41461 141869 SCIENCE  BACKUP   607.988      29.690       -1.000   29.690     -1.0    good       --
20220911  40895 141870 SCIENCE  BACKUP   607.867      30.101       -1.000   30.101     -1.0    good       --
20220911  40935 141871 SCIENCE  BACKUP   606.913      52.994       -1.000   52.994     -1.0    good       --
20220911  40896 141872 SCIENCE  BACKUP   445.460      61.653       -1.000   61.653     -1.0    good       --
20220911  42873 141873 SCIENCE  BACKUP   156.037      14.020       -1.000   14.020     -1.0    good       --
20230416   7741 176606 SCIENCE    DARK  1685.884    1005.371       -1.000 1005.371     -1.0    good       --
20230503  41894 178901 SCIENCE  BACKUP   601.914       1.044       -1.000    1.044     -1.0    good       --
20230525  23891 182165 SCIENCE  BRIGHT   487.629     189.186       -1.000  189.186     -1.0    good       --
20230525   4534 182168 SCIENCE    DARK  1170.443     418.688       -1.000  418.688     -1.0    good       --
20230525   4534 182169 SCIENCE    DARK  1501.314     587.363       -1.000  587.363     -1.0    good       --
20230525   3472 182170 SCIENCE    DARK  1522.507    1002.821       -1.000 1002.821     -1.0    good       --
20230525  24723 182172 SCIENCE  BRIGHT   501.315     184.501       -1.000  184.501     -1.0    good       --
20230608  22308 184545 SCIENCE  BRIGHT   435.053     186.933       -1.000  186.933     -1.0    good       --

These are now fixed. Using the same code snippet Anand provided above:

In [2]: a = Table.read("/global/cfs/cdirs/desi/survey/ops/surveyops/trunk/ops/exposures.ecsv")
   ...: sel = (a["TILEID"] >= 1000) & (a["TILEID"] < 60000) & (np.in1d(a["PROGRAM"], ["BACKUP", "BRIGHT", "DARK"]))
   ...: a = a[sel]
   ...: b = Table.read("/global/cfs/cdirs/desi/spectro/redux/daily/exposures-daily.csv")
   ...: sel = ~np.in1d(a["EXPID"], b["EXPID"])
   ...: a = a[sel]
   ...: a[a["TILEID"].argsort()].pprint_all()

Now gives:

NIGHT TILEID EXPID OBSTYPE PROGRAM EXPTIME EFFTIME_ETC EFFTIME_SPEC EFFTIME GOALTIME QUALITY COMMENTS
----- ------ ----- ------- ------- ------- ----------- ------------ ------- -------- ------- --------

@araichoor
Copy link
Contributor

I confirm that today s (20231007) AP does not report anything anymore, great.
thanks!

@schlafly
Copy link
Contributor

schlafly commented Oct 9, 2023

Objections to closing this, @araichoor , @akremin ?

@araichoor
Copy link
Contributor

no, looks good to me!

@schlafly schlafly closed this as completed Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dailyops For listing individual dailyops problems
Projects
None yet
Development

No branches or pull requests

5 participants