Skip to content

Firefly-1931: alpha support for asdf file format#1905

Open
robyww wants to merge 1 commit intodevfrom
FIREFLY-1931-asdf
Open

Firefly-1931: alpha support for asdf file format#1905
robyww wants to merge 1 commit intodevfrom
FIREFLY-1931-asdf

Conversation

@robyww
Copy link
Contributor

@robyww robyww commented Jan 23, 2026

Firefly-1931: asdf file alpha support

  • alpha - only data under roman node, only uses roman layout
  • supports images and tables created by astropy

Reviewer notes.

  • All of the work is done in ASDFUtil.java, the other file have only have small change to recognize the ASDF files.
  • Strategy
    • Images: convert the ASDF image data to a FITS file
      • this happens at the same place we deal with compressed files.
    • Tables: read directly from the ASDF files

Testing

@robyww robyww added this to the 2026.1 milestone Jan 23, 2026
@robyww robyww self-assigned this Jan 23, 2026
@robyww robyww added Image FITS images Table Changes to table or table model labels Jan 23, 2026
@robyww robyww force-pushed the FIREFLY-1931-asdf branch from b1c8573 to 586bcd7 Compare January 23, 2026 18:36
  - alpha - only data under "roman" node
  - uses roman layout
  - support images
  - support tables
@robyww robyww force-pushed the FIREFLY-1931-asdf branch from 586bcd7 to bce1dde Compare January 23, 2026 21:36
@robyww robyww requested a review from ejoliet January 23, 2026 21:53
@ejoliet
Copy link
Contributor

ejoliet commented Jan 23, 2026

Exciting to see this and test it.
Before i do, do you know the versions of ASDF, Roman data-models and RAD you are supporting?
https://github.com/spacetelescope/roman_datamodels

Thanks!

@robyww robyww marked this pull request as ready for review January 23, 2026 23:20
@robyww
Copy link
Contributor Author

robyww commented Jan 24, 2026

@ejoliet I can only tell you what is in the asdf file.

Here is the yaml part of one of the asdf files we are using. I see so many different versions in it!

asdf-header.yaml.gz

@robyww robyww requested a review from loitly February 2, 2026 17:08
Copy link
Contributor

@loitly loitly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this with a sample file and it worked as expected. The code looks clean. Implementing inline table ingestion would be a nice enhancement if we decide to pursue it. Otherwise, this looks good to go.

Comment on lines +110 to +113
static FileInfo ingestAsdfTable(TableServerRequest req, DbAdapter dbAdapter, String source, int tableIndex) throws IOException, DataAccessException {
var table = ASDFUtil.convertAsdfAstroPyTableToDataGroup(new File(source), req, tableIndex);
return ingestTable(dbAdapter, table, false); //logic is already in the reader
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, but it requires loading the entire table into memory during database ingestion, which demands higher deployment memory limit when processing large files.
We have a TableParseHandler.DbIngest option that supports inline ingestion and uses minimal memory, regardless of table size.

Comment on lines +79 to +81
int startingTableIdx= asdfInfo.imageNameList.size();
if (tableIdx>=startingTableIdx+asdfInfo.tables.size()) throw new IOException("table index exceeds number of tables");
String tName= asdfInfo.tableNames().get(tableIdx-startingTableIdx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So tableIdx refers to the index within the ASDF file itself, not the index of the tables, right?
Also, are images always located at the beginning of the file, with tables at the end? If not, I don't think this logic will work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, (at least for now) the processing make all the images it finds to be first then then all the tables. It is not necessarily about the structure of the file just an arbitrary way to organize it.

This is basically flattening out a very deep structure into something more simple for the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be frustrating for users who understand the file structure to have to work with a manufactured version. I am pretty sure someone will raise the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a solution for that right now. For asdf, an image or a table does not have a numbered location. It is just at some arbitrary place. So our numbering scheme it just to help the user get a sence of it.

In my recent meeting I did talk to the ASDF folks about adding a table of contents to the file format. If that ever happens we will use that.

Also remember this is just an first step at this. I am sure we will iterate on it several times.

Copy link
Contributor

@ejoliet ejoliet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played with one of the file in the box.com shared - those are Level-2 images - i can visualize all the extensions and check tables as well. Congrats!
As i mentioned in a separate discussion, there is much more complex products in ASDF that might requires more logic to extract data that originally image extractor could do.
The other twist is making sure to highlight the compatibility with datamodel version used, which uniquely identified the content. Typically this is the version of the software, roman_datamodels and the RAD.

One thing you could always show even if you can't extract data is to show the header, which is very useful and is YAML (ASCII) so you don't need any decoder.

For images based ASDF, i think it's already a win!! I will share the tool once is released so people can start using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Image FITS images Table Changes to table or table model

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants