Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-45657 Add tests for consdb-hinfo and consdb-pqserver +200-65 #33

Merged
merged 8 commits into from
Aug 15, 2024

Conversation

bbrondel
Copy link
Collaborator

@bbrondel bbrondel commented Aug 8, 2024

More testing still desirable for consdb, but this begins to cover hinfo.

Aside from code I'm adding two files to the test directory: updated cdb_latiss.sql freshly generated from felis, and a yaml file pulled from s3.

@bbrondel bbrondel requested review from ktlim, Vebop and womullan August 8, 2024 17:25


@pytest.fixture
def tmpdir(scope="module"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this get cleaned up, or should we yield/finally the passing of tmpdir?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is not cleaned up automatically



@pytest.fixture
def engine(tmpdir, scope="module"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own context & learning, this looks like it could be relevant in other test files where we need to create an sqlite engine. Do other test files exist yet that need that, should we generalize this one, should we just keep an eye on it for further test development?

print(f"{row=}")
print(f"{row.exposure_name=}")

assert _header_lookup(header, "OBSID") == row.exposure_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two other asserts look straight forward for names, but because I'm still fresh here, do you mind showing me how OBSID and exposure_name correlate? I can't seem to line them up by searching

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata transaltor populates exposure_name with OBSID - the mapping dict is in hinfo.py



@pytest.fixture
def tmpdir(scope="module"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is not cleaned up automatically

print(f"{row=}")
print(f"{row.exposure_name=}")

assert _header_lookup(header, "OBSID") == row.exposure_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata transaltor populates exposure_name with OBSID - the mapping dict is in hinfo.py

python-version: "3.11"
cache: "pip"
- name: Build docker image
run: docker build -f Dockerfile.pytest -t pytest_image .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this could be done in a container in the workflow: https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#jobsjob_idcontainer
instead of building a new container and running in it.

The advantage would be that more of the test code is in the workflow file, where it's more visible, than in another Dockerfile.

RUN source loadLSST.bash && mamba install aiokafka httpx
RUN source loadLSST.bash && pip install kafkit
COPY python/lsst/consdb/hinfo.py python/lsst/consdb/utils.py ./hinfo/
RUN source loadLSST.bash && pip install kafkit aiokafka httpx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally try to avoid pip installing into conda environments whenever possible. Although it works pretty well nowadays, there's still some iffiness about the integration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this using only mamba, but it wouldn't cooperate. I think it's because we have to rely on newer versions of aiokafka that have not made their way to conda. My understanding is generally you can get away with mixing pip and conda as long as you never use conda again once you've used pip.

WORKDIR /home/lsst/
COPY --chown=lsst . ./consdb/
WORKDIR /home/lsst/consdb/
RUN source /opt/lsst/software/stack/loadLSST.bash && pip install -e .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this as well as the install above.

RUN source loadLSST.bash && pip install kafkit aiokafka httpx

WORKDIR /home/lsst/
COPY --chown=lsst . ./consdb/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it copies too much for the hinfo service.

@@ -442,14 +443,16 @@ def get_kafka_config() -> KafkaConfig:

logger = setup_logging("consdb.hinfo")

instrument = os.environ["INSTRUMENT"]
logger.info(f"Instrument = {instrument}")
instrument = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to have a more specific error than the KeyError that would be returned if the instrument is not defined? I'm not sure that an "Unrecognized instrument: " (with no instrument name) error is much better.

Also, instrument = os.environ.get("INSTRUMENT", "") may be more idiomatic.

try:
yield tmpdir
finally:
shutil.rmtree(tmpdir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can often be useful to specify ignore_errors=True when using shutil.rmtree(), especially when working on NFS filesystems that sometimes leave behind .nfsNNNN files when removing things.

In this case, since it's for temporary test files that should be on local storage, this is not an issue.

conn.execute(f"INSERT INTO schemas VALUES ('{schema}', '{schema_path}')")
with sqlite3.connect(schema_path) as schema_conn:
schema_conn.executescript(sql.read_text())
schema_conn.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be handled by the context manager?

schema_conn.close()

os.environ["POSTGRES_URL"] = f"sqlite:///{db_path}"
hinfo.engine = utils.setup_postgres()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Maybe we should rename this method, since it's obviously not setting up postgres...



def _header_lookup(header, key):
for line in header:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for tests, but for production code it might be better to load the array of key/value pairs into a Python dict for faster lookups.

os.environ["INSTRUMENT"] = "LATISS"
tmpdir = Path(tempfile.mkdtemp())
try:
yield tmpdir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the yield actually ever fail? I think you could remove the try: finally: (here and in the other test).

@bbrondel bbrondel merged commit 04b2b2b into main Aug 15, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants