Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sum_of_base_qualities raises TypeError if the received record has no base qualities #210

Open
msto opened this issue Jan 13, 2025 · 4 comments · May be fixed by #212
Open

sum_of_base_qualities raises TypeError if the received record has no base qualities #210

msto opened this issue Jan 13, 2025 · 4 comments · May be fixed by #212

Comments

@msto
Copy link
Contributor

msto commented Jan 13, 2025

Summary

sum_of_base_qualities() accesses the input record's query_qualities without first checking if the record has base qualities.

When the record does not have base qualities, this attribute is None, and the attempted comprehension fails with a TypeError: 'NoneType' object is not iterable.

fgpyo/fgpyo/sam/__init__.py

Lines 849 to 863 in b0b4227

def sum_of_base_qualities(rec: AlignedSegment, min_quality_score: int = 15) -> int:
"""Calculate the sum of base qualities score for an alignment record.
This function is useful for calculating the "mate score" as implemented in samtools fixmate.
Args:
rec: The alignment record to calculate the sum of base qualities from.
min_quality_score: The minimum base quality score to use for summation.
See:
[`calc_sum_of_base_qualities()`](https://github.com/samtools/samtools/blob/4f3a7397a1f841020074c0048c503a01a52d5fa2/bam_mate.c#L227-L238)
[`MD_MIN_QUALITY`](https://github.com/samtools/samtools/blob/4f3a7397a1f841020074c0048c503a01a52d5fa2/bam_mate.c#L42)
"""
score: int = sum(qual for qual in rec.query_qualities if qual >= min_quality_score)
return score

Suggested solution

After empirically testing the results of samtools fixmate -m on the real read pair that originated this error, I found that samtools will assign a mate score of 0 when the read has no base qualities. (h/t @clintval for the suggestion)

I think it's therefore most appropriate if the function returns 0 in that case.


Original suggestion

Either of the following could be appropriate:

1. Changing the signature of the function to return int | None, and returning None when the input record has no base qualities
2. Raising ValueError with a more specific error message when the input record has no base qualities

@msto
Copy link
Contributor Author

msto commented Jan 13, 2025

I would be in favor of (1), because then the following works as expected and does not require additional boilerplate to guard against an exception:

record.set_tag("ms", sum_of_base_qualities(record))
  • If the record is mapped, it receives the appropriate sum in the ms tag.
  • If the record is unmapped, it does not receive the ms tag at all. (AlignedSegment.set_tag(tag, value) will remove the tag if value is None)

@msto msto changed the title sum_of_base_qualities raises TypeError if the received record is unmapped sum_of_base_qualities raises TypeError if the received record has no base qualities Jan 13, 2025
@msto
Copy link
Contributor Author

msto commented Jan 13, 2025

Note that testing this issue will be blocked by #211

@msto
Copy link
Contributor Author

msto commented Jan 13, 2025

After empirically testing the results of samtools fixmate -m on the real read pair that originated this error, I found that samtools will assign a mate score of 0 when the read has no base qualities. (h/t @clintval for the suggestion)

I think it's therefore most appropriate if the function returns 0 in that case.

@msto
Copy link
Contributor Author

msto commented Jan 13, 2025

@clintval I'm carrying over your question from the client repo about why mypy didn't catch this.

I also would have expected mypy to catch this.

Pysam types query_sequence and query_qualities as Optional[str] and Optional[array], respectively.

https://github.com/pysam-developers/pysam/blob/0eae5be21ac3ab3ac7aa770a3931e2977e37b909/pysam/libcalignedsegment.pyi#L84-L85

In my IDE (VS Code), the pylance/pyright extension correctly infers the type of query_qualities as array | None

Screenshot 2025-01-13 at 11 26 59 AM

However, when including reveal_type(rec.query_qualities) before running mypy, I find that mypy infers the type to be array[Any].

fgpyo/sam/__init__.py:867: note: Revealed type is "array.array[Any]"

It's obviously a very simple function and I don't see anything that would be narrowing the type. This might be a bug in mypy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant