Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement queueFiles endpoint #50 #51

Merged
merged 4 commits into from
Jan 15, 2025
Merged

Conversation

patrick-austin
Copy link

@patrick-austin patrick-austin commented Dec 3, 2024

Branched from #49 due to re-used code which has been refactored here.

  • Add endpoint to UserResource to submit multiple files
  • Add function to IcatClient to getDatafiles for the above
  • Refactored code common to this and the visitId submission

Closes #50

stringBuilder.append("'" + file + "'");
});
String formattedFiles = stringBuilder.toString();
String query = "SELECT datafile.id from Datafile datafile";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is not going to work at scale when users submit thousands of files, for example. This query is going to be sent to ICAT as a GET URL and there is normally a limit on how long GET URLs can be. It can be different from server to server but the HTTP spec recommends that at least 8KB is supported (see: https://stackoverflow.com/questions/2659952/maximum-length-of-http-get-request). So these queries to ICAT are probably going to need to be chunked to keep the query URL under that limit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have implemented this, using a similar approach to how the IdsClient goes about it (comparing the length of the URLEncoded String to (by default) 1024. The tests use an artificially low value for this to ensure that the 3 test Datafiles are split into 2, 1 to cover both branches of the logic in getDatafiles.


/**
* Utility method for submitting an unformatted query to the entityManager

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by an "unformatted" query? One that hasn't been UTF8 encoded? Maybe non/un(?) encoded would be a better description.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah looking at it I think I've used it as a synonym for encoding? Will change.

* Utility method for submitting an unformatted query to the entityManager
* endpoint, and returning the resultant JsonArray.
*
* @param query Unformatted String query to submit

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Unformatted" again

*/
private JsonArray submitQuery(String query) throws TopcatException {
try {
String encodedQuery = URLEncoder.encode(query, "UTF8");
String url = "entityManager?sessionId=" + URLEncoder.encode(sessionId, "UTF8") + "&query=" + encodedQuery;
Response response = httpClient.get(url, new HashMap<String, String>());
if (response.getCode() == 404) {
throw new NotFoundException("Could not run getEntities got a 404 response");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getEntities -> submitQuery

* Queue download of Datafiles by location, splitting into part Downloads if
* needed.
*
* @param facilityName ICAT Facility.name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my comment on other PRs, I don't think users should have to specify the facility name, so can this be made into an optional form parameter instead?

@FormParam("sessionId") String sessionId, @FormParam("transport") String transport,
@FormParam("email") String email, @FormParam("files") List<String> files) throws TopcatException {

logger.info("queueVisitId called");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queueVisitId -> queueFiles

long downloadId;
JsonArrayBuilder jsonArrayBuilder = Json.createArrayBuilder();

long part = 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my comment on another PR, I think these values should have "L" after them.

downloadId = submitDownload(idsClient, download, DownloadStatus.PAUSED);
jsonArrayBuilder.add(downloadId);

part += 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values should have "L" after them.

long part = 1;
long downloadFileCount = 0;
List<DownloadItem> downloadItems = new ArrayList<DownloadItem>();
String filename = formatQueuedFilename(facilityName, "files", part);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if this call is affected by any changes that might happen due to my comments about this method on the other PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is branched from 48_queue_visitId so will be affected and resolved via merging/conflict resolution.

* @throws TopcatException
*/
public JsonArray getDatafiles(List<String> files) throws TopcatException {
StringBuilder stringBuilder = new StringBuilder();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section creating the comma separated list can be replaced with:
String commaSepString = String.join(",", downloadIds);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I tried this without the ' around the filename but it didn't work, as the strings needed to be single quoted for the JPQL. While using join gets the comma, it wouldn't add the single quotes. I suppose you could do:

String commaSepString = String.join("','", downloadIds);
String finalString = "'" + commaSepString + "'";

It's less lines of code but I think it makes the need to single quote less explicit and so overall maybe isn't as clear - but I can change it to that if you think it would be better than what we have now?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry I didn't spot that the filepaths need to be in single quotes here.
I would be happy with the whole thing on a single line eg:
String commaSepString = "'" + String.join("','", files) + "'";
Admittedly the single quotes do get lost somewhat in the double quotes but I much prefer it to the 8 lines that are currently there (which also have multiple places where single quotes are hiding in double quotes).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think this will probably be superseded by all the chunking stuff anyway.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will still need it for the chunking. There will just be less files each list of files so that the queries are shorter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands in order to build those shorter lists I need to consider each file in turn, get its encoded length, compare against the chunk limit, decide whether to add to the current chunk or start a new one, then repeat. So I'll be iterating over the list of files to do that rather using join, but there's no point worrying about it now, can leave the details to the re-review.

Base automatically changed from 48_queue_visitId to 36_queuing January 15, 2025 16:35
@patrick-austin patrick-austin merged commit 19734e7 into 36_queuing Jan 15, 2025
1 check failed
@patrick-austin patrick-austin deleted the 50_queue_files branch January 15, 2025 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants