-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement queueFiles endpoint #50 #51
Conversation
stringBuilder.append("'" + file + "'"); | ||
}); | ||
String formattedFiles = stringBuilder.toString(); | ||
String query = "SELECT datafile.id from Datafile datafile"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this is not going to work at scale when users submit thousands of files, for example. This query is going to be sent to ICAT as a GET URL and there is normally a limit on how long GET URLs can be. It can be different from server to server but the HTTP spec recommends that at least 8KB is supported (see: https://stackoverflow.com/questions/2659952/maximum-length-of-http-get-request). So these queries to ICAT are probably going to need to be chunked to keep the query URL under that limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have implemented this, using a similar approach to how the IdsClient
goes about it (comparing the length of the URLEncoded String to (by default) 1024. The tests use an artificially low value for this to ensure that the 3 test Datafiles are split into 2, 1 to cover both branches of the logic in getDatafiles.
|
||
/** | ||
* Utility method for submitting an unformatted query to the entityManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by an "unformatted" query? One that hasn't been UTF8 encoded? Maybe non/un(?) encoded would be a better description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah looking at it I think I've used it as a synonym for encoding? Will change.
* Utility method for submitting an unformatted query to the entityManager | ||
* endpoint, and returning the resultant JsonArray. | ||
* | ||
* @param query Unformatted String query to submit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Unformatted" again
*/ | ||
private JsonArray submitQuery(String query) throws TopcatException { | ||
try { | ||
String encodedQuery = URLEncoder.encode(query, "UTF8"); | ||
String url = "entityManager?sessionId=" + URLEncoder.encode(sessionId, "UTF8") + "&query=" + encodedQuery; | ||
Response response = httpClient.get(url, new HashMap<String, String>()); | ||
if (response.getCode() == 404) { | ||
throw new NotFoundException("Could not run getEntities got a 404 response"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getEntities -> submitQuery
* Queue download of Datafiles by location, splitting into part Downloads if | ||
* needed. | ||
* | ||
* @param facilityName ICAT Facility.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my comment on other PRs, I don't think users should have to specify the facility name, so can this be made into an optional form parameter instead?
@FormParam("sessionId") String sessionId, @FormParam("transport") String transport, | ||
@FormParam("email") String email, @FormParam("files") List<String> files) throws TopcatException { | ||
|
||
logger.info("queueVisitId called"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queueVisitId -> queueFiles
long downloadId; | ||
JsonArrayBuilder jsonArrayBuilder = Json.createArrayBuilder(); | ||
|
||
long part = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my comment on another PR, I think these values should have "L" after them.
downloadId = submitDownload(idsClient, download, DownloadStatus.PAUSED); | ||
jsonArrayBuilder.add(downloadId); | ||
|
||
part += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These values should have "L" after them.
long part = 1; | ||
long downloadFileCount = 0; | ||
List<DownloadItem> downloadItems = new ArrayList<DownloadItem>(); | ||
String filename = formatQueuedFilename(facilityName, "files", part); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if this call is affected by any changes that might happen due to my comments about this method on the other PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is branched from 48_queue_visitId
so will be affected and resolved via merging/conflict resolution.
* @throws TopcatException | ||
*/ | ||
public JsonArray getDatafiles(List<String> files) throws TopcatException { | ||
StringBuilder stringBuilder = new StringBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section creating the comma separated list can be replaced with:
String commaSepString = String.join(",", downloadIds);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I tried this without the '
around the filename but it didn't work, as the strings needed to be single quoted for the JPQL. While using join
gets the comma, it wouldn't add the single quotes. I suppose you could do:
String commaSepString = String.join("','", downloadIds);
String finalString = "'" + commaSepString + "'";
It's less lines of code but I think it makes the need to single quote less explicit and so overall maybe isn't as clear - but I can change it to that if you think it would be better than what we have now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sorry I didn't spot that the filepaths need to be in single quotes here.
I would be happy with the whole thing on a single line eg:
String commaSepString = "'" + String.join("','", files) + "'";
Admittedly the single quotes do get lost somewhat in the double quotes but I much prefer it to the 8 lines that are currently there (which also have multiple places where single quotes are hiding in double quotes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think this will probably be superseded by all the chunking stuff anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you will still need it for the chunking. There will just be less files each list of files so that the queries are shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands in order to build those shorter lists I need to consider each file in turn, get its encoded length, compare against the chunk limit, decide whether to add to the current chunk or start a new one, then repeat. So I'll be iterating over the list of files to do that rather using join, but there's no point worrying about it now, can leave the details to the re-review.
Branched from #49 due to re-used code which has been refactored here.
Closes #50