Implement queueFiles endpoint #50 #51

patrick-austin · 2024-12-03T12:29:56Z

Branched from #49 due to re-used code which has been refactored here.

Add endpoint to UserResource to submit multiple files
Add function to IcatClient to getDatafiles for the above
Refactored code common to this and the visitId submission

Closes #50

kevinphippsstfc · 2025-01-14T08:57:34Z

src/main/java/org/icatproject/topcat/IcatClient.java

+			stringBuilder.append("'" + file + "'");
+		});
+		String formattedFiles = stringBuilder.toString();
+		String query = "SELECT datafile.id from Datafile datafile";


I suspect this is not going to work at scale when users submit thousands of files, for example. This query is going to be sent to ICAT as a GET URL and there is normally a limit on how long GET URLs can be. It can be different from server to server but the HTTP spec recommends that at least 8KB is supported (see: https://stackoverflow.com/questions/2659952/maximum-length-of-http-get-request). So these queries to ICAT are probably going to need to be chunked to keep the query URL under that limit.

Have implemented this, using a similar approach to how the IdsClient goes about it (comparing the length of the URLEncoded String to (by default) 1024. The tests use an artificially low value for this to ensure that the 3 test Datafiles are split into 2, 1 to cover both branches of the logic in getDatafiles.

kevinphippsstfc · 2025-01-14T09:01:28Z

src/main/java/org/icatproject/topcat/IcatClient.java


+	/**
+	 * Utility method for submitting an unformatted query to the entityManager


What do you mean by an "unformatted" query? One that hasn't been UTF8 encoded? Maybe non/un(?) encoded would be a better description.

Yeah looking at it I think I've used it as a synonym for encoding? Will change.

kevinphippsstfc · 2025-01-14T09:01:49Z

src/main/java/org/icatproject/topcat/IcatClient.java

+	 * Utility method for submitting an unformatted query to the entityManager
+	 * endpoint, and returning the resultant JsonArray.
+	 * 
+	 * @param query Unformatted String query to submit


"Unformatted" again

kevinphippsstfc · 2025-01-14T09:02:18Z

src/main/java/org/icatproject/topcat/IcatClient.java

+	 */
+	private JsonArray submitQuery(String query) throws TopcatException {
+		try {
+			String encodedQuery = URLEncoder.encode(query, "UTF8");
 			String url = "entityManager?sessionId=" + URLEncoder.encode(sessionId, "UTF8") + "&query=" + encodedQuery;
 			Response response = httpClient.get(url, new HashMap<String, String>());
 			if (response.getCode() == 404) {
 				throw new NotFoundException("Could not run getEntities got a 404 response");


getEntities -> submitQuery

kevinphippsstfc · 2025-01-14T09:07:21Z

src/main/java/org/icatproject/topcat/web/rest/UserResource.java

+	 * Queue download of Datafiles by location, splitting into part Downloads if
+	 * needed.
+	 * 
+	 * @param facilityName ICAT Facility.name


As per my comment on other PRs, I don't think users should have to specify the facility name, so can this be made into an optional form parameter instead?

kevinphippsstfc · 2025-01-14T09:07:45Z

src/main/java/org/icatproject/topcat/web/rest/UserResource.java

+			@FormParam("sessionId") String sessionId, @FormParam("transport") String transport,
+			@FormParam("email") String email, @FormParam("files") List<String> files) throws TopcatException {
+
+		logger.info("queueVisitId called");


queueVisitId -> queueFiles

kevinphippsstfc · 2025-01-14T09:08:58Z

src/main/java/org/icatproject/topcat/web/rest/UserResource.java

+		long downloadId;
+		JsonArrayBuilder jsonArrayBuilder = Json.createArrayBuilder();
+
+		long part = 1;


As per my comment on another PR, I think these values should have "L" after them.

kevinphippsstfc · 2025-01-14T09:09:24Z

src/main/java/org/icatproject/topcat/web/rest/UserResource.java

+				downloadId = submitDownload(idsClient, download, DownloadStatus.PAUSED);
+				jsonArrayBuilder.add(downloadId);
+
+				part += 1;


These values should have "L" after them.

kevinphippsstfc · 2025-01-14T09:11:31Z

src/main/java/org/icatproject/topcat/web/rest/UserResource.java

+		long part = 1;
+		long downloadFileCount = 0;
+		List<DownloadItem> downloadItems = new ArrayList<DownloadItem>();
+		String filename = formatQueuedFilename(facilityName, "files", part);


Can you check if this call is affected by any changes that might happen due to my comments about this method on the other PR.

This is branched from 48_queue_visitId so will be affected and resolved via merging/conflict resolution.

kevinphippsstfc · 2025-01-14T10:30:57Z

src/main/java/org/icatproject/topcat/IcatClient.java

+	 * @throws TopcatException
+	 */
+	public JsonArray getDatafiles(List<String> files) throws TopcatException {
+		StringBuilder stringBuilder = new StringBuilder();


I think this section creating the comma separated list can be replaced with:
String commaSepString = String.join(",", downloadIds);

I think I tried this without the ' around the filename but it didn't work, as the strings needed to be single quoted for the JPQL. While using join gets the comma, it wouldn't add the single quotes. I suppose you could do:

String commaSepString = String.join("','", downloadIds); String finalString = "'" + commaSepString + "'";

It's less lines of code but I think it makes the need to single quote less explicit and so overall maybe isn't as clear - but I can change it to that if you think it would be better than what we have now?

Yes sorry I didn't spot that the filepaths need to be in single quotes here.
I would be happy with the whole thing on a single line eg:
String commaSepString = "'" + String.join("','", files) + "'";
Admittedly the single quotes do get lost somewhat in the double quotes but I much prefer it to the 8 lines that are currently there (which also have multiple places where single quotes are hiding in double quotes).

Actually I think this will probably be superseded by all the chunking stuff anyway.

I think you will still need it for the chunking. There will just be less files each list of files so that the queries are shorter.

As it stands in order to build those shorter lists I need to consider each file in turn, get its encoded length, compare against the chunk limit, decide whether to add to the current chunk or start a new one, then repeat. So I'll be iterating over the list of files to do that rather using join, but there's no point worrying about it now, can leave the details to the re-review.

Implement queueFiles endpoint #50

335561d

patrick-austin mentioned this pull request Dec 6, 2024

Implement queuing priority #56 #58

Merged

kevinphippsstfc requested changes Jan 14, 2025

View reviewed changes

kevinphippsstfc reviewed Jan 14, 2025

View reviewed changes

Merge branch '48_queue_visitId' into 50_queue_files

f03c34f

patrick-austin mentioned this pull request Jan 14, 2025

Implement endpoint for queuing visits #48 #49

Merged

Implement chunking for IcatClient.getDatafiles #50

d0a7220

patrick-austin requested a review from kevinphippsstfc January 14, 2025 16:01

Replace hardcoded filenames in testQueueFiles with fetched values #50

ca41d0f

kevinphippsstfc approved these changes Jan 15, 2025

View reviewed changes

Base automatically changed from 48_queue_visitId to 36_queuing January 15, 2025 16:35

patrick-austin merged commit 19734e7 into 36_queuing Jan 15, 2025
1 check failed

patrick-austin deleted the 50_queue_files branch January 15, 2025 16:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement queueFiles endpoint #50 #51

Implement queueFiles endpoint #50 #51

patrick-austin commented Dec 3, 2024 •

edited

Loading

kevinphippsstfc Jan 14, 2025

patrick-austin Jan 14, 2025

kevinphippsstfc Jan 14, 2025

patrick-austin Jan 14, 2025

kevinphippsstfc Jan 14, 2025

kevinphippsstfc Jan 14, 2025

kevinphippsstfc Jan 14, 2025

kevinphippsstfc Jan 14, 2025

kevinphippsstfc Jan 14, 2025

kevinphippsstfc Jan 14, 2025

kevinphippsstfc Jan 14, 2025

patrick-austin Jan 14, 2025

kevinphippsstfc Jan 14, 2025

patrick-austin Jan 14, 2025

kevinphippsstfc Jan 14, 2025

patrick-austin Jan 14, 2025

kevinphippsstfc Jan 14, 2025

patrick-austin Jan 14, 2025


		/**
		* Utility method for submitting an unformatted query to the entityManager

Implement queueFiles endpoint #50 #51

Implement queueFiles endpoint #50 #51

Conversation

patrick-austin commented Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrick-austin commented Dec 3, 2024 •

edited

Loading