Skip to content

aitanagoca/Mastodon-Dynamo-App

Repository files navigation

Mastodon-Dynamo-App

(For better viewing, you can visit: https://github.com/aitanagoca/Mastodon-Dynamo-App)

Group Information

👥 Group: (P102, grup 05)

Aitana González (U186651)

Jordi Alfonso (U111792)

Arnau Royo (U172499)

(For group mates) - How to execute

⚠️ If you’re having troubles running the application locally, with errors similar to ”cannot find method methodName()”, it might be due to jar conflicts between spark and the dependencies of your application. Find your Spark installation, move to the jar directory (in downloaded spark, the jars directory; in brew spark, the libexec/jars directory, etc.) and remove the following files: gson-2.2.4.jar (or equivalent versions), okhttp-3.12.12.jar (or equivalent versions), okio-1.14.0.jar (or equivalent versions).

(PART 2) Running example application locally

1️⃣ Mvn: mvn clean

2️⃣ Mvn: mvn validate

3️⃣ Mvn: mvn compile

4️⃣ Mvn: mvn package

5️⃣ Mvn: spark-submit --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///log4j.properties --class edu.upf.MastodonStreamingExample target/lab3-mastodon-1.0-SNAPSHOT.jar src/main/resources/map.tsv

(PART 3) Stateless: joining a static RDD with a real time stream

1️⃣ Mvn: mvn clean

2️⃣ Mvn: mvn validate

3️⃣ Mvn: mvn compile

4️⃣ Mvn: mvn package

5️⃣ Mvn: spark-submit --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///log4j.properties --class edu.upf.MastodonStateless target/lab3-mastodon-1.0-SNAPSHOT.jar src/main/resources/map.tsv

(PART 4) Spark Stateful transformations with windows

1️⃣ Mvn: mvn clean

2️⃣ Mvn: mvn validate

3️⃣ Mvn: mvn compile

4️⃣ Mvn: mvn package

5️⃣ Mvn: spark-submit --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///log4j.properties --class edu.upf.MastodonWindows target/lab3-mastodon-1.0-SNAPSHOT.jar src/main/resources/map.tsv

(PART 5) Spark Stateful transformations with state variables

1️⃣ Mvn: mvn clean

2️⃣ Mvn: mvn validate

3️⃣ Mvn: mvn compile

4️⃣ Mvn: mvn package

5️⃣ Mvn: spark-submit --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///log4j.properties --class edu.upf.MastodonWithState target/lab3-mastodon-1.0-SNAPSHOT.jar en

(PART 6) DynamoDB

(PART 6.1) Writing to Dynamo DB

⚠️ Before following these steps, remember the aws configuration!! (1️⃣ aws configure; 2️⃣ aws configure set aws_session_token < your_aws_session_token >)

⚠️ Before following these steps, remember creating the DynamoBD table manually!! (Table Name: LsdsTwitterHashtags - Primary Key: hashtag

1️⃣ Mvn: mvn clean

2️⃣ Mvn: mvn validate

3️⃣ Mvn: mvn compile

4️⃣ Mvn: mvn package

5️⃣ Mvn: spark-submit --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///log4j.properties --class edu.upf.MastodonHashtags target/lab3-mastodon-1.0-SNAPSHOT.jar en

(PART 6.2) Writing from Dynamo DB

⚠️ Before following these steps, remember the aws configuration!! (1️⃣ aws configure; 2️⃣ aws configure set aws_session_token < your_aws_session_token >)

1️⃣ Mvn: mvn clean

2️⃣ Mvn: mvn validate

3️⃣ Mvn: mvn compile

4️⃣ Mvn: mvn package

5️⃣ Mvn: spark-submit --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///log4j.properties --class edu.upf.MastodonHashtagsReader target/lab3-mastodon-1.0-SNAPSHOT.jar en

Output

(PART 2) Running example application locally

From the output, we can conclude that the application is functioning correctly. It’s successfully capturing tweets and their associated data in real-time. The data includes the tweet’s content, the user who posted it, and any hashtags used.

Captura de pantalla 2024-03-09 a les 17 22 14

(PART 3) Stateless: joining a static RDD with a real time stream

From the output, we can conclude that the application is functioning correctly. It’s successfully capturing tweets and their associated languages in real-time. The data includes the language of the tweet and the count of tweets in that language. English appears to have the highest number of tweets in both time intervals displayed.

Captura de pantalla 2024-03-14 a les 14 28 20 Captura de pantalla 2024-03-14 a les 14 28 30

(PART 4) Spark Stateful transformations with windows

From the output, we can conclude that the application is functioning correctly. It’s successfully capturing tweets and their associated languages in real-time. The data includes the language of the tweet and the count of tweets in that language. English appears to have the highest number of tweets in both the micro batch and the 60-second window.

Captura de pantalla 2024-03-14 a les 15 22 48 Captura de pantalla 2024-03-14 a les 15 23 04
Captura de pantalla 2024-03-14 a les 15 23 37 Captura de pantalla 2024-03-14 a les 15 23 50

(PART 5) Spark Stateful transformations with state variables

From the output, we can conclude that the application is functioning correctly. It’s successfully capturing users and their associated number of toots in real-time. The data includes the user’s name and the count of toots they have made. The users are sorted by the number of toots they have made, with the user having the most toots listed first.

Captura de pantalla 2024-03-14 a les 14 33 09 Captura de pantalla 2024-03-14 a les 14 33 19

(PART 6) DynamoDB

(PART 6.1) Writing to Dynamo DB

Partial example after writing to DynamoDB table "LsdsTwitterHashtags":

Captura de pantalla 2024-03-13 a les 22 24 29

From the output, we can conclude that the Spark streaming application is successfully extracting hashtags from toots and storing the data in DynamoDB. The data includes the frequency of each hashtag, the language of the toot, and the toot IDs where the hashtag appears.

(PART 6.2) Writing from Dynamo DB

Example of obtained top 10 after reading from DynamoDB table "LsdsTwitterHashtags":

Captura de pantalla 2024-03-13 a les 22 27 31

From the output, we can conclude that the MastodonHashtagsReader class is successfully retrieving the top 10 hashtags from the DynamoDB table. The hashtags are sorted in descending order based on their frequency of occurrence.