Skip to content

Commit

Permalink
Update README instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
szarnyasg committed Sep 12, 2022
1 parent 5721546 commit 6fab847
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 8 deletions.
22 changes: 14 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,21 +58,27 @@ Spark 3.2.x is the recommended runtime to use. The rest of the instructions are
To place Spark under `/opt/`:

```bash
curl https://archive.apache.org/dist/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz | sudo tar -xz -C /opt/
export SPARK_HOME="/opt/spark-3.2.2-bin-hadoop3.2"
export PATH="${SPARK_HOME}/bin":"${PATH}"
scripts/get-spark-to-opt.sh
```

To place under `~/`:
To place it under `${HOME}/`:

```bash
curl https://archive.apache.org/dist/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz | tar -xz -C ~/
export SPARK_HOME=~/spark-3.2.2-bin-hadoop3.2
export PATH="${SPARK_HOME}/bin":"${PATH}"
scripts/get-spark-to-home.sh
```

Both Java 8 and Java 11 are supported.

#### Building the project

Run:

```bash
scripts/build.sh
```

#### Running the generator

Once you have Spark in place and built the JAR file, run the generator as follows:

```bash
Expand All @@ -90,7 +96,7 @@ The runtime configuration arguments determine the amount of memory, number of th
./tools/run.py --help
```

To generate a single `part-*.csv` file, reduce the parallelism (number of Spark partitions) to 1.
To generate a single `part-*` file, reduce the parallelism (number of Spark partitions) to 1.

```bash
./tools/run.py --parallelism 1 -- --format csv --scale-factor 0.003 --mode interactive
Expand Down
8 changes: 8 additions & 0 deletions scripts/get-spark-to-home.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash

set -eu
cd "$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

curl https://archive.apache.org/dist/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz | tar -xz -C ${HOME}/
export SPARK_HOME="${HOME}/spark-3.2.2-bin-hadoop3.2"
export PATH="${SPARK_HOME}/bin":"${PATH}"
8 changes: 8 additions & 0 deletions scripts/get-spark-to-opt.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash

set -eu
cd "$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

curl https://archive.apache.org/dist/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz | sudo tar -xz -C /opt/
export SPARK_HOME="/opt/spark-3.2.2-bin-hadoop3.2"
export PATH="${SPARK_HOME}/bin":"${PATH}"

0 comments on commit 6fab847

Please sign in to comment.