You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Due to open items like ldbc/ldbc_snb_datagen_spark#206 with dev, I am currently running with the stable branch (commit hash d6620b9) on a 8-node Hadoop cluster (Hadoop 3.3.0, Centos 7.5, Java 8). There seems to be some weird issue with the output folders: initially, the data generation seemed to succeed, but unfortunately the dynamic folder did not seem to have been produced. After trying some things, I thought I'd set ldbc.snb.datagen.serializer.outputDir to /ldbc_dataset/sf1, the generation consistently fails at the Person Serializer job, with the below stack. I have checked Hadoop permissions, ownership and groups - nothing helped.
Do you see anything here which I can do?
2020-10-15 19:02:00,714 INFO mapreduce.Job: Task Id : attempt_1602777928494_0089_r_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.io.IOException: Mkdirs failed to create /ldbc_dataset/sf1/social_network/dynamic (exists=false, cwd=file:/mydata4/hadoop/yarn/local/usercache/centos/appcache/application_1602777928494_0089/container_1602777928494_0089_01_000004)
at ldbc.snb.datagen.hadoop.serializer.HadoopPersonSortAndSerializer$HadoopDynamicPersonSerializerReducer.setup(HadoopPersonSortAndSerializer.java:97)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: java.io.IOException: Mkdirs failed to create /ldbc_dataset/sf1/social_network/dynamic (exists=false, cwd=file:/mydata4/hadoop/yarn/local/usercache/centos/appcache/application_1602777928494_0089/container_1602777928494_0089_01_000004)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:473)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:458)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1164)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1144)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1103)
at ldbc.snb.datagen.hadoop.writer.HdfsWriter.<init>(HdfsWriter.java:66)
at ldbc.snb.datagen.hadoop.writer.HdfsCsvWriter.<init>(HdfsCsvWriter.java:49)
at ldbc.snb.datagen.serializer.snb.csv.CsvSerializer.initialize(CsvSerializer.java:23)
at ldbc.snb.datagen.serializer.LdbcSerializer.initialize(LdbcSerializer.java:20)
at ldbc.snb.datagen.hadoop.serializer.HadoopPersonSortAndSerializer$HadoopDynamicPersonSerializerReducer.setup(HadoopPersonSortAndSerializer.java:91)
... 8 more
The text was updated successfully, but these errors were encountered:
Unfortunately, I was not able to reproduce / solve this issue.
A possible workaround is to run the generation on a single node (pseudo-distributed Hadoop cluster). This is very slow as it only uses a single thread but it doesn't crash.
Using 1 TB memory is sufficient to generate data sets up to SF3,000 (see the new data set at ldbc/ldbc_snb_docs#251).
I'm closing this issue as it's unlikely to be resolved in the future due to a lack of attention on this data generator.
Due to open items like ldbc/ldbc_snb_datagen_spark#206 with
dev
, I am currently running with thestable
branch (commit hash d6620b9) on a 8-node Hadoop cluster (Hadoop 3.3.0, Centos 7.5, Java 8). There seems to be some weird issue with the output folders: initially, the data generation seemed to succeed, but unfortunately thedynamic
folder did not seem to have been produced. After trying some things, I thought I'd setldbc.snb.datagen.serializer.outputDir
to/ldbc_dataset/sf1
, the generation consistently fails at thePerson Serializer
job, with the below stack. I have checked Hadoop permissions, ownership and groups - nothing helped.Do you see anything here which I can do?
The text was updated successfully, but these errors were encountered: