[stable branch] Person Serializer failing with `Mkdirs failed to create` exception #1

arvindshmicrosoft · 2020-10-15T19:14:42Z

Due to open items like ldbc/ldbc_snb_datagen_spark#206 with dev, I am currently running with the stable branch (commit hash d6620b9) on a 8-node Hadoop cluster (Hadoop 3.3.0, Centos 7.5, Java 8). There seems to be some weird issue with the output folders: initially, the data generation seemed to succeed, but unfortunately the dynamic folder did not seem to have been produced. After trying some things, I thought I'd set ldbc.snb.datagen.serializer.outputDir to /ldbc_dataset/sf1, the generation consistently fails at the Person Serializer job, with the below stack. I have checked Hadoop permissions, ownership and groups - nothing helped.

Do you see anything here which I can do?

2020-10-15 19:02:00,714 INFO mapreduce.Job: Task Id : attempt_1602777928494_0089_r_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.io.IOException: Mkdirs failed to create /ldbc_dataset/sf1/social_network/dynamic (exists=false, cwd=file:/mydata4/hadoop/yarn/local/usercache/centos/appcache/application_1602777928494_0089/container_1602777928494_0089_01_000004)
        at ldbc.snb.datagen.hadoop.serializer.HadoopPersonSortAndSerializer$HadoopDynamicPersonSerializerReducer.setup(HadoopPersonSortAndSerializer.java:97)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: java.io.IOException: Mkdirs failed to create /ldbc_dataset/sf1/social_network/dynamic (exists=false, cwd=file:/mydata4/hadoop/yarn/local/usercache/centos/appcache/application_1602777928494_0089/container_1602777928494_0089_01_000004)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:473)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:458)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1164)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1144)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1103)
        at ldbc.snb.datagen.hadoop.writer.HdfsWriter.<init>(HdfsWriter.java:66)
        at ldbc.snb.datagen.hadoop.writer.HdfsCsvWriter.<init>(HdfsCsvWriter.java:49)
        at ldbc.snb.datagen.serializer.snb.csv.CsvSerializer.initialize(CsvSerializer.java:23)
        at ldbc.snb.datagen.serializer.LdbcSerializer.initialize(LdbcSerializer.java:20)
        at ldbc.snb.datagen.hadoop.serializer.HadoopPersonSortAndSerializer$HadoopDynamicPersonSerializerReducer.setup(HadoopPersonSortAndSerializer.java:91)
        ... 8 more

The text was updated successfully, but these errors were encountered:

szarnyasg · 2025-01-11T17:21:43Z

Unfortunately, I was not able to reproduce / solve this issue.

A possible workaround is to run the generation on a single node (pseudo-distributed Hadoop cluster). This is very slow as it only uses a single thread but it doesn't crash.

Using 1 TB memory is sufficient to generate data sets up to SF3,000 (see the new data set at ldbc/ldbc_snb_docs#251).

I'm closing this issue as it's unlikely to be resolved in the future due to a lack of attention on this data generator.

szarnyasg transferred this issue from ldbc/ldbc_snb_datagen_spark Jun 8, 2021

szarnyasg closed this as not planned Won't fix, can't repro, duplicate, stale Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stable branch] Person Serializer failing with `Mkdirs failed to create` exception #1

[stable branch] Person Serializer failing with `Mkdirs failed to create` exception #1

arvindshmicrosoft commented Oct 15, 2020

szarnyasg commented Jan 11, 2025

[stable branch] Person Serializer failing with Mkdirs failed to create exception #1

[stable branch] Person Serializer failing with Mkdirs failed to create exception #1

Comments

arvindshmicrosoft commented Oct 15, 2020

szarnyasg commented Jan 11, 2025

[stable branch] Person Serializer failing with `Mkdirs failed to create` exception #1

[stable branch] Person Serializer failing with `Mkdirs failed to create` exception #1