POC : Spark automated incremental load .

The inbound folder will contains the input csv files. When you trigger the spark job , following steps will takes place.

Spark will pick the latest arrived file in the inbound folder automatically and validate,process and ingest to HDFS.
During the validation, if you found that file is already loaded to HDFS, then you can request new load from spark-submit optional parameters.This optional parameters are developed by scala's scopt library.When you request a new load flag, scala script will fetch a new file from external location(as this is a poc, It is simulated as some other directory than inbound within same file system) to Inbound and load that file to HDFS table.
Once the data is read and validated , it will insert into given parameterized avro table or overwrite if table already exists.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
input_data_files		input_data_files
project		project
src/main		src/main
Readme.md		Readme.md
build.sbt		build.sbt
poc_hivetohbase_2.11-0.1.jar		poc_hivetohbase_2.11-0.1.jar
sparksubmitlogWithConf.log		sparksubmitlogWithConf.log

Provide feedback