Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Using sqoop, import products_replica table from MYSQL into hdfs such that fields are separated by a '|' and lines are separated by '\n'. Null values are represented as -1 for numbers and "NOT-AVAILABLE" for strings. Only records with product id greater than or equal to 1 and less than or equal to 1000 should be imported and use 3 mappers for importing. The destination file should be stored as a text file to directory /user/cloudera/problem5/products-text.
- Using sqoop, import products_replica table from MYSQL into hdfs such that fields are separated by a '*' and lines are separated by '\n'. Null values are represented as -1000 for numbers and "NA" for strings. Only records with product id less than or equal to 1111 should be imported and use 2 mappers for importing. The destination file should be stored as a text file to directory /user/cloudera/problem5/products-text-part1.
- Using sqoop, import products_replica table from MYSQL into hdfs such that fields are separated by a '*' and lines are separated by '\n'. Null values are represented as -1000 for numbers and "NA" for strings. Only records with product id greater than 1111 should be imported and use 5 mappers for importing. The destination file should be stored as a text file to directory /user/cloudera/problem5/products-text-part2.
- Using sqoop merge data available in /user/cloudera/problem5/products-text-part1 and /user/cloudera/problem5/products-text-part2 to produce a new set of files in /user/cloudera/problem5/products-text-both-parts
- sqoop-import --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" --username retail_dba --password cloudera --table products_replica --fields-terminated-by "|" --lines-terminated-by "\n" --null-non-string -1 --null-string "NA" --where "product_id >=1 and product_id <=1000" -m 3 --target-dir /user/cloudera/problem5/products-text --outdir /home/cloudera/sqoop1
- sqoop-import --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" --username retail_dba --password cloudera --table products_replica --fields-terminated-by "*" --lines-terminated-by "\n" --null-non-string -1000 --null-string "NA" --where "product_id <= 1111" -m 2 --target-dir /user/cloudera/problem5/products-text-part1 --outdir /home/cloudera/sqoop2
- sqoop-import --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" --username retail_dba --password cloudera --table products_replica --fields-terminated-by "*" --lines-terminated-by "\n" --null-non-string -1000 --null-string "NA" --where "product_id >= 1111" -m 5 --target-dir /user/cloudera/problem5/products-text-part2 --outdir /home/cloudera/sqoop3
- sqoop-merge --class-name products_replica \
- --jar-file /tmp/sqoop-cloudera/compile/44e59c363d94f3d1c12f74bd129e206d/products_replica.jar \
- --merge-key product_id \
- --new-data /user/cloudera/problem5/products-text-part2 \
- --onto /user/cloudera/problem5/products-text-part1 \
- --target-dir /user/cloudera/problem5/products-text-both-parts
- Using sqoop do the following. Read the entire steps before you create the sqoop job.
- create a sqoop job Import Products_replica table as text file to directory /user/cloudera/problem5/products-incremental. Import all the records.
- insert three more records to Products_replica from mysql
- run the sqoop job again so that only newly added records can be pulled from mysql
- insert 2 more records to Products_replica from mysql
- run the sqoop job again so that only newly added records can be pulled from mysql
- Validate to make sure the records have not be duplicated in HDFS
- sqoop-job --create firstSqoopJob -- import --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" --username retail_dba --password cloudera --table products_replica --as-textfile --target-dir /user/cloudera/problem5/products-incremental --incremental append --last-value 0 --check-column product_id
- sqoop-job --exec firstSqoopJob
- insert into products_replica values (1346,2,'something 1','something 2',300.00,'not avaialble',3,'STRONG');
- insert into products_replica values (1347,5,'something 787','something 2',356.00,'not avaialble',3,'STRONG');
- sqoop-job --exec firstSqoopJob
- insert into products_replica values (1376,4,'something 1376','something 2',1.00,'not avaialble',3,'WEAK');
- insert into products_replica values (1365,4,'something 1376','something 2',10.00,'not avaialble',null,'NOT APPLICABLE');
- sqoop-job --exec firstSqoopJob
- Using sqoop do the following. Read the entire steps before you create the sqoop job.
- create a hive table in database named problem5 using below command
- create table products_hive (product_id int, product_category_id int, product_name string, product_description string, product_price float, product_imaage string,product_grade int, product_sentiment string);
- create a sqoop job Import Products_replica table as hive table to database named problem5. name the table as products_hive.
- insert three more records to Products_replica from mysql
- run the sqoop job again so that only newly added records can be pulled from mysql
- insert 2 more records to Products_replica from mysql
- run the sqoop job again so that only newly added records can be pulled from mysql
- Validate to make sure the records have not been duplicated in Hive table
- sqoop-job --create sencondSqoopJob -- import --connect :jdbc:mysql:quickstart.cloudera:3306/retail_db --username retail_dba --password cloudera --table products_replica --hive-import --hive-database problem5 --hive-table products_hive --hive-overwrite --check-column product_id --incremental append --last-value 0
- sqoop-job --exec sencondSqoopJob
- insert into products_replica values (1378,4,'something 1376','something 2',10.00,'not avaialble',null,'NOT APPLICABLE');
- insert into products_replica values (1379,4,'something 1376','something 2',10.00,'not avaialble',null,'NOT APPLICABLE');
- sqoop-job --exec sencondSqoopJob
- select * from products_hive
- Using sqoop do the following. .
- insert 2 more records into products_hive table using hive.
- create table in mysql using below command
- create table products_external (product_id int(11) primary Key, product_grade int(11), product_category_id int(11), product_name varchar(100), product_description varchar(100), product_price float, product_impage varchar(500), product_sentiment varchar(100));
- export data from products_hive (hive) table to (mysql) products_external table.
- insert 2 more records to Products_hive table from hive
- export data from products_hive table to products_external table.
- Validate to make sure the records have not be duplicated in mysql table
- insert into table products_hive values (1380,4,'something 1380','something 2',8.00,'not avaialble',3,'NOT APPLICABLE');
- insert into table products_hive values (1381,4,'something 1380','something 2',8.00,'not avaialble',3,'NOT APPLICABLE');
- create table products_external (product_id int(11) primary Key, product_grade int(11), product_category_id int(11), product_name varchar(100), product_description varchar(100), product_price float, product_impage varchar(500), product_sentiment varchar(100));
- sqoop-export --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" --username retail_dba --password cloudera --table products_external --export-dir /user/hive/warehouse/problem5.db/products_hive --input-fields-terminated-by '\001' --input-null-non-string 0 --input-null-string 'null' --update-mode allowinsert --update-key product_id --columns "product_id,product_category_id,product_name,product_description,product_price,product_impage,product_grade,product_sentiment"
Add Comment
Please, Sign In to add comment