Apache Bigtop

Report
Apache Bigtop
Week 10, Testing
Unit Testing
• Programming in the small vs. Programming in
the large
Parlante’s link: codingbat.com
• unit tests for programming in the small
• Apache rule: Before submitting patch to
Hadoop Component, pass and verify all
component unit tests.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Unit Testing
• Hadoop Unit Tests installed in bigtop
• Great reference:
http://www.cloudera.com/blog/2008/12/testinghadoop/
• Run tests on downloaded hadoop-0.20.205.0: ant
test
• Where are the bigtop shims for hadoop0.20.205.0/1.0/.22? For hive/pig?
• Other shims are available but don’t work, have to
pick at build time. In latest relese. Hive 0.8.1. Pig
in 0.9.2.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Hadoop Unit Testing
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Bigtop Hadoop Unit Testing
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Bigtop Unit Test Symlink
• Symlink src: [email protected]:/usr/lib/hadoop$ sudo ln -s
/usr/src/hadoop /usr/lib/hadoop/src
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Bigtop Unit Test Permission
• sudo chmod 757 /usr/lib/hadoop,
/usr/lib/hadoop/bin, /usr/lib/hadoop/sbin
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Bigtop Unit Tests
• If running in AWS, setup Screen
– sudo apt-get install screen screen-profiles screenprofiles-extras
– Type screen, will see clear terminal window, start
ant test, ctrl-a ctrl-d, logout, login again, type
screen -r
• Ron’s fix: Modify /etc/hostname
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Standalone/Bigtop Hadoop Unit Tests
Results
• Standalone: Logs for each test under
~/hadoop-0.20.205.0/build/test
• Bigtop: /usr/lib/hadoop/build/test
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Hadoop Mods to get Integration Tests
to run (repeat)
• Copy testConf.xml: sudo cp
/usr/src/hadoop/test/org/apache/hadoop/cli/testConf.
xml /home/ubuntu/bigtop-0.2.0-incubating/bigtoptests/testexecution/smokes/hadoop/target/clitest_data/
• https://issues.cloudera.org/browse/DISTRO-44
• Add Jackson dependency to pom.xml
<dependency>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-mapper-asl</artifactId>
<version>1.9.3</version>
</dependency>
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Bigtop Hadoop Integration Tests
• Running single integration test:
mvn –
Dit.test=org.apache.bigtop.itest.hadooptests.CL
ASS verify
Example: mvn Dit.test=org.apache.bigtop.itest.hadooptests.Tes
tTestCLI verify
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Standalone Hbase Unit Tests
• ~/hbase-0.9.2/mvn –P localTest
• Running a single unit test: mvn test Dtest=org.apache.hadoop.hbase.TestHServerA
ddress
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Bigtop Hbase Unit Tests
•
•
•
•
•
Don’t exist? Put in /usr/src/hbase like Hadoop and use Groovy shell to run?
Project to get Hbase unit tests working in bigtop? Partition hbase unit tests into
categories. One approach to issue requests, look at internal state and verify.
Another approach only use public APIs, r/w to Hbase. Partition into 2 categories.
MiniHbase mock objects in a single JVM process can be used in Bigtop. Different
bugs in distributed mode vs. MiniMr/DFSCluster. Write this up as a project.
PIG uses same test artifacat from unit test for bigtop.
Missing pom goals
Use
– org.apache.bigtop.itest.JUnitUtils.groovy. For annotation support in Junit4/groovy.
– org.apache.bigtop.itest.junit.OrderedParameterized.java; extension of Junit, Junit has all tests
are stateless, order doesn’t matter. Tests are not stateless in bigtop, ordering requires run
stages, specify which run stage; simple ints with ordering. By default are in run stage 0, if have
tests case annotated -1 run stage will execute this first.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Org.apache.bigtop.itest.pmanager
questions
• PackageManager/Abstract Class
• What is DEBPackage.groovy, ManagedPacakge.groovy,
RPMPackage.groovy for?
• AptCmdLinePackageManager.groovy allows apt-get commands in
Groovy?
• YumCmdLinePackageManager, RPMPackage,
ZypperCmdLinePackageManager
• Bigtop spends time on packaging like apt-get install, no existing Java
APIs to do this, install packages using Java Api. Used internally for
Jenkins testing, tests in test-artifacts/package. Manifest driven in
xml files for what is expected from package, files with xxx
permissions, check and verify paths and permission. If you are
introducing a new package you are responsible for this abstract
class testing.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Bigtop Hbase Integration Tests
• Bigtop-2.0-incubating/bigtop-tests/testexecution/smokes/hbase/mvn verify
• /home/ubuntu/bigtop-0.2.0incubating/system/TestLoadAndVerify.java
• // private static final long NUM_TO_WRITE_DEFAULT = 100*1000;
private static final long NUM_TO_WRITE_DEFAULT = 10;
• //private static final int NUM_TASKS = 200;
• //private static final int NUM_REDUCE_TASKS = 35;
private static final int NUM_TASKS=2;
private static final int NUM_REDUCE_TASKS=2;
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Bigtop Hbase Integration Results
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Pig/HiveMahout/Oozie/Flume Unit
Tests
•
•
•
•
•
•
ant test or mvn test
Project: mavenize Hive:hard
Project: Pig, easier?
~/hive-0.7.1/src/build.xml
~/pig-0.9.2/build.xml
mahout-0.6-src, ~/mahout-distribution-0.6; mvn test; install
core and src, 2 subdirectories with same name ~/mahoutdistribution-0.6/mahout-distribution-0.6/pom.xml
• git clone https://github.com/yahoo/oozie.git; mvn test
• git clone https://github.com/cloudera/flume.git; mvn test
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Pig-0.9.2 unit test Results
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Bigtop Pig Integration Tests
• Problem with mvn artifact…
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Hive-0.7.1 Notes
• Hive Unit Tests install own version of Hadoop
• ~/hive-0.7.1/src/build/hadoopcore/
• Remove test
TestHadoopThriftAuthBridge20S.java. Cant
connect to Thrift Server, socket timeout > 6x.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Hive-0.7.1 Unit Tests
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Bigtop Hive Integration Tests
• Follow Pig format, work with Hive Unit Test
Authors. Hive integration project suggestion.
Hive/Pig on top of M/R with custom language.
With a compiler, the unit tests have
input/expected output. Hive unit tests are SQL
code and verification afterwards. Was hard to
retrofit vs. real cluster. Took *.SQL files from Hive
and dumping them in Bigtop to take SQL files and
compare actual/expected. Can you reuse the
same test artifacts for unit tests and bigtop
integration tests. Convert Hive Unit tests
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Mahout Results
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Flume Unit Test Results
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Bigtop Flume Integration Tests
• Transition to FlumeNG. NG lost features from
Flume. Too early.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Oozie Unit Test
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Bigtop Oozie Integration Test
• What to set oozie_url to?
http://localhost:11000/oozie
• Start the oozie service..
• Runs only Oozie examples.jar. Project to create
workflow for oozie. Integration testing on
cluster needed here. Actions, to broaden data
interfaces. Email actions, sqoop action. Good
project for J2EE developers.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Bigtop Scripts mod
• Where to modify the bigtop install scripts to
fix this?
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Command line vs. Eclipse
• Create a Java Project, test programs using
HDFS/Hive/Pig, etc… 2 ways to run the files,
command line or in Eclipse.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Command Line vs. Eclipse
• This may be important when debugging in
cluster and pseudo-distributed mode.
• Cluster loads the 3 conf/ files, core-site.xml,
hdfs-site.xml , mapred-sire.xml. Some of the
parameters are embedded..
• Java Code may not properly init these params
for cluster operation. Sometimes hard to
debug
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Hadoop CLI
• Command line uses bin/hadoop
jarfilename.jar ClassName args
• Did this when running Pi from hadoop-xxxexamples.jar, test programs under jar
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Hadoop CLI
• Set absolute path for log4j.properties
PropertyConfigurator
.configure("/Users/dc/Documents/
workspace/log4j.properties");
• Properties files are outside of the jar. Web
search for adding log4j.properties to jar are
incorrect. Web search for setting class path
are incorrect.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Eclipse Console Output 1/2
14:03:08,263 INFO TestHDFS:34 - Yo I am logger!!!
created new jobconf
finished setting jobconf parameters
generateSampeInpuIf
inputDirectory:file:/tmp/MapReduceIntroInput exists:true
14:03:08,588 INFO TestHDFS:59 - isEmptyDirectory
14:03:08,595 INFO TestHDFS:65 - num file status:4
14:03:08,596 INFO TestHDFS:75 - file:///tmp/MapReduceIntroInput is not
empty
14:03:08,596 INFO TestHDFS:80 - A non empty file
file:///tmp/MapReduceIntroInput/asdf.txt was found
14:03:08,597 INFO TestHDFS:46 - The inputDirectory
file:/tmp/MapReduceIntroInput exists and is either a file or a non empty
directory
14:03:08,598 INFO TestHDFS:111 - Generating 3 input files of random data,
each record is a random number TAB the input file name
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Eclipse Console Output 2/2
14:03:25,076
14:03:25,076
14:03:25,076
14:03:25,077
14:03:25,077
14:03:25,077
14:03:25,077
14:03:25,078
14:03:25,078
14:03:25,078
14:03:25,078
14:03:25,079
14:03:25,079
14:03:25,079
14:03:25,079
INFO JobClient:589 - Map input records=15
INFO JobClient:589 - Reduce shuffle bytes=0
INFO JobClient:589 - Spilled Records=30
INFO JobClient:589 - Map output bytes=303
INFO JobClient:589 - Total committed heap usage (bytes)=425000960
INFO JobClient:589 - Map input bytes=302
INFO JobClient:589 - SPLIT_RAW_BYTES=358
INFO JobClient:589 - Combine input records=0
INFO JobClient:589 - Reduce input records=15
INFO JobClient:589 - Reduce input groups=15
INFO JobClient:589 - Combine output records=0
INFO JobClient:589 - Reduce output records=15
INFO JobClient:589 - Map output records=15
INFO TestHDFS:235 - The job has completed.
INFO TestHDFS:241 - The job completed successfully.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Command Line Output
21:44:36,765 INFO TestHDFS:35 - Yo I am logger!!!
created new jobconf
finished setting jobconf parameters
generateSampeInpuIf
inputDirectory:file:/tmp/MapReduceIntroInput exists:true
21:44:36,983 INFO TestHDFS:60 - isEmptyDirectory
21:44:36,990 INFO TestHDFS:66 - num file status:4
21:44:36,991 INFO TestHDFS:76 - file:///tmp/MapReduceIntroInput is not empty
21:44:36,991 INFO TestHDFS:81 - A non empty file file:///tmp/MapReduceIntroInput/asdf.txt was found
21:44:36,992 INFO TestHDFS:47 - The inputDirectory file:/tmp/MapReduceIntroInput exists and is
either a file or a non empty directory
21:44:36,992 INFO TestHDFS:112 - Generating 3 input files of random data, each record is a random
number TAB the input file name
21:44:36,999 WARN NativeCodeLoader:52 - Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable
21:44:37,007 INFO TestHDFS:169 - The job output directory file:/tmp/MapReduceIntroOutput exists
and is not a directory and will be removed
21:44:37,022 INFO TestHDFS:235 - Launching the job.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Command Line Output 2/2
21:44:53,523
21:44:53,524
21:44:53,524
21:44:53,531
21:44:53,531
21:44:53,531
21:44:53,531
21:44:53,532
21:44:53,532
INFO JobClient:589 - SPLIT_RAW_BYTES=358
INFO JobClient:589 - Combine input records=0
INFO JobClient:589 - Reduce input records=19
INFO JobClient:589 - Reduce input groups=19
INFO JobClient:589 - Combine output records=0
INFO JobClient:589 - Reduce output records=19
INFO JobClient:589 - Map output records=19
INFO TestHDFS:237 - The job has completed.
INFO TestHDFS:243 - The job completed successfully.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
M/R Idioms
•
•
•
•
What you get for free in M/R
Sorting
Duplicate Detection
Design Pattern Notes: Object Churn, Thread
Safety in Mappers
– ThreadLocal vs. Atomic Ivars vs. Locks
dougc [at] gmail25 dot com 2012 All Rights
Reserved
M/R Idioms
• Hadoop Partitioner, multiple output files or 1
output file.
– Job.setNumReduceTasks(1) same as merge sort
– Default HashPartitioner
– Create own for filtering, e.g. sending all keys
which start with a common prefix to one specific
file.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
HDFS Idioms
• Serialization: Writable I/F order of magnitude
performance improvement
• HDFS Block R/W
• JobTrackers/TaskTrackers/NameNodes. Each
file operation directly goes to the NN.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Integration Testing Smoke
Hadoop
Component,
/usr/lib
Tests Exist under
bigtop-tests
Hbase
Yes
IncrementalPELoad.java,
TestHBaseCompression.java,
TEstHBasePigSmoke.groovy,
TestHBaseSmoke.java,
TestHFileOutpuFormat.java,
TestLoadIncrementalHFiles.java
Hive
Yes
HiveBulkScriptExecutor.java,
IntegrationTestHiveSmokeBulk.groovy,
TestHiveSmokeBulk.groovy,
TestJdbcDriver.java
Pig
No
Yes, in Hbase
Zookeeper
No
Part of components
Mahout
No
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Integration Testing Smoke
Hadoop Component
Test Code exists
Whirr
No, not needed?
SQOOP
Yes
IntegrationTestSqoopHive.g
roovy,
IntegrationTestSqoopHbase
.groovy
Flume
Yes
TestFlumeSmoke.groovy
Hadoop
Yes
TestCLI.groovy,
TesthadoopSmoke,
TestHadoopExamples
Package Test
Yes
PackageTestCommon.groov
y,
Hue
Yes, part of package test
Oozie
Yes
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Program name
TestOozieSmoke.groovy,
StateVerifierZookeeper.gro
ovy
Lab #4 Create an integration test
• Groovy runtime allows shell commands which
allow you to use the scripts inside the
components saving debugging time for the
classpaths and environment files
• Alternatively use Java libraries, DFSCluster,
MiniMRCluster, reverse engineer the env. vars
settings, sequence of commands to run, Start
from a HDFS file system then work way up to
Bigtop Component
dougc [at] gmail25 dot com 2012 All Rights
Reserved
From Lab #3
• Working map reduce program, run them using
mvn verify. Have to make sure HDFS/Hadoop
is running first
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Create a Mahout artifact dir and child
pom
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Groovy Test Code
• Assumes HDFS running
• Configuration conf = new Configuration();
• conf.addResource('mapred-site.xml’)
• Shell sh = new Shell("/bin/bash -s");
sh.exec("hadoop fs –mkdir /tmp/test”,
“hadoop fs –copyFromLocal one /tmp/test/one”,
“hadoop fs –cat /tmp/test/one”);
dougc [at] gmail25 dot com 2012 All Rights
Reserved
Future Labs
• Integrate unit testing into Bigtop
• More integration testing
• Integrate different versions of Hadoop
Components(Hbase, Hive, etc) into Bigtop
• Mavenize an ant centric Hadoop Component, Pig,
Hive
• Puppet Lab; Bigtop puppet code used in CDH4 to
deploy/test
• Deploying and testing in cluster
dougc [at] gmail25 dot com 2012 All Rights
Reserved

similar documents