Malith's Perspective
Sunday, May 28, 2017
Using fabric8 docker-maven-plugin to automate docker builds
No comments
:
Posted by
Unknown
at
1:37:00 AM
Labels:
#docker
,
#docker-maven-plugin
,
#fabric8
,
#Kubernetes
In building the required libraries for a docker container, using a maven project, the libraries have to be copied to a separate location and we have to manually run a docker build. This process is cumbersome since you have to repeat the same process over even if there is a slight modification.
fabric8 docker-maven-plugin is the perfect solution for this requirement. spotify also supports a docker plugin. However fabric8 provides much more functionalities. For example, with fabric8 it's possible to inject values to the docker file with ease.
fabric8 provides a great source of documentation (https://dmp.fabric8.io/). However for a beginner, it could be challenging. My requirement was to configure the plugin in
such a way that would enable even a user without docker on his/her machine to do a maven build.
The first step was to move the variable names to the POM so that configurations can be separated in order to improve maintainability.
Next under the <build><plugins> tag add the maven docker plugin.
Next, I needed to skip the execution of the plugin if the user does not want to build the docker image. Therefore I created a profile, that would set the variable ${docker.skip} to true on the provision of the variable skipDocker.
I added the following section to the pom.
Based on the above code snippet, if skipDocker has been given as a maven directive, it would set docker.skip to true, effectively skipping the execution of the plugin.
Therefore a user can skip building the docker image simply by,
Fabric8 has a lot of support for docker builds. I have a lot to explore. Hope you've found my findings useful. Don't forget to leave a comment. :-)
fabric8 docker-maven-plugin is the perfect solution for this requirement. spotify also supports a docker plugin. However fabric8 provides much more functionalities. For example, with fabric8 it's possible to inject values to the docker file with ease.
fabric8 provides a great source of documentation (https://dmp.fabric8.io/). However for a beginner, it could be challenging. My requirement was to configure the plugin in
such a way that would enable even a user without docker on his/her machine to do a maven build.
The first step was to move the variable names to the POM so that configurations can be separated in order to improve maintainability.
<fabric8io.docker.version>0.21.0</fabric8io.docker.version> <docker.registry>myregistry.malith.com:5000</docker.registry> <docker.image.tag>latest</docker.image.tag> <docker.repository>projectname:${docker.image.tag}</docker.repository> <docker.fileName>Dockerfile</docker.fileName> <docker.skip>false</docker.skip>
Next under the <build><plugins> tag add the maven docker plugin.
<plugin> <groupId>io.fabric8</groupId> <artifactId>docker-maven-plugin</artifactId> <version>${fabric8io.docker.version}</version> <configuration> <registry>${docker.registry}</registry> <useColor>true</useColor> <verbose>true</verbose> <skip>${docker.skip}</skip> <images> <image> <name>${docker.registry}/${docker.repository}</name> <build> <dockerFileDir>${project.build.directory}/projectname-${project.version}-docker/projectname</dockerFileDir> <filter>@</filter> <dockerFile>${docker.fileName}</dockerFile> <tags> <tag>${docker.image.tag}</tag> </tags> </build> </image> </images> </configuration> <executions> <execution> <id>start</id> <phase>pre-integration-test</phase> <goals> <!-- "build" should be used to create the images with the artifact --> <goal>build</goal> <goal>start</goal> </goals> </execution> <execution> <id>stop</id> <phase>post-integration-test</phase> <goals> <goal>stop</goal> </goals> </execution> </executions> </plugin>
Next, I needed to skip the execution of the plugin if the user does not want to build the docker image. Therefore I created a profile, that would set the variable ${docker.skip} to true on the provision of the variable skipDocker.
I added the following section to the pom.
<profiles> <profile> <id>docker</id> <properties> <docker.skip>true</docker.skip> </properties> <activation> <property> <name>skipDocker</name> <value>true</value> </property> </activation> </profile> </profiles>
Based on the above code snippet, if skipDocker has been given as a maven directive, it would set docker.skip to true, effectively skipping the execution of the plugin.
Therefore a user can skip building the docker image simply by,
maven clean install -DskipDocker
Fabric8 has a lot of support for docker builds. I have a lot to explore. Hope you've found my findings useful. Don't forget to leave a comment. :-)
Saturday, May 27, 2017
Setting up Kubernetes 1.7 on a CentOS 7.1 cluster
It was quite a daunting task at the beginning to start with Kubernetes 1.7 alpha release because I knew that I was bound to face with difficulties. I built Kubernetes from source on my Ubuntu 16.04 machine.
I downloaded the source from kubernetes (https://github.com/kubernetes/kubernetes/tree/v1.7.0-alpha.3) and CentOS 7.1 (minimal version). I set up three virtual machines to deploy my CentOS cluster. This blog post focuses on building and deploying Kubernetes.
First I navigated to the kubernetes src folder within kubernetes.
Then I built the kubernetes images with,
In order to build the relase binaries execute,
When I executed this I ran out of space in my root directory. This was due to the fact that in /var/lib/docker Docker creates a large number of layers eating up disk space. It could run up to a dazzling amount of 30GB. So I had to move the kubernetes src to another partition and I created a symlink from /var/lib/docker to /media/malith/Data/docker/ which was on the Data partition.
Next we need to copy the download etcd, falnnel and docker. But we do not need to download kubernetes as we have built it. If we download it, there would most probably be a version incompatibility.
Therefore comment out the below section.
In the function download-release() comment
This is due to the fact that we are trying to move the binaries that we built to the RELEASES_DIR directory which defaults to /tmp/downloads.
Next we have to manually copy kubernetes-client-linux-amd64.tar.gz and kubernetes-server-linux-amd64.tar.gz to the RELEASES_DIR folder. You can find them by navigating to the folder,
Okay, we are good to go. Now we can start a cluster, provided you have a minimum of three CentOS machines on a private network with an active internet connection.
You need to change the following values in cluster/centos/config-default.sh. My master IP address is 192.168.57.122 and the minions are respectively, 192.168.57.123 and 192.168.57.123.
Do not change the DNS Server IP as it's configured by Kubernetes itself.
To start kubernetes, navigate to the cluster folder,
However I kept running into an error due to a fault with kubernetes. It could not read a variable value (MASTER_ADVERTISE_ADDRESS)
Therefore I had to edit cluster/centos/util.sh. In the function "make-ca-cert", I added the line
MASTER_ADVERTISE_IP=192.168.57.122.
After that I executed the above command again, and the cluster started successfully.
I downloaded the source from kubernetes (https://github.com/kubernetes/kubernetes/tree/v1.7.0-alpha.3) and CentOS 7.1 (minimal version). I set up three virtual machines to deploy my CentOS cluster. This blog post focuses on building and deploying Kubernetes.
First I navigated to the kubernetes src folder within kubernetes.
cd kubernetes/src/k8s.io/kubernetes/
Then I built the kubernetes images with,
build/run.sh
In order to build the relase binaries execute,
build/release.sh
When I executed this I ran out of space in my root directory. This was due to the fact that in /var/lib/docker Docker creates a large number of layers eating up disk space. It could run up to a dazzling amount of 30GB. So I had to move the kubernetes src to another partition and I created a symlink from /var/lib/docker to /media/malith/Data/docker/ which was on the Data partition.
Next we need to copy the download etcd, falnnel and docker. But we do not need to download kubernetes as we have built it. If we download it, there would most probably be a version incompatibility.
Therefore comment out the below section.
#echo "Download kubernetes release v${K8S_VERSION} ..." #curl -L ${K8S_CLIENT_DOWNLOAD_URL} -o ${RELEASES_DIR}/kubernetes-client-linux-amd64.tar.gz #curl -L ${K8S_SERVER_DOWNLOAD_URL} -o ${RELEASES_DIR}/kubernetes-server-linux-amd64.tar.gz
In the function download-release() comment
#rm -rf ${RELEASES_DIR}
This is due to the fact that we are trying to move the binaries that we built to the RELEASES_DIR directory which defaults to /tmp/downloads.
Next we have to manually copy kubernetes-client-linux-amd64.tar.gz and kubernetes-server-linux-amd64.tar.gz to the RELEASES_DIR folder. You can find them by navigating to the folder,
cd kubernetes/src/k8s.io/kubernetes/_output/release-tars
Okay, we are good to go. Now we can start a cluster, provided you have a minimum of three CentOS machines on a private network with an active internet connection.
You need to change the following values in cluster/centos/config-default.sh. My master IP address is 192.168.57.122 and the minions are respectively, 192.168.57.123 and 192.168.57.123.
export MASTER="root@192.168.57.122" export NODES="${NODES:-"root@192.168.57.123 root@192.168.57.124"}"
Do not change the DNS Server IP as it's configured by Kubernetes itself.
To start kubernetes, navigate to the cluster folder,
KUBERNETES_PROVIDER=centos CERT_GROUP=malith ./kube-up.sh
However I kept running into an error due to a fault with kubernetes. It could not read a variable value (MASTER_ADVERTISE_ADDRESS)
Therefore I had to edit cluster/centos/util.sh. In the function "make-ca-cert", I added the line
MASTER_ADVERTISE_IP=192.168.57.122.
After that I executed the above command again, and the cluster started successfully.
Sunday, August 21, 2016
Changepoint Detection : Theoretical Background
Introduction
Changepoints are abrupt variations in the generative parameters of sequential data. Changepoint detection is the process of identifying such abrupt variations and it has been useful in a wide range of applications such as EEG analysis, DNA segmentation, econometrics etc.
Bayesian ways of change point detection focus on segmentation and techniques to generate samples from the posterior distribution over changepoint locations.
Let's try to understand this without a lot of complex mathematical notations. When we are given a data set that varies according to time ( e.g: - a time series ), there might be sudden changes. Because of these changes the data set can be partitioned and separately analysed. Therefore if we assume that within a partition there is a relationship between data points, we can perform a regression to calculate regression coefficients. (Provided that we have an idea about the change points)
However this does not capture the uncertainty involved with the change points. This is where change point distributions come in to play. Following the Bayesian change point model, and by using a constrained HMM (Hidden Markov Model ) we are able to calculate the posterior distribution.
Therefore the posterior distribution can be calculated with a modified forward backward algorithm. The algorithm can be further improved to compute the most probable change point configuration.
The references below provide more insight to the postCP package.
[1] Luong, T.M., Rozenholc, Y. and Nuel, G., 2013. Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model. Computational Statistics & Data Analysis, 68, pp.129-140.
However this does not capture the uncertainty involved with the change points. This is where change point distributions come in to play. Following the Bayesian change point model, and by using a constrained HMM (Hidden Markov Model ) we are able to calculate the posterior distribution.
Therefore the posterior distribution can be calculated with a modified forward backward algorithm. The algorithm can be further improved to compute the most probable change point configuration.
The references below provide more insight to the postCP package.
[1] Luong, T.M., Rozenholc, Y. and Nuel, G., 2013. Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model. Computational Statistics & Data Analysis, 68, pp.129-140.
My Experience with GSOC and R
It all began when I started searching for a Google Summer of Code project last year (November, 2015) . While I was searching through the web found this page that suggested of a project idea. I didn't have a complete understanding about the problem but I contacted the mentors and familiarized myself with R and the theoretical background of the package. In March, I submitted a project proposal and got selected for GSOC finally. (Of course I had to submit a test to the mentors first)
The actual challenge began only then. My mentor, Professor Gregory Nuel sent me a paper he had presented which included a detailed explanation about the change point model and some code examples. I must say that it took a while to understand the document completely with the different notations used. I always kept bothering my mentors for further details whenever I had a question. A significant proportion of the complexity of the project was due to the theoretical background required to the project.
I started with simple steps. Model specific implementation required to choose a few models and finalize the calculations for the log evidence. I was able to finalize this with the help of my mentors. The core implementation of the forward backward algorithm was done in C++. I had several issues interfacing it with the R code using Rcpp. I was able to rectify them by adding useDynLib() in the NAMESPACE.
After the core forward backward algorithms were interfaced, I could proceed with the core model part. Following the change point model, I coded to find the initial regression coefficients using standard R functions. The result was then used to calculate model specific log evidences which were input to the forward backward algorithm to compute posterior probability distributions using HMM. Finally the posterior change point probability distribution code be obtained.
Using the change point probability distribution, the parameters and the log evidence was updated. Finishing off the core implementation part of the package.
However at that time I found out that I hadn't followed a proper coding style guideline. Therefore I referred to "Hadley Wickam R style" and "R style. An Rchaeological Commentary". Well, all of this would be an utter waste if the user did not have any source to understand how to use the functions. Therefore I spent time on improving function documentation as well.
To provide an insight to the theoretical background behind the package, I added Vignettes which is a long form documentation in R packages.
Throughout the project I learned new things and I always tried to develop the postCP package in the most R compliant way. All the requirements have been met and the project has been successfully finished.
Here is the source folder which contains the improvements I made to the package.
https://drive.google.com/drive/folders/0B8xO3Cc0h6rIbXNDTUM1TDlMbDQ?usp=sharing
Here are the commits done throughout the project.
https://github.com/malithj/postCP_Improvement/commits/feature-glmsyntax?author=malithj
https://github.com/malithj/postCP_Improvement/commits/feature-model?author=malithj
The final package can be found in the link below.
https://github.com/malithj/postCP_Improvement
The actual challenge began only then. My mentor, Professor Gregory Nuel sent me a paper he had presented which included a detailed explanation about the change point model and some code examples. I must say that it took a while to understand the document completely with the different notations used. I always kept bothering my mentors for further details whenever I had a question. A significant proportion of the complexity of the project was due to the theoretical background required to the project.
I started with simple steps. Model specific implementation required to choose a few models and finalize the calculations for the log evidence. I was able to finalize this with the help of my mentors. The core implementation of the forward backward algorithm was done in C++. I had several issues interfacing it with the R code using Rcpp. I was able to rectify them by adding useDynLib() in the NAMESPACE.
useDynLib(postCP)
After the core forward backward algorithms were interfaced, I could proceed with the core model part. Following the change point model, I coded to find the initial regression coefficients using standard R functions. The result was then used to calculate model specific log evidences which were input to the forward backward algorithm to compute posterior probability distributions using HMM. Finally the posterior change point probability distribution code be obtained.
Using the change point probability distribution, the parameters and the log evidence was updated. Finishing off the core implementation part of the package.
However at that time I found out that I hadn't followed a proper coding style guideline. Therefore I referred to "Hadley Wickam R style" and "R style. An Rchaeological Commentary". Well, all of this would be an utter waste if the user did not have any source to understand how to use the functions. Therefore I spent time on improving function documentation as well.
To provide an insight to the theoretical background behind the package, I added Vignettes which is a long form documentation in R packages.
Throughout the project I learned new things and I always tried to develop the postCP package in the most R compliant way. All the requirements have been met and the project has been successfully finished.
Here is the source folder which contains the improvements I made to the package.
https://drive.google.com/drive/folders/0B8xO3Cc0h6rIbXNDTUM1TDlMbDQ?usp=sharing
Here are the commits done throughout the project.
https://github.com/malithj/postCP_Improvement/commits/feature-glmsyntax?author=malithj
https://github.com/malithj/postCP_Improvement/commits/feature-model?author=malithj
The final package can be found in the link below.
https://github.com/malithj/postCP_Improvement
Friday, August 19, 2016
postCP change point detection with GSOC
Introduction
The project aimed at improving the postCP package and making it available on CRAN again. The snag that prevented the package from being updated is the recent requirement that in the R code, .C() calls require DUP=TRUE arguments, and .Call() is suggested instead of .C(). The implementation of postCP package required that it's done in the most R compliant way. Separating out the model specific implementation and the core implementation. The core part is implemented in C++ for speed in calculations. The project page can be found here.Implementation
To improve the usability and user friendliness, a glm syntax based "formula" and "family" specification was added. These commits are in the feature-glmsyntax branch. Also the model specific implementation was done by adding four models. (Gaussian, Poisson, Binomial and Gamma). The previous package only included three models. (Gaussian, Poisson and Binomial). These commits are found here.
The model specific part was integrated with the core specific part (C++ forward backward algorithm) to give the required results. Also following the change point model, a separate section was added for parameter calculations and parameter updates based on the updated log evidences. Standard error checks were included to handle incorrect user inputs as much as possible and to give meaningful error messages. Vignettes have been included to provide long form documentation. The commits are found here.
The development has been completed as agreed in the project proposal. All the branches have been merged to the master branch and it reflects the latest package.
The completed source code of the package has been included in the following public Google folder for viewing.
Run time analysis
The following table represents a run time analysis of postCP package done on a 2.7 GHz machine with 4GB RAM running on Ubuntu 14.04. It shows a linear time complexity; O(N)
CPU TIME ( Time in seconds ) | ||
n (iterations ) | Mean Model | Slope Model |
1000 | 0.104 | 0.056 |
10000 | 0.536 | 0.628 |
100000 | 2.636 | 2.3 |
1000000 | 16.076 | 13.776 |
The mean model and the slope model are given below.
Building the Package
1. Clone my repository https://github.com/malithj/postCP_Improvement
2. Navigate into the postCP folder
3. Build using R / RStudio. ( If it's RStudio use Ctrl + Shift + B )
Coding Guidelines
I referred to "Hadley Wickam R style" and "R style. An Rchaeological Commentary" for appropriate coding style guidelines because I saw that there are some issues with existing R packages because of not following best practices.
Final remarks
I would like to thank my mentors Gregory Nuel and Guillem Regaill for all the support and guidance and also bearing with me for the duration of the project. :D I would also like to thank Minh Luong (original creator of the package) for the additional documentation provide while I was trying to understand the previous implementation.
Finally thank you for GSOC for the great learning opportunity! :D
The official GSoC Project page can be found here.
Linked Blog posts
The following blog post explains my GSOC experience and the challenges that I had to face as a summary.
http://malithjayaweera.blogspot.com/2016/08/my-experience-with-gsoc-and-r.html
http://malithjayaweera.blogspot.com/2016/08/my-experience-with-gsoc-and-r.html
Wednesday, August 10, 2016
Conquering Raspberry Pi with Ubuntu 14.04 : Resolving issues with Partitioning
Recently for a project, I used a Raspberry Pi. Although mostly it's preferred to use raspbian as an operating system, I chose to use Ubuntu because it gives access to the Ubuntu repositories as well. There is a complete guide for the installation here. However after sometime, I found that the Raspberry is utilizing only 2GB from its SD card which had the capacity of 8GB. This was quite a surprise because I formatted the SD card before installing Ubuntu on it.
I searched for a good resource to resolve this and I came across this wiki on elinux.org (http://elinux.org/RPi_Resize_Flash_Partitions). However based on their instructions using raspi-config was not an option because I was running Ubuntu 14.04.
The other option which actually worked for me was to manually resize the SD card on linux.
Before you proceed with anything, I highly suggest that you take a copy of your SD card using HDD Raw Copy Tool.
First to show partition information I used the following command.
I searched for a good resource to resolve this and I came across this wiki on elinux.org (http://elinux.org/RPi_Resize_Flash_Partitions). However based on their instructions using raspi-config was not an option because I was running Ubuntu 14.04.
The other option which actually worked for me was to manually resize the SD card on linux.
Before you proceed with anything, I highly suggest that you take a copy of your SD card using HDD Raw Copy Tool.
First to show partition information I used the following command.
$ df -h
Look for a partition which approximately has 2GB size of the distribution image.
$ umount /dev/mmcblk0p1
Then use parted to examine the card
$ sudo parted /dev/sdd (parted) unit chs (parted) print Disk /dev/sdd: 121535,3,31 Sector size (logical/physical): 512B/512B BIOS cylinder,head,sector geometry: 121536,4,32. Each cylinder is 65.5kB. Partition Table: msdos Number Start End Type File system Flags 1 16,0,0 1215,3,31 primary fat32 lba 2 1232,0,0 26671,3,31 primary ext4 3 26688,0,0 29743,3,31 primary linux-swap(v1)
Notice that nothing uses the card from the end of the cylinder 29743 to 121535 (which is the maximum). However in my case there wasn't a third partition called the swap. There were only two partitions; namely the boot partition and the root partition. I left the boot partition alone. I wanted to fill the SD card with the root partition. Nevertheless if there was a swap, then it would have to be moved to the end of the card. (You'll have to adjust the numbers so that the end of the partition 3 is at the end cylinder/head/sector of the card)
Here's the calculation.
Maximum - (Partition 3 End - Partition 3 Start) ) - 1 = Partition 3 New Start
So, in this example:
(121535 - ( 29743 - 26688)) -1 = 118479
Then move the partition. ( Note that this wouldn't work on parted versions later than 2.4 )
(parted) move 3 118479,0,0
However if there was no swap like in my case, life becomes much simpler.
(parted) rm 2 (parted) mkpart primary 1232,0,0 121534,0,0 (parted) quit
Note that in my case, the head and sector numbers did not exactly coincide. (eg:- 3,31 at the end). Therefore as a precaution, I counted one cylinder less from the total available cylinders. ( i.e. 121534,0,0 )
The rest of the steps were easy.
Clean and resize the root partition.
$ sudo e2fsck -f /dev/mmcblk0p1
Allow it to add lost and found
$ sudo resize2fs /dev/mmcblk0p1
After the steps this was the result.
Wednesday, December 24, 2014
Gauge Meters; A daunting task
Since I'm on vacation, I thought of planning my project using the free time. As the next step, I thought of developing the front end of the application. The user should be able to view the real time temperature and the power consumption. To present this information, I thought of using simple gauge meters.
Although the idea may seem to be simple, I had to spend a lot of time searching for any java libraries. Finally after many hours I found "SteelSeries". Although the name might suggest otherwise, it's a opensource library created for the purpose of using gauge meters effectively. I managed to learn the basics of the library from these blog posts. Using java swing, my next attempt was to create a simple program which could display the value provided by the user. However in these attempts, I found out that the jar files I had got were corrupted. After searching through many resources I found a site which provides all the libraries.
My next step was to expand the program further so that it can read a value from a text file and display the temperature and the power consumption. In this implementation, I used a thread called reader for background work. In this case, its task was to read the value from the text file and update the user interface as well as the internal calculator which would then calculate the power consumption. After several attempts, I was able to design a program which worked.
As the final step, since users are more interested in the time the last update was done, I included an update so that the date and time stamp would be included in a JLabel. The following video demonstrates my progress.
Although the idea may seem to be simple, I had to spend a lot of time searching for any java libraries. Finally after many hours I found "SteelSeries". Although the name might suggest otherwise, it's a opensource library created for the purpose of using gauge meters effectively. I managed to learn the basics of the library from these blog posts. Using java swing, my next attempt was to create a simple program which could display the value provided by the user. However in these attempts, I found out that the jar files I had got were corrupted. After searching through many resources I found a site which provides all the libraries.
My next step was to expand the program further so that it can read a value from a text file and display the temperature and the power consumption. In this implementation, I used a thread called reader for background work. In this case, its task was to read the value from the text file and update the user interface as well as the internal calculator which would then calculate the power consumption. After several attempts, I was able to design a program which worked.
As the final step, since users are more interested in the time the last update was done, I included an update so that the date and time stamp would be included in a JLabel. The following video demonstrates my progress.
Subscribe to:
Posts
(
Atom
)
R-bloggers
About Me
Blog Archive
Popular Posts
-
Introduction Changepoints are abrupt variations in the generative parameters of sequential data. Changepoint detection is the process o...
-
Introduction The project aimed at improving the postCP package and making it available on CRAN again. The snag that prevented the pac...
-
It all began when I started searching for a Google Summer of Code project last year (November, 2015) . While I was searching through the we...
-
Since I'm on vacation, I thought of planning my project using the free time. As the next step, I thought of developing the front end of...
-
In building the required libraries for a docker container, using a maven project, the libraries have to be copied to a separate location an...
-
It was quite a daunting task at the beginning to start with Kubernetes 1.7 alpha release because I knew that I was bound to face with diffi...
-
Recently for a project, I used a Raspberry Pi. Although mostly it's preferred to use raspbian as an operating system, I chose to use U...
-
Last week we were assigned projects for our 4th semester and I was selected for the energy sector. As a part of it, when I was exploring t...
Powered by Blogger.
Labels
#GSOC
(3)
#Tech
(3)
#postCP
(3)
#Kubernetes
(2)
#R-gsoc
(2)
#docker
(2)
#CentOS
(1)
#Cluster
(1)
#Pi
(1)
#R
(1)
#Ubuntu
(1)
#docker-maven-plugin
(1)
#fabric8
(1)
#raspberryPi
(1)
Air Conditioning
(1)
Embedded
(1)
Micro Controllers
(1)
Power consumption
(1)
Smart Systems
(1)
Tech
(1)
© Malith's Perspective 2016 . Powered by Bootstrap , Blogger templates and RWD Testing Tool