Sunday, August 21, 2016
My Experience with GSOC and R
It all began when I started searching for a Google Summer of Code project last year (November, 2015) . While I was searching through the web found this page that suggested of a project idea. I didn't have a complete understanding about the problem but I contacted the mentors and familiarized myself with R and the theoretical background of the package. In March, I submitted a project proposal and got selected for GSOC finally. (Of course I had to submit a test to the mentors first)
The actual challenge began only then. My mentor, Professor Gregory Nuel sent me a paper he had presented which included a detailed explanation about the change point model and some code examples. I must say that it took a while to understand the document completely with the different notations used. I always kept bothering my mentors for further details whenever I had a question. A significant proportion of the complexity of the project was due to the theoretical background required to the project.
I started with simple steps. Model specific implementation required to choose a few models and finalize the calculations for the log evidence. I was able to finalize this with the help of my mentors. The core implementation of the forward backward algorithm was done in C++. I had several issues interfacing it with the R code using Rcpp. I was able to rectify them by adding useDynLib() in the NAMESPACE.
After the core forward backward algorithms were interfaced, I could proceed with the core model part. Following the change point model, I coded to find the initial regression coefficients using standard R functions. The result was then used to calculate model specific log evidences which were input to the forward backward algorithm to compute posterior probability distributions using HMM. Finally the posterior change point probability distribution code be obtained.
Using the change point probability distribution, the parameters and the log evidence was updated. Finishing off the core implementation part of the package.
However at that time I found out that I hadn't followed a proper coding style guideline. Therefore I referred to "Hadley Wickam R style" and "R style. An Rchaeological Commentary". Well, all of this would be an utter waste if the user did not have any source to understand how to use the functions. Therefore I spent time on improving function documentation as well.
To provide an insight to the theoretical background behind the package, I added Vignettes which is a long form documentation in R packages.
Throughout the project I learned new things and I always tried to develop the postCP package in the most R compliant way. All the requirements have been met and the project has been successfully finished.
Here is the source folder which contains the improvements I made to the package.
https://drive.google.com/drive/folders/0B8xO3Cc0h6rIbXNDTUM1TDlMbDQ?usp=sharing
Here are the commits done throughout the project.
https://github.com/malithj/postCP_Improvement/commits/feature-glmsyntax?author=malithj
https://github.com/malithj/postCP_Improvement/commits/feature-model?author=malithj
The final package can be found in the link below.
https://github.com/malithj/postCP_Improvement
The actual challenge began only then. My mentor, Professor Gregory Nuel sent me a paper he had presented which included a detailed explanation about the change point model and some code examples. I must say that it took a while to understand the document completely with the different notations used. I always kept bothering my mentors for further details whenever I had a question. A significant proportion of the complexity of the project was due to the theoretical background required to the project.
I started with simple steps. Model specific implementation required to choose a few models and finalize the calculations for the log evidence. I was able to finalize this with the help of my mentors. The core implementation of the forward backward algorithm was done in C++. I had several issues interfacing it with the R code using Rcpp. I was able to rectify them by adding useDynLib() in the NAMESPACE.
useDynLib(postCP)
After the core forward backward algorithms were interfaced, I could proceed with the core model part. Following the change point model, I coded to find the initial regression coefficients using standard R functions. The result was then used to calculate model specific log evidences which were input to the forward backward algorithm to compute posterior probability distributions using HMM. Finally the posterior change point probability distribution code be obtained.
Using the change point probability distribution, the parameters and the log evidence was updated. Finishing off the core implementation part of the package.
However at that time I found out that I hadn't followed a proper coding style guideline. Therefore I referred to "Hadley Wickam R style" and "R style. An Rchaeological Commentary". Well, all of this would be an utter waste if the user did not have any source to understand how to use the functions. Therefore I spent time on improving function documentation as well.
To provide an insight to the theoretical background behind the package, I added Vignettes which is a long form documentation in R packages.
Throughout the project I learned new things and I always tried to develop the postCP package in the most R compliant way. All the requirements have been met and the project has been successfully finished.
Here is the source folder which contains the improvements I made to the package.
https://drive.google.com/drive/folders/0B8xO3Cc0h6rIbXNDTUM1TDlMbDQ?usp=sharing
Here are the commits done throughout the project.
https://github.com/malithj/postCP_Improvement/commits/feature-glmsyntax?author=malithj
https://github.com/malithj/postCP_Improvement/commits/feature-model?author=malithj
The final package can be found in the link below.
https://github.com/malithj/postCP_Improvement
Subscribe to:
Post Comments
(
Atom
)
R-bloggers
About Me
Blog Archive
Popular Posts
-
Introduction Changepoints are abrupt variations in the generative parameters of sequential data. Changepoint detection is the process o...
-
Introduction The project aimed at improving the postCP package and making it available on CRAN again. The snag that prevented the pac...
-
It all began when I started searching for a Google Summer of Code project last year (November, 2015) . While I was searching through the we...
-
Since I'm on vacation, I thought of planning my project using the free time. As the next step, I thought of developing the front end of...
-
In building the required libraries for a docker container, using a maven project, the libraries have to be copied to a separate location an...
-
It was quite a daunting task at the beginning to start with Kubernetes 1.7 alpha release because I knew that I was bound to face with diffi...
-
Recently for a project, I used a Raspberry Pi. Although mostly it's preferred to use raspbian as an operating system, I chose to use U...
-
Last week we were assigned projects for our 4th semester and I was selected for the energy sector. As a part of it, when I was exploring t...
Powered by Blogger.
Labels
#GSOC
(3)
#Tech
(3)
#postCP
(3)
#Kubernetes
(2)
#R-gsoc
(2)
#docker
(2)
#CentOS
(1)
#Cluster
(1)
#Pi
(1)
#R
(1)
#Ubuntu
(1)
#docker-maven-plugin
(1)
#fabric8
(1)
#raspberryPi
(1)
Air Conditioning
(1)
Embedded
(1)
Micro Controllers
(1)
Power consumption
(1)
Smart Systems
(1)
Tech
(1)
© Malith's Perspective 2016 . Powered by Bootstrap , Blogger templates and RWD Testing Tool
No comments :
Post a Comment