Pinboard (rahuldave)

Pinboard (rahuldave) https://pinboard.in/u:rahuldave/public/ recent bookmarks from rahuldave Advanced Data Analysis from an Elementary Point of View 2012-05-03T20:41:03+00:00 http://www.cscs.umich.edu/~crshalizi/weblog/cat_36-402.html rahuldavedata statistics https://pinboard.in/ https://pinboard.in/u:rahuldave/b:b69e5412b4c2/ cumplyr: Extending the plyr Package to Handle Cross-Dependencies 2012-05-03T14:44:49+00:00 http://www.johnmyleswhite.com/notebook/2012/05/03/cumplyr-extending-the-plyr-package-to-handle-cross-dependencies/ rahuldave= Value 1 AND Variable 2 >= Value 2, etc. This allows us to implement the backward-moving mean described earlier. Using Norm Balls Finally, we can consider a combination of upper and lower bounds. For simplicity, we'll assume that these bounds have a fixed tightness around the "center" of each subset of our split data. To articulate this tightness formally, we look at a specific hypothetical equality constraint like Variable 1 = Value 1 and then loosen it so that norm(Variable 1 - Value 1) <= r. When r = 0, this system gives the original equality constraint. But when r > 0, we produce a "ball" of data around the constraint whose tightness is r. This lets us estimate the local means from our third example. Implementation To demo these ideas in a usable fashion, I've created a draft package for R called cumplyr. Here is an extended example of its usage in solving simple variants of the problems described in this post: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 library('cumplyr') data <- data.frame(Time = 1:5, Value = seq(1, 9, by = 2)) iddply(data, equality.variables = c('Time'), lower.bound.variables = c(), upper.bound.variables = c(), norm.ball.variables = list(), func = function (df) {with(df, mean(Value))}) iddply(data, equality.variables = c(), lower.bound.variables = c('Time'), upper.bound.variables = c(), norm.ball.variables = list(), func = function (df) {with(df, mean(Value))}) iddply(data, equality.variables = c(), lower.bound.variables = c(), upper.bound.variables = c('Time'), norm.ball.variables = list(), func = function (df) {with(df, mean(Value))}) iddply(data, equality.variables = c(), lower.bound.variables = c(), upper.bound.variables = c(), norm.ball.variables = list('Time' = 1), func = function (df) {with(df, mean(Value))}) iddply(data, equality.variables = c(), lower.bound.variables = c(), upper.bound.variables = c(), norm.ball.variables = list('Time' = 2), func = function (df) {with(df, mean(Value))}) iddply(data, equality.variables = c(), lower.bound.variables = c(), upper.bound.variables = c(), norm.ball.variables = list('Time' = 5), func = function (df) {with(df, mean(Value))}) You can download this package from GitHub and play with it to see whether it helps you. Please submit feedback using GitHub if you have any comments, complaints or patches. Comparing plyr with cumplyr In the long run, I'm hoping to make the functions in cumplyr robust enough to submit a patch to plyr. I see these tools as one logical extension of plyr to encompass more of the framework described in Hadley's paper on the Split-Apply-Combine strategy. For the time being, I would advise any users of cumplyr to make sure that you do not use cumplyr for anything that plyr could already do. cumplyr is very much demo software and I am certain that both its API and implementation will change. In contrast, plyr is fast and stable software that can be trusted to perform its job. But, if you have a problem that cumplyr will solve and plyr will not, I hope you'll try cumplyr out and submit patches when it breaks. Happy hacking! ]]> Programming Statistics https://pinboard.in/u:rahuldave/b:a5482cf69a97/ Measuring time series characteristics 2012-05-02T07:51:05+00:00 http://www.r-bloggers.com/measuring-time-series-characteristics/ rahuldave10) # Arbitrary threshold chosen by trial and error. { period <- round(1/spec$freq[which.max(spec$spec)]) if(period==Inf) # Find next local maximum { j <- which(diff(spec$spec)>0) if(length(j)>0) { nextmax <- j[1] + which.max(spec$spec[j[1]:500]) if(nextmax <= length(spec$freq)) period <- round(1/spec$freq[nextmax]) else period <- 1 } else period <- 1 } } else period <- 1 return(period) } The function is called find.freq because time series people often call the period of seasonality the “frequency” (which is of course highly confusing). Decomposing the data into trend and seasonal components We needed a measure of the strength of trend and the strength of seasonality, and to do this we decomposed the data into trend, seasonal and error terms. Because not all data could be decomposed additively, we first needed to apply an automated Box-Cox transformation. We tried a range of Box-Cox parameters on a grid, and selected the one which gave the most normal errors. That worked ok, but I’ve since found some papers that provide quite good automated Box-Cox algorithms that I’ve implemented in the forecast package. So this code uses Guerrero’s (1993) method instead. For seasonal time series, we decomposed the transformed data using an stl decomposition with periodic seasonality. For non-seasonal time series, we estimated the trend of the transformed data using penalized regression splines via the mgcv package. decomp <- function(x,transform=TRUE) { require(forecast) # Transform series if(transform & min(x,na.rm=TRUE) >= 0) { lambda <- BoxCox.lambda(na.contiguous(x)) x <- BoxCox(x,lambda) } else { lambda <- NULL transform <- FALSE } # Seasonal data if(frequency(x)>1) { x.stl <- stl(x,s.window="periodic",na.action=na.contiguous) trend <- x.stl$time.series[,2] season <- x.stl$time.series[,1] remainder <- x - trend - season } else #Nonseasonal data { require(mgcv) tt <- 1:length(x) trend <- rep(NA,length(x)) trend[!is.na(x)] <- fitted(gam(x ~ s(tt))) season <- NULL remainder <- x - trend } return(list(x=x,trend=trend,season=season,remainder=remainder, transform=transform,lambda=lambda)) } Putting everything on a [0,1] scale We wanted to measure a range of characteristics such as strength of seasonality, strength of trend, level of nonlinearity, skewness, kurtosis, serial correlatedness, self-similarity, level of chaoticity (is that a word?) and the periodicity of the data. But we wanted all these on the same scale which meant mapping the natural range of each measure onto [0,1]. The following two functions were used to do this. # f1 maps [0,infinity) to [0,1] f1 <- function(x,a,b) { eax <- exp(a*x) if (eax == Inf) f1eax <- 1 else f1eax <- (eax-1)/(eax+b) return(f1eax) } # f2 maps [0,1] onto [0,1] f2 <- function(x,a,b) { eax <- exp(a*x) ea <- exp(a) return((eax-1)/(eax+b)*(ea+b)/(ea-1)) } The values of and in each function were chosen so the measure had a 90th percentile of 0.10 when the data were iid standard normal, and a value of 0.9 using a well-known benchmark time series. Calculating the measures Now we are ready to calculate the measures on the original data, as well as on the adjusted data (after removing trend and seasonality). measures <- function(x) { require(forecast) N <- length(x) freq <- find.freq(x) fx <- c(frequency=(exp((freq-1)/50)-1)/(1+exp((freq-1)/50))) x <- ts(x,f=freq) # Decomposition decomp.x <- decomp(x) # Adjust data if(freq > 1) fits <- decomp.x$trend + decomp.x$season else # Nonseasonal data fits <- decomp.x$trend adj.x <- decomp.x$x - fits + mean(decomp.x$trend, na.rm=TRUE) # Backtransformation of adjusted data if(decomp.x$transform) tadj.x <- InvBoxCox(adj.x,decomp.x$lambda) else tadj.x <- adj.x # Trend and seasonal measures v.adj <- var(adj.x, na.rm=TRUE) if(freq > 1) { detrend <- decomp.x$x - decomp.x$trend deseason <- decomp.x$x - decomp.x$season trend <- ifelse(var(deseason,na.rm=TRUE) < 1e-10, 0, max(0,min(1,1-v.adj/var(deseason,na.rm=TRUE)))) season <- ifelse(var(detrend,na.rm=TRUE) < 1e-10, 0, max(0,min(1,1-v.adj/var(detrend,na.rm=TRUE)))) } else #Nonseasonal data { trend <- ifelse(var(decomp.x$x,na.rm=TRUE) < 1e-10, 0, max(0,min(1,1-v.adj/var(decomp.x$x,na.rm=TRUE)))) season <- 0 } m <- c(fx,trend,season) # Measures on original data xbar <- mean(x,na.rm=TRUE) s <- sd(x,na.rm=TRUE) # Serial correlation Q <- Box.test(x,lag=10)$statistic/(N*10) fQ <- f2(Q,7.53,0.103) # Nonlinearity p <- terasvirta.test(na.contiguous(x))$statistic fp <- f1(p,0.069,2.304) # Skewness s <- abs(mean((x-xbar)^3,na.rm=TRUE)/s^3) fs <- f1(s,1.510,5.993) # Kurtosis k <- mean((x-xbar)^4,na.rm=TRUE)/s^4 fk <- f1(k,2.273,11567) # Hurst=d+0.5 where d is fractional difference. H <- fracdiff(na.contiguous(x),0,0)$d + 0.5 # Lyapunov Exponent if(freq > N-10) stop("Insufficient data") Ly <- numeric(N-freq) for(i in 1:(N-freq)) { idx <- order(abs(x[i] - x)) idx <- idx[idx < (N-freq)] j <- idx[2] Ly[i] <- log(abs((x[i+freq] - x[j+freq])/(x[i]-x[j])))/freq if(is.na(Ly[i]) | Ly[i]==Inf | Ly[i]==-Inf) Ly[i] <- NA } Lyap <- mean(Ly,na.rm=TRUE) fLyap <- exp(Lyap)/(1+exp(Lyap)) m <- c(m,fQ,fp,fs,fk,H,fLyap) # Measures on adjusted data xbar <- mean(tadj.x, na.rm=TRUE) s <- sd(tadj.x, na.rm=TRUE) # Serial Q <- Box.test(adj.x,lag=10)$statistic/(N*10) fQ <- f2(Q,7.53,0.103) # Nonlinearity p <- terasvirta.test(na.contiguous(adj.x))$statistic fp <- f1(p,0.069,2.304) # Skewness s <- abs(mean((tadj.x-xbar)^3,na.rm=TRUE)/s^3) fs <- f1(s,1.510,5.993) # Kurtosis k <- mean((tadj.x-xbar)^4,na.rm=TRUE)/s^4 fk <- f1(k,2.273,11567) m <- c(m,fQ,fp,fs,fk) names(m) <- c("frequency", "trend","seasonal", "autocorrelation","non-linear","skewness","kurtosis", "Hurst","Lyapunov", "dc autocorrelation","dc non-linear","dc skewness","dc kurtosis") return(m) } Here is a quick example applied to Australian monthly gas production: library(forecast) measures(gas) frequency trend seasonal autocorrelation 0.1096 0.9989 0.9337 0.9985 non-linear skewness kurtosis Hurst 0.4947 0.1282 1.0000 0.9996 Lyapunov dc autocorrelation dc non-linear dc skewness 0.5662 0.1140 0.0538 0.1743 dc kurtosis 1.0000 The function is far from perfect, and it is not hard to find examples where it fails. For example, it doesn’t work with multiple seasonality — try measure(taylor) and check the seasonality. Also, I’m not convinced the kurtosis provides anything useful here, or that the skewness measure is done in the best way possible. But it was really a proof of concept, so we will leave it to others to revise and improve the code. In our papers, we took the measures obtained using R, and produced self-organizing maps using Viscovery. There is now a som package in R for that, so it might be possible to integrate that step into R as well. The dendogram was generated in matlab, although that could now also be done in R using the ggdendro package for example. Download the code in a single file. To leave a comment for the author, please follow the link and comment on his blog: Research tips » R. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more... ]]> R_bloggers forecasting R Research_tips statistics https://pinboard.in/u:rahuldave/b:c791c19684a6/ Big data is easy 2012-05-01T14:16:12+00:00 http://www.johndcook.com/blog/2012/05/01/big-data-is-easy/ rahuldave Statistics Probability_and_Statistics https://pinboard.in/u:rahuldave/b:ebac9e0bdf19/ Cookbook for R » Cookbook for R 2012-04-30T13:48:23+00:00 http://wiki.stdout.org/rcookbook/ rahuldavestatistics https://pinboard.in/ https://pinboard.in/u:rahuldave/b:c57520f4b4c4/ Comparing Julia and R’s Vocabularies 2012-04-09T14:00:19+00:00 http://www.r-bloggers.com/comparing-julia-and-r%e2%80%99s-vocabularies/ rahuldave > Basics Comparison >= >= Basics Comparison < < Basics Comparison <= <= Basics Comparison is.na Basics Comparison is.nan Basics Comparison is.finite Basics Comparison complete.cases Basics Comparison * * Basics Basic Math + + Basics Basic Math - - Basics Basic Math / / Basics Basic Math ^ ^ Basics Basic Math %% mod (%%) Basics Basic Math %/% div Basics Basic Math abs abs Basics Basic Math sign sign Basics Basic Math acos acos Basics Basic Math acosh acosh Basics Basic Math asin asin Basics Basic Math asinh asinh Basics Basic Math atan atan Basics Basic Math atan2 atan2 Basics Basic Math atanh atanh Basics Basic Math sin sin Basics Basic Math sinh sinh Basics Basic Math cos cos Basics Basic Math cosh cosh Basics Basic Math tan tan Basics Basic Math tanh tanh Basics Basic Math ceiling ceil Basics Basic Math floor floor Basics Basic Math round round Basics Basic Math trunc trunc Basics Basic Math signif Basics Basic Math exp exp Basics Basic Math log log Basics Basic Math log10 log10 Basics Basic Math log1p log1p Basics Basic Math log2 log2 Basics Basic Math logb Basics Basic Math sqrt sqrt Basics Basic Math cummax Basics Basic Math cummin Basics Basic Math cumprod cumprod Basics Basic Math cumsum cumsum Basics Basic Math diff diff Basics Basic Math max max Basics Basic Math min min Basics Basic Math prod prod Basics Basic Math sum sum Basics Basic Math range Basics Basic Math mean mean Basics Basic Math median median Basics Basic Math cor cor_pearson Basics Basic Math cov cov_pearson Basics Basic Math sd std Basics Basic Math var var Basics Basic Math pmax Basics Basic Math pmin Basics Basic Math rle Basics Basic Math function function Basics Functions missing Basics Functions on.exit Basics Functions return return Basics Functions invisible Basics Functions & & Basics Logical & Set Operations | | Basics Logical & Set Operations ! ! Basics Logical & Set Operations xor Basics Logical & Set Operations all all Basics Logical & Set Operations any any Basics Logical & Set Operations intersect intersect Basics Logical & Set Operations union union Basics Logical & Set Operations setdiff Basics Logical & Set Operations setequal Basics Logical & Set Operations which find Basics Logical & Set Operations c [] ({}) Basics Vectors and Matrices matrix [] ({}) Basics Vectors and Matrices length size (length) Basics Vectors and Matrices dim size Basics Vectors and Matrices ncol size(x, 1) Basics Vectors and Matrices nrow size(x, 2) Basics Vectors and Matrices cbind hcat Basics Vectors and Matrices rbind vcat Basics Vectors and Matrices names Basics Vectors and Matrices colnames Basics Vectors and Matrices rownames Basics Vectors and Matrices t ‘ Basics Vectors and Matrices diag eye Basics Vectors and Matrices sweep Basics Vectors and Matrices as.matrix Basics Vectors and Matrices data.matrix Basics Vectors and Matrices c [] ({}) Basics Making Vectors rep Basics Making Vectors seq [from:by:to] Basics Making Vectors seq_along Basics Making Vectors seq_len [1:len] Basics Making Vectors rev reverse Basics Making Vectors sample Basics Making Vectors choose factorial Basics Making Vectors factorial factorial Basics Making Vectors combn Basics Making Vectors (is/as).(character/numeric/logical) Basics Making Vectors list HashTable ([]) Basics Lists & Data Frames unlist Basics Lists & Data Frames data.frame Basics Lists & Data Frames as.data.frame Basics Lists & Data Frames split Basics Lists & Data Frames expand.grid Basics Lists & Data Frames if if Basics Control Flow && && Basics Control Flow || || Basics Control Flow for for Basics Control Flow while while Basics Control Flow next continue Basics Control Flow break break Basics Control Flow switch Basics Control Flow ifelse Basics Control Flow fitted Statistics Linear Models predict Statistics Linear Models resid Statistics Linear Models rstandard Statistics Linear Models lm Statistics Linear Models glm Statistics Linear Models hat Statistics Linear Models influence.measures Statistics Linear Models logLik Statistics Linear Models df Statistics Linear Models deviance Statistics Linear Models formula Statistics Linear Models ~ Statistics Linear Models I Statistics Linear Models anova Statistics Linear Models coef Statistics Linear Models confint Statistics Linear Models vcov Statistics Linear Models contrasts Statistics Linear Models apropos(‘\\.test$’) Statistics Miscellaneous Statistical Tests beta beta Statistics Random Numbers binom binom Statistics Random Numbers cauchy cauchy Statistics Random Numbers chisq chisq Statistics Random Numbers exp exp Statistics Random Numbers f f Statistics Random Numbers gamma gamma Statistics Random Numbers geom geom Statistics Random Numbers hyper hyper Statistics Random Numbers lnorm lnorm Statistics Random Numbers logis logis Statistics Random Numbers multinom multinom Statistics Random Numbers nbinom nbinom Statistics Random Numbers norm norm Statistics Random Numbers pois pois Statistics Random Numbers signrank signrank Statistics Random Numbers t t Statistics Random Numbers unif unif (rand) Statistics Random Numbers weibull weibull Statistics Random Numbers wilcox wilcox Statistics Random Numbers birthday birthday Statistics Random Numbers tukey tukey Statistics Random Numbers crossprod * Statistics Matrix Algebra tcrossprod * Statistics Matrix Algebra eigen eig Statistics Matrix Algebra qr qr Statistics Matrix Algebra svd svd Statistics Matrix Algebra %*% * Statistics Matrix Algebra %o% Statistics Matrix Algebra outer Statistics Matrix Algebra rcond Statistics Matrix Algebra solve \ Statistics Matrix Algebra duplicated Statistics Ordering and Tabulating unique Statistics Ordering and Tabulating merge Statistics Ordering and Tabulating order Statistics Ordering and Tabulating rank Statistics Ordering and Tabulating quantile quantile Statistics Ordering and Tabulating sort sort Statistics Ordering and Tabulating table Statistics Ordering and Tabulating ftable Statistics Ordering and Tabulating ls whos Working with R Workspace exists Working with R Workspace get Working with R Workspace rm Working with R Workspace getwd getcwd Working with R Workspace setwd setcwd Working with R Workspace q Ctrl-D Working with R Workspace source load Working with R Workspace install.packages Working with R Workspace library Working with R Workspace require Working with R Workspace help help Working with R Help ? help Working with R Help help.search Working with R Help apropos Working with R Help RSiteSearch Working with R Help citation Working with R Help demo Working with R Help example Working with R Help vignette Working with R Help traceback Working with R Debugging browser Working with R Debugging recover Working with R Debugging options(error =) Working with R Debugging stop Working with R Debugging warning Working with R Debugging message Working with R Debugging tryCatch try/catch Working with R Debugging try try Working with R Debugging print print (println) I/O Output cat I/O Output message I/O Output warning I/O Output dput I/O Output format I/O Output sink I/O Output data I/O Reading and Writing Data count.fields I/O Reading and Writing Data read.csv csvread I/O Reading and Writing Data read.delim dlmread I/O Reading and Writing Data read.fwf I/O Reading and Writing Data read.table I/O Reading and Writing Data library(foreign) I/O Reading and Writing Data write.table dlmwrite I/O Reading and Writing Data readLines readlines I/O Reading and Writing Data writeLines I/O Reading and Writing Data load I/O Reading and Writing Data save I/O Reading and Writing Data readRDS I/O Reading and Writing Data saveRDS I/O Reading and Writing Data dir I/O Files and Directories basename I/O Files and Directories dirname I/O Files and Directories file.path I/O Files and Directories path.expand I/O Files and Directories file.choose I/O Files and Directories file.copy I/O Files and Directories file.create I/O Files and Directories file.remove I/O Files and Directories path.rename I/O Files and Directories dir.create I/O Files and Directories file.exists I/O Files and Directories tempdir I/O Files and Directories tempfile I/O Files and Directories download.file I/O Files and Directories ISOdate Special Data Date / Time ISOdatetime Special Data Date / Time strftime Special Data Date / Time strptime Special Data Date / Time date Special Data Date / Time difftime Special Data Date / Time julian Special Data Date / Time months Special Data Date / Time quarters Special Data Date / Time weekdays Special Data Date / Time library(lubridate) Special Data Date / Time grep match Special Data Character Manipulation agrep Special Data Character Manipulation gsub Special Data Character Manipulation strsplit split Special Data Character Manipulation chartr Special Data Character Manipulation nchar strlen Special Data Character Manipulation tolower Special Data Character Manipulation toupper Special Data Character Manipulation substr Special Data Character Manipulation paste join Special Data Character Manipulation library(stringr) Special Data Character Manipulation factor Special Data Factors levels Special Data Factors nlevels Special Data Factors reorder Special Data Factors relevel Special Data Factors cut Special Data Factors findInterval Special Data Factors interaction Special Data Factors options(stringsAsFactors = FALSE) Special Data Factors array [] Special Data Array Manipulation dim size Special Data Array Manipulation dimnames Special Data Array Manipulation aperm Special Data Array Manipulation library(abind) Special Data Array Manipulation I’d like to note that holes in the list of Julia functions can exist for several reasons: The language does not yet have the relevant features. This is true of things like factor() or data.frame(). The language has draft implementations of the relevant features, but they are not yet ready to make their way into this list. This is true of Doug Bates’ GLM code, for example. I simply don’t know what the Julia equivalent is for an R function, but it may well exist. If you know of one, please fork the GitHub repository I’m using and revise the CSV file appropriately. I’ll integrate relevant pull requests as soon as I can find time. In addition to explaining the presence of the many holes you can see this in this list, I’d also like to note how quickly these holes are being filled in: Doug Bates already finished a wrapper for the Rmath library, which means that Julia now has tools for calculating the PDF’s, CDF’s, and inverse CDF’s of most statistical distributions as well as the ability to draw random samples from them. That means that almost any sort of MCMC you’d like to do is already possible in Julia. (I, for one, am really interested to see if someone will use Julia’s sparse matrix support and these new Rmath functions to build MCMC code that’s easy on the eyes while also running at an appropriately fast speed on complicated, big data problems like matrix factorizations.) On my end, I’ve been working on filling some of the missing entries in this list by adding in pieces that I think I understand well enough to implement from scratch, such as: Optimization algorithms (optim.jl): Simulated annealing Gradient descent Newton’s method Statistical hypothesis tests (stats.jl): t-Tests Utility functions (utils.jl): range keys cummax cummin To leave a comment for the author, please follow the link and comment on his blog: John Myles White » Statistics. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more... ]]> R_bloggers programming statistics https://pinboard.in/u:rahuldave/b:4a08330142b5/ Simulated Annealing in Julia 2012-04-04T20:38:53+00:00 http://www.johnmyleswhite.com/notebook/2012/04/04/simulated-annealing-in-julia/ rahuldave rosenbrock(z[1], z[2]), [0, 0], neighbors, log_temperature, 10000, true, false) Finding the Minima of the Himmelbrau Function 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 # Find a solution of the Himmelbrau function using SA. load("simulated_annealing.jl") load("../rng.jl") function himmelbrau(x, y) (x^2 + y - 11)^2 + (x + y^2 - 7)^2 end function neighbors(z) [rand_uniform(z[1] - 1, z[1] + 1), rand_uniform(z[2] - 1, z[2] + 1)] end srand(1) solution = simulated_annealing(z -> himmelbrau(z[1], z[2]), [0, 0], neighbors, log_temperature, 10000, true, false) Moving Forward Now that I’ve got a form of SA working, I’m interested in coding up a suite of optimization functions for Julia so that I can start to do maximum likelihood estimation in pure Julia. Once that’s possible, I can use Julia to do real science, e.g. when I need to fit simple models for which finding the MLE is appropriate. (I will leave the development of cleaner statistical functions for special cases of maximum likelihood estimation to more capable people, like Douglas Bates, who has already produced some GLM code.) At present my code is meant simply to demonstrate how one could write an implementation of simulated annealing in Julia. I’m sure that the code can be more efficient and I suspect that I’ve violated some of the idioms of the language. In addition, I’d much prefer that this function use default values for many of the arguments, as there is no reason that an end-user needs to be concerned with finding the best cooling schedule if SA seems to work out of the box on their problem with the cooling schedule I’ve been using. With those disclaimers about my code in place, I’d like to think that I haven’t made any mathematical errors and that this function can be used by others. So, I’d ask that those interested please tear apart my code so that I can make it usable as a general purpose function for optimization in Julia. Alternatively, please tell me that there’s no need for a pure Julia implementation of SA because, for example, there are nice C libraries that would be much easier to link to than to re-implement. With an implementation of SA in place, I’ll probably start working on implementing L-BFGS-S soon, which is the other optimization algorithm I use often in R. (To be honest, I use L-BFGS-S almost exclusively, but SA was much easier to implement.) Incidentally, this code base demonstrates how I view the relationship between R and Julia: Julia is a beautiful new language that is still missing many important pieces. We can all work together to build the best pieces of R that are missing from Julia. While we’re working on improving Julia, we’ll need to keep using R to handle things like visualization of our results. For this post, I turned back to ggplot2 for all of the graphics generation. ]]> Statistics https://pinboard.in/u:rahuldave/b:60a23d91daaa/ Monkeying with Bayes’ theorem 2012-03-09T19:01:17+00:00 http://www.johndcook.com/blog/2012/03/09/monkeying-with-bayes-theorem/ rahuldave Statistics Bayesian Probability_and_Statistics https://pinboard.in/u:rahuldave/b:856f4bf519b8/ Data visualization 2012-03-04T22:37:45+00:00 http://www.r-bloggers.com/data-visualization/ rahuldave R_bloggers graphics R statistics Uncategorized https://pinboard.in/u:rahuldave/b:c48507b81035/ ABC in Roma [R lab #2] 2012-03-02T06:07:14+00:00 http://www.r-bloggers.com/abc-in-roma-r-lab-2/ rahuldave R_bloggers ABC La_Sapienza PhD_course R Roma statistics University_life https://pinboard.in/u:rahuldave/b:65af70b24476/ Modeling Trick: the Signed Pseudo Logarithm 2012-03-02T05:19:42+00:00 http://www.r-bloggers.com/modeling-trick-the-signed-pseudo-logarithm/ rahuldave R_bloggers arcsinh R singed_pseudo_logarithm stabilizing_transform statistics https://pinboard.in/u:rahuldave/b:d3b534855889/ Comparing R to smoking 2012-02-29T20:46:29+00:00 http://www.johndcook.com/blog/2012/02/29/comparing-r-to-smoking/ rahuldave Statistics Rstats https://pinboard.in/u:rahuldave/b:6cd625f8ee7e/ Dear statisticians: Please start using your powers for good not evil 2012-02-17T15:44:19+00:00 http://chrisblattman.com/2012/02/17/dear-statisticians-please-start-using-your-powers-for-good-not-evil/ rahuldave statistics forecasting prediction https://pinboard.in/u:rahuldave/b:a5311996a72e/ Bayesian probabilistic reasoning applied to mathematical epidemiology for predictive spatiotemporal analysis of infectious diseases 2012-02-14T19:39:02+00:00 http://dl.acm.org/citation.cfm?id=1195284 rahuldaveepidemiology statistics https://pinboard.in/ https://pinboard.in/u:rahuldave/b:0a80d6d9bc44/ [untitled] 2012-02-14T19:38:34+00:00 http://stat.asu.edu/~chavez/CCCPUB/Towards%20real%20time%20epidemiology%20data%20assimilation,%20modeling%20and%20anomaly%20detection%20of%20health%20surveillance%20data%20streams.pdf rahuldavestatistics epidemiology https://pinboard.in/ https://pinboard.in/u:rahuldave/b:4b66882af2ee/ [untitled] 2012-02-14T19:38:18+00:00 http://www.lancs.ac.uk/staff/diggle/ENAR_slides.pdf rahuldavestatistics epidemiology https://pinboard.in/ https://pinboard.in/u:rahuldave/b:eb2de3539c1f/ Linking correlation to causation with power laws and scale free systems 2012-02-09T19:30:19+00:00 http://feeds.arstechnica.com/~r/arstechnica/index/~3/NzKuPgcpSdY/seeing-a-power-law-in-data-doesnt-make-it-real.ars rahuldave News Science correlations powerlaw statistics https://pinboard.in/u:rahuldave/b:2b1eec98d001/ The universal solvent of statistics 2012-02-01T16:02:21+00:00 http://www.johndcook.com/blog/2012/02/01/the-universal-solvent-of-statistics/ rahuldave Statistics Uncategorized Bayesian Probability_and_Statistics https://pinboard.in/u:rahuldave/b:c7d00ee2e732/ R in Action 2012-01-02T20:16:15+00:00 http://www.johndcook.com/blog/2012/01/02/r-in-action/ rahuldave Statistics Books Probability_and_Statistics Rstats https://pinboard.in/u:rahuldave/b:8d9bdb6d3ec6/ R in Action 2012-01-02T16:32:31+00:00 http://feedproxy.google.com/~r/TheEndeavour/~3/gBBNTQY8bwk/ rahuldave Statistics Books Probability_and_Statistics Rstats https://pinboard.in/u:rahuldave/b:1e2ce682352d/ Teaching Bayesian stats backward 2011-04-20T15:04:42+00:00 http://www.johndcook.com/blog/2011/04/20/teaching-bayesian-stats-backward/ rahuldave Statistics Bayesian Education https://pinboard.in/u:rahuldave/b:941288c31c21/ Significance testing and Congress 2011-04-14T13:55:51+00:00 http://www.johndcook.com/blog/2011/04/14/significance-testing-and-congress/ rahuldave Statistics Bayesian Probability_and_Statistics Science https://pinboard.in/u:rahuldave/b:22c91fe22f79/ How insignificant is statistical significance? 2011-04-13T20:48:56+00:00 http://www.johndcook.com/blog/2011/04/13/pericchi-statistical-significance/ rahuldave Statistics Probability_and_Statistics https://pinboard.in/u:rahuldave/b:1fdfa4260e22/ How I regard almost every empirical development or conflict paper I know 2011-04-11T13:14:05+00:00 http://chrisblattman.com/2011/04/11/how-i-regard-almost-every-empirical-development-or-conflict-paper-i-know/ rahuldave research statistics https://pinboard.in/u:rahuldave/b:149625a03fe2/ Simple approximation to normal distribution 2010-04-29T14:50:17+00:00 http://www.johndcook.com/blog/2010/04/29/simple-approximation-to-normal-distribution/ rahuldave Math Statistics Probability_and_Statistics https://pinboard.in/u:rahuldave/b:847189f70cbf/ Is R an ‘epic fail’? 2010-04-26T06:03:19+00:00 http://www.mailund.dk/index.php/2010/04/26/is-r-an-epic-fail/ rahuldave Work programming R statistics https://pinboard.in/u:rahuldave/b:a98065d24020/ Estimating the chances of something that hasn’t happened yet 2010-03-30T13:51:23+00:00 http://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/ rahuldave Statistics Bayesian Probability_and_Statistics https://pinboard.in/u:rahuldave/b:55bc56ab748b/