<?xml version="1.0" encoding="UTF-8"?>
 <rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://web.resource.org/cc/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://pinboard.in">
    <title>Pinboard (cshalizi)</title>
    <link>https://pinboard.in/u:cshalizi/public/</link>
    <description>recent bookmarks from cshalizi</description>
    <items>
      <rdf:Seq>	<rdf:li rdf:resource="https://arxiv.org/abs/2504.03503"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2502.12063"/>
	<rdf:li rdf:resource="https://www.ams.org/journals/tran/1968-133-01/S0002-9947-1968-0226281-1/S0002-9947-1968-0226281-1.pdf"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2311.03910"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1901.06230"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2305.00322"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2101.04653"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2107.07562"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2107.10885"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2107.03975"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2105.14368"/>
	<rdf:li rdf:resource="https://donskerclass.github.io/publication/automatedsolution/"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2103.03191"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2005.10224"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2101.12353"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2101.11968"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1912.06987"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1801.03437"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2101.05873"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2012.13424"/>
	<rdf:li rdf:resource="https://projecteuclid.org/euclid.ss/1608541222"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2012.08496"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1506.02785"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2011.10985"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2011.13780"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/2005.04751"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1106.6184"/>
	<rdf:li rdf:resource="https://www.sciencedirect.com/science/article/abs/pii/S1226319208000380"/>
	<rdf:li rdf:resource="https://www.tandfonline.com/doi/abs/10.1080/10485250801948625"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/math/0606591"/>
	<rdf:li rdf:resource="http://models.street-artists.org/2020/01/09/nothing-to-see-here-move-along-regression-discontinuity-edition/"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1909.05794"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1711.10414"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1906.09282"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1906.05746"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1905.12122"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1905.10409"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1904.00687"/>
	<rdf:li rdf:resource="http://proceedings.mlr.press/v80/balestriero18b.html"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1904.03673"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1904.02505"/>
	<rdf:li rdf:resource="https://sites.math.washington.edu/~rothvoss/lecturenotes/lasserresurvey.pdf"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1612.07545"/>
	<rdf:li rdf:resource="http://papers.nips.cc/paper/7945-parameters-as-interacting-particles-long-time-convergence-and-asymptotic-error-scaling-of-neural-networks"/>
	<rdf:li rdf:resource="https://papers.nips.cc/paper/3495-weighted-sums-of-random-kitchen-sinks-replacing-minimization-with-randomization-in-learning"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1806.06850"/>
	<rdf:li rdf:resource="http://www.cambridge.org/us/academic/subjects/mathematics/numerical-analysis/multivariate-approximation?format=HB&amp;WT.mc_id=LFA-MAT-CL-CMACM%2Bseries#vFK9KkeLHpmYPUkP.97"/>
	<rdf:li rdf:resource="https://colala.bcs.rochester.edu/papers/piantadosi2018one.pdf"/>
	<rdf:li rdf:resource="http://www.hup.harvard.edu/catalog.php?isbn=9780674975002"/>
	<rdf:li rdf:resource="http://link.springer.com/chapter/10.1007/978-3-319-11520-7_27"/>
	<rdf:li rdf:resource="https://arxiv.org/abs/1611.06168"/>
	<rdf:li rdf:resource="http://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_00892#.WDXdn3eZNR0"/>
	<rdf:li rdf:resource="http://arxiv.org/abs/1508.05906"/>
	<rdf:li rdf:resource="http://www.springer.com/us/book/9783319340715"/>
	<rdf:li rdf:resource="http://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_00849"/>
	<rdf:li rdf:resource="http://arxiv.org/abs/1601.00670"/>
	<rdf:li rdf:resource="http://auai.org/uai2015/proceedings/papers/143.pdf"/>
	<rdf:li rdf:resource="http://papers.nips.cc/paper/3182-random-features-for-large-scale-kernel-machines"/>
	<rdf:li rdf:resource="http://arxiv.org/abs/1404.1578"/>
	<rdf:li rdf:resource="http://projecteuclid.org/euclid.bj/1402488943"/>
	<rdf:li rdf:resource="http://www.stat.cmu.edu/~kass/papers/nonlinearFunctions.pdf"/>
	<rdf:li rdf:resource="http://arxiv.org/abs/1405.0058"/>
	<rdf:li rdf:resource="http://www.gutenberg.org/ebooks/38079"/>
	<rdf:li rdf:resource="http://cran.r-project.org/web/packages/crs/vignettes/spline_primer.pdf"/>
	<rdf:li rdf:resource="http://www.pnas.org/content/111/6/2385.abstract"/>
	<rdf:li rdf:resource="http://arxiv.org/abs/1402.1694"/>
	<rdf:li rdf:resource="http://arxiv.org/abs/1311.7286"/>
	<rdf:li rdf:resource="http://arxiv.org/abs/1310.5543"/>
	<rdf:li rdf:resource="http://arxiv.org/abs/1212.6906"/>
	<rdf:li rdf:resource="http://press.princeton.edu/titles/8624.html"/>
      </rdf:Seq>
    </items>
  </channel><item rdf:about="https://arxiv.org/abs/2504.03503">
    <title>[2504.03503] Operator Learning: A Statistical Perspective</title>
    <dc:date>2025-04-28T01:23:12+00:00</dc:date>
    <link>https://arxiv.org/abs/2504.03503</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Operator learning has emerged as a powerful tool in scientific computing for approximating mappings between infinite-dimensional function spaces. A primary application of operator learning is the development of surrogate models for the solution operators of partial differential equations (PDEs). These methods can also be used to develop black-box simulators to model system behavior from experimental data, even without a known mathematical model. In this article, we begin by formalizing operator learning as a function-to-function regression problem and review some recent developments in the field. We also discuss PDE-specific operator learning, outlining strategies for incorporating physical and mathematical constraints into architecture design and training processes. Finally, we end by highlighting key future directions such as active data collection and the development of rigorous uncertainty quantification frameworks."]]></description>
<dc:subject>to:NB approximation tewari.ambuj statistics equations_of_motion_from_a_time_series</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:ad7e303203c2/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:tewari.ambuj"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:equations_of_motion_from_a_time_series"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2502.12063">
    <title>[2502.12063] Low-Rank Thinning</title>
    <dc:date>2025-02-22T22:42:53+00:00</dc:date>
    <link>https://arxiv.org/abs/2502.12063</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic gradient training through reordering, and distinguishing distributions in near-linear time."]]></description>
<dc:subject>to:NB computational_statistics low-rank_approximation kernel_methods mackey.lester via:mraginsky approximation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:6f021998692f/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:computational_statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:low-rank_approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:kernel_methods"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:mackey.lester"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:via:mraginsky"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://www.ams.org/journals/tran/1968-133-01/S0002-9947-1968-0226281-1/S0002-9947-1968-0226281-1.pdf">
    <title>LOWER BOUNDS FOR APPROXIMATION BY NONLINEAR MANIFOLDS</title>
    <dc:date>2024-12-11T16:01:58+00:00</dc:date>
    <link>https://www.ams.org/journals/tran/1968-133-01/S0002-9947-1968-0226281-1/S0002-9947-1968-0226281-1.pdf</link>
    <dc:creator>cshalizi</dc:creator><dc:subject>to:NB geometry approximation re:codename:catherine_wheel</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:f3f9aa2eb087/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:geometry"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:re:codename:catherine_wheel"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2311.03910">
    <title>[2311.03910] Structure of universal formulas</title>
    <dc:date>2023-12-08T19:04:39+00:00</dc:date>
    <link>https://arxiv.org/abs/2311.03910</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["By universal formulas we understand parameterized analytic expressions that have a fixed complexity, but nevertheless can approximate any continuous function on a compact set. There exist various examples of such formulas, including some in the form of neural networks. In this paper we analyze the essential structural elements of these highly expressive models. We introduce a hierarchy of expressiveness classes connecting the global approximability property to the weaker property of infinite VC dimension, and prove a series of classification results for several increasingly complex functional families. In particular, we introduce a general family of polynomially-exponentially-algebraic functions that, as we prove, is subject to polynomial constraints. As a consequence, we show that fixed-size neural networks with not more than one layer of neurons having transcendental activations (e.g., sine or standard sigmoid) cannot in general approximate functions on arbitrary finite sets. On the other hand, we give examples of functional families, including two-hidden-layer neural networks, that approximate functions on arbitrary finite sets, but fail to do that on the whole domain of definition."]]></description>
<dc:subject>to:NB learning_theory neural_networks approximation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:43e65eb23777/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:learning_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:neural_networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1901.06230">
    <title>[1901.06230] Computing large market equilibria using abstractions</title>
    <dc:date>2023-06-06T01:55:01+00:00</dc:date>
    <link>https://arxiv.org/abs/1901.06230</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Computing market equilibria is an important practical problem for market design, for example in fair division of items. However, computing equilibria requires large amounts of information (typically the valuation of every buyer for every item) and computing power. We consider ameliorating these issues by applying a method used for solving complex games: constructing a coarsened abstraction of a given market, solving for the equilibrium in the abstraction, and lifting the prices and allocations back to the original market. We show how to bound important quantities such as regret, envy, Nash social welfare, Pareto optimality, and maximin share/proportionality when the abstracted prices and allocations are used in place of the real equilibrium. We then study two abstraction methods of interest for practitioners: (1) filling in unknown valuations using techniques from matrix completion, (2) reducing the problem size by aggregating groups of buyers/items into smaller numbers of representative buyers/items and solving for equilibrium in this coarsened market. We find that in real data allocations/prices that are relatively close to equilibria can be computed from even very coarse abstractions."]]></description>
<dc:subject>to_read economics optimization approximation re:in_soviet_union_optimization_problem_solves_you in_NB</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:af36dc73f9c7/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:economics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:optimization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:re:in_soviet_union_optimization_problem_solves_you"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2305.00322">
    <title>[2305.00322] Toward $L_infty$-recovery of Nonlinear Functions: A Polynomial Sample Complexity Bound for Gaussian Random Fields</title>
    <dc:date>2023-05-05T01:47:16+00:00</dc:date>
    <link>https://arxiv.org/abs/2305.00322</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Many machine learning applications require learning a function with a small worst-case error over the entire input domain, that is, the L∞-error, whereas most existing theoretical works only guarantee recovery in average errors such as the L2-error. L∞-recovery from polynomial samples is even impossible for seemingly simple function classes such as constant-norm infinite-width two-layer neural nets. This paper makes some initial steps beyond the impossibility results by leveraging the randomness in the ground-truth functions. We prove a polynomial sample complexity bound for random ground-truth functions drawn from Gaussian random fields. Our key technical novelty is to prove that the degree-k spherical harmonics components of a function from Gaussian random field cannot be spiky in that their L∞/L2 ratios are upperbounded by O(dlnk‾‾‾‾√) with high probability. In contrast, the worst-case L∞/L2 ratio for degree-k spherical harmonics is on the order of Ω(min{dk/2,kd/2})."

--- This sounds interesting, but it also seems to say that Gaussian random fields generate especially smooth functions (with high probability), casting doubt on their suitability as a general prior.  (Of course I think there's no such thing as a general
prior.)]]></description>
<dc:subject>approximation learning_theory random_fields in_NB</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:fb73aea45ac1/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:learning_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:random_fields"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2101.04653">
    <title>[2101.04653] Benchmarking Simulation-Based Inference</title>
    <dc:date>2021-11-22T15:53:03+00:00</dc:date>
    <link>https://arxiv.org/abs/2101.04653</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods. However, a public benchmark with appropriate performance metrics for such 'likelihood-free' algorithms has been lacking. This has made it difficult to compare algorithms and identify their strengths and weaknesses. We set out to fill this gap: We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms including recent approaches employing neural networks and classical Approximate Bayesian Computation methods. We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency. Neural network-based approaches generally exhibit better performance, but there is no uniformly best algorithm. We provide practical advice and highlight the potential of the benchmark to diagnose problems and improve algorithms. The results can be explored interactively on a companion website. All code is open source, making it possible to contribute further benchmark tasks and inference algorithms."

--- Kind of embarrassing to have missed this when it came out!]]></description>
<dc:subject>to:NB approximation re:codename:catherine_wheel to_read via:rvenkat</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:a1b66545b031/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:re:codename:catherine_wheel"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:via:rvenkat"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2107.07562">
    <title>[2107.07562] On universal approximation and error bounds for Fourier Neural Operators</title>
    <dc:date>2021-08-14T00:48:29+00:00</dc:date>
    <link>https://arxiv.org/abs/2107.07562</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Fourier neural operators (FNOs) have recently been proposed as an effective framework for learning operators that map between infinite-dimensional spaces. We prove that FNOs are universal, in the sense that they can approximate any continuous operator to desired accuracy. Moreover, we suggest a mechanism by which FNOs can approximate operators associated with PDEs efficiently. Explicit error bounds are derived to show that the size of the FNO, approximating operators associated with a Darcy type elliptic PDE and with the incompressible Navier-Stokes equations of fluid dynamics, only increases sub (log)-linearly in terms of the reciprocal of the error. Thus, FNOs are shown to efficiently approximate operators arising in a large class of PDEs."

--- Relation to the random-feature operator approximation [https://arxiv.org/abs/2005.10224]?]]></description>
<dc:subject>neural_networks approximation fourier_analysis</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:8fec631c17c6/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:neural_networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:fourier_analysis"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2107.10885">
    <title>[2107.10885] Laplace and Saddlepoint Approximations in High Dimensions</title>
    <dc:date>2021-07-26T15:00:11+00:00</dc:date>
    <link>https://arxiv.org/abs/2107.10885</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["We examine the behaviour of the Laplace and saddlepoint approximations in the high-dimensional setting, where the dimension of the model is allowed to increase with the number of observations. Approximations to the joint density, the marginal posterior density and the conditional density are considered. Our results show that under the mildest assumptions on the model, the error of the joint density approximation is O(p4/n) if p=o(n1/4) for the Laplace approximation and saddlepoint approximation, with improvements being possible under additional assumptions. Stronger results are obtained for the approximation to the marginal posterior density."]]></description>
<dc:subject>high-dimensional_statistics approximation reid nancy laplace_approximation statistics in_NB</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:4441b47f2d45/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:high-dimensional_statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:reid"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:nancy"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:laplace_approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2107.03975">
    <title>[2107.03975] Compressibility Analysis of Asymptotically Mean Stationary Processes</title>
    <dc:date>2021-07-11T05:55:06+00:00</dc:date>
    <link>https://arxiv.org/abs/2107.03975</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["This work provides new results for the analysis of random sequences in terms of ℓp-compressibility. The results characterize the degree in which a random sequence can be approximated by its best k-sparse version under different rates of significant coefficients (compressibility analysis). In particular, the notion of strong ℓp-characterization is introduced to denote a random sequence that has a well-defined asymptotic limit (sample-wise) of its best k-term approximation error when a fixed rate of significant coefficients is considered (fixed-rate analysis). The main theorem of this work shows that the rich family of asymptotically mean stationary (AMS) processes has a strong ℓp-characterization. Furthermore, we present results that characterize and analyze the ℓp-approximation error function for this family of processes. Adding ergodicity in the analysis of AMS processes, we introduce a theorem demonstrating that the approximation error function is constant and determined in closed-form by the stationary mean of the process. Our results and analyses contribute to the theory and understanding of discrete-time sparse processes and, on the technical side, confirm how instrumental the point-wise ergodic theorem is to determine the compressibility expression of discrete-time processes even when stationarity and ergodicity assumptions are relaxed."]]></description>
<dc:subject>to:NB stochastic_processes sparsity approximation ergodic_theory</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:485fccb044e3/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:stochastic_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:sparsity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:ergodic_theory"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2105.14368">
    <title>[2105.14368] Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation</title>
    <dc:date>2021-06-01T13:33:06+00:00</dc:date>
    <link>https://arxiv.org/abs/2105.14368</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges. However, the gap between theory and practice is gradually starting to close. In this paper I will attempt to assemble some pieces of the remarkable and still incomplete mathematical mosaic emerging from the efforts to understand the foundations of deep learning. The two key themes will be interpolation, and its sibling, over-parameterization. Interpolation corresponds to fitting data, even noisy data, exactly. Over-parameterization enables interpolation and provides flexibility to select a right interpolating model.
"As we will see, just as a physical prism separates colors mixed within a ray of light, the figurative prism of interpolation helps to disentangle generalization and optimization properties within the complex picture of modern Machine Learning. This article is written with belief and hope that clearer understanding of these issues brings us a step closer toward a general theory of deep learning and machine learning."]]></description>
<dc:subject>approximation neural_networks learning_theory belkin.mikhail statistics have_read adversarial_examples computational_statistics kernel_methods in_NB interpolation_aka_memorizing_the_training_data</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:d46689212c03/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:neural_networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:learning_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:belkin.mikhail"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:have_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:adversarial_examples"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:computational_statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:kernel_methods"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:interpolation_aka_memorizing_the_training_data"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://donskerclass.github.io/publication/automatedsolution/">
    <title>Automated Solution of Heterogeneous Agent Models | David Childers</title>
    <dc:date>2021-05-10T19:59:53+00:00</dc:date>
    <link>https://donskerclass.github.io/publication/automatedsolution/</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["In this paper I present and analyze a new linearization based method for automated solution of heterogeneous agent models with continuously distributed heterogeneity and aggregate shocks. The approach is based on representation of the model equilibrium conditions as a system of smooth functional equations in terms of endogenously time-varying distributions and decision rules. Taking the value of these functions at a set of grid points as arguments, the equilibrium conditions can then be linearized, interpolated with respect to a set of basis functions, and solved through a procedure relying on automatic differentiation and standard discrete time linear rational expectations solution algorithms. While solution approaches based on linearization of discretized or projected models have achieved substantial popularity in recent years, it has been unclear whether such approaches generate solutions which correspond to that of the true infinite dimensional model. I characterize a broad class of models and a set of regularity conditions which ensure that this is indeed the case: the solution algorithm is guaranteed to converge to the first derivative of the true infinite dimensional solution as the discretization is refined. The key conceptual result leading to these methods is a recognition that a broad variety of heterogeneous agent models can be interpreted as infinite width deep neural networks (Guss, 2017), constructed entirely by iterated composition of pointwise nonlinearities and linear integral operators along a directed acyclic computational graph. On a theoretical level, this formulation ensures commutativity of differentiation and sampling and so permits construction of approximate functional derivatives without performing direct manual calculations in infinite dimensional space. On a practical level, this permits implementation using existing fast and scalable libraries for automatic differentiation on Euclidean space while maintaining the consistency guarantees derived for solutions based on derivatives computed directly in infinite dimensional space in Childers (2018). In addition to providing precise technical conditions under which this method yields accurate representations, I provide examples and guidelines for how to formulate models to ensure that these conditions are satisfied. These conditions are shown to hold in models which possess smooth conditional densities of idiosyncratic state variables as in the class of heterogeneous agent models formalized in Arellano et al. (2016) augmented with aggregate shocks, subject to a particular choice of representation of the model equations which can be implemented by a change of variables. Convergence rates for the approximation are derived, depending on the classes of functions defining the nodes in the network and the overall network topology for a variety of choices of interpolation method including polynomials, splines, histograms, and wavelets. As a corollary, I provide the first proof of convergence of the solution method of Reiter (2009) when applied to models satisfying our conditions.The procedure is demonstrated numerically by application to a version of the incomplete markets model of Huggett (1993) with continuously distributed idiosyncratic and aggregate income risk."]]></description>
<dc:subject>to:NB approximation agent-based_models economics numerical_analysis childers.david</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:d956218cfb06/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:agent-based_models"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:economics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:numerical_analysis"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:childers.david"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2103.03191">
    <title>[2103.03191] Generalization Bounds for Sparse Random Feature Expansions</title>
    <dc:date>2021-04-21T14:53:40+00:00</dc:date>
    <link>https://arxiv.org/abs/2103.03191</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting their use for data-scarce applications or problems in scientific machine learning. This paper introduces the sparse random feature expansion to obtain parsimonious random feature models. Specifically, we leverage ideas from compressive sensing to generate random feature expansions with theoretical guarantees even in the data-scarce setting. In particular, we provide uniform bounds on the approximation error and generalization bounds for functions in a certain class (that is dense in a reproducing kernel Hilbert space) depending on the number of samples and the distribution of features. The error bounds improve with additional structural conditions, such as coordinate sparsity, compact clusters of the spectrum, or rapid spectral decay. In particular, by introducing sparse features, i.e. features with random sparse weights, we provide improved bounds for low order functions. We show that the sparse random feature expansions outperforms shallow networks in several scientific machine learning tasks."]]></description>
<dc:subject>sparsity random_features approximation learning_theory to_teach:childs_garden_of_statistical_learning_theory in_NB</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:dde4a10d084c/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:sparsity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:random_features"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:learning_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_teach:childs_garden_of_statistical_learning_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2005.10224">
    <title>[2005.10224] The Random Feature Model for Input-Output Maps between Banach Spaces</title>
    <dc:date>2021-04-20T13:21:08+00:00</dc:date>
    <link>https://arxiv.org/abs/2005.10224</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Well known to the machine learning community, the random feature model, originally introduced by Rahimi and Recht in 2008, is a parametric approximation to kernel interpolation or regression methods. It is typically used to approximate functions mapping a finite-dimensional input space to the real line. In this paper, we instead propose a methodology for use of the random feature model as a data-driven surrogate for operators that map an input Banach space to an output Banach space. Although the methodology is quite general, we consider operators defined by partial differential equations (PDEs); here, the inputs and outputs are themselves functions, with the input parameters being functions required to specify the problem, such as initial data or coefficients, and the outputs being solutions of the problem. Upon discretization, the model inherits several desirable attributes from this infinite-dimensional, function space viewpoint, including mesh-invariant approximation error with respect to the true PDE solution map and the capability to be trained at one mesh resolution and then deployed at different mesh resolutions. We view the random feature model as a non-intrusive data-driven emulator, provide a mathematical framework for its interpretation, and demonstrate its ability to efficiently and accurately approximate the nonlinear parameter-to-solution maps of two prototypical PDEs arising in physical science and engineering applications: viscous Burgers' equation and a variable coefficient elliptic equation."]]></description>
<dc:subject>random_features analysis via:rvenkat approximation in_NB have_read dynamical_systems re:codename:catherine_wheel</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:81171133ef9a/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:random_features"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:analysis"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:via:rvenkat"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:have_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:dynamical_systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:re:codename:catherine_wheel"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2101.12353">
    <title>[2101.12353] On the capacity of deep generative networks for approximating distributions</title>
    <dc:date>2021-02-04T17:39:08+00:00</dc:date>
    <link>https://arxiv.org/abs/2101.12353</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["We study the efficacy and efficiency of deep generative networks for approximating probability distributions. We prove that neural networks can transform a one-dimensional source distribution to a distribution that is arbitrarily close to a high-dimensional target distribution in Wasserstein distances. Upper bounds of the approximation error are obtained in terms of neural networks' width and depth. It is shown that the approximation error grows at most linearly on the ambient dimension and that the approximation order only depends on the intrinsic dimension of the target distribution. On the contrary, when f-divergences are used as metrics of distributions, the approximation property is different. We prove that in order to approximate the target distribution in f-divergences, the dimension of the source distribution cannot be smaller than the intrinsic dimension of the target distribution. Therefore, f-divergences are less adequate than Waserstein distances as metrics of distributions for generating samples."

--- I don't see how the last sentence could possibly follow.]]></description>
<dc:subject>to:NB approximation probability neural_networks</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:fdece6d93814/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:probability"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:neural_networks"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2101.11968">
    <title>[2101.11968] Reproducing kernel Hilbert spaces, polynomials and the classical moment problems</title>
    <dc:date>2021-02-04T17:38:12+00:00</dc:date>
    <link>https://arxiv.org/abs/2101.11968</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["We show that polynomials do not belong to the reproducing kernel Hilbert space of infinitely differentiable translation-invariant kernels whose spectral measures have moments corresponding to a determinate moment problem. Our proof is based on relating this question to the problem of best linear estimation in continuous time one-parameter regression models with a stationary error process defined by the kernel. In particular, we show that the existence of a sequence of estimators with variances converging to 0 implies that the regression function cannot be an element of the reproducing kernel Hilbert space. This question is then related to the determinacy of the Hamburger moment problem for the spectral measure corresponding to the kernel.
"In the literature it was observed that a non-vanishing constant function does not belong to the reproducing kernel Hilbert space associated with the Gaussian kernel (see Corollary 4.44 in Steinwart and Christmann, 2008). Our results provide a unifying view of this phenomenon and show that the mentioned result can be extended for arbitrary polynomials and a broad class of translation-invariant kernels."]]></description>
<dc:subject>to:NB probability approximation hilbert_space kernel_methods re:codename:catherine_wheel</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:bd951012e765/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:probability"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:hilbert_space"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:kernel_methods"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:re:codename:catherine_wheel"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1912.06987">
    <title>[1912.06987] The Generalization Error of the Minimum-norm Solutions for Over-parameterized Neural Networks</title>
    <dc:date>2021-02-04T15:34:29+00:00</dc:date>
    <link>https://arxiv.org/abs/1912.06987</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["We study the generalization properties of minimum-norm solutions for three over-parametrized machine learning models including the random feature model, the two-layer neural network model and the residual network model. We proved that for all three models, the generalization error for the minimum-norm solution is comparable to the Monte Carlo rate, up to some logarithmic terms, as long as the models are sufficiently over-parametrized."]]></description>
<dc:subject>approximation learning_theory random_features in_NB</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:9ff53c05a88d/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:learning_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:random_features"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1801.03437">
    <title>[1801.03437] Approximation beats concentration? An approximation view on inference with smooth radial kernels</title>
    <dc:date>2021-01-23T03:43:38+00:00</dc:date>
    <link>https://arxiv.org/abs/1801.03437</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Positive definite kernels and their associated Reproducing Kernel Hilbert Spaces provide a mathematically compelling and practically competitive framework for learning from data.
"In this paper we take the approximation theory point of view to explore various aspects of smooth kernels related to their inferential properties. We analyze eigenvalue decay of kernels operators and matrices, properties of eigenfunctions/eigenvectors and "Fourier" coefficients of functions in the kernel space restricted to a discrete set of data points. We also investigate the fitting capacity of kernels, giving explicit bounds on the fat shattering dimension of the balls in Reproducing Kernel Hilbert spaces. Interestingly, the same properties that make kernels very effective approximators for functions in their "native" kernel space, also limit their capacity to represent arbitrary functions. We discuss various implications, including those for gradient descent type methods.
"It is important to note that most of our bounds are measure independent. Moreover, at least in moderate dimension, the bounds for eigenvalues are much tighter than the bounds which can be obtained from the usual matrix concentration results. For example, we see that the eigenvalues of kernel matrices show nearly exponential decay with constants depending only on the kernel and the domain. We call this "approximation beats concentration" phenomenon as even when the data are sampled from a probability distribution, some of their aspects are better understood in terms of approximation theory."]]></description>
<dc:subject>to:NB kernel_methods hilbert_space approximation learning_theory belkin.mikhail to_teach:childs_garden_of_statistical_learning_theory</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:6eebabf098c0/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:kernel_methods"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:hilbert_space"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:learning_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:belkin.mikhail"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_teach:childs_garden_of_statistical_learning_theory"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2101.05873">
    <title>[2101.05873] Data-driven learning for the Mori--Zwanzig formalism: a generalization of the Koopman learning framework</title>
    <dc:date>2021-01-19T20:57:23+00:00</dc:date>
    <link>https://arxiv.org/abs/2101.05873</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["A theoretical framework which unifies the conventional Mori--Zwanzig formalism and the approximate Koopman learning is presented. In this framework, the Mori--Zwanzig formalism, developed in statistical mechanics to tackle the hard problem of construction of reduced-order dynamics for high-dimensional dynamical systems, can be considered as a natural generalization of the Koopman description of the dynamical system. We next show that similar to the approximate Koopman learning methods, data-driven methods can be developed for the Mori--Zwanzig formalism with Mori's linear projection operator. We developed two algorithms to extract the key operators, the Markov and the memory kernel, using time series of a reduced set of observables in a dynamical system. We adopted the Lorenz `96 system as a test problem and solved for the operators, which exhibit complex behaviors which are unlikely to be captured by traditional modeling approaches, in Mori--Zwanzig analysis. The nontrivial Generalized Fluctuation Dissipation relationship, which relates the memory kernel with the two-time correlation statistics of the orthogonal dynamics, was numerically verified as a validation of the solved operators. We present numerical evidence that the Generalized Langevin Equation, a key construct in the Mori--Zwanzig formalism, is more advantageous in predicting the evolution of the reduced set of observables than the conventional approximate Koopman operators."]]></description>
<dc:subject>macro_from_micro stochastic_processes approximation in_NB mori-zwanzig koopman_operators</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:5c04db814345/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:macro_from_micro"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:stochastic_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:mori-zwanzig"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:koopman_operators"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2012.13424">
    <title>[2012.13424] Homogeneous Interpretable Approximations to Heterogeneous SIR Models</title>
    <dc:date>2021-01-14T15:51:02+00:00</dc:date>
    <link>https://arxiv.org/abs/2012.13424</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["The SIR-compartment model is among the simplest models that describe the spread of a disease through a population. The model makes the unrealistic assumption that the population through which the disease is spreading is well-mixed. Although real populations have heterogeneities in contacts not represented in the SIR model, it nevertheless well fits real US state Covid-19 case data. Here we demonstrate mathematically how closely the simple continuous SIR model approximates a model which includes heterogeneous contacts, and provide insight onto how one can interpret parameters gleaned from regression in the context of heterogeneous dynamics."]]></description>
<dc:subject>to:NB epidemic_models approximation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:551b235d42e3/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:epidemic_models"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://projecteuclid.org/euclid.ss/1608541222">
    <title>Katzfuss , Guinness : A General Framework for Vecchia Approximations of Gaussian Processes</title>
    <dc:date>2020-12-21T14:09:14+00:00</dc:date>
    <link>https://projecteuclid.org/euclid.ss/1608541222</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Gaussian processes (GPs) are commonly used as models for functions, time series, and spatial fields, but they are computationally infeasible for large datasets. Focusing on the typical setting of modeling data as a GP plus an additive noise term, we propose a generalization of the Vecchia (J. Roy. Statist. Soc. Ser. B 50 (1988) 297–312) approach as a framework for GP approximations. We show that our general Vecchia approach contains many popular existing GP approximations as special cases, allowing for comparisons among the different methods within a unified framework. Representing the models by directed acyclic graphs, we determine the sparsity of the matrices necessary for inference, which leads to new insights regarding the computational properties. Based on these results, we propose a novel sparse general Vecchia approximation, which ensures computational feasibility for large spatial datasets but can lead to considerable improvements in approximation accuracy over Vecchia’s original approach. We provide several theoretical results and conduct numerical comparisons. We conclude with guidelines for the use of Vecchia approximations in spatial statistics."]]></description>
<dc:subject>to:NB approximation computational_statistics random_fields gaussian_processes stochastic_processes statistics_on_manifolds</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:21d74c2d020c/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:computational_statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:random_fields"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:gaussian_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:stochastic_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics_on_manifolds"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2012.08496">
    <title>[2012.08496] Spectral Methods for Data Science: A Statistical Perspective</title>
    <dc:date>2020-12-16T15:13:22+00:00</dc:date>
    <link>https://arxiv.org/abs/2012.08496</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data. In a nutshell, spectral methods refer to a collection of algorithms built upon the eigenvalues (resp. singular values) and eigenvectors (resp. singular vectors) of some properly designed matrices constructed from data. A diverse array of applications have been found in machine learning, data science, and signal processing. Due to their simplicity and effectiveness, spectral methods are not only used as a stand-alone estimator, but also frequently employed to initialize other more sophisticated algorithms to improve performance.
"While the studies of spectral methods can be traced back to classical matrix perturbation theory and methods of moments, the past decade has witnessed tremendous theoretical advances in demystifying their efficacy through the lens of statistical modeling, with the aid of non-asymptotic random matrix theory. This monograph aims to present a systematic, comprehensive, yet accessible introduction to spectral methods from a modern statistical perspective, highlighting their algorithmic implications in diverse large-scale applications. In particular, our exposition gravitates around several central questions that span various applications: how to characterize the sample efficiency of spectral methods in reaching a target level of statistical accuracy, and how to assess their stability in the face of random noise, missing data, and adversarial corruptions? In addition to conventional ℓ2 perturbation analysis, we present a systematic ℓ∞ and ℓ2,∞ perturbation theory for eigenspace and singular subspaces, which has only recently become available owing to a powerful "leave-one-out" analysis framework."]]></description>
<dc:subject>to:NB statistics spectral_methods data_mining approximation linear_algebra to_teach:data-mining</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:6c3188aa8c89/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:spectral_methods"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:data_mining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:linear_algebra"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_teach:data-mining"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1506.02785">
    <title>[1506.02785] On the Error of Random Fourier Features</title>
    <dc:date>2020-12-14T01:48:11+00:00</dc:date>
    <link>https://arxiv.org/abs/1506.02785</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Kernel methods give powerful, flexible, and theoretically grounded approaches to solving many problems in machine learning. The standard approach, however, requires pairwise evaluations of a kernel function, which can lead to scalability issues for very large datasets. Rahimi and Recht (2007) suggested a popular approach to handling this problem, known as random Fourier features. The quality of this approximation, however, is not well understood. We improve the uniform error bound of that paper, as well as giving novel understandings of the embedding's variance, approximation error, and use in some machine learning methods. We also point out that surprisingly, of the two main variants of those features, the more widely used is strictly higher-variance for the Gaussian kernel and has worse bounds."]]></description>
<dc:subject>random_features kernel_methods approximation computational_statistics concentration_of_measure two-sample_tests regression schneider.jeff have_read to_teach:childs_garden_of_statistical_learning_theory in_NB</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:bdadd4e91fb9/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:random_features"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:kernel_methods"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:computational_statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:concentration_of_measure"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:two-sample_tests"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:regression"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:schneider.jeff"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:have_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_teach:childs_garden_of_statistical_learning_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2011.10985">
    <title>[2011.10985] A universal probability approximation method: Markov process approach</title>
    <dc:date>2020-11-30T04:02:00+00:00</dc:date>
    <link>https://arxiv.org/abs/2011.10985</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["We view the classical Lindeberg principle in a Markov process setting to establish a universal probability approximation framework by Itô's formula and Markov semigroup. As applications, we consider approximating a family of online stochastic gradient descents (SGDs) by a stochastic differential equation (SDE) driven by additive Brownian motion, and obtain an approximation error with explicit dependence on the dimension which makes it possible to analyse high dimensional models. We also apply our framework to study stable approximation and normal approximation and obtain their optimal convergence rates (up to a logarithmic correction for normal approximation)."]]></description>
<dc:subject>markov_models convergence_of_stochastic_processes stochastic_processes stochastic_differential_equations approximation stochastic_gradient_descent in_NB</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:c3a1ccd774e2/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:markov_models"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:convergence_of_stochastic_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:stochastic_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:stochastic_differential_equations"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:stochastic_gradient_descent"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2011.13780">
    <title>[2011.13780] Rate of convergence in Trotter's approximation theorem and its applications</title>
    <dc:date>2020-11-30T03:02:09+00:00</dc:date>
    <link>https://arxiv.org/abs/2011.13780</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["The celebrated Trotter approximation theorem provides a sufficient condition for the convergence of a sequence of operator semigroups in terms of the corresponding sequence of infinitesimal generators. There exists a few results on the rate of convergence in Trotter's theorem under some constraints. In the present paper, a new rate of convergence in Trotter's theorem with full generality is given. Moreover, we see that this rate of convergence works well to obtain quantitative estimates for some limit theorems in probability theory."]]></description>
<dc:subject>to:NB markov_models convergence_of_stochastic_processes stochastic_processes re:almost_none approximation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:df4597d34e61/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:markov_models"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:convergence_of_stochastic_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:stochastic_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:re:almost_none"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/2005.04751">
    <title>[2005.04751] Tractable nonlinear memory functions as a tool to capture and explain dynamical behaviours</title>
    <dc:date>2020-11-15T21:15:24+00:00</dc:date>
    <link>https://arxiv.org/abs/2005.04751</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Mathematical approaches from dynamical systems theory are used in a range of fields. This includes biology where they are used to describe processes such as protein-protein interaction and gene regulatory networks. As such networks increase in size and complexity, detailed dynamical models become cumbersome, making them difficult to explore and decipher. This necessitates the application of simplifying and coarse graining techniques in order to derive explanatory insight. Here we demonstrate that Zwanzig-Mori projection methods can be used to arbitrarily reduce the dimensionality of dynamical networks while retaining their dynamical properties. We show that a systematic expansion around the quasi-steady state approximation allows an explicit solution for memory functions without prior knowledge of the dynamics. The approach not only preserves the same steady states but also replicates the transients of the original system. The method also correctly predicts the dynamics of multistable systems as well as networks producing sustained and damped oscillations. Applying the approach to a gene regulatory network from the vertebrate neural tube, a well characterised developmental transcriptional network, identifies features of the regulatory network responsible dfor its characteristic transient behaviour. Taken together, our analysis shows that this method is broadly applicable to multistable dynamical systems and offers a powerful and efficient approach for understanding their behaviour."]]></description>
<dc:subject>to:NB dynamical_systems approximation networks biochemical_networks non-equilibrium</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:3dff9992307a/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:dynamical_systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:biochemical_networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:non-equilibrium"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1106.6184">
    <title>[1106.6184] Non-perturbative heterogeneous mean-field approach to epidemic spreading in complex networks</title>
    <dc:date>2020-07-29T15:28:00+00:00</dc:date>
    <link>https://arxiv.org/abs/1106.6184</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Since roughly a decade ago, network science has focused among others on the problem of how the spreading of diseases depends on structural patterns. Here, we contribute to further advance our understanding of epidemic spreading processes by proposing a non-perturbative formulation of the heterogeneous mean field approach that has been commonly used in the physics literature to deal with this kind of spreading phenomena. The non-perturbative equations we propose have no assumption about the proximity of the system to the epidemic threshold, nor any linear approximation of the dynamics. In particular, we first develop a probabilistic description at the node level of the epidemic propagation for the so-called susceptible-infected-susceptible family of models, and after we derive the corresponding heterogeneous mean-field approach. We propose to use the full extension of the approach instead of pruning the expansion to first order, which leads to a non-perturbative formulation that can be solved by fixed point iteration, and used with reliability far away from the epidemic threshold to assess the prevalence of the epidemics. Our results are in close agreement with Monte Carlo simulations thus enhancing the predictive power of the classical heterogeneous mean field approach, while providing a more effective framework in terms of computational time."]]></description>
<dc:subject>to:NB approximation epidemics_on_networks re:do-institutions-evolve</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:a1dcc8726d6b/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:epidemics_on_networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:re:do-institutions-evolve"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://www.sciencedirect.com/science/article/abs/pii/S1226319208000380">
    <title>Approximating data - ScienceDirect</title>
    <dc:date>2020-05-16T18:10:40+00:00</dc:date>
    <link>https://www.sciencedirect.com/science/article/abs/pii/S1226319208000380</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["There are essentially two statistical paradigms, the Bayesian and frequentist. Despite their obvious differences the two approaches have certain points in common. In particular both are density (or likelihood) based and neither has a concept of approximation. By a concept of approximation we mean some formal admission of the fact that the statistical models are not true representations of the data. We argue that the relationship between the data and the model is a fundamental one which cannot be reduced to either diagnostics or model validation. We argue further that a concept of approximation must be formulated in a weak topology different from the strong topology of densities. For this reason there can be no density or likelihood based concept of approximation. The concept of approximation we suggest goes back to [Donoho, D. L. (1988). One-sided inference about functionals of a density. Annals of Statistics, 16, 1390–1420] and [Davies, P. L. (1995). Data features. Statistica Neerlandica, 49, 185–245] and requires ‘typical’ data sets simulated under the model ‘look like’ the real data set. This idea is developed using examples from nonparametric regression."]]></description>
<dc:subject>to:NB approximation statistics foundations_of_statistics davies.p.l. color_me_skeptical</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:d16415803a00/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:foundations_of_statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:davies.p.l."/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:color_me_skeptical"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://www.tandfonline.com/doi/abs/10.1080/10485250801948625">
    <title>Approximating data with weighted smoothing splines: Journal of Nonparametric Statistics: Vol 20, No 3</title>
    <dc:date>2020-05-16T18:09:43+00:00</dc:date>
    <link>https://www.tandfonline.com/doi/abs/10.1080/10485250801948625</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Given a data set (t i , y i ), i=1, ˙s, n with t i ∈[0, 1] non-parametric regression is concerned with the problem of specifying a suitable function f n :[0, 1]→ℝ such that the data can be reasonably approximated by the points (t i , f n (t i )), i=1, ˙s, n. If a data set exhibits large variations in local behaviour, for example large peaks as in spectroscopy data, then the method must be able to adapt to the local changes in smoothness. Whilst many methods are able to accomplish this, they are less successful at adapting derivatives. In this paper we showed how the goal of local adaptivity of the function and its first and second derivatives can be attained in a simple manner using weighted smoothing splines. A residual-based concept of approximation is used which forces local adaptivity of the regression function together with a global regularization which makes the function as smooth as possible subject to the approximation constraints."]]></description>
<dc:subject>to:NB smoothing approximation splines statistics davies.p.l.</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:c2207876cd87/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:smoothing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:splines"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:davies.p.l."/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/math/0606591">
    <title>[math/0606591] Approximation of stationary processes by Hidden Markov Models</title>
    <dc:date>2020-05-12T23:48:53+00:00</dc:date>
    <link>https://arxiv.org/abs/math/0606591</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["We aim at the construction of a Hidden Markov Model (HMM) of assigned complexity (number of states of the underlying Markov chain) which best approximates, in Kullback-Leibler divergence rate, a given stationary process. We establish, under mild conditions, the existence of the divergence rate between a stationary process and an HMM. Since in general there is no analytic expression available for this divergence rate, we approximate it with a properly defined, and easily computable, divergence between Hankel matrices, which we use as our approximation criterion. We propose a three-step algorithm, based on the Nonnegative Matrix Factorization technique, which realizes an HMM optimal with respect to the defined approximation criterion. A full theoretical analysis of the algorithm is given in the special case of Markov approximation."]]></description>
<dc:subject>to:NB approximation stochastic_processes information_theory markov_models re:AoS_project to_read</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:7ded2a998ee1/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:stochastic_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:information_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:markov_models"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:re:AoS_project"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_read"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://models.street-artists.org/2020/01/09/nothing-to-see-here-move-along-regression-discontinuity-edition/">
    <title>Nothing to see here… move along (regression discontinuity edition) | Models Of Reality</title>
    <dc:date>2020-01-12T22:06:57+00:00</dc:date>
    <link>http://models.street-artists.org/2020/01/09/nothing-to-see-here-move-along-regression-discontinuity-edition/</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA[Interesting, but surely this just speaks to the magnitude of the jump one should expect to see under the null?  I.e., if one got the null distribution sensibly, by simulation, shouldn't this effect be incorporated?]]></description>
<dc:subject>approximation causal_inference bad_data_analysis statistics via:gelman regression regression_discontinuity</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:d70c2ae82bc4/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:causal_inference"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:bad_data_analysis"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:via:gelman"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:regression"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:regression_discontinuity"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1909.05794">
    <title>[1909.05794] Stationary distributions of continuous-time Markov chains: a review of theory and truncation-based approximations</title>
    <dc:date>2019-09-13T13:19:39+00:00</dc:date>
    <link>https://arxiv.org/abs/1909.05794</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Computing the stationary distributions of a continuous-time Markov chain involves solving a set of linear equations. In most cases of interest, the number of equations is infinite or too large, and cannot be solved analytically or numerically. Several approximation schemes overcome this issue by truncating the state space to a manageable size. In this review, we first give a comprehensive theoretical account of the stationary distributions and their relation to the long-term behaviour of the Markov chain, which is readily accessible to non-experts and free of irreducibility assumptions made in standard texts. We then review truncation-based approximation schemes paying particular attention to their convergence and to the errors they introduce, and we illustrate their performance with an example of a stochastic reaction network of relevance in biology and chemistry. We conclude by elaborating on computational trade-offs associated with error control and some open questions."]]></description>
<dc:subject>to:NB markov_models stochastic_processes re:almost_none approximation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:78dca7d89c12/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:markov_models"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:stochastic_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:re:almost_none"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1711.10414">
    <title>[1711.10414] When are epsilon-nets small?</title>
    <dc:date>2019-08-08T13:06:47+00:00</dc:date>
    <link>https://arxiv.org/abs/1711.10414</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["In many interesting situations the size of epsilon-nets depends only on ϵ together with different complexity measures. The aim of this paper is to give a systematic treatment of such complexity measures arising in Discrete and Computational Geometry and Statistical Learning, and to bridge the gap between the results appearing in these two fields. As a byproduct, we obtain several new upper bounds on the sizes of epsilon-nets that generalize/improve the best known general guarantees. In particular, our results work with regimes when small epsilon-nets of size o(1ϵ) exist, which are not usually covered by standard upper bounds. Inspired by results in Statistical Learning we also give a short proof of the Haussler's upper bound on packing numbers."]]></description>
<dc:subject>to:NB learning_theory approximation to_teach:childs_garden_of_statistical_learning_theory</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:e712cdb4888a/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:learning_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_teach:childs_garden_of_statistical_learning_theory"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1906.09282">
    <title>[1906.09282] Robustness of Dynamical Quantities of Interest via Goal-Oriented Information Theory</title>
    <dc:date>2019-08-05T13:45:12+00:00</dc:date>
    <link>https://arxiv.org/abs/1906.09282</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Variational-principle-based methods that relate expectations of a quantity of interest to information-theoretic divergences have proven to be effective tools for obtaining distributional robustness bounds under both parametric and non-parametric model-form uncertainty. Here, we extend these ideas to utilize information divergences that are specifically targeted at the quantity-of-interest being investigated. Due to their goal-oriented nature, and when combined with the data processing inequality, the resulting robustness bounds are tighter and a wider class of problems can be treated. We find the method especially useful for problems involving unbounded time-horizons, a case where established information-theoretic methods typically result in trivial (infinite) bounds. Problem types that can be treated within this framework include robustness bounds on the expected value and distribution of a stopping time, time averages, and exponentially discounted quantities. We illustrate these results with several examples, including option pricing, stochastic control, semi-Markov queueing models, and expectations and distributions of hitting times."]]></description>
<dc:subject>to:NB information_theory probability stochastic_processes approximation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:56c3e1c3e7fc/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:information_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:probability"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:stochastic_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1906.05746">
    <title>[1906.05746] Nonlinear System Identification via Tensor Completion</title>
    <dc:date>2019-06-14T11:40:07+00:00</dc:date>
    <link>https://arxiv.org/abs/1906.05746</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Function approximation from input and output data pairs constitutes a fundamental problem in supervised learning. Deep neural networks are currently the most popular method for learning to mimic the input-output relationship of a generic nonlinear system, as they have proven to be very effective in approximating complex highly nonlinear functions. In this work, we propose low-rank tensor completion as an appealing alternative for modeling and learning complex nonlinear systems. We model the interactions between the N input variables and the scalar output of a system by a single N-way tensor, and setup a weighted low-rank tensor completion problem with smoothness regularization which we tackle using a block coordinate descent algorithm. We extend our method to the multi-output setting and the case of partially observed data, which cannot be readily handled by neural networks. Finally, we demonstrate the effectiveness of the approach using several regression tasks including some standard benchmarks and a challenging student grade prediction task."]]></description>
<dc:subject>to:NB approximation tensors regression computational_statistics statistics</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:7ff988587750/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:tensors"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:regression"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:computational_statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1905.12122">
    <title>[1905.12122] Deep Learning Moment Closure Approximations using Dynamic Boltzmann Distributions</title>
    <dc:date>2019-05-30T16:14:07+00:00</dc:date>
    <link>https://arxiv.org/abs/1905.12122</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["The moments of spatial probabilistic systems are often given by an infinite hierarchy of coupled differential equations. Moment closure methods are used to approximate a subset of low order moments by terminating the hierarchy at some order and replacing higher order terms with functions of lower order ones. For a given system, it is not known beforehand which closure approximation is optimal, i.e. which higher order terms are relevant in the current regime. Further, the generalization of such approximations is typically poor, as higher order corrections may become relevant over long timescales. We have developed a method to learn moment closure approximations directly from data using dynamic Boltzmann distributions (DBDs). The dynamics of the distribution are parameterized using basis functions from finite element methods, such that the approach can be applied without knowing the true dynamics of the system under consideration. We use the hierarchical architecture of deep Boltzmann machines (DBMs) with multinomial latent variables to learn closure approximations for progressively higher order spatial correlations. The learning algorithm uses a centering transformation, allowing the dynamic DBM to be trained without the need for pre-training. We demonstrate the method for a Lotka-Volterra system on a lattice, a typical example in spatial chemical reaction networks. The approach can be applied broadly to learn deep generative models in applications where infinite systems of differential equations arise."]]></description>
<dc:subject>to:NB approximation stochastic_processes mjolsness.eric moment_closures</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:84e8d18a231e/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:stochastic_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:mjolsness.eric"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:moment_closures"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1905.10409">
    <title>[1905.10409] Greedy Shallow Networks: A New Approach for Constructing and Training Neural Networks</title>
    <dc:date>2019-05-28T17:40:53+00:00</dc:date>
    <link>https://arxiv.org/abs/1905.10409</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["We present a novel greedy approach to obtain a single layer neural network approximation to a target function with the use of a ReLU activation function. In our approach we construct a shallow network by utilizing a greedy algorithm where the set of possible inner weights acts as a parametrization of the prescribed dictionary. To facilitate the greedy selection we employ an integral representation of the network, based on the ridgelet transform, that significantly reduces the cardinality of the dictionary and hence promotes feasibility of the proposed method. Our approach allows for the construction of efficient architectures which can be treated either as improved initializations to be used in place of random-based alternatives, or as fully-trained networks, thus potentially nullifying the need for training and/or calibrating based on backpropagation. Numerical experiments demonstrate the tenability of the proposed concept and its advantages compared to the classical techniques for training and constructing neural networks."

--- After reading, compare with the algorithms for shallow networks at the end of Anthony & Bartlett's book on neural network learning from the 1990s.]]></description>
<dc:subject>to:NB learning_theory approximation neural_networks your_favorite_deep_neural_network_sucks</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:99b556b19030/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:learning_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:neural_networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:your_favorite_deep_neural_network_sucks"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1904.00687">
    <title>[1904.00687] On the Power and Limitations of Random Features for Understanding Neural Networks</title>
    <dc:date>2019-05-10T03:24:29+00:00</dc:date>
    <link>https://arxiv.org/abs/1904.00687</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Recently, a spate of papers have provided positive theoretical results for training over-parameterized neural networks (where the network size is larger than what is needed to achieve low error). The key insight is that with sufficient over-parameterization, gradient-based methods will implicitly leave some components of the network relatively unchanged, so the optimization dynamics will behave as if those components are essentially fixed at their initial random values. In fact, fixing these explicitly leads to the well-known approach of learning with random features. In other words, these techniques imply that we can successfully learn with neural networks, whenever we can successfully learn with random features. In this paper, we first review these techniques, providing a simple and self-contained analysis for one-hidden-layer networks. We then argue that despite the impressive positive results, random feature approaches are also inherently limited in what they can explain. In particular, we rigorously show that random features cannot be used to learn even a single ReLU neuron with standard Gaussian inputs, unless the network size (or magnitude of the weights) is exponentially large. Since a single neuron is learnable with gradient-based methods, we conclude that we are still far from a satisfying general explanation for the empirical success of neural networks."]]></description>
<dc:subject>to:NB neural_networks learning_theory optimization approximation your_favorite_deep_neural_network_sucks to_teach:childs_garden_of_statistical_learning_theory random_features</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:dfb4828711a7/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:neural_networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:learning_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:optimization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:your_favorite_deep_neural_network_sucks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_teach:childs_garden_of_statistical_learning_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:random_features"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://proceedings.mlr.press/v80/balestriero18b.html">
    <title>A Spline Theory of Deep Learning</title>
    <dc:date>2019-04-29T17:48:57+00:00</dc:date>
    <link>http://proceedings.mlr.press/v80/balestriero18b.html</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs), which provide a powerful portal through which to view and analyze their inner workings. For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classification performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space opens up a new geometric avenue to study how DNs organize signals in a hierarchical fashion. As an application, we develop and validate a new distance metric for signals that quantifies the difference between their partition encodings."]]></description>
<dc:subject>to:NB to_read approximation splines neural_networks machine_learning your_favorite_deep_neural_network_sucks via:csantos</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:9b98002244e0/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:splines"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:neural_networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:machine_learning"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:your_favorite_deep_neural_network_sucks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:via:csantos"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1904.03673">
    <title>[1904.03673] Every Local Minimum is a Global Minimum of an Induced Model</title>
    <dc:date>2019-04-11T00:08:43+00:00</dc:date>
    <link>https://arxiv.org/abs/1904.03673</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["For non-convex optimization in machine learning, this paper proves that every local minimum achieves the global optimality of the perturbable gradient basis model at any differentiable point. As a result, non-convex machine learning is theoretically as supported as convex machine learning with a hand-crafted basis in terms of the loss at differentiable local minima, except in the case when a preference is given to the hand-crafted basis over the perturbable gradient basis. The proofs of these results are derived under mild assumptions. Accordingly, the proven results are directly applicable to many machine learning models, including practical deep neural networks, without any modification of practical methods. Furthermore, as special cases of our general results, this paper improves or complements several state-of-the-art theoretical results in the literature with a simple and unified proof technique."]]></description>
<dc:subject>to:NB optimization approximation neural_networks kaelbling.leslie_pack</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:e8419d151be9/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:optimization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:neural_networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:kaelbling.leslie_pack"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1904.02505">
    <title>[1904.02505] A deterministic and computable Bernstein-von Mises theorem</title>
    <dc:date>2019-04-08T21:52:23+00:00</dc:date>
    <link>https://arxiv.org/abs/1904.02505</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Bernstein-von Mises results (BvM) establish that the Laplace approximation is asymptotically correct in the large-data limit. However, these results are inappropriate for computational purposes since they only hold over most, and not all, datasets and involve hard-to-estimate constants. In this article, I present a new BvM theorem which bounds the Kullback-Leibler (KL) divergence between a fixed log-concave density f(θ) and its Laplace approximation. The bound goes to 0 as the higher-derivatives of f(θ) tend to 0 and f(θ) becomes increasingly Gaussian. The classical BvM theorem in the IID large-data asymptote is recovered as a corollary. 
"Critically, this theorem further suggests a number of computable approximations of the KL divergence with the most promising being:
KL(gLAP,f)≈12Varθ∼g(θ)(log[f(θ)]−log[gLAP(θ)])
An empirical investigation of these bounds in the logistic classification model reveals that these approximations are great surrogates for the KL divergence. This result, and future results of a similar nature, could provide a path towards rigorously controlling the error due to the Laplace approximation and more modern approximation methods."]]></description>
<dc:subject>statistics approximation laplace_approximation probability in_NB</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:53a67d69a1cf/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:laplace_approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:probability"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://sites.math.washington.edu/~rothvoss/lecturenotes/lasserresurvey.pdf">
    <title>The Lasserre Hierarch in Approximation Algorithms</title>
    <dc:date>2019-04-08T19:15:16+00:00</dc:date>
    <link>https://sites.math.washington.edu/~rothvoss/lecturenotes/lasserresurvey.pdf</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["The Lasserre hierarchy is a systematic procedure to strengthen a relaxation for
an optimization problem by adding additional variables and SDP constraints. In
the last years this hierarchy moved into the focus of researchers in approximation algorithms as the obtain relaxations have provably nice properties. In particular on the t -th level, the relaxation can be solved in time n^O(t) and every constraint that one could derive from looking just at t variables is automatically satisfied. Additionally, it provides a vector embedding of events so that probabilities are expressable as inner products.
"The goal of these lecture notes is to give short but rigorous proofs of all key properties of the Lasserre hierarchy. In the second part we will demonstrate how the Lasserre SDP can be applied to (mostly NP-hard) optimization problems such as KNAPSACK, MATCHING, MAXCUT (in general and in dense graphs), 3-COLORING and SETCOVER."

--- I remember Cris trying to explain this to me once...]]></description>
<dc:subject>to:NB to_read approximation optimization re:in_soviet_union_optimization_problem_solves_you</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:7e7dfef7ce00/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:optimization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:re:in_soviet_union_optimization_problem_solves_you"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1612.07545">
    <title>[1612.07545] A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search</title>
    <dc:date>2019-01-05T15:11:58+00:00</dc:date>
    <link>https://arxiv.org/abs/1612.07545</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Approximate Nearest Neighbor Search (ANNS) is a fundamental problem in many areas of machine learning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every proposed algorithm claims outperform other state-of-the-art hashing methods. However, the evaluation of these hashing papers was not thorough enough, and those claims should be re-examined. The ultimate goal of an ANNS method is returning the most accurate answers (nearest neighbors) in the shortest time. If implemented correctly, almost all the hashing methods will have their performance improved as the code length increases. However, many existing hashing papers only report the performance with the code length shorter than 128. In this paper, we carefully revisit the problem of search with a hash index, and analyze the pros and cons of two popular hash index search procedures. Then we proposed a very simple but effective two level index structures and make a thorough comparison of eleven popular hashing algorithms. Surprisingly, the random-projection-based Locality Sensitive Hashing (LSH) is the best performed algorithm, which is in contradiction to the claims in all the other ten hashing papers. Despite the extreme simplicity of random-projection-based LSH, our results show that the capability of this algorithm has been far underestimated. For the sake of reproducibility, all the codes used in the paper are released on GitHub, which can be used as a testing platform for a fair comparison between various hashing algorithms."]]></description>
<dc:subject>data_mining approximation nearest_neighbors locality-sensitive_hashing hashing have_read via:vaguery random_projections k-means databases in_NB</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:6e194513aac5/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:data_mining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:nearest_neighbors"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:locality-sensitive_hashing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:hashing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:have_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:via:vaguery"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:random_projections"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:k-means"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:databases"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://papers.nips.cc/paper/7945-parameters-as-interacting-particles-long-time-convergence-and-asymptotic-error-scaling-of-neural-networks">
    <title>Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks</title>
    <dc:date>2018-12-12T01:59:50+00:00</dc:date>
    <link>http://papers.nips.cc/paper/7945-parameters-as-interacting-particles-long-time-convergence-and-asymptotic-error-scaling-of-neural-networks</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. We show that in the limit that the number of parameters n is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales as O(n−1). In this limit, we prove a dynamical variant of the universal approximation theorem showing that the optimal representation can be attained by stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning. In the asymptotic regime, we study the fluctuations around the optimal representation and show that they arise at a scale O(n−1). These fluctuations in the landscape identify the natural scale for the noise in stochastic gradient descent. Our results apply to both single and multi-layer neural networks, as well as standard kernel methods like radial basis functions."

!!!]]></description>
<dc:subject>to:NB to_read neural_networks approximation learning_theory via:rvenkat</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:42b892a61018/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:neural_networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:learning_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:via:rvenkat"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://papers.nips.cc/paper/3495-weighted-sums-of-random-kitchen-sinks-replacing-minimization-with-randomization-in-learning">
    <title>Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning</title>
    <dc:date>2018-09-13T19:40:00+00:00</dc:date>
    <link>https://papers.nips.cc/paper/3495-weighted-sums-of-random-kitchen-sinks-replacing-minimization-with-randomization-in-learning</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Randomized neural networks are immortalized in this AI Koan: In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. What are you doing?'' asked Minsky. I am training a randomly wired neural net to play tic-tac-toe,'' Sussman replied. Why is the net wired randomly?'' asked Minsky. Sussman replied, I do not want it to have any preconceptions of how to play.'' Minsky then shut his eyes. Why do you close your eyes?'' Sussman asked his teacher. So that the room will be empty,'' replied Minsky. At that moment, Sussman was enlightened. We analyze shallow random networks with the help of concentration of measure inequalities. Specifically, we consider architectures that compute a weighted sum of their inputs after passing them through a bank of arbitrary randomized nonlinearities. We identify conditions under which these networks exhibit good classification performance, and bound their test error in terms of the size of the dataset and the number of random nonlinearities."

--- Have I never bookmarked this before?]]></description>
<dc:subject>approximation kernel_methods random_projections statistics prediction classifiers rahimi.ali recht.benjamin machine_learning have_read in_NB</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:9d11b9bca403/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:kernel_methods"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:random_projections"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:prediction"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:classifiers"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:rahimi.ali"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:recht.benjamin"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:machine_learning"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:have_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1806.06850">
    <title>[1806.06850] Polynomial Regression As an Alternative to Neural Nets</title>
    <dc:date>2018-07-05T13:06:38+00:00</dc:date>
    <link>https://arxiv.org/abs/1806.06850</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Despite the success of neural networks (NNs), there is still a concern among many over their "black box" nature. Why do they work? Here we present a simple analytic argument that NNs are in fact essentially polynomial regression models. This view will have various implications for NNs, e.g. providing an explanation for why convergence problems arise in NNs, and it gives rough guidance on avoiding overfitting. In addition, we use this phenomenon to predict and confirm a multicollinearity property of NNs not previously reported in the literature. Most importantly, given this loose correspondence, one may choose to routinely use polynomial models instead of NNs, thus avoiding some major problems of the latter, such as having to set many tuning parameters and dealing with convergence issues. We present a number of empirical results; in each case, the accuracy of the polynomial approach matches or exceeds that of NN approaches. A many-featured, open-source software package, polyreg, is available."

--- Matloff is the author of my favorite "R programming for n00bs" textbook...
--- ETA after reading: the argument that multi-layer neural networks "are essentially" polynomial regression is a bit weak.  It would be true, exactly, if activation functions were exactly polynomial, which however they rarely are in practice.  If non-polynomial activations happen to be implemented in computational practice by polynomials (e.g., Taylor approximations), well, either we get different hardware or we crank up the degree of approximation as much as we like.  (Said a little differently, if you buy this line of argument, you should buy that _every_ smooth statistical model "is essentially" polynomial regression, which seems a bit much.)  It is, also, an argument about the function-approximation properties of the model classes, and not the fitting processes, despite the explicit disclaimers.]]></description>
<dc:subject>to:NB your_favorite_deep_neural_network_sucks regression neural_networks statistics matloff.norman approximation computational_statistics have_read</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:4138e391c13c/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:your_favorite_deep_neural_network_sucks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:regression"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:neural_networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:matloff.norman"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:computational_statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:have_read"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.cambridge.org/us/academic/subjects/mathematics/numerical-analysis/multivariate-approximation?format=HB&amp;WT.mc_id=LFA-MAT-CL-CMACM%2Bseries#vFK9KkeLHpmYPUkP.97">
    <title>Multivariate approximation | Numerical analysis | Cambridge University Press</title>
    <dc:date>2018-06-23T16:40:46+00:00</dc:date>
    <link>http://www.cambridge.org/us/academic/subjects/mathematics/numerical-analysis/multivariate-approximation?format=HB&amp;WT.mc_id=LFA-MAT-CL-CMACM%2Bseries#vFK9KkeLHpmYPUkP.97</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["This self-contained, systematic treatment of multivariate approximation begins with classical linear approximation, and moves on to contemporary nonlinear approximation. It covers substantial new developments in the linear approximation theory of classes with mixed smoothness, and shows how it is directly related to deep problems in other areas of mathematics. For example, numerical integration of these classes is closely related to discrepancy theory and to nonlinear approximation with respect to special redundant dictionaries, and estimates of the entropy numbers of classes with mixed smoothness are closely related to (in some cases equivalent to) the Small Ball Problem from probability theory. The useful background material included in the book makes it accessible to graduate students. Researchers will find that the many open problems in the theory outlined in the book provide helpful directions and guidance for their own research in this exciting and active area."]]></description>
<dc:subject>to:NB approximation mathematics books:noted</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:14f5244484ab/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:mathematics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:books:noted"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://colala.bcs.rochester.edu/papers/piantadosi2018one.pdf">
    <title>One Parameter Is Always Enough</title>
    <dc:date>2018-05-29T15:01:35+00:00</dc:date>
    <link>https://colala.bcs.rochester.edu/papers/piantadosi2018one.pdf</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA[This is very cute, and deserves some attention.
Being me, I'll grump a bit.  As he says at the end, this is just an elaboration of the point that we get a class of _infinite_ VC dimension by thresholding sin(kx), i.e., we can correctly classify any arbitrarily large set of points by tweaking k appropriately.  Since this was known in the 1970s (proved by V&C themselves, if memory serves), it's been kind of insane that people continue to count parameters and pat themselves on the back about Occam.  (See http://bactra.org/weblog/921.html et seq. for more.)  Still, the elephant is nice.]]></description>
<dc:subject>model_selection approximation dynamical_systems chaos have_read to:blog</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:89db40e79b3b/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:model_selection"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:dynamical_systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:chaos"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:have_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:blog"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.hup.harvard.edu/catalog.php?isbn=9780674975002">
    <title>As If — Kwame Anthony Appiah | Harvard University Press</title>
    <dc:date>2017-09-07T16:03:33+00:00</dc:date>
    <link>http://www.hup.harvard.edu/catalog.php?isbn=9780674975002</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Idealization is a fundamental feature of human thought. We build simplified models in our scientific research and utopias in our political imaginations. Concepts like belief, desire, reason, and justice are bound up with idealizations and ideals. Life is a constant adjustment between the models we make and the realities we encounter. In idealizing, we proceed “as if” our representations were true, while knowing they are not. This is not a dangerous or distracting occupation, Kwame Anthony Appiah shows. Our best chance of understanding nature, society, and ourselves is to open our minds to a plurality of imperfect depictions that together allow us to manage and interpret our world.
"The philosopher Hans Vaihinger first delineated the “as if” impulse at the turn of the twentieth century, drawing on Kant, who argued that rational agency required us to act as if we were free. Appiah extends this strategy to examples across philosophy and the human and natural sciences. In a broad range of activities, we have some notion of the truth yet continue with theories that we recognize are, strictly speaking, false. From this vantage point, Appiah demonstrates that a picture one knows to be unreal can be a vehicle for accessing reality.
"As If explores how strategic untruth plays a critical role in far-flung areas of inquiry: decision theory, psychology, natural science, and political philosophy. A polymath who writes with mainstream clarity, Appiah defends the centrality of the imagination not just in the arts but in science, morality, and everyday life."]]></description>
<dc:subject>books:noted philosophy epistemology philosophy_of_science approximation modeling idealization in_NB</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:d992f3c0970c/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:books:noted"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:philosophy"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:epistemology"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:philosophy_of_science"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:modeling"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:idealization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://link.springer.com/chapter/10.1007/978-3-319-11520-7_27">
    <title>A Novel Algorithm for Coarse-Graining of Cellular Automata - Springer</title>
    <dc:date>2016-12-09T16:11:04+00:00</dc:date>
    <link>http://link.springer.com/chapter/10.1007/978-3-319-11520-7_27</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["The coarse-graining is an approximation procedure widely used for simplification of mathematical and numerical models of multiscale systems. It reduces superfluous – microscopic – degrees of freedom. Israeli and Goldenfeld demonstrated in [1,2] that the coarse-graining can be employed for elementary cellular automata (CA), producing interesting interdependences between them. However, extending their investigation on more complex CA rules appeared to be impossible due to the high computational complexity of the coarse-graining algorithm. We demonstrate here that this complexity can be substantially decreased. It allows for scrutinizing much broader class of cellular automata in terms of their coarse graining. By using our algorithm we found out that the ratio of the numbers of elementary CAs having coarse grained representation to “degenerate” – irreducible – cellular automata, strongly increases with increasing the “grain” size of the approximation procedure. This rises principal questions about the formal limits in modeling of realistic multiscale systems."]]></description>
<dc:subject>to:NB macro_from_micro approximation cellular_automata via:?</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:b8c0580f7c68/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:macro_from_micro"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:cellular_automata"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:via:?"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="https://arxiv.org/abs/1611.06168">
    <title>[1611.06168] On $p$-values</title>
    <dc:date>2016-12-04T18:43:08+00:00</dc:date>
    <link>https://arxiv.org/abs/1611.06168</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Models are consistently treated as approximations and all procedures are consistent with this. They do not treat the model as being true. In this context p-values are one measure of approximation, a small p-value indicating a poor approximation. Approximation regions are defined and distinguished from confidence regions."]]></description>
<dc:subject>to:NB statistics misspecification approximation p-values hypothesis_testing via:vaguery</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:e527ca5df465/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:misspecification"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:p-values"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:hypothesis_testing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:via:vaguery"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_00892#.WDXdn3eZNR0">
    <title>A Universal Approximation Theorem for Mixture-of-Experts Models | Neural Computation | MIT Press Journals</title>
    <dc:date>2016-11-23T18:22:31+00:00</dc:date>
    <link>http://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_00892#.WDXdn3eZNR0</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["The mixture-of-experts (MoE) model is a popular neural network architecture for nonlinear regression and classification. The class of MoE mean functions is known to be uniformly convergent to any unknown target function, assuming that the target function is from a Sobolev space that is sufficiently differentiable and that the domain of estimation is a compact unit hypercube. We provide an alternative result, which shows that the class of MoE mean functions is dense in the class of all continuous functions over arbitrary compact domains of estimation. Our result can be viewed as a universal approximation theorem for MoE models. The theorem we present allows MoE users to be confident in applying such models for estimation when data arise from nonlinear and nondifferentiable generative processes."]]></description>
<dc:subject>to:NB ensemble_methods statistics graphical_models approximation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:9633ee3eda5c/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:ensemble_methods"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:graphical_models"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://arxiv.org/abs/1508.05906">
    <title>[1508.05906] Chaining, Interpolation, and Convexity</title>
    <dc:date>2016-09-07T14:53:37+00:00</dc:date>
    <link>http://arxiv.org/abs/1508.05906</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["We show that classical chaining bounds on the suprema of random processes in terms of entropy numbers can be systematically improved when the underlying set is convex: the entropy numbers need not be computed for the entire set, but only for certain "thin" subsets. This phenomenon arises from the observation that real interpolation can be used as a natural chaining mechanism. Unlike the general form of Talagrand's generic chaining method, which is sharp but often difficult to use, the resulting bounds involve only entropy numbers but are nonetheless sharp in many situations in which classical entropy bounds are suboptimal. Such bounds are readily amenable to explicit computations in specific examples, and we discover some old and new geometric principles for the control of chaining functionals as special cases."]]></description>
<dc:subject>empirical_processes learning_theory approximation convexity functional_analysis van_handel.ramon in_NB to_teach:childs_garden_of_statistical_learning_theory</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:724678402496/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:empirical_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:learning_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:convexity"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:functional_analysis"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:van_handel.ramon"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_teach:childs_garden_of_statistical_learning_theory"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.springer.com/us/book/9783319340715">
    <title>Approximation Methods in Probability Theory | Vydas Čekanavičius | Springer</title>
    <dc:date>2016-07-08T15:19:22+00:00</dc:date>
    <link>http://www.springer.com/us/book/9783319340715</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["This book presents a wide range of well-known and less common methods used for estimating the accuracy of probabilistic approximations, including the Esseen type inversion formulas, the Stein method as well as the methods of convolutions and triangle function. Emphasising the correct usage of the methods presented, each step required for the proofs is examined in detail. As a result, this textbook provides valuable tools for proving approximation theorems.
"While Approximation Methods in Probability Theory will appeal to everyone interested in limit theorems of probability theory, the book is particularly aimed at graduate students who have completed a standard intermediate course in probability theory. Furthermore, experienced researchers wanting to enlarge their toolkit will also find this book useful."]]></description>
<dc:subject>to:NB probability approximation mathematics convergence_of_stochastic_processes</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:ed9139eebdd4/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:probability"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:mathematics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:convergence_of_stochastic_processes"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_00849">
    <title>A Single Hidden Layer Feedforward Network with Only One Neuron in the Hidden Layer Can Approximate Any Univariate Function</title>
    <dc:date>2016-06-28T17:54:23+00:00</dc:date>
    <link>http://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_00849</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["The possibility of approximating a continuous function on a compact subset of the real line by a feedforward single hidden layer neural network with a sigmoidal activation function has been studied in many papers. Such networks can approximate an arbitrary continuous function provided that an unlimited number of neurons in a hidden layer is permitted. In this note, we consider constructive approximation on any finite interval of $\mathbb{R}$ by neural networks with only one neuron in the hidden layer. We construct algorithmically a smooth, sigmoidal, almost monotone activation function $\sigma$ providing approximation to an arbitrary continuous function within any degree of accuracy. This algorithm is implemented in a computer program, which computes the value of $\sigma$ at any reasonable point of the real axis."]]></description>
<dc:subject>to:NB to_read neural_networks approximation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:ef55378a6207/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:neural_networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://arxiv.org/abs/1601.00670">
    <title>[1601.00670] Variational Inference: A Review for Statisticians</title>
    <dc:date>2016-02-09T03:02:29+00:00</dc:date>
    <link>http://arxiv.org/abs/1601.00670</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["One of the core problems of modern statistics is to approximate difficult-to-compute probability distributions. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation about the posterior. In this paper, we review variational inference (VI), a method from machine learning that approximates probability distributions through optimization. VI has been used in myriad applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of distributions and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this widely-used class of algorithms."]]></description>
<dc:subject>to:NB variational_inference statistics computational_statistics approximation blei.david</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:a48bd78ff1e2/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:variational_inference"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:computational_statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:blei.david"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://auai.org/uai2015/proceedings/papers/143.pdf">
    <title>Importance Sampling Over Sets: A New Probabilistic Inference Scheme</title>
    <dc:date>2015-07-15T14:15:21+00:00</dc:date>
    <link>http://auai.org/uai2015/proceedings/papers/143.pdf</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Computing expectations in high-dimensional spaces is a key challenge in probabilistic infer- ence and machine learning. Monte Carlo sam- pling, and importance sampling in particular, is one of the leading approaches. We propose a generalized importance sampling scheme based on randomly selecting (exponentially large) sub- sets of states rather than individual ones. By col- lecting a small number of extreme states in the sampled sets, we obtain estimates of statistics of interest, such as the partition function of an undirected graphical model. We incorporate this idea into a novel maximum likelihood learning algorithm based on cutting planes. We demon- strate empirically that our scheme provides accu- rate answers and scales to problems with up to a million variables."

--- Applicable to the fitness-breeding scheme?]]></description>
<dc:subject>to:NB computational_statistics monte_carlo statistics approximation re:fitness_sampling</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:da2b78d00cf4/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:computational_statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:monte_carlo"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:re:fitness_sampling"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://papers.nips.cc/paper/3182-random-features-for-large-scale-kernel-machines">
    <title>Random Features for Large-Scale Kernel Machines</title>
    <dc:date>2015-07-15T14:07:00+00:00</dc:date>
    <link>http://papers.nips.cc/paper/3182-random-features-for-large-scale-kernel-machines</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["To accelerate the training of kernel machines, we propose to map the input data to a randomized low-dimensional feature space and then apply existing fast linear methods. The features are designed so that the inner products of the transformed data are approximately equal to those in the feature space of a user specified shift- invariant kernel. We explore two sets of random features, provide convergence bounds on their ability to approximate various radial basis kernels, and show that in large-scale classification and regression tasks linear machine learning al- gorithms applied to these features outperform state-of-the-art large-scale kernel machines."]]></description>
<dc:subject>machine_learning computational_statistics approximation fourier_analysis kernel_methods statistics re:hyperbolic_networks in_NB have_read random_features</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:dfa4fe79abe7/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:machine_learning"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:computational_statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:fourier_analysis"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:kernel_methods"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:re:hyperbolic_networks"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:have_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:random_features"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://arxiv.org/abs/1404.1578">
    <title>[1404.1578] Models as Approximations: How Random Predictors and Model Violations Invalidate Classical Inference in Regression</title>
    <dc:date>2015-02-01T13:43:35+00:00</dc:date>
    <link>http://arxiv.org/abs/1404.1578</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["We review and interpret the early insights of Halbert White who over thirty years ago inaugurated a form of statistical inference for regression models that is asymptotically correct even under "model misspecification," that is, under the assumption that models are approximations rather than generative truths. This form of inference, which is pervasive in econometrics, relies on the "sandwich estimator" of standard error. Whereas linear models theory in statistics assumes models to be true and predictors to be fixed, White's theory permits models to be approximate and predictors to be random. Careful reading of his work shows that the deepest consequences for statistical inference arise from a synergy --- a "conspiracy" --- of nonlinearity and randomness of the predictors which invalidates the ancillarity argument that justifies conditioning on the predictors when they are random. Unlike the standard error of linear models theory, the sandwich estimator provides asymptotically correct inference in the presence of both nonlinearity and heteroskedasticity. An asymptotic comparison of the two types of standard error shows that discrepancies between them can be of arbitrary magnitude. If there exist discrepancies, standard errors from linear models theory are usually too liberal even though occasionally they can be too conservative as well. A valid alternative to the sandwich estimator is provided by the "pairs bootstrap"; in fact, the sandwich estimator can be shown to be a limiting case of the pairs bootstrap. We conclude by giving meaning to regression slopes when the linear model is an approximation rather than a truth. --- In this review we limit ourselves to linear least squares regression, but many qualitative insights hold for most forms of regression."

-- Very close to what I teach in my class, though I haven't really talked about sandwich variances.]]></description>
<dc:subject>have_read statistics regression linear_regression bootstrap misspecification estimation approximation in_NB</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:f929c523e284/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:have_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:regression"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:linear_regression"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:bootstrap"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:misspecification"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:estimation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://projecteuclid.org/euclid.bj/1402488943">
    <title>Dehling , Durieu , Tusche : Approximating class approach for empirical processes of dependent sequences indexed by functions</title>
    <dc:date>2015-01-24T14:13:31+00:00</dc:date>
    <link>http://projecteuclid.org/euclid.bj/1402488943</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["We study weak convergence of empirical processes of dependent data (Xi)i≥0, indexed by classes of functions. Our results are especially suitable for data arising from dynamical systems and Markov chains, where the central limit theorem for partial sums of observables is commonly derived via the spectral gap technique. We are specifically interested in situations where the index class  is different from the class of functions f for which we have good properties of the observables (f(Xi))i≥0. We introduce a new bracketing number to measure the size of the index class  which fits this setting. Our results apply to the empirical process of data (Xi)i≥0 satisfying a multiple mixing condition. This includes dynamical systems and Markov chains, if the Perron–Frobenius operator or the Markov operator has a spectral gap, but also extends beyond this class, for example, to ergodic torus automorphisms."]]></description>
<dc:subject>empirical_processes approximation stochastic_processes markov_models dynamical_systems ergodic_theory mixing in_NB</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:b57eb1847383/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:empirical_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:stochastic_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:markov_models"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:dynamical_systems"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:ergodic_theory"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:mixing"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.stat.cmu.edu/~kass/papers/nonlinearFunctions.pdf">
    <title>Fully Exponential Laplace Approximations to Expectations and Variances of Nonpositive Functions (Tierny, Kass and Kadane, 1989)</title>
    <dc:date>2015-01-24T00:36:04+00:00</dc:date>
    <link>http://www.stat.cmu.edu/~kass/papers/nonlinearFunctions.pdf</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA[The un-numbered equation defining $A_K$, between (2.3) and (2.4), is wrong --- the O(1/n) terms are off by various powers of $\sigma^2$.  However, both (2.3) and (2.4) are right...]]></description>
<dc:subject>in_NB have_read laplace_approximation statistics approximation kith_and_kin kass.robert re:HEAS</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:46598bd52c5d/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:have_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:laplace_approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:kith_and_kin"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:kass.robert"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:re:HEAS"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://arxiv.org/abs/1405.0058">
    <title>[1405.0058] Underestimating extreme events in power-law behavior due to machine-dependent cutoffs</title>
    <dc:date>2015-01-23T13:45:39+00:00</dc:date>
    <link>http://arxiv.org/abs/1405.0058</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Power-law distributions are typical macroscopic features occurring in almost all complex systems observable in nature. As a result, researchers in quantitative analyses must often generate random synthetic variates obeying power-law distributions. The task is usually performed through standard methods that map uniform random variates into the desired probability space. Whereas all these algorithms are theoretically solid, in this paper we show that they are subject to severe machine-dependent limitations. As a result, two dramatic consequences arise: (i) the sampling in the tail of the distribution is not random but deterministic; (ii) the moments of the sample distribution, which are theoretically expected to diverge as functions of the sample sizes, converge instead to finite values. We provide quantitative indications for the range of distribution parameters that can be safely handled by standard libraries used in computational analyses. Whereas our findings indicate possible reinterpretations of numerical results obtained through flawed sampling methodologies, they also pave the way for the search for a concrete solution to this central issue shared by all quantitative sciences dealing with complexity."]]></description>
<dc:subject>to:NB to_read heavy_tails approximation computational_statistics have_skimmed</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:3a0596bb0021/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:heavy_tails"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:computational_statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:have_skimmed"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.gutenberg.org/ebooks/38079">
    <title>Orders of Infinity by G. H. Hardy - Free Ebook</title>
    <dc:date>2014-10-21T22:32:05+00:00</dc:date>
    <link>http://www.gutenberg.org/ebooks/38079</link>
    <dc:creator>cshalizi</dc:creator><dc:subject>to:NB books:recommended mathematics analysis asymptotics approximation hardy.g.h. have_skimmed via:teddy_s.</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:819925514c0d/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:books:recommended"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:mathematics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:analysis"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:asymptotics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:hardy.g.h."/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:have_skimmed"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:via:teddy_s."/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://cran.r-project.org/web/packages/crs/vignettes/spline_primer.pdf">
    <title>A Primer on Regression Splines</title>
    <dc:date>2014-05-02T15:27:58+00:00</dc:date>
    <link>http://cran.r-project.org/web/packages/crs/vignettes/spline_primer.pdf</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["B-splines constitute an appealing method for the nonparametric estimation of a range of statis- tical objects of interest. In this primer we focus our attention on the estimation of a conditional mean, i.e. the ‘regression function’."]]></description>
<dc:subject>splines nonparametrics regression approximation statistics computational_statistics racine.jeffrey_s. to_teach:statcomp to_teach:undergrad-ADA have_read in_NB</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:c6d32a51d6f0/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:splines"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:nonparametrics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:regression"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:computational_statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:racine.jeffrey_s."/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_teach:statcomp"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_teach:undergrad-ADA"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:have_read"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:in_NB"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.pnas.org/content/111/6/2385.abstract">
    <title>Distortion of genealogical properties when the sample is very large</title>
    <dc:date>2014-03-11T21:00:33+00:00</dc:date>
    <link>http://www.pnas.org/content/111/6/2385.abstract</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Study sample sizes in human genetics are growing rapidly, and in due course it will become routine to analyze samples with hundreds of thousands, if not millions, of individuals. In addition to posing computational challenges, such large sample sizes call for carefully reexamining the theoretical foundation underlying commonly used analytical tools. Here, we study the accuracy of the coalescent, a central model for studying the ancestry of a sample of individuals. The coalescent arises as a limit of a large class of random mating models, and it is an accurate approximation to the original model provided that the population size is sufficiently larger than the sample size. We develop a method for performing exact computation in the discrete-time Wright–Fisher (DTWF) model and compare several key genealogical quantities of interest with the coalescent predictions. For recently inferred demographic scenarios, we find that there are a significant number of multiple- and simultaneous-merger events under the DTWF model, which are absent in the coalescent by construction. Furthermore, for large sample sizes, there are noticeable differences in the expected number of rare variants between the coalescent and the DTWF model. To balance the trade-off between accuracy and computational efficiency, we propose a hybrid algorithm that uses the DTWF model for the recent past and the coalescent for the more distant past. Our results demonstrate that the hybrid method with only a handful of generations of the DTWF model leads to a frequency spectrum that is quite close to the prediction of the full DTWF model."]]></description>
<dc:subject>to:NB stochastic_processes approximation historical_genetics statistics</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:06eb14984d8a/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:stochastic_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:historical_genetics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://arxiv.org/abs/1402.1694">
    <title>[1402.1694] Asymptotically Exact MCMC Algorithms via Local Approximations of Computationally Intensive Models</title>
    <dc:date>2014-02-21T00:28:45+00:00</dc:date>
    <link>http://arxiv.org/abs/1402.1694</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["We construct a new framework for accelerating MCMC algorithms for sampling from posterior distributions in the context of computationally intensive models. We proceed by constructing local surrogates of the forward model within the Metropolis-Hastings kernel, borrowing ideas from deterministic approximation theory, optimization, and experimental design. Our work departs from previous work in surrogate-based inference by exploiting useful convergence characteristics of local approximations. We prove the ergodicity of our approximate Markov chain and show that it samples asymptotically from the \emph{exact} posterior distribution of interest. We describe variations of the algorithm that construct either local polynomial approximations or Gaussian process regressors, thus spanning two important classes of surrogate models. Our theoretical results reinforce the key observation underlying this paper: when the likelihood has some \emph{local} regularity, the number of model evaluations per MCMC step can be greatly reduced, without incurring significant bias in the Monte Carlo average. Our numerical experiments demonstrate order-of-magnitude reductions in the number of forward model evaluations used in representative ODE or PDE inference problems, in both real and synthetic data examples."]]></description>
<dc:subject>to:NB monte_carlo statistics approximation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:b733c3ea2120/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:monte_carlo"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://arxiv.org/abs/1311.7286">
    <title>[1311.7286] Approximate Bayesian Computation with composite score functions</title>
    <dc:date>2013-12-16T15:22:23+00:00</dc:date>
    <link>http://arxiv.org/abs/1311.7286</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Both Approximate Bayesian Computation (ABC) and composite likelihood methods are useful for Bayesian and frequentist inference when the likelihood function is intractable. We show that composite likelihoods score functions can be fruitfully used as automatic informative summary statistics in ABC in order to obtain accurate approximations to the posterior distribution of the parameter of interest. This is formally motivated by the use of the score function of the full likelihood, and extended to general unbiased estimating functions in complex models. Examples illustrate that the proposed ABC procedure can significantly improve upon usual ABC methods based on ordinary data summaries."]]></description>
<dc:subject>to:NB approximation likelihood statistics estimation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:2557276541a9/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:likelihood"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:estimation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://arxiv.org/abs/1310.5543">
    <title>[1310.5543] Universalities of Reproducing Kernels Revisited</title>
    <dc:date>2013-10-23T14:10:33+00:00</dc:date>
    <link>http://arxiv.org/abs/1310.5543</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["Kernel methods have been widely applied to machine learning and other questions of approximating an unknown function from its finite sample data. To ensure arbitrary accuracy of such approximation, various denseness conditions are imposed on the selected kernel. This note contributes to the study of universal, characteristic, and C0-universal kernels. We first give simple and direct description of the difference and relation among these three kinds of universalities of kernels. We then focus on translation-invariant and weighted polynomial kernels. A simple and shorter proof of the known characterization of characteristic translation-invariant kernels will be presented. The main purpose of the note is to give a delicate discussion on the universalities of weighted polynomial kernels."]]></description>
<dc:subject>to:NB kernel_methods hilbert_space approximation learning_theory</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:c450e5f20ae1/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:kernel_methods"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:hilbert_space"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:learning_theory"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://arxiv.org/abs/1212.6906">
    <title>[1212.6906] Central Limit Theorems and Multiplier Bootstrap when p is much larger than n</title>
    <dc:date>2013-09-09T03:36:11+00:00</dc:date>
    <link>http://arxiv.org/abs/1212.6906</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["We derive a central limit theorem for the maximum of a sum of high dimensional random vectors. Specifically, we establish conditions under which the distribution of the maximum is approximated by that of the maximum of a sum of the Gaussian random vectors with the same covariance matrices as the original vectors. The key innovation of this result is that it applies even when the dimension of random vectors (p) is large compared to the sample size (n); in fact, p can be much larger than n. We also show that the distribution of the maximum of a sum of the random vectors with unknown covariance matrices can be consistently estimated by the distribution of the maximum of a sum of the conditional Gaussian random vectors obtained by multiplying the original vectors with i.i.d. Gaussian multipliers. This is the multiplier bootstrap procedure. Here too, p can be large or even much larger than n. These distributional approximations, either Gaussian or conditional Gaussian, yield a high-quality approximation to the distribution of the original maximum, often with approximation error decreasing polynomially in the sample size, and hence are of interest in many applications. We demonstrate how our central limit theorem and the multiplier bootstrap can be used for high dimensional estimation, multiple hypothesis testing, and adaptive specification testing. All these results contain non-asymptotic bounds on approximation errors. "

- For the reading group this week.]]></description>
<dc:subject>to:NB empirical_processes stochastic_processes approximation central_limit_theorem bootstrap statistics probability</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:038f7e73d1e5/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to:NB"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:empirical_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:stochastic_processes"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:central_limit_theorem"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:bootstrap"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:statistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:probability"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://press.princeton.edu/titles/8624.html">
    <title>Bryant, J. and Sangwin, C.: How Round Is Your Circle? Where Engineering and Mathematics Meet.</title>
    <dc:date>2013-05-10T21:29:58+00:00</dc:date>
    <link>http://press.princeton.edu/titles/8624.html</link>
    <dc:creator>cshalizi</dc:creator><description><![CDATA["How do you draw a straight line? How do you determine if a circle is really round? These may sound like simple or even trivial mathematical problems, but to an engineer the answers can mean the difference between success and failure. How Round Is Your Circle? invites readers to explore many of the same fundamental questions that working engineers deal with every day--it's challenging, hands-on, and fun.
"John Bryant and Chris Sangwin illustrate how physical models are created from abstract mathematical ones. Using elementary geometry and trigonometry, they guide readers through paper-and-pencil reconstructions of mathematical problems and show them how to construct actual physical models themselves--directions included. It's an effective and entertaining way to explain how applied mathematics and engineering work together to solve problems, everything from keeping a piston aligned in its cylinder to ensuring that automotive driveshafts rotate smoothly. Intriguingly, checking the roundness of a manufactured object is trickier than one might think. When does the width of a saw blade affect an engineer's calculations--or, for that matter, the width of a physical line? When does a measurement need to be exact and when will an approximation suffice? Bryant and Sangwin tackle questions like these and enliven their discussions with many fascinating highlights from engineering history. Generously illustrated, How Round Is Your Circle? reveals some of the hidden complexities in everyday things."]]></description>
<dc:subject>books:noted approximation geometry engineering modeling to_teach books:owned</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:cshalizi/b:9b2ded06b1c9/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:books:noted"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:approximation"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:geometry"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:engineering"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:modeling"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:to_teach"/>
	<rdf:li rdf:resource="https://pinboard.in/u:cshalizi/t:books:owned"/>
</rdf:Bag></taxo:topics>
</item>
</rdf:RDF>