<?xml version="1.0" encoding="UTF-8"?>
 <rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://web.resource.org/cc/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://pinboard.in">
    <title>Pinboard (rybesh)</title>
    <link>https://pinboard.in/u:rybesh/public/</link>
    <description>recent bookmarks from rybesh</description>
    <items>
      <rdf:Seq>	<rdf:li rdf:resource="https://github.com/overview/overview-server"/>
	<rdf:li rdf:resource="http://www.nactem.ac.uk/EventMine/"/>
	<rdf:li rdf:resource="http://blog.semantic-web.at/2012/09/12/state-of-the-art-text-mining-poolparty-extractor-2-1-1-released/"/>
	<rdf:li rdf:resource="http://trec-kba.org/corpus.shtml"/>
	<rdf:li rdf:resource="http://commoncrawl.org/twelve-steps-to-running-your-ruby-code-across-five-billion-web-pages/"/>
	<rdf:li rdf:resource="http://jmlr.csail.mit.edu/proceedings/papers/v17/sudhahar11a/sudhahar11a.pdf"/>
	<rdf:li rdf:resource="http://www.cs.princeton.edu/~blei/lda-c/"/>
	<rdf:li rdf:resource="http://www.inference.phy.cam.ac.uk/hmw26/crf/"/>
	<rdf:li rdf:resource="http://works.bepress.com/cgi/viewcontent.cgi?article=1026&amp;context=mireille_hildebrandt"/>
	<rdf:li rdf:resource="http://mininghumanities.com/2011/12/07/beautiful-in-shakespeare/"/>
	<rdf:li rdf:resource="http://dl.acm.org/citation.cfm?id=1600193.1600237"/>
	<rdf:li rdf:resource="http://lingpipe-blog.com/2011/05/27/price-is-right-binary-search-suffix-array-document/"/>
	<rdf:li rdf:resource="http://wikipedia-miner.sourceforge.net/index.htm"/>
	<rdf:li rdf:resource="http://aws.amazon.com/articles/5249664154115844"/>
	<rdf:li rdf:resource="http://tm.r-forge.r-project.org/"/>
	<rdf:li rdf:resource="http://www.crcnetbase.com/isbn/9781420059403"/>
      </rdf:Seq>
    </items>
  </channel><item rdf:about="https://github.com/overview/overview-server">
    <title>overview/overview-server</title>
    <dc:date>2014-07-08T22:29:26+00:00</dc:date>
    <link>https://github.com/overview/overview-server</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[The Overview Project is an open source visual document mining system. It was originally designed for investigative journalists, but is now also used for qualitative research, e-discovery, digital humanities, etc.]]></description>
<dc:subject>topicmodels infoviz textmining datamining organization inls201</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:577b2442bb84/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:topicmodels"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:infoviz"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:datamining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:organization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:inls201"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.nactem.ac.uk/EventMine/">
    <title>National Centre for Text Mining — NaCTeM — EventMine</title>
    <dc:date>2013-10-12T19:27:32+00:00</dc:date>
    <link>http://www.nactem.ac.uk/EventMine/</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[EventMine is a machine learning-based pipeline system, which extracts events from documents that already contain named entity annotations (e.g., genes/proteins, etc.). Given appropriate training data, it can be trained to extract many different types and structures of events.]]></description>
<dc:subject>textmining events</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:32c9fdf776f7/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:events"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://blog.semantic-web.at/2012/09/12/state-of-the-art-text-mining-poolparty-extractor-2-1-1-released/">
    <title>State-of-the-art Text Mining: PoolParty Extractor 2.1.1 released |The Semantic Puzzle</title>
    <dc:date>2012-09-15T20:18:57+00:00</dc:date>
    <link>http://blog.semantic-web.at/2012/09/12/state-of-the-art-text-mining-poolparty-extractor-2-1-1-released/</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[The idea behind PPX is to underpin automatic text mining algorithms with domain-specific knowledge from thesauri and linked data sources. This is the precondition to extract meaning from unstructured information more precisely and with higher performance. PoolParty Extractor supports the following application scenarios:

automatic document categorisation
named entity extraction based on concepts from thesauri or other knowledge models
text analysis to improve semantic indexing
automatic transformation of unstructured text to an RDF based linked data source
linking and enrichment of text with structured data from databases or XML-documents
extended indexing by using inflected forms of words and by splitting of compound words
generation and continuous improvement of thesauri by text corpus analysis
PoolParty Extractor can be integrated smoothly with third-party systems like CMS, DMS, communication platforms, wikis etc.]]></description>
<dc:subject>entitydetection extraction textmining linkeddata thesaurus inls520 semweb</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:9cf2a648382d/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:entitydetection"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:extraction"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:linkeddata"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:thesaurus"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:inls520"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:semweb"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://trec-kba.org/corpus.shtml">
    <title>Knowledge Base Acceleration (KBA) -- a track in NIST's TREC 2012</title>
    <dc:date>2012-03-27T13:01:23+00:00</dc:date>
    <link>http://trec-kba.org/corpus.shtml</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[The data for TREC KBA 2012 has two components: Target Entities (Filtering Queries) and Stream Corpus (Text Documents).]]></description>
<dc:subject>trec IR semweb textmining</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:b0f1b08bb5c6/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:trec"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:IR"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:semweb"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://commoncrawl.org/twelve-steps-to-running-your-ruby-code-across-five-billion-web-pages/">
    <title>Twelve steps to running your Ruby code across five billion web pages | CommonCrawl</title>
    <dc:date>2012-03-26T22:09:20+00:00</dc:date>
    <link>http://commoncrawl.org/twelve-steps-to-running-your-ruby-code-across-five-billion-web-pages/</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[A starting point to write your own Ruby algorithms to analyse the wealth of information that’s buried in the Common Crawl web archive.]]></description>
<dc:subject>ec2 hadoop web datamining textmining</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:8a34d45f4b5a/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:ec2"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:hadoop"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:web"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:datamining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://jmlr.csail.mit.edu/proceedings/papers/v17/sudhahar11a/sudhahar11a.pdf">
    <title>Automating Quantitative Narrative Analysis of News Data</title>
    <dc:date>2012-03-07T17:45:56+00:00</dc:date>
    <link>http://jmlr.csail.mit.edu/proceedings/papers/v17/sudhahar11a/sudhahar11a.pdf</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[We present a working system for large scale quantitative narrative analysis (QNA) of news corpora, which includes various recent ideas from text mining and pattern analysis in order to solve a problem arising in computational social sciences. The task is that of identifying the key actors in a body of news, and the actions they perform, so that further analysis can be carried out. This step is normally performed by hand and is very labour intensive.  We then characterise the actors by: studying their position in the overall network of actors and actions; studying the time series associated with some of their properties; generating scatter plots describing the subject/object bias of each actor; and investigating the types of actions each actor is most associated with. The system is demonstrated on a set of 100,000 articles about crime appeared on the New York Times between 1987 and 2007.  As an example, we nd that Men were most commonly responsible for crimes against the person, while Women and Children were most often victims of those crimes.]]></description>
<dc:subject>textanalysis textmining events sociology news</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:cca208ccd2c0/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textanalysis"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:events"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:sociology"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:news"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.cs.princeton.edu/~blei/lda-c/">
    <title>Latent Dirichlet Allocation in C</title>
    <dc:date>2012-03-06T19:49:03+00:00</dc:date>
    <link>http://www.cs.princeton.edu/~blei/lda-c/</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[This is a C implementation of variational EM for latent Dirichlet allocation (LDA), a topic model for text or other discrete data. LDA allows you to analyze of corpus, and extract the topics that combined to form its documents. For example, click here to see the topics estimated from a small corpus of Associated Press documents. LDA is fully described in Blei et al. (2003) .

This code contains:

an implementation of variational inference for the per-document topic proportions and per-word topic assignments
a variational EM procedure for estimating the topics and exchangeable Dirichlet hyperparameter]]></description>
<dc:subject>lda c linguistics machinelearning textanalysis textmining</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:2469cf74384a/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:lda"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:c"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:linguistics"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:machinelearning"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textanalysis"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.inference.phy.cam.ac.uk/hmw26/crf/">
    <title>Conditional Random Fields</title>
    <dc:date>2012-02-03T15:29:07+00:00</dc:date>
    <link>http://www.inference.phy.cam.ac.uk/hmw26/crf/</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data, such as sequences, trees and lattices. The underlying idea is that of defining a conditional probability distribution over label sequences given a particular observation sequence, rather than a joint distribution over both label and observation sequences. The primary advantage of CRFs over hidden Markov models is their conditional nature, resulting in the relaxation of the independence assumptions required by HMMs in order to ensure tractable inference. Additionally, CRFs avoid the label bias problem, a weakness exhibited by maximum entropy Markov models (MEMMs) and other conditional Markov models based on directed graphical models. CRFs outperform both MEMMs and HMMs on a number of real-world tasks in many fields, including bioinformatics, computational linguistics and speech recognition.]]></description>
<dc:subject>machinelearning nlp crf textmining metadata</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:25e87edcbc6e/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:machinelearning"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:nlp"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:crf"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:metadata"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://works.bepress.com/cgi/viewcontent.cgi?article=1026&amp;context=mireille_hildebrandt">
    <title>The Meaning and The Mining of Legal Texts</title>
    <dc:date>2012-01-08T23:37:36+00:00</dc:date>
    <link>http://works.bepress.com/cgi/viewcontent.cgi?article=1026&amp;context=mireille_hildebrandt</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[Positive law, inscribed in legal texts, entails an authority not inherent in literary texts, generating legal consequences that can have real effects on a person’s life and liberty. The interpretation of legal texts, necessarily a normative undertaking, resists the mechanical application of rules, though still requiring a measure of predictability, coherence with other relevant legal norms and compliance with constitutional safeguards. The present proliferation of legal texts on the internet (codes, statutes, judgments, treaties, doctrinal treatises) renders the selection of relevant texts and cases next to impossible. We may expect that systems to mine these texts to find arguments that support one’s case, as well as expert systems that support the decision-making process of courts, will end up doing much of the work.

This raises the question of the difference between human interpretation and computational pattern-recognition and the issue of whether this difference makes a difference for the meaning of law. Possibly, data mining will produce patterns that disclose habits of the minds of judges and legislators that would have otherwise gone unnoticed (reinforcing the argument of the ‘legal realists’ at the beginning of the 20th century). Also, after the data analysis it will still be up to the judge to decide how to interpret the results or up to the prosecution which patterns to engage in the construction of evidence (requiring a hermeneutics of computational patterns instead of texts). My focus in this paper regards the fact that the mining process necessarily disambiguates the legal texts in order to transform them into a machine-readable data set, while the algorithms used for the analysis embody a strategy that will co-determine the outcome of the patterns. There seems a major due process concern here to the extent that these patterns are invisible for the naked human eye and will not be contestable in a court of law, due to their hidden complexity and computational nature.

This position paper aims to explain what is at stake in the computational turn with regard to legal texts. This prepares for the question I want to put forward to those involved in distant reading and not-reading of texts: could a visualization of computational patterns constitute a new way of un-hiding the complexity involved, opening the results of computational ‘knowledge’ to citizens’ scrutiny?]]></description>
<dc:subject>textmining machinelearning visualization digitalhumanities law</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:0f19fa010aaa/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:machinelearning"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:visualization"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:digitalhumanities"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:law"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://mininghumanities.com/2011/12/07/beautiful-in-shakespeare/">
    <title>“Beautiful” in Shakespeare « Text Mining and the Digital Humanities</title>
    <dc:date>2011-12-15T19:59:07+00:00</dc:date>
    <link>http://mininghumanities.com/2011/12/07/beautiful-in-shakespeare/</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[Great, clear example of text mining using Wordseer.]]></description>
<dc:subject>digitalhumanities textmining textanalysis nlp infoviz examples</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:515bc4908744/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:digitalhumanities"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textanalysis"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:nlp"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:infoviz"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:examples"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://dl.acm.org/citation.cfm?id=1600193.1600237">
    <title>A panlingual anomalous text detector</title>
    <dc:date>2011-10-30T21:31:41+00:00</dc:date>
    <link>http://dl.acm.org/citation.cfm?id=1600193.1600237</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[In a large-scale book scanning operation, material can vary widely in language, script, genre, domain, print quality, and other factors, giving rise to a corresponding variability in the OCRed text. It is often desirable to automatically detect errorful and otherwise anomalous text segments, so that they can be filtered out or appropriately flagged, for such applications as indexing, mining, analyzing, displaying, and selectively re-processing such data. Moreover, it is advantageous to require that the automated detector be independent of the underlying OCR engine (or engines), that it work over a broad range of languages, that it seamlessly handle mixed-language material, and that it accommodate documents that contain domain-specific and otherwise rare terminology. A technique is presented that satisfies these requirements, using an adaptive mixture of character-level N-gram language models. Its design, training, implementation, and evaluation are described within the context of high-volume book scanning.]]></description>
<dc:subject>ocr textanalysis textmining evalulation</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:115cd34b297b/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:ocr"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textanalysis"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:evalulation"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://lingpipe-blog.com/2011/05/27/price-is-right-binary-search-suffix-array-document/">
    <title>Price-is-Right Binary Search (for Suffix Arrays of Documents) « LingPipe Blog</title>
    <dc:date>2011-06-01T15:53:12+00:00</dc:date>
    <link>http://lingpipe-blog.com/2011/05/27/price-is-right-binary-search-suffix-array-document/</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[Suffix arrays are useful if you’re looking for anything from plagiarized passages in a pile of writing assignments, cut-and-paste code blocks in a large project, or just commonly repeated phrases on Twitter.]]></description>
<dc:subject>search textanalysis textmining</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:0c2e4071747c/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:search"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textanalysis"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://wikipedia-miner.sourceforge.net/index.htm">
    <title>Wikipedia Miner - Home</title>
    <dc:date>2011-05-17T19:28:03+00:00</dc:date>
    <link>http://wikipedia-miner.sourceforge.net/index.htm</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[Wikipedia Miner is a toolkit for navigating and making use of the structure and content of Wikipedia. It aims to make it easy for you to integrate Wikipedia's knowledge into your own applications, by:

providing simplified, object-oriented access to Wikipedia's structure and content.
measuring how terms and concepts in Wikipedia are connected to each other.
detecting and disambiguating Wikipedia topics when they are mentioned in documents.]]></description>
<dc:subject>wikipedia textmining nlp webservices tools datamining</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:992947faf4a5/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:wikipedia"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:nlp"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:webservices"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:tools"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:datamining"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://aws.amazon.com/articles/5249664154115844">
    <title>Finding trending topics using Google Books n-grams data and Apache Hive on Elastic MapReduce : Articles &amp; Tutorials : Amazon Web Services</title>
    <dc:date>2011-02-08T17:34:08+00:00</dc:date>
    <link>http://aws.amazon.com/articles/5249664154115844</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[Finding trending topics using Google Books n-grams data and Apache Hive on Elastic MapReduce]]></description>
<dc:subject>hadoop digitalhumanities amazon cloud howto textmining tools</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:67d13d043b13/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:hadoop"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:digitalhumanities"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:amazon"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:cloud"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:howto"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:tools"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://tm.r-forge.r-project.org/">
    <title>tm - Text Mining Package</title>
    <dc:date>2010-10-19T16:33:30+00:00</dc:date>
    <link>http://tm.r-forge.r-project.org/</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[tm (shorthand for Text Mining Infrastructure in R) provides a framework for text mining applications within R.

The tm package offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. The package has integrated database backend support to minimize memory demands. An advanced meta data management is implemented for collections of text documents to alleviate the usage of large and with meta data enriched document sets.]]></description>
<dc:subject>R textmining datamining nlp tools statistics</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:9982fd6b02a5/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:R"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:datamining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:nlp"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:tools"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:statistics"/>
</rdf:Bag></taxo:topics>
</item>
<item rdf:about="http://www.crcnetbase.com/isbn/9781420059403">
    <title>CRCnetBASE - Text Mining</title>
    <dc:date>2010-09-10T18:37:51+00:00</dc:date>
    <link>http://www.crcnetbase.com/isbn/9781420059403</link>
    <dc:creator>rybesh</dc:creator><description><![CDATA[Giving a broad perspective of the field from numerous vantage points, Text Mining: Classification, Clustering, and Applications focuses on statistical methods for text mining and analysis. It examines methods to automatically cluster and classify text documents and applies these methods in a variety of areas, including adaptive information filtering, information distillation, and text search.

The book begins with chapters on the classification of documents into predefined categories. It presents state-of-the-art algorithms and their use in practice. The next chapters describe novel methods for clustering documents into groups that are not predefined. These methods seek to automatically determine topical structures that may exist in a document corpus. The book concludes by discussing various text mining applications that have significant implications for future research and industrial use.]]></description>
<dc:subject>textmining nlp</dc:subject>
<dc:source>https://pinboard.in/</dc:source>
<dc:identifier>https://pinboard.in/u:rybesh/b:972f5764d7b1/</dc:identifier>
<taxo:topics><rdf:Bag>	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:textmining"/>
	<rdf:li rdf:resource="https://pinboard.in/u:rybesh/t:nlp"/>
</rdf:Bag></taxo:topics>
</item>
</rdf:RDF>