Wednesday, July 21, 2010

A Survival Guide about Data Mining


Once a quick surf through the abundant resources available in Togaware you can discover that there is also Data Mining Desktop Survival Guide, a useful guide for understanding the practical deployment, updating and refining, the use of algorithms and available analytical tools, applicable in the field of data mining. Of course, the guide is based on real examples of application of Rattle and then of all the typical functions of the famous open source software R.
The author, Dr. Graham Williams, a researcher and professor of mining for over 15 years, with a long series of successful applications in the mining research field and public institutions. He has taught data mining for over 10 years and has published numerous papers sullargomento. Currently he holds the post of Principal Data Miner at the Australian Taxation Office and is responsible for technical support of the biggest mining group in Australia.
The guide available online at: http://datamining.togaware.com/

Friday, July 2, 2010

Rattle: User interface for data mining with R


Rattle (R Analytical Tool To Learn Easily) the simple and logical user interface with R for data mining. This is an application for data mining, provided with a graphical Gnome environment based on the popular open-source language R. The software runs on GNU / Linux, Macintosh, OS / X and MS / Windows. The interface provides practical and intuitive tools allowing the users to easily follow every steps of the fundamental data mining process, so as to display both the R code used from time to time. Graphical tools integrated into it should be sufficient for all purposes and all necessity.
Latest release available for download at: rattle.togaware.com
In this version, lutente have the following sections:
- Data: Importing CSV to Dataset Support R; ODBC
- Exploration: Event Summary; Correlations between Characteristics; Groups Features hierarchical dendrogram
- Graphics: Box plots, histograms, CFD Benfords Law, Bar Charts, Dot plot
- Analysis of Groups: KMeans; Analysis with Hierarchical dendrogram
- Modeling: decision trees (rpart), Generalized Linear Models, Boosting, Random Forests, Support Vector Machine;
- Rating: Confusion Matrix, Risk Chart; Lift Chart, ROC curves and AUC, Accuracy, Sensitivity.

Ecological Ordination Methods


Ecological Data Analysis cannot be ignored as a very useful source information-system/information provided by the Oklahoma State University: This is the section of the university website more known to analysts as The Ordination Web Page and fully implemented by Professor Michael W. Palmer.
Ordination is one of the most widely used methods for the individuation of relationships between ecological communities, writes Michael W. Palmer in his The Ordination Web Page. This web page was designed to answer to some of the questions most frequently asked, and more particularly aimed to students and novices. Here you can find an almost unlimited number of papers in which they are described, compared, and discussed the different techniques of sorting. The website contains the most simple concepts (and paradoxically this is also more difficult to recover) ad the more complex arguments: a section dedicated to general descriptions and references to more interesting, a section dedicated exclusively to statistical methods, a section dedicated to the softwares, and a section devoted to processing and standardization of environmental data / ecological.

Wednesday, June 30, 2010

A book tells you how to use data mining for fighting the crime


Data mining can also be used to predict where and when crimes will occur most likely in the future. This is not a statement from the film Minority Report (Spielberg) but the thesis presented in the book Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis, written by Colleen McCue, senior researcher at International LRTI. In her book, McCue says that the predictive data mining can identify trends in crime and anticipate criminal events allowing the police forces to refine the internal decision processes. The same tools used to fill the supermarkets' shelves can be used to secure the community and improve the civil life if adopted by the police forces, McCue says, with some arrangements concerning some of the guidelines of data mining, and the application of specialized software tools.

Monday, June 28, 2010

APID analyis of the interactions between proteins

Agile Protein Interaction DataAnalyzer (APID), a Web-based tool for bioinformatics, developed to allows the users to make their analyses through a unified and comparative platform main currently-known information about protein-protein interactions identified using specific methods of testing small and large-scale.
At the moment, the software includes information from five main enclosed big datasets in a single database server, designed for the exploration of 40000 different proteins and nearly 150,000 different interactions.
APID also includes graphical tools for interactive visualization of sub-networks and navigating through them or the long linters Dinter network.

For more information please visit http://bioinfow.dep.usal.es/apid/index.htm

Friday, June 25, 2010

Gapminder


Gapminder is a Swedish non-profit foundation that promotes the sustainable development and the achievement of Millennium Development Goals (MDGs) through the use of statistical data on economic, social and environmental impacts both locally and nationally. We define as a "modern" museum "that helps making the world understandable, using the Internet." Gapminder World available on the website collects an impressive amount of data (population, energy, wealth, illness, work, trade, disaster ...) and lets them relate to each other on a graph or map.

R Analysis of Ecological Communities


Montana State University published on its web a section containing some extracts relating to the fundamental topics of their course Analysis of Ecological Communities.
This is a real laboratory that introduce the analysts to the deployment of R , and covers all the main analysis applied to this type of study (consider, for example, detrended Correspondence Analysis (DCA) and all instruments derived from GIS). There is a series of example data files, so as structured data files Beauty consulting traditionally used for species and plant ecological communities.
This guide allows those who are already familiar with user of R, to achieve the highest quality and - most importantly - how to apply these techniques in their research.
Some of these sections here contained:
Introduction
Introduction to the use of R
Getting Data
Loading Data Vegetation and simple graphical summary
Loading Site Data and Simple
Graphical summary
Summary tables
Species Distributions Modeling
Generalized Linear Models
Generalized additive models
Classification Trees
Ordering
Principal Component Analysis
Analysis of Principal Coordinates
Multidimensional Scaling
Correspondence Analysis and Correspondence Analysis Detrended
Cluster Analysis
Cluster Analysis
Discriminant Analysis with Classifiers for Trees