miercuri, 9 mai 2018

Representing a dataset as a multiset

Representing a dataset as a multiset is a rather new concept, but it can reveal much insight on you data.
In this post I will present the code that can put your data from a 2d representation to a 3d one.
So get 2 variables (preferably dependent, so you can see their relationship better). Also be sure that it is a set from the real world, in which some values appear more than once:


b <- data.frame(cbind(x,y))
head(b)
library(plyr)
z<-count(b, c("x", "y"))
z
install.packages("akima")
library(akima)
 reggrid <- interp(z$x,z$y,z$freq,linear=T,extrap=F)
x.ticks <- round(reggrid$x[seq(1,length(reggrid$x),length=5)],5)
y.ticks <- round(reggrid$y[seq(1,length(reggrid$y),length=5)],5)
wireframe(reggrid$z, xlab="Var_x",ylab="Var_y", zlab="Frequency",
 scales=list(x=list(labels=x.ticks), y=list(labels=y.ticks),arrows=FALSE),drape=T,colorkey=T)


As you ca see, instead of getting just some dots, you can see which records are dominant in the set, which is very relevant, as they are the major influencers on your data analysis.

Niciun comentariu:

Trimiteți un comentariu

Why Kanban is a great way to handle AI projects?

So, you are setting up an AI project: you have your team, you have your task, maybe some funds. But how do you organize yourself? Will you...