Anairda Learning AI: Representing a dataset as a multiset

miercuri, 9 mai 2018

Representing a dataset as a multiset

Representing a dataset as a multiset is a rather new concept, but it can reveal much insight on you data.
In this post I will present the code that can put your data from a 2d representation to a 3d one.
So get 2 variables (preferably dependent, so you can see their relationship better). Also be sure that it is a set from the real world, in which some values appear more than once:

b <- data.frame(cbind(x,y))

head(b)

library(plyr)

z<-count(b, c("x", "y"))

install.packages("akima")

library(akima)

reggrid <- interp(z$x,z$y,z$freq,linear=T,extrap=F)

x.ticks <- round(reggrid$x[seq(1,length(reggrid$x),length=5)],5)

y.ticks <- round(reggrid$y[seq(1,length(reggrid$y),length=5)],5)

wireframe(reggrid$z, xlab="Var_x",ylab="Var_y", zlab="Frequency",

scales=list(x=list(labels=x.ticks), y=list(labels=y.ticks),arrows=FALSE),drape=T,colorkey=T)

As you ca see, instead of getting just some dots, you can see which records are dominant in the set, which is very relevant, as they are the major influencers on your data analysis.

Anairda Learning AI

miercuri, 9 mai 2018

Representing a dataset as a multiset

Niciun comentariu:

Trimiteți un comentariu

Why Kanban is a great way to handle AI projects?

Raportați un abuz

Etichete