r - kmeans clustering on the basis of fixed number of variables out of all variables -
i beginner in r , data analysis.i have data-set of around 2500 rows 7 columns .i want cluster data-set 15 centers on basis of first 2 columns(keeping other columns intact clustered-data-set.
i need display clustered data-set sorted on basis of third column.
can me required syntax ? let csv file name locdata.csv , first 2 columns "lat" , "lon" , third column "date".
this should there.
first create dataset (alternatively, import csv file):
set.seed(1) df <- data.frame(matrix(rnorm(n=10000, mean=10, sd=20), ncol=8)) names(df)[1:3] <- c("lat", "lon", "date") # use df <- read.csv(..) instead load file require(dplyr) cluster.df <- select(df, lat, lon) # select columns cluster on km <- kmeans(cluster.df, 15)
next can extract clusters, using fact kmeans retains original order:
# extract clusters , add them original data frame df$cluster = km$cluster # sort on whatever column prefer df %>% arrange(date, cluster)
Comments
Post a Comment