I would like to use something like dplyr's
cut_number to split a column into buckets with approximately the same number of observations, where my dataset is in a compact form where each row has a weight (number of observations).
Example data frame:
df <- data.frame(
If there were one observation of x per row, I would simply use
df$bucket <- cut_number(df$x,3) to segment
x into 3 buckets with approximately the same number of observations. But how do I take into account the fact that each row is weighted with some number of observations? I'd like to avoid splitting each row into
weight rows since the original dataframe already has millions of rows.