TY - GEN
T1 - An incremental data-stream sketch using sparse random proj ections
AU - Menon, Aditya Krishna
AU - Pham, Gia Vinh Anh
AU - Chawla, Sanjay
AU - Viglas, Anastasios
PY - 2007
Y1 - 2007
N2 - We propose the use of random projections with a sparse matrix to maintain a sketch of a collection of high-dimensional data-streams that are updated asynchronously. This sketch allows us to estimate L2 (Euclidean) distances and dot- products with high accuracy. We verify the validity of this sketch by applying it to an online clustering problem, where we compare our results to the offline algorithm and an existing L2 sketch, and observe comparable results in terms of accuracy, and a reduced runtime cost.
AB - We propose the use of random projections with a sparse matrix to maintain a sketch of a collection of high-dimensional data-streams that are updated asynchronously. This sketch allows us to estimate L2 (Euclidean) distances and dot- products with high accuracy. We verify the validity of this sketch by applying it to an online clustering problem, where we compare our results to the offline algorithm and an existing L2 sketch, and observe comparable results in terms of accuracy, and a reduced runtime cost.
UR - http://www.scopus.com/inward/record.url?scp=70449094532&partnerID=8YFLogxK
U2 - 10.1137/1.9781611972771.62
DO - 10.1137/1.9781611972771.62
M3 - Conference contribution
AN - SCOPUS:70449094532
SN - 9780898716306
T3 - Proceedings of the 7th SIAM International Conference on Data Mining
SP - 563
EP - 568
BT - Proceedings of the 7th SIAM International Conference on Data Mining
PB - Society for Industrial and Applied Mathematics Publications
T2 - 7th SIAM International Conference on Data Mining
Y2 - 26 April 2007 through 28 April 2007
ER -