SDM 2014 Workshop on Exploratory Data Analysis

SDM 2014 Workshop on Exploratory Data Analysis

Time and Place

April 26th, 2014, 08:30 - 17:00.

Sheraton Society Hill Hotel, Philadelphia, PA, USA


Remco Chang Tufts University
Jaegul Choo Georgia Institute of Technology
Zhicheng "Leo" Liu Adobe Research
Haesun Park Georgia Institute of Technology


Goals and Topics of Interest

The data mining and machine learning research often tackles the problem in a fully automated manner while the other side of research community, such as human-computer interaction and information visualization, does not fully take advantage of the recent advancements in state-of-the-art data mining techniques.

The primary goal of the workshop is to fill this gap by bringing together researchers from both sides. The workshop should provide an opportunity to discuss and explore ways to harmonize the power of data mining techniques and human-in-the-loop exploratory nature of data analysis.

Topics include

  • Interactive data mining algorithms. Most data mining algorithms work in a fully automated manner. How can these algorithms provide user interactions so that they can properly incorporate the user intention? How can they provide meaningful interpretation about the algorithm results?
  • Visualizations for interactive data mining. One of the most effective ways to facilitate exploratory analysis is visualization of data and the output of data mining algorithms. Data usually contain multiple types of information, and the data mining techniques also involves complex outputs. How can we effectively deliver such information to human via visualization?
  • Visual analytics systems based on data mining techniques. Integrating visualizations and various interactions that support user intentions into a mature visual analytics system is crucial. How can data mining techniques be effectively integrated into visual analytics systems?
  • Any-time data mining algorithms. To provide smooth interactions with data mining techniques, it is crucial that they should give the results at any time that humans want. How can we accelerate data mining algorithms and what aspects should we exploit to serve this purpose?
  • Speedup vs. accuracy tradeoff in visual analytics. How can we effectively exchange reduction in the accuracy of a visual analytics procedure for a significant computational speedup. What are the practical and theoretical aspects of this issue?
  • Fundamental limits and theory. What are the important theoretical issues in visual analytics scalability? What is the impact of the curse of dimensionality on visual analytics? Are there any fundamental laws of diminishing returns?
  • Visual analytics on limited computational platforms. What strategies and algorithms can be employed when large problems are scaled to small machines such as the iPhone, iPad, and mobile phones? What are the strengths and weaknesses of mobile operating systems such as Windows Mobile, Android, and iPhone OS? How can we design a flexible environment that scales up to megapixel displays and down to hand­ helds?

Other than these topics, the workshop will broadly cover other related directions such as

  • Demonstrations of interactive data mining.
  • On-line algorithms.
  • Adaptive stream mining algorithms.
  • Active learning / mining.

Planned Activities

We intend to have four sessions in the workshop (one day). The two sessions in the morning will contain eight invited talks, each 25-minute long. The third session after an hour lunch break will contain three more invited talks, each 25-minute long, followed by quick poster presentations, each 5-minute long. During the break in the morning and the afternoon, workshop audiences will be able to look into posters and interact with poster presenters. All posters will be based on contributed submissions. The organizers will review the submissions and select high-quality submissions for poster presentation. We plan to encourage participation by graduate students and junior researchers. It is anticipated that there will be a large number of submissions from a wide range of communities, especially since we plan to solicit submissions from the NSF FODAVA teams, DARPA XDATA teams, and other related groups, such as DHS Center of Excellence affiliated universities.

Finally, the last session will be panel discussion. The panel discussion will focus on challenges in integrating data mining/machine learning and InfoVis/HCI for big data. The panelists will be recruited from experts in the field, including the invited speakers. One or more of the organizers will moderate the panel. Questions will be taken from the audience and the moderator and panelists will be free to contribute additional questions.

Submission Information

We call for posters relevant to the above-described topics. The submissions must be in PDF, written in English, no more than 4 pages long — 2 pages or just an abstract would be fine — and formatted using the SIAM SODA macro ( All the submissions must be made electronically at

Key Dates

01/10/2014: Submission Due
01/31/2014: Author Notification
02/10/2014: Camera-ready

Workshop Schedule

Exploratory Data Analysis
Saturday, April 26, 2014
Welcome: 8:30am
Session Title Speakers (Affiliation) Talk Title
Session 1: 8:30am - 10:00am Invited Talks I (Chair: Jaegul Choo) Carlos Scheidegger (AT&T Labs) RCloud and the Gap between EDA, Collaboration, and Deployment
Lisa Singh (Georgetown Univ.) Improving Data Exploration through the Use of Interactive Workflows and Visual Analytics
John Stasko (Georgia Tech) The Value of Visualization for Exploring and Understanding Data
Break: 10:00am - 10:30am
Session 2: 10:30am - 12:30pm Invited Talks II (Chair: TBD) Jieping Ye (Arizona State Univ.) Multi-Source Feature Learning
Fei Sha (Univ. of Southern California) New Methods for Learning to Cluster
Polo Chau (Georgia Tech) Scalable,Interactive, and Comprehensible Tools for Data Analytics
Tamara Munzner (Univ. of British Columbia) Dimensionality Reduction from Three Angles
Lunch: 12:30pm - 2:00pm
Session 3: 2:00pm - 3:00pm Invited Talks III (Chair: TBD) Klaus Mueller (Stony Brook Univ.) Model-Driven Visual Analytics
Srinivasan Parthasarathy (Ohio State Univ.) On the Roles of Visualization within Data Analytics
Break: 3:00pm - 3:30pm
Session 4: 3:30pm - 4:00pm Poster Presentation (Oral) (Chair: Ramakrishnan Kannan) Adam Perer Exploratory Data Analysis for Frequent Sequence Mining
Axel Soto, Ryan Kiros, Vlado Keselj and Evangelos Milios In-depth Interactive Visual Exploration for Bridging Unstructured and Structured Document Content
Alexios Kotsifakos Case Study: Model-based vs Distance-based Search in Time Series Databases
Di Wang, Xiaoqin Zhang, Tang Tang and Mingyu Fan Hierarchical Mixing Linear SVMs based on Rademacher Complexity
Maria Barouti, Daniel Keren, Jacob Kogan and Yaakov Malinovsky Adaptive Clustering for Monitoring Distributed Data Streams
Session 5: 4:00pm - 4:45pm Panel Discussion (Chair: Adam Perer) Fei Sha (Univ. of Southern California), Klaus Mueller (Stony Brook Univ.), Srinivasan Parthasarathy (Ohio State Univ.), and Tamara Munzner (Univ. of British Columbia) Integration of data mining with exploratory analysis: Challenges and Opportunities