In this post we highlight some of the talks to be delivered at the Southern Data Science Conference in Atlanta on April-07-2017.
Talk Title: Community Discovery in Multi-Faceted Graphs
Speaker: Ahmed Metwally, Data Scientist at Google
A multi-faceted graph defines several facets on a set of nodes. Each facet is a set of edges that represent the relationships between the nodes in a specific context. Mining multi-faceted graphs have several applications, including finding fraudster rings that launch advertising traffic fraud attacks, tracking IP addresses of botnets over time, analyzing interactions on social networks and co-authorship of scientific papers.In this talk, we discuss NeSim, a distributed efficient clustering algorithm that does soft clustering on individual facets. NeSim employs optimizations that enhances its scalability, efficiency and the clusters quality. We also report our evaluation of the NeSim algorithm on several real and synthetic datasets, where NeSim is shown to be superior to MCL, Jarvis-Patric and Affinity-Propagation, the well-established clustering algorithms. In addition, We also discuss The MuFace framework, which allows for employing general purpose graph-clustering algorithms in a novel way to discover communities across facets. Due to the qualities of NeSim, NeSim is employed as a backbone in the MuFace framework.
Talk Title: Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Speaker: Trey Grainger, SVP of Engineering at Lucidworks
Search engines, recommendation systems, advertising networks, and even data analytics tools all share the same end goal - to deliver the most relevant information possible to meet a given information need (usually in real-time). Perfecting these systems requires algorithms which can build a deep understanding of the domains represented by the underlying data, understand the nuanced ways in which words and phrases should be parsed and interpreted within different contexts, score the relationships between arbitrary phrases and concepts, continually learn from users' context and interactions to make the system smarter, and generate custom models of personalized tastes for each user of the system. In this talk, we'll dive into both the philosophical questions associated with such systems ("how do you accurately represent and interpret the meaning of words?", "How do you prevent filter bubbles?", etc.), as well as look at practical examples of how these systems have been successfully implemented in production systems combining a variety of available commercial and open source components (inverted indexes, entity extraction, similarity scoring and machine-learned ranking, auto-generated knowledge graphs, phrase interpretation and concept expansion, etc.).
Talk Title: Data Science with Java
Speaker: Michael Brzustowicz, Machine Learning Research Scientist at Google
A good Data Scientist knows how to do something really well, but a great data scientist can do “something of everything”. From raw data all the way to shining in front of C-level executives, a great data scientist has the skills to architect data systems, build applications, perform modeling and machine learning and wrap up the results in a clear presentation that tells a story. Few programming languages can handle all of those tasks. While languages like R and python have cemented their place in the data science community, Java has yet to grab the data science mindshare it deserves. In this talk we will break down some of the barriers to mastering data science with Java and demonstrate a simple approach to statistical and machine learning algorithms.