Top Performers in Data Science


Forum for Top Performance course users in Data Science
 
HomeHome  UsergroupsUsergroups  RegisterRegister  Log in  

View unanswered posts
 Forum TopicsPosts
Last Posts

Data Science Resources

 
No new posts

Data Science Blogs & Websites


Collection of data science resources: blogs, websites, online books (please post links to books that are available online).
Moderator: Moderators
413Blogs and websit...
Sat Oct 31, 2015 8:44 pm
Ellen Koenig View latest post
No new posts

Data Science Textbooks


Collection of data science text books.
12Data Science Tex...
Thu Oct 29, 2015 8:03 pm
darkruby501 View latest post
No new posts

Data Science Software Resources


Data Science Software resources. While it's important to code your own implementations to gain a deeper understanding of the algorithm, we are most likely going to use publicly available implementations that have been heavily optimized and tested. Please aggregate software resources you've found from github or other websites with a description. Please make a new post for each software resources so that potential users can ask questions.
00
No new posts

Data Science Papers


There are thousands of papers in data science (a few dozen are submitted to arXiv daily) so lets try not to overload. Please only post papers that are particularly meaningful, provide good summaries of the field, you wish to discuss or are relevant to your project.
2250 Years of Data...
Sat Nov 07, 2015 2:47 pm
Diran View latest post
No new posts

Data Sets


Data sets are critically important, probably every single project we design in this course will include applying some data science algorithms to some data set. There are many publicly available data sets covering an enormous set topics. Please post links and information about public data sets so that we can use them to create cool projects. Additionally in your post or title highlight the size of the dataset so that people can quickly determine whether or not they computational resources to handle the dataset. I've made the decision to split data sets into three groups Small: 0 - 32 GBs (Fits in RAM) Mid: 32 GBs - 4 TBs (Fits in main memory) Large: 4+ TBs (Larger than main memory) On the assumption that most people will be using commodity hardware. With a typical computer with 32 GB RAM and 4 TB of main memory these sizes represent divisions at which practitioners will need to completely change the algorithmic approach to analysing their data. I recognize that some people may decide to use Amazon Web Services for their calculations or have access to commodity servers, high-end computers or small clusters. In that case these bounds aren't correct but the division still stands, data sets which cannot fit in RAM or main memory will have to be treated very differently. Please make a new post for each independent data set so that potential users can ask questions.
45ClueWeb09 Datase...
Fri Oct 30, 2015 11:34 pm
Admin View latest post
.No new posts

Mid-Range Data Sets

Mid-Range Data Sets
Out of core data sets (32 GB - 20 TBs). Data sets larger than 32 GBs cannot fit in main memory, as a result we need special algorithms to deal with data sets. Please aggregate data sets of this size here. I'm assuming commodity hardware, if anyone has access to a server then these data set sizes are completely irrelevant.
00
.No new posts

Large Data sets


Data sets larger than 20 TB will probably need multiple computers and make use of message passing interfaces. If anyone find data of this size please post them here.
11ClueWeb09 Datase...
Fri Oct 30, 2015 11:34 pm
Admin View latest post
.No new posts

Small Data Sets


This is to aggregate data sets that are small enough to fit in RAM: less than 32 GB in size.
34NLP Data Sets?...
Thu Oct 29, 2015 12:18 am
Admin View latest post
No new posts

Misc

Miscellaneous topics, anything not covered above
221Introduce yourse...
Wed Nov 04, 2015 4:44 pm
justkeepswimming View latest post
.No new posts

Introductions


Hey everyone, I know we already posted our background in the top performance facebook groups, but that thread is very long. If you have time please introduce yourself and post a short summary of your interest in data science, what you've done and where you hope to go. We're all at different parts in the field, we might find that some people have already overcome the issues we now face or are confronted by similar problems.
118Introduce yourse...
Wed Nov 04, 2015 4:44 pm
justkeepswimming View latest post
.No new posts

Suggestions


Any concerns, desired changes please post.
13Thread to share ...
Wed Nov 04, 2015 10:18 am
Yoori Choe View latest post
No new posts

Coding


Coding, implementation problems, anything that's more software development than data science
23Nvidia's Introdu...
Thu Oct 29, 2015 10:50 pm
Yoori Choe View latest post
.No new posts

GPU Programming


GPU Programming is increasingly important and is critical for deep learning. Please post any tutorials, resources, comments, questions, about GPU programming here.
23Nvidia's Introdu...
Thu Oct 29, 2015 10:50 pm
Yoori Choe View latest post
Today's active topics
Today's top 20 posters
Overall top 20 posters
Delete the forum cookies
Who is online?
Who is online?Our users have posted a total of 46 messages
We have 28 registered users
The newest registered user is MalcolmA
In total there is 1 user online :: 0 Registered, 0 Hidden and 1 Guest
Most users ever online was 8 on Wed Nov 04, 2015 8:09 pm

Registered Users: None
No users have a birthday today
No users are having a birthday in the upcoming 7 days
Legend :  [ Moderators ]

New postsNew postsNo new postsNo new posts  Forum is lockedForum is locked