collection of some nice articles
- http://www.unofficialgoogledatascience.com/2016/10/practical-advice-for-analysis-of-large.html
- http://designingcx.com/cx-journey-mapping-toolkit
- great and cheap resources
- reproducible research: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285
- https://medium.com/data-engineering/modeling-madly-8b2c72eb52be
- https://eng.uber.com/on-call-dashboard/
- https://plot.ly/products/dash/
- http://pbpython.com/effective-matplotlib.html
- https://resourcecards.com
- https://blog.bufferapp.com/53-design-terms-explained-for-marketers
- jupyter slides http://echorand.me/presentation-slides-with-jupyter-notebook.html
- mapbox and http://leafletjs.com
- https://uber.github.io/deck.gl/#/
- comcast ML pipeline https://data-artisans.com/flink-forward/resources/embedding-flink-throughout-an-operationalized-streaming-ml-lifecycle
- CQRS event sourcing drive tribe platform on flink https://data-artisans.com/flink-forward/resources/panta-rhei-designing-distributed-applications-with-streams
- spatio temporal event aggregation UBER https://data-artisans.com/flink-forward/resources/scaling-ubers-realtime-optimization-with-apache-flink
- optimization of flink at netflix https://data-artisans.com/flink-forward/resources/scaling-flink-in-cloud
- pravega https://data-artisans.com/flink-forward/resources/scaling-stream-data-pipelines
- SQL motifs patterns https://www.slideshare.net/FlinkForward/flink-forward-berlin-2018-dawid-wysakowicz-detecting-patterns-in-event-streams-with-flink-sql
- apache nifi
- apache camel
- akka + alpakka
- https://kylo.io
governance
- apache atlas
- https://github.com/linkedin/WhereHows
- https://thehftguy.com/2017/07/19/what-does-it-really-take-to-track-100-million-cell-phones/
- https://matthewrocklin.com/blog//work/2017/09/21/accelerating-geopandas-1
- jupyter notebook tricks
- https://michelleful.github.io/code-blog/2015/06/20/pipelines/
- xgboost lightgbm tuning https://medium.com/@Laurae2/getting-the-most-of-xgboost-and-lightgbm-speed-compiler-cpu-pinning-374c38d82b86
- tutorial python ml http://nbviewer.jupyter.org/github/mdeff/python_tour_of_data_science/blob/with_outputs/python_tour_of_data_science.ipynb
- advanced numpy tricks http://nbviewer.jupyter.org/github/vlad17/np-learn/blob/master/presentation.ipynb?flush_cache=true
- https://www.kaggle.com/learn/overview
- gbm https://medium.com/mlreview/gradient-boosting-from-scratch-1e317ae4587d
- https://eng.uber.com/michelangelo/
- fb-learner flow
- h2o.ai
- https://github.com/Verizon/trapezium
- http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf
- https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
- https://loads.pickle.me.uk/2016/04/04/deploying-a-scikit-learn-classifier-to-production/
- https://news.ycombinator.com/item?id=13821217
- https://www.kdnuggets.com/2016/11/moving-machine-learning-practice-production.html
- https://github.com/lyg5623/lightgbm_predict4j
- https://www.slideshare.net/xamat/10-more-lessons-learned-from-building-machine-learning-systems/12-However_It_is_not_always
- https://github.com/opendatagroup/hadrian
- https://docs.microsoft.com/de-de/azure/machine-learning/desktop-workbench/model-management-overview
- https://github.com/marcotcr/lime
- https://github.com/datascienceinc/Skater/blob/master/README.rst
- https://github.com/slundberg/shap
- https://github.com/csinva/imodels
- https://github.com/pbiecek/DALEX/
- https://github.com/EthicalML/XAI
- https://www.youtube.com/watch?v=WKAuXlsq6xw
- https://sebastianraschka.com/blog/2016/model-evaluation-selection-part3.html
- scoring process in production https://www.youtube.com/watch?v=-rGRHrED94Y
- security e2e tutorial https://github.com/albahnsen/ML_SecurityInformatics
- scoring https://www.r-bloggers.com/a-budget-of-classifier-evaluation-measures/
- imbalanced scoring http://www.kdnuggets.com/2016/08/learning-from-imbalanced-classes.html
- http://course.fast.ai/start.html
- https://github.com/astorfi/TensorFlow-World-Resources/blob/master/README.rst
- https://medium.com/towards-data-science/building-a-real-time-object-recognition-app-with-tensorflow-and-opencv-b7a2b4ebdc32
- https://medium.com/towards-data-science/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9
- http://briansp2020.github.io/2017/11/05/fast_ai_ROCm/
- densenet https://arxiv.org/abs/1608.06993
- vision
- https://fullstackml.com/wavelet-image-hash-in-python-3504fdd282b5
- scalability
- spark
- improve spark performance
- ml & pipelines
- hypothesis checkig
- etl
- streaming
- hbase
- bulk loading from spark http://www.opencore.com/blog/2016/10/efficient-bulk-load-of-hbase-using-spark/
- coprocessors
- good bad and ugly of coprocessors from bloomberg https://www.youtube.com/watch?v=9NAPLmCB2sA
- books
- covering all the basic concepts: Designing Data-Intensive Applications http://shop.oreilly.com/product/0636920032175.do
- streaming
- kafka https://www.confluent.io/wp-content/uploads/confluent-kafka-definitive-guide-complete.pdf
- security kafka https://www.confluent.io/kafka-summit-london18/kafka-as-a-service-a-tale-of-security-and-multi-tenancy
- monitoring kafka https://www.confluent.io/kafka-summit-london18/monitor-kafka-like-a-pro
- kafka for micro service communication https://github.com/confluentinc/qcon-microservices
- kafka https://www.confluent.io/wp-content/uploads/confluent-kafka-definitive-guide-complete.pdf
- mastering spark for data science - use cases end 2 end https://www.youtube.com/watch?v=B6xequGNM20&list=PLYX1a6mVbBmzZTnuB4niJHiyzEEqYsGLN&index=6
- spark akka http://www.stephenzoio.com/spark-cluster-execution-with-akka/
- spark functional programming
- spark deployment
- https://www.datacamp.com/community/tutorials/lda2vec-topic-model
- https://juliasilge.com/blog/tidy-word-vectors/
- https://www.youtube.com/watch?time_continue=23&v=-lx2shfA-5s
- https://arxiv.org/abs/1508.07909
- https://streamhacker.com/2008/12/29/how-to-train-a-nltk-chunker/
embeddings
- https://www.kdnuggets.com/2017/11/automated-feature-engineering-time-series-data.html?lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3BgzlYOgbKSVemJoxvulGFZg%3D%3D
- http://www.unofficialgoogledatascience.com/2017/07/fitting-bayesian-structural-time-series.html
- https://www.kdnuggets.com/2016/11/combining-different-methods-create-advanced-time-series-prediction.html
- https://facebookincubator.github.io/prophet/
- http://www.unofficialgoogledatascience.com/2017/04/our-quest-for-robust-time-series.html
- anomaly detection https://github.com/htm-community/flink-htm
- robfilter (r)
- https://zedoul.github.io/cbar/articles/quickstart.html
- kalman filter and big data https://www.crcpress.com/authors/news/i3194-kalman-filter-at-the-age-of-big-data-programming-in-spark-scala
- introductory book http://nbviewer.jupyter.org/github/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/table_of_contents.ipynb
- probabilistic streaming data types
- spark
- akka streams
- formatting
- https://www.kdnuggets.com/2017/11/automated-feature-engineering-time-series-data.html
- https://eng.uber.com/neural-networks/
- lambdamart / learning to rank
- https://github.com/tedgueniche/IPredict compact prediction tree
- https://www.slideshare.net/SparkSummit/finding-graph-isomorphisms-in-graphx-and-graphframes
- https://towardsdatascience.com/record-linking-with-apache-sparks-mllib-graphx-d118c5f31f83
- distributed large RDF processing http://sansa-stack.net
- c++ efficient stuff
- https://blog.insightdatascience.com/statistical-advice-for-a-b-testing-28654a24b9f0
- http://www.unofficialgoogledatascience.com/2018/01/designing-ab-tests-in-collaboration.html
- https://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186fff1d
- neo4j
- http://janusgraph.org
- http://pachyderm.io
- https://github.com/vaexio/vaex
- http://dask.pydata.org
- deploy python on yarn
- google spanner DB
- accumulo cassandra hbase http://accumulosummit.com/program/talks/comparing-accumulo-cassandra-hbase/
- https://medium.com/@rukavitsya/canvas-fingerprinting-cookies-on-steroids-253f43c7e293
- https://www.blackhat.com/docs/eu-17/materials/eu-17-Shuster-Passive-Fingerprinting-Of-HTTP2-Clients-wp.pdf
- https://www.cgal.org
- geospatial algorithms https://i11www.iti.kit.edu/teaching/sommer2013/algokartografie/index
- machine learning on source code https://github.com/src-d/awesome-machine-learning-on-source-code