Review of Python Courses (Part 13)
Posted by Mark on December 29, 2020 at 07:48 | Last modified: February 2, 2021 11:44In Part 12, I summarized my Datacamp courses 35-37. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #38 was Web Scraping in Python. This gets complicated with some objected-oriented stuff that still throws me for a loop (no pun intended). I don’t think I will be using this anytime soon so I skimmed it in this review:
- Web scraping with Python
- HyperText Markup Language (HTML)
- HTML tags and attributes
- Crash course X
- Off the beaten XPath
- Introduction to the scrapy Selector (from scrapy import Selector)
- “Inspecting the HTML”
- CSS locators
- Attribute and text selection
- Getting ready to crawl
- Scraping for reals
- A classy spider (from scrapy.crawler import CrawlerProcess)
- A request for service
- Move your bloomin’ parse
- Capstone
>
My course #39 was Working with the Class System in Python. Like #38, this gets thick. The course covers:
- Intro to Object Oriented Programming (OOP) in Python
- Introduction to NumPy internals
- Introduction to objects and classes
- Deep dive on classes
- __Init__ializing a class
- Methods in classes
- Working with a dataset to create dataframes
- Renaming columns and the five-figure summary
- OOP best practices
- Inheritance: is-a versus has-a
- Inheritance with DataShells
- Composition
- Wrapping up OOP
>
My course #40 was Sentiment Analysis in Python. This course covers:
- What is sentiment analysis?
- Sentiment analysis types and approaches (from textblob import TextBlob)
- Let’s build a word cloud (from wordcloud import WordCloud)!
- Bag-of-words (from sklearn.feature_extraction.text import CountVectorizer)
- Getting granular with n-grams
- Build new features from text (from nltk import word_tokenize)
- Can you guess the language (from langdetect import detect_langs)?
- Stop words (from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS)
- Capturing a token pattern [.isalpha(), .isdigit(), .isalnum()]
- Stemming and lemmatization (from nltk.stem import PorterStemmer, WordNetLemmatizer)
- TfIdf: more ways to transform text (from sklearn.feature_extraction.text import TfidfVectorizer)
- Let’s predict the sentiment (from sklearn.linear_model import LogisticRegression)!
- Did we really predict the sentiment well (from sklearn.metrics import accuracy_score, confusion_matrix)?
- Logistic regression: revisited
- Bringing it all together
>
I will review more classes next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 12)
Posted by Mark on December 24, 2020 at 07:39 | Last modified: February 1, 2021 15:43In Part 11, I summarized my Datacamp courses 31-34. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #35 was Intermediate Data Visualization with Seaborn. This course covers:
- Introduction to Seaborn [histogram vs. sns.distplot()]
- Using the distribution plot
- Regression plots in Seaborn [sns.regplot(), sns.lmplot()]
- Using Seaborn styles [sns.set_style(), sns.despine()]
- Colors in Seaborn
- Customizing with matplotlib (using Axes)
- Categorical plot types
- Regression plots [sns.regplot(), sns.residplot()]
- Matrix plots [sns.heatmap(pd.crosstab())]
- Using FacetGrid, factorplot, lmplot
- Using PairGrid and pairplot
- Using JointGrid and jointplot
- Selecting Seaborn plots
>
My course #36 was Introduction to Data Visualization with Seaborn (taking #35 before this was an oversight on my part, but everything ended up okay). This course covers:
- Introduction to Seaborn
- Using pandas with Seaborn
- Adding a third variable with hue
- Introduction to relational plots and subplots
- Customizing scatter plots
- Introduction to line plots
- Count plots and bar plots [sns.catplot()]
- Creating a box plot
- Point plots
- Changing plot style and color
- Adding titles and labels (FacetGrid vs. AxesSubplot)
>
My course #37 was Unsupervised Learning in Python. This course covers:
- Unsupervised learning (from sklearn.cluster import KMeans)
- Evaluating a clustering
- Transforming features for better clustering (from sklearn.preprocessing import StandardScaler)
- Visualizing hierarchies (from scipy.cluster.hierarchy import linkage, dendrogram)
- Cluster labels in hierarchical clustering
- t-SNE for 2-dimensional maps (from sklearn.manifold import TSNE)
- Visualizing the PCA transformation (from sklearn.decomposition import PCA)
- Intrinsic dimension
- Dimension reduction with PCA (from sklearn.decomposition import TruncatedSVD)
- Non-negative matrix factorization (NMF) (from sklearn.decomposition import NMF)
- NMF learns interpretable parts
- Building recommender systems using NMF (From sklearn.preprocessing import normalize)
>
I will review more classes next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 11)
Posted by Mark on December 21, 2020 at 07:41 | Last modified: February 1, 2021 11:34In Part 10, I summarized my Datacamp courses 28-30. Today I will continue with the next four.
As a reminder, I introduced you to my recent work learning Python here.
My course #31 was Customer Analytics and A/B Testing in Python. This course covers:
- What is A/B testing?
- Identifying and understanding KPIs
- Exploratory analysis of KPIs
- Calculating KPIs—a practical example
- Working with time series data in pandas
- Creating time series graphs with matplotlib
- Understanding and visualizing trends in customer data
- Events and releases
- Introduction to A/B testing
- Initial A/B test design
- Preparing to run an A/B test
- Calculating sample size
- Analyzing the A/B test results
- Understanding statistical significance (get_pvalue, get_ci)
- Interpreting your test results
>
My course #32 was Machine Learning with Tree-Based Models in Python. This course covers:
- Decision-tree for classification (from sklearn.tree import DecisionTreeClassifier)
- Classification-tree learning
- Decision-tree for regression
- Generalization error (bias-variance tradeoff)
- Diagnosing bias and variance problems
- Ensemble learning
- Bagging (from sklearn.ensemble import BaggingClassifier)
- Out of bag evaluation
- Random forests
- AdaBoost (from sklearn.ensemble import AdaBoostClassifier)
- Gradient boosting (from sklearn.ensemble import GradientBoostingRegressor)
- Stochastic gradient boosting
- Tuning a CART’s hyperparameters
- Tuning an RF’s hyperparameters
>
My course #33 was Introduction to PySpark. This is a data engineering course—a field in which I found myself not very enthusiastic. This course covers:
- What is Spark, anyway?
- Using Spark in Python
- Using dataframes
- Joining
- Maching learning pipelines
- Data types
- Strings and factors
>
My course #34 was Cleaning Data with PySpark. This course covers:
- Intro to data cleaning with Apache Spark
- Immutability and lazy processing
- Understanding Parquet
- Dataframe column operations
- Conditional dataframe column operations
- User defined functions
- Partitioning and lazy processing
- Caching
- Improve import performance
- Cluster sizing tips
- Performance improvements
- Introduction to data pipelines
- Data handling techniques
- Data validation
- Final analysis and delivery
>
I will review more classes next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 10)
Posted by Mark on December 18, 2020 at 07:25 | Last modified: January 29, 2021 14:36In Part 9, I summarized my Datacamp courses 25-27. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #28 was Supervised Learning with scikit-learn. This course covers:
- Supervised learning
- Exploratory data analysis [pd.plotting.scatter_matrix()]
- The classification challenge (creating arrays, from sklearn.neighbors import KNeighborsClassifier)
- Measuring model performance (from sklearn.model_selection import train_test_split, datasets)
- Introduction to regression (from sklearn.linear_model import LinearRegression)
- The basics of linear regression
- Cross-validation (from sklearn.model_selection import cross_val_score)
- Correlation
- Simple regression (from scipy.stats import linregress) and its limits
- Regularized regression (from sklearn.linear_model import Ridge, Lasso)
- How good is your model (from sklearn.metrics import classification_report, confusion_matrix)?
- Logistic regression and the ROC curve (from sklearn.metrics import roc_curve)
- Area under the ROC curve
- Hyperparameter tuning (from sklearn.model_selection import GridSearchCV)
- Hold-out set for final evaluation
- Preprocessing data [pd.get_dummies(df)]
- Handling missing data (from sklearn.preprocessing import Imputer, from sklearn.pipeline import Pipeline)
- Centering and scaling (from sklearn.preprocessing import scale, StandardScaler)
>
My course #29 was Introduction to Natural Language Processing in Python. This course covers:
- Introduction to regular expressions
- Introduction to tokenization (from nltk.tokenize import word_tokenize, sent_tokenize)
- Advanced tokenization with regex
- Charting word length with nltk
- Word counts with bag-of-words (from collections import Counter)
- Simple text preprocessing (from nltk.corpus import stopwords, from nltk.stem import WordNetLemmatizer)
- Introduction to gensim (from gensim.corpora.dictionary import Dictionary)
- Tf-idf with gensim (from gensim.models.tfidfmodel import TfidfModel)
- Named entity recognition
- Introduction to SpaCy
- Multilingual NER with polyglot (from polyglot.text import Text)
- Classifying fake news using supervised learning with NLP
- Building word count vectors (from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer)
- Training and testing a classification model with scikit-learn (from sklearn.naive_bayes import MultinomialNB)
- Simple NLP, complex problems
>
My course #30 was Building Chatbots in Python. This course covers:
- Introduction to conversational software (respond function, sleep method from time module)
- Creating a personality
- Text processing with regular expressions
- Understanding intents and entities (re.compile)
- Word vectors
- Intents and classification (from sklearn.svm import SVC)
- Entity extraction
- Robust NLU with Rasa (from rasa_nlu.converters import load_data)
- Virtual assistants and accessing data
- Exploring a DB with natural language
- Incremental slot filling and negation
- Stateful bots
- Asking questions and queuing answers
- Frontiers of dialog technology
>
I will review more classes next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 9)
Posted by Mark on December 15, 2020 at 07:23 | Last modified: January 28, 2021 10:21In Part 8, I summarized my Datacamp courses 22-24. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #25 was Exploratory Data Analysis in Python (Part 2). This course covers:
- Dataframes and series
- Clean and validate (inplace arg)
- Filter and visualize
- Probability mass functions
- Cumulative distribution functions (probability < x)
- Comparing and modeling distributions
- Exploring (scatter plot: transparency, market size, jittering, zoom) and visualizing relationships (violin, box plot)
- Correlation
- Simple regression (from scipy.stats import linregress) and its limits
- Multiple regression
- Visualizing regression results
- Logistic regression
>
My course #26 was Regular Expressions in Python. Once into regex, this material gets very complex yet very powerful:
- Introduction to string manipulation
- String operations (selecting portions of a particular word)
- Finding and replacing
- Positional formatting (method to format percentages)
- Formatted string literal (escape sequences)
- Template method (from string import Template)
- Introduction to regular expressions
- Repetitions
- Regex metacharacters
- Greedy vs. non-greedy matching
- Alternation and non-capturing groups
- Backreferences
- Lookaround
>
My course #27 was Introduction to Deep Learning in Python. This course covers:
- Introduction to deep learning
- Forward propagation
- Activation functions
- Deeper networks
- The need for optimization
- Gradient descent
- Backpropagation [in practice]
- Creating a Keras model
- Compiling and fitting a model
- Classification models
- Using models
- Understanding model optimization
- Model validation
- Thinking about model capacity
- Stepping up to images
>
I will review more classes next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 8)
Posted by Mark on December 10, 2020 at 07:34 | Last modified: January 26, 2021 11:22In Part 7, I summarized my Datacamp courses 19-21. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #22 was Statistical Thinking in Python (Part 2). This course covers:
- Optimal parameters [statistical inference using scipy.stats, statsmodels, or hacker stats with numpy; plt.margins() ]
- Linear regression by least squares [slope, intercept = np.polyfit() ]
- The importance of exploratory data analysis: Anscombe’s quartet (generating and plotting line of best fit)
- Generating bootstrap replicates [ecdf() written in my course (prequel) #14]
- Bootstrap confidence intervals
- Pairs bootstrap
- Formulating and simulating a hypothesis (permutation sample)
- Test statistics and p-values (permutation replicate)
- Bootstrap hypothesis tests
- A/B testing
- Test of correlation
>
My course #23 was Introduction to Financial Concepts in Python. This course covers:
- Fundamental financial concepts (calculating return on investment and compound interest)
- Present and future value [np.pv(), np.fv() ]
- Net present value and cash flows [np.npv(rate= , values=np.array([]) ) ]
- Common profitability analysis methods [np.npv(), np.irr(np.array([]) ) ]
- Weighted average cost of capital
- Comparing two projects of different life spans (EAA)
- Mortgage basics [np.pmt(rate, nper, pv) ]
- Amortization, principal, and interest (simulating periodic mortgage payments)
- Home ownership, equity, and forecasting (cumulative operations in numpy)
- Budgeting project proposal [constant cumulative growth with np.repeat(), calculating monthly expenses]
- Net worth and valuation in your personal financial life
- The power of time and compound interest
>
My course #24 was Introduction to Portfolio Risk Management in Python. This course covers:
- Financial returns
- Mean, variance, and normal distributions (scaling volatility)
- Skewness and kurtosis (from scipy.stats import skew, kurtosis, Shapiro-Wilk test)
- Portfolio composition (calculating market-cap weights)
- Correlation and covariance (calculating portfolio volatility)
- Markowitz portfolios (MSR and GMV)
- The capital asset pricing model (calculating Beta)
- Alpha and multi-factor models (Fama-French 3-factor model)
- Expanding the 3-factor model (Fama-French 5-factor model)
- Estimating tail risk (historical drawdown, historical/conditional VaR)
- VaR extensions
- Random walks (Monte Carlo simulations)
>
Review of Python Courses (Part 7)
Posted by Mark on December 7, 2020 at 07:19 | Last modified: January 25, 2021 11:26In Part 6, I summarized my Datacamp courses 16-18. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #19 was Manipulating DataFrames with pandas. This course covers:
- Indexing DataFrames (using square brackets, using .loc, using .iloc, selecting certain columns with [[ ]] )
- Slicing DataFrames (R boundary included with .loc but not .iloc, slicing with one/two brackets gets Series/df)
- Filtering DataFrames
- Transforming DataFrames [vectorized computations in numpy without loops, .map() for index, .apply() for Series]
- Indexed objects and labeled data (name attribute for index and columns attributes)
- Hierarchical indexing (sorting MultiIndex)
- Pivoting DataFrames [.pivot(index= , columns= , values= ) ]
- Stacking and unstacking DataFrames (pivoting doesn’t work well on MultiIndex so unstack to move index to column)
- Melting DataFrames [reverses .pivot() ]
- Pivot tables
- Categoricals and groupby
- Groupby and aggregation/transformation
- Iterating over and filtering groupby object
- Understanding the column labels
- .idxmax() and .idxmin() (row/column label where max/min value located)
- .T attribute (transposes numpy array)
- Reshaping DataFrames for visualization
- Making a histogram (bins, range, normalizing)
>
My twentieth course was Manipulating Time Series Data in Python. This course has lots of good information for backtesting:
- How to use dates and times with pandas ( [sequences of] timestamp and period objects)
- Indexing and resampling time series [selecting missing ‘price’ values, .asfreq() ]
- Lags, changes, and returns for stock price series [.shift(), n-period % chg, .diff(), .pct_change(), stock price chg in df]
- Compare time series growth rates ( .iloc as abs ref, normalizing series, concat prices and .dropna, perf vs. benchmark)
- Changing the time series frequency: resampling
- Upsampling and interpolation with .resample()
- Downsampling and aggregation (plotting resample data with ax)
- Rolling window functions with pandas (plotting price and moving average, plotting multiple rolling metrics)
- Expanding window functions with pandas (calculating running return, running rate of return)
- Relationships between time series: correlation
- Select index components and import data
- Build a market-cap weighted index
- Evaluate index performance
- Index correlation and exporting to Excel
>
My course #21 was Working with Dates and Times in Python. This course covers:
- Dates in Python
- Math with dates (time delta)
- Turning dates into strings
- Adding time to the mix
- Printing and parsing datetimes (no time printed from datetime object)
- Working with durations
- UTC offsets
- Time zone database (from dateutil import tz)
- Starting Daylight Saving Time
- Ending Daylight Saving Time [ .datetime_ambiguous() and .enfold() for ambiguous times]
- Reading date and time data in Pandas (loading datetimes with parse_dates [or manually with .to_datetime() ])
- Summarizing datetime data in Pandas (alternative to for loop)
- Additional datetime methods in Pandas
- Index correlation and exporting to Excel
>
Review of Python Courses (Part 6)
Posted by Mark on December 4, 2020 at 06:53 | Last modified: January 21, 2021 13:14In Part 5, I summarized my Datacamp courses 13-15. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #16 was Introduction to Data Science in Python. This course covers:
- Creating variables
- What is a function?
- What is pandas?
- Selecting columns
- Select rows with logic
- Creating line plots
- Adding labels and legends
- Adding some style (line color, width, style, markers, template)
- Making a scatter plot (marker transparency)
- Making a bar chart (horizontal, error bars, stacked)
- Making a histogram (bins, range, normalizing)
>
My course #17 was Joining Data with Pandas. This course covers:
- Inner join (changing df values with .loc accessor)
- One to many relationships
- Merging multiple DataFrames
- Left join (count number of rows in a column with missing data)
- Right and outer joins
- Merging a table to itself (i.e. self join)
- Merging on indexes
- Filtering joins (semi-joins, anti-joins)
- Concatenate DataFrames together vertically [.append()]
- verify_integrity=True identifies accidental duplicates while validate arg helps to identify relationship type
- Using merge_ordered() (for ordered/time-series data and to fill in missing values)
- Using .merge_asof() (matches on nearest-value rather than equal-value columns)
- Selecting data with .query()
- Reshaping data with .melt()
>
Introduction to Linear Modeling in Python was my eighteenth course. This covers:
- Introductory concepts about models (interpolation, extrapolation)
- Visualizing linear relationships [object-oriented (OOP) approach to matplotlib]
- Quantifying linear relationships (covariance, correlation, normalization)
- What makes a model linear (Taylor series, overfitting, defining function to plot graph)
- Interpreting slope and intercept
- Model optimization (RSS: sum of squared residuals)
- Least-squares optimization (by numpy, Scipy, Statsmodels)
- Modeling real data
- The limits of prediction
- Goodness of fit (deviations, residuals, and R-squared in code)
- Standard error (RMSE measures spread of residuals whereas SE measures uncertainty in model params)
- Inferential statistics concepts
- Model estimation and likelihood
- Model uncertainty and sample distributions (bootstrap in code)
- Model errors and randomness
>
Networking Call (Part 3)
Posted by Mark on November 27, 2020 at 06:33 | Last modified: July 2, 2021 11:29Today I conclude summary of my recent networking call with GP: a self-proclaimed delta-neutral trader.
I asked GP why he is not trading full-time. He said he has an excellent job.
Wondering if this was a deflection I asked, “would you rather be trading full-time?”
“Yes, but I am building up my account for now. I’ve had an excellent last year and I continue on with a very delta- and vega-neutral butterfly position.”
I was not intending to discuss trading strategy, but since he volunteered I felt compelled to comment. I have seen greek-neutral strategies presented over the years and I am repeatedly unconvinced because they get sufficiently neutral to have little profit potential. Something with little gamma is going to have little theta. Something delta neutral will have to rely on theta or vega but with theta comes gamma, which will soon turn into delta—and it all falls apart.
In other words, “no risk, no reward” conundrum. He uh-huh’d me all the way to the bank on this point that he couldn’t be making much bank doing it. How ironic.
I explained that while I fall shy of MSFE in terms of academic skill, I have 12 years of extensive trading experience that few in the MSFE program are likely to match. I think I could be dangerous with an MSFE. Even without it, though, I do analysis all day long with strategy development, investigative writing, dissection of trade presentations and webinars, etc. I wouldn’t mind getting some gig as a consultant or junior financial analyst. The trading-related expertise I have accrued to date must be of some use for some clients somewhere.
Again, GP uh-huh’d a lot but really had nothing to add. He mumbled something about just having to call around to different firms, which seems like an obvious step I have begun to take.
I talked about spending thousands of hours doing manual backtesting over my first several years as a full-time trader and how I exhausted myself doing that.
I then discussed how I have interest in building an automated option backtester because those I have seen on the market lack the functionality I seek (more uh-huh’s as I went on). OptionStack looked promising but I think $200/month is far overpriced for an individual, retail trader. I said I am a green, newbie Python programmer thinking he might take the bait and suggest something about building software together. That was not forthcoming.
In fact, he said absolutely nothing about any sort of collaboration and asked nothing that might possibly help better even his situation. It was a very one-sided, closed conversation: closed in terms of opportunity.
When traders come together—even on a phone call like this—some effort should be made to see if we can help each other. No holy grail exists and what works one year may certainly not work the next. Even if we are performing well at the moment, our heads should always be on a swivel and we should constantly be on the prowl for something better and/or uncorrelated.
GP seemed outmatched when it wasn’t even a competition.
GP seemed quiet when I was looking for an exchange.
GP seemed humiliated, maybe, when I wasn’t posing any threat.
It’s almost hard to believe the person I saw represented on the website is the GP I spoke to over the phone. Maybe the website is stale and hardly used. The consulting services, now, seem quite hollow and bogus to me.
Onwards and upwards!
Categories: Networking | Comments (0) | PermalinkNetworking Call (Part 2)
Posted by Mark on November 23, 2020 at 07:33 | Last modified: July 2, 2021 10:55Today I continue presentation of my networking call with GP: a self-proclaimed delta-neutral trader with a business website advertising consulting services.
After he figured out who I was, we offered introductory greetings and I quickly ventured into what would be the meat of the conversation: “so I browsed your website and I see that you’re a full-time trader and trader consultant…”
“Actually I’m a software engineer. Trading is more like my side hustle.”
Say what?
His story begins with professional trading experience working for a firm. He then left to start a fund of his own. He took out a bank loan, which would constitute 50% of his trading capital, and used personal savings as the other 50%. He said the pressure of having to profit on a consistent basis was too great and the fund eventually failed. He didn’t say specifically how much he was trading, how much he lost, over what period of time, or any other details—just that things didn’t work out. He then got an “offer I couldn’t refuse” and went back to work as a software engineer.
I said that I have been a full-time option trader for the last 12 years, which he called the “perfect gig.” I said it was much better than being put through the pharmacy wringer every day working 8-12-hour shifts on my feet continuously for 50-60 hours/week. I faced relentless phone calls, being constantly pulled in multiple directions at once, no breaks for bathroom or food, and limited time to exercise.
I told him while I love what I’m doing now, I still seek something more.
He mentioned starting a fund, but I am not currently thinking about going in that direction because I know little about sales or how to proceed with marketing the fund.
“Just start with friends and family,” he said. “You can raise money from them and then solicit outside investors as you grow. It’s not really worthwhile to get $5M-$10M. You really want at least $50M to start, and then you can hire a sales person and someone to help with accounting.”
He didn’t realize raising money amounts to sales. “You still have to create a pitch book, present it, and sell it,” I said.
“Oh right, righttttt…” he said, and he uh-huh’d me the rest of the way. Perhaps this is something he went through before settling on the bank loan. I don’t know. I had lots of unanswered questions and was finding little new information from him to be of any use. Some of what he did say didn’t even make complete sense for someone with his alleged experience.
I will conclude next time.
Categories: Networking | Comments (0) | Permalink