Option Fanatic

Review of Python Courses (Part 13)

Posted by Mark on December 29, 2020 at 07:48 | Last modified: February 2, 2021 11:44

In Part 12, I summarized my Datacamp courses 35-37. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #38 was Web Scraping in Python. This gets complicated with some objected-oriented stuff that still throws me for a loop (no pun intended). I don’t think I will be using this anytime soon so I skimmed it in this review:

Web scraping with Python
HyperText Markup Language (HTML)
HTML tags and attributes
Crash course X
Off the beaten XPath
Introduction to the scrapy Selector (from scrapy import Selector)
“Inspecting the HTML”
CSS locators
Attribute and text selection
Getting ready to crawl
Scraping for reals
A classy spider (from scrapy.crawler import CrawlerProcess)
A request for service
Move your bloomin’ parse
Capstone

My course #39 was Working with the Class System in Python. Like #38, this gets thick. The course covers:

Intro to Object Oriented Programming (OOP) in Python
Introduction to NumPy internals
Introduction to objects and classes
Deep dive on classes
__Init__ializing a class
Methods in classes
Working with a dataset to create dataframes
Renaming columns and the five-figure summary
OOP best practices
Inheritance: is-a versus has-a
Inheritance with DataShells
Composition
Wrapping up OOP

My course #40 was Sentiment Analysis in Python. This course covers:

What is sentiment analysis?
Sentiment analysis types and approaches (from textblob import TextBlob)
Let’s build a word cloud (from wordcloud import WordCloud)!
Bag-of-words (from sklearn.feature_extraction.text import CountVectorizer)
Getting granular with n-grams
Build new features from text (from nltk import word_tokenize)
Can you guess the language (from langdetect import detect_langs)?
Stop words (from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS)
Capturing a token pattern [.isalpha(), .isdigit(), .isalnum()]
Stemming and lemmatization (from nltk.stem import PorterStemmer, WordNetLemmatizer)
TfIdf: more ways to transform text (from sklearn.feature_extraction.text import TfidfVectorizer)
Let’s predict the sentiment (from sklearn.linear_model import LogisticRegression)!
Did we really predict the sentiment well (from sklearn.metrics import accuracy_score, confusion_matrix)?
Logistic regression: revisited
Bringing it all together

I will review more classes next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 12)

Posted by Mark on December 24, 2020 at 07:39 | Last modified: February 1, 2021 15:43

In Part 11, I summarized my Datacamp courses 31-34. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #35 was Intermediate Data Visualization with Seaborn. This course covers:

Introduction to Seaborn [histogram vs. sns.distplot()]
Using the distribution plot
Regression plots in Seaborn [sns.regplot(), sns.lmplot()]
Using Seaborn styles [sns.set_style(), sns.despine()]
Colors in Seaborn
Customizing with matplotlib (using Axes)
Categorical plot types
Regression plots [sns.regplot(), sns.residplot()]
Matrix plots [sns.heatmap(pd.crosstab())]
Using FacetGrid, factorplot, lmplot
Using PairGrid and pairplot
Using JointGrid and jointplot
Selecting Seaborn plots

My course #36 was Introduction to Data Visualization with Seaborn (taking #35 before this was an oversight on my part, but everything ended up okay). This course covers:

Introduction to Seaborn
Using pandas with Seaborn
Adding a third variable with hue
Introduction to relational plots and subplots
Customizing scatter plots
Introduction to line plots
Count plots and bar plots [sns.catplot()]
Creating a box plot
Point plots
Changing plot style and color
Adding titles and labels (FacetGrid vs. AxesSubplot)

My course #37 was Unsupervised Learning in Python. This course covers:

Unsupervised learning (from sklearn.cluster import KMeans)
Evaluating a clustering
Transforming features for better clustering (from sklearn.preprocessing import StandardScaler)
Visualizing hierarchies (from scipy.cluster.hierarchy import linkage, dendrogram)
Cluster labels in hierarchical clustering
t-SNE for 2-dimensional maps (from sklearn.manifold import TSNE)
Visualizing the PCA transformation (from sklearn.decomposition import PCA)
Intrinsic dimension
Dimension reduction with PCA (from sklearn.decomposition import TruncatedSVD)
Non-negative matrix factorization (NMF) (from sklearn.decomposition import NMF)
NMF learns interpretable parts
Building recommender systems using NMF (From sklearn.preprocessing import normalize)

I will review more classes next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 11)

Posted by Mark on December 21, 2020 at 07:41 | Last modified: February 1, 2021 11:34

In Part 10, I summarized my Datacamp courses 28-30. Today I will continue with the next four.

As a reminder, I introduced you to my recent work learning Python here.

My course #31 was Customer Analytics and A/B Testing in Python. This course covers:

What is A/B testing?
Identifying and understanding KPIs
Exploratory analysis of KPIs
Calculating KPIs—a practical example
Working with time series data in pandas
Creating time series graphs with matplotlib
Understanding and visualizing trends in customer data
Events and releases
Introduction to A/B testing
Initial A/B test design
Preparing to run an A/B test
Calculating sample size
Analyzing the A/B test results
Understanding statistical significance (get_pvalue, get_ci)
Interpreting your test results

My course #32 was Machine Learning with Tree-Based Models in Python. This course covers:

Decision-tree for classification (from sklearn.tree import DecisionTreeClassifier)
Classification-tree learning
Decision-tree for regression
Generalization error (bias-variance tradeoff)
Diagnosing bias and variance problems
Ensemble learning
Bagging (from sklearn.ensemble import BaggingClassifier)
Out of bag evaluation
Random forests
AdaBoost (from sklearn.ensemble import AdaBoostClassifier)
Gradient boosting (from sklearn.ensemble import GradientBoostingRegressor)
Stochastic gradient boosting
Tuning a CART’s hyperparameters
Tuning an RF’s hyperparameters

My course #33 was Introduction to PySpark. This is a data engineering course—a field in which I found myself not very enthusiastic. This course covers:

What is Spark, anyway?
Using Spark in Python
Using dataframes
Joining
Maching learning pipelines
Data types
Strings and factors

My course #34 was Cleaning Data with PySpark. This course covers:

Intro to data cleaning with Apache Spark
Immutability and lazy processing
Understanding Parquet
Dataframe column operations
Conditional dataframe column operations
User defined functions
Partitioning and lazy processing
Caching
Improve import performance
Cluster sizing tips
Performance improvements
Introduction to data pipelines
Data handling techniques
Data validation
Final analysis and delivery

I will review more classes next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 10)

Posted by Mark on December 18, 2020 at 07:25 | Last modified: January 29, 2021 14:36

In Part 9, I summarized my Datacamp courses 25-27. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #28 was Supervised Learning with scikit-learn. This course covers:

Supervised learning
Exploratory data analysis [pd.plotting.scatter_matrix()]
The classification challenge (creating arrays, from sklearn.neighbors import KNeighborsClassifier)
Measuring model performance (from sklearn.model_selection import train_test_split, datasets)
Introduction to regression (from sklearn.linear_model import LinearRegression)
The basics of linear regression
Cross-validation (from sklearn.model_selection import cross_val_score)
Correlation
Simple regression (from scipy.stats import linregress) and its limits
Regularized regression (from sklearn.linear_model import Ridge, Lasso)
How good is your model (from sklearn.metrics import classification_report, confusion_matrix)?
Logistic regression and the ROC curve (from sklearn.metrics import roc_curve)
Area under the ROC curve
Hyperparameter tuning (from sklearn.model_selection import GridSearchCV)
Hold-out set for final evaluation
Preprocessing data [pd.get_dummies(df)]
Handling missing data (from sklearn.preprocessing import Imputer, from sklearn.pipeline import Pipeline)
Centering and scaling (from sklearn.preprocessing import scale, StandardScaler)

My course #29 was Introduction to Natural Language Processing in Python. This course covers:

Introduction to regular expressions
Introduction to tokenization (from nltk.tokenize import word_tokenize, sent_tokenize)
Advanced tokenization with regex
Charting word length with nltk
Word counts with bag-of-words (from collections import Counter)
Simple text preprocessing (from nltk.corpus import stopwords, from nltk.stem import WordNetLemmatizer)
Introduction to gensim (from gensim.corpora.dictionary import Dictionary)
Tf-idf with gensim (from gensim.models.tfidfmodel import TfidfModel)
Named entity recognition
Introduction to SpaCy
Multilingual NER with polyglot (from polyglot.text import Text)
Classifying fake news using supervised learning with NLP
Building word count vectors (from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer)
Training and testing a classification model with scikit-learn (from sklearn.naive_bayes import MultinomialNB)
Simple NLP, complex problems

My course #30 was Building Chatbots in Python. This course covers:

Introduction to conversational software (respond function, sleep method from time module)
Creating a personality
Text processing with regular expressions
Understanding intents and entities (re.compile)
Word vectors
Intents and classification (from sklearn.svm import SVC)
Entity extraction
Robust NLU with Rasa (from rasa_nlu.converters import load_data)
Virtual assistants and accessing data
Exploring a DB with natural language
Incremental slot filling and negation
Stateful bots
Asking questions and queuing answers
Frontiers of dialog technology

I will review more classes next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 9)

Posted by Mark on December 15, 2020 at 07:23 | Last modified: January 28, 2021 10:21

In Part 8, I summarized my Datacamp courses 22-24. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #25 was Exploratory Data Analysis in Python (Part 2). This course covers:

Dataframes and series
Clean and validate (inplace arg)
Filter and visualize
Probability mass functions
Cumulative distribution functions (probability < x)
Comparing and modeling distributions
Exploring (scatter plot: transparency, market size, jittering, zoom) and visualizing relationships (violin, box plot)
Correlation
Simple regression (from scipy.stats import linregress) and its limits
Multiple regression
Visualizing regression results
Logistic regression

My course #26 was Regular Expressions in Python. Once into regex, this material gets very complex yet very powerful:

Introduction to string manipulation
String operations (selecting portions of a particular word)
Finding and replacing
Positional formatting (method to format percentages)
Formatted string literal (escape sequences)
Template method (from string import Template)
Introduction to regular expressions
Repetitions
Regex metacharacters
Greedy vs. non-greedy matching
Alternation and non-capturing groups
Backreferences
Lookaround

My course #27 was Introduction to Deep Learning in Python. This course covers:

Introduction to deep learning
Forward propagation
Activation functions
Deeper networks
The need for optimization
Gradient descent
Backpropagation [in practice]
Creating a Keras model
Compiling and fitting a model
Classification models
Using models
Understanding model optimization
Model validation
Thinking about model capacity
Stepping up to images

I will review more classes next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 8)

Posted by Mark on December 10, 2020 at 07:34 | Last modified: January 26, 2021 11:22

In Part 7, I summarized my Datacamp courses 19-21. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #22 was Statistical Thinking in Python (Part 2). This course covers:

Optimal parameters [statistical inference using scipy.stats, statsmodels, or hacker stats with numpy; plt.margins() ]
Linear regression by least squares [slope, intercept = np.polyfit() ]
The importance of exploratory data analysis: Anscombe’s quartet (generating and plotting line of best fit)
Generating bootstrap replicates [ecdf() written in my course (prequel) #14]
Bootstrap confidence intervals
Pairs bootstrap
Formulating and simulating a hypothesis (permutation sample)
Test statistics and p-values (permutation replicate)
Bootstrap hypothesis tests
A/B testing
Test of correlation

My course #23 was Introduction to Financial Concepts in Python. This course covers:

Fundamental financial concepts (calculating return on investment and compound interest)
Present and future value [np.pv(), np.fv() ]
Net present value and cash flows [np.npv(rate= , values=np.array([]) ) ]
Common profitability analysis methods [np.npv(), np.irr(np.array([]) ) ]
Weighted average cost of capital
Comparing two projects of different life spans (EAA)
Mortgage basics [np.pmt(rate, nper, pv) ]
Amortization, principal, and interest (simulating periodic mortgage payments)
Home ownership, equity, and forecasting (cumulative operations in numpy)
Budgeting project proposal [constant cumulative growth with np.repeat(), calculating monthly expenses]
Net worth and valuation in your personal financial life
The power of time and compound interest

My course #24 was Introduction to Portfolio Risk Management in Python. This course covers:

Financial returns
Mean, variance, and normal distributions (scaling volatility)
Skewness and kurtosis (from scipy.stats import skew, kurtosis, Shapiro-Wilk test)
Portfolio composition (calculating market-cap weights)
Correlation and covariance (calculating portfolio volatility)
Markowitz portfolios (MSR and GMV)
The capital asset pricing model (calculating Beta)
Alpha and multi-factor models (Fama-French 3-factor model)
Expanding the 3-factor model (Fama-French 5-factor model)
Estimating tail risk (historical drawdown, historical/conditional VaR)
VaR extensions
Random walks (Monte Carlo simulations)

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 7)

Posted by Mark on December 7, 2020 at 07:19 | Last modified: January 25, 2021 11:26

In Part 6, I summarized my Datacamp courses 16-18. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #19 was Manipulating DataFrames with pandas. This course covers:

Indexing DataFrames (using square brackets, using .loc, using .iloc, selecting certain columns with [[ ]] )
Slicing DataFrames (R boundary included with .loc but not .iloc, slicing with one/two brackets gets Series/df)
Filtering DataFrames
Transforming DataFrames [vectorized computations in numpy without loops, .map() for index, .apply() for Series]
Indexed objects and labeled data (name attribute for index and columns attributes)
Hierarchical indexing (sorting MultiIndex)
Pivoting DataFrames [.pivot(index= , columns= , values= ) ]
Stacking and unstacking DataFrames (pivoting doesn’t work well on MultiIndex so unstack to move index to column)
Melting DataFrames [reverses .pivot() ]
Pivot tables
Categoricals and groupby
Groupby and aggregation/transformation
Iterating over and filtering groupby object
Understanding the column labels
.idxmax() and .idxmin() (row/column label where max/min value located)
.T attribute (transposes numpy array)
Reshaping DataFrames for visualization
Making a histogram (bins, range, normalizing)

My twentieth course was Manipulating Time Series Data in Python. This course has lots of good information for backtesting:

How to use dates and times with pandas ( [sequences of] timestamp and period objects)
Indexing and resampling time series [selecting missing ‘price’ values, .asfreq() ]
Lags, changes, and returns for stock price series [.shift(), n-period % chg, .diff(), .pct_change(), stock price chg in df]
Compare time series growth rates ( .iloc as abs ref, normalizing series, concat prices and .dropna, perf vs. benchmark)
Changing the time series frequency: resampling
Upsampling and interpolation with .resample()
Downsampling and aggregation (plotting resample data with ax)
Rolling window functions with pandas (plotting price and moving average, plotting multiple rolling metrics)
Expanding window functions with pandas (calculating running return, running rate of return)
Relationships between time series: correlation
Select index components and import data
Build a market-cap weighted index
Evaluate index performance
Index correlation and exporting to Excel

My course #21 was Working with Dates and Times in Python. This course covers:

Dates in Python
Math with dates (time delta)
Turning dates into strings
Adding time to the mix
Printing and parsing datetimes (no time printed from datetime object)
Working with durations
UTC offsets
Time zone database (from dateutil import tz)
Starting Daylight Saving Time
Ending Daylight Saving Time [ .datetime_ambiguous() and .enfold() for ambiguous times]
Reading date and time data in Pandas (loading datetimes with parse_dates [or manually with .to_datetime() ])
Summarizing datetime data in Pandas (alternative to for loop)
Additional datetime methods in Pandas
Index correlation and exporting to Excel

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 6)

Posted by Mark on December 4, 2020 at 06:53 | Last modified: January 21, 2021 13:14

In Part 5, I summarized my Datacamp courses 13-15. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #16 was Introduction to Data Science in Python. This course covers:

Creating variables
What is a function?
What is pandas?
Selecting columns
Select rows with logic
Creating line plots
Adding labels and legends
Adding some style (line color, width, style, markers, template)
Making a scatter plot (marker transparency)
Making a bar chart (horizontal, error bars, stacked)
Making a histogram (bins, range, normalizing)

My course #17 was Joining Data with Pandas. This course covers:

Inner join (changing df values with .loc accessor)
One to many relationships
Merging multiple DataFrames
Left join (count number of rows in a column with missing data)
Right and outer joins
Merging a table to itself (i.e. self join)
Merging on indexes
Filtering joins (semi-joins, anti-joins)
Concatenate DataFrames together vertically [.append()]
verify_integrity=True identifies accidental duplicates while validate arg helps to identify relationship type
Using merge_ordered() (for ordered/time-series data and to fill in missing values)
Using .merge_asof() (matches on nearest-value rather than equal-value columns)
Selecting data with .query()
Reshaping data with .melt()

Introduction to Linear Modeling in Python was my eighteenth course. This covers:

Introductory concepts about models (interpolation, extrapolation)
Visualizing linear relationships [object-oriented (OOP) approach to matplotlib]
Quantifying linear relationships (covariance, correlation, normalization)
What makes a model linear (Taylor series, overfitting, defining function to plot graph)
Interpreting slope and intercept
Model optimization (RSS: sum of squared residuals)
Least-squares optimization (by numpy, Scipy, Statsmodels)
Modeling real data
The limits of prediction
Goodness of fit (deviations, residuals, and R-squared in code)
Standard error (RMSE measures spread of residuals whereas SE measures uncertainty in model params)
Inferential statistics concepts
Model estimation and likelihood
Model uncertainty and sample distributions (bootstrap in code)
Model errors and randomness

Categories: Python | Comments (0) | Permalink

Networking Call (Part 3)

Posted by Mark on November 27, 2020 at 06:33 | Last modified: July 2, 2021 11:29

Today I conclude summary of my recent networking call with GP: a self-proclaimed delta-neutral trader.

I asked GP why he is not trading full-time. He said he has an excellent job.

Wondering if this was a deflection I asked, “would you rather be trading full-time?”

“Yes, but I am building up my account for now. I’ve had an excellent last year and I continue on with a very delta- and vega-neutral butterfly position.”

I was not intending to discuss trading strategy, but since he volunteered I felt compelled to comment. I have seen greek-neutral strategies presented over the years and I am repeatedly unconvinced because they get sufficiently neutral to have little profit potential. Something with little gamma is going to have little theta. Something delta neutral will have to rely on theta or vega but with theta comes gamma, which will soon turn into delta—and it all falls apart.

In other words, “no risk, no reward” conundrum. He uh-huh’d me all the way to the bank on this point that he couldn’t be making much bank doing it. How ironic.

I explained that while I fall shy of MSFE in terms of academic skill, I have 12 years of extensive trading experience that few in the MSFE program are likely to match. I think I could be dangerous with an MSFE. Even without it, though, I do analysis all day long with strategy development, investigative writing, dissection of trade presentations and webinars, etc. I wouldn’t mind getting some gig as a consultant or junior financial analyst. The trading-related expertise I have accrued to date must be of some use for some clients somewhere.

Again, GP uh-huh’d a lot but really had nothing to add. He mumbled something about just having to call around to different firms, which seems like an obvious step I have begun to take.

I talked about spending thousands of hours doing manual backtesting over my first several years as a full-time trader and how I exhausted myself doing that.

I then discussed how I have interest in building an automated option backtester because those I have seen on the market lack the functionality I seek (more uh-huh’s as I went on). OptionStack looked promising but I think $200/month is far overpriced for an individual, retail trader. I said I am a green, newbie Python programmer thinking he might take the bait and suggest something about building software together. That was not forthcoming.

In fact, he said absolutely nothing about any sort of collaboration and asked nothing that might possibly help better even his situation. It was a very one-sided, closed conversation: closed in terms of opportunity.

When traders come together—even on a phone call like this—some effort should be made to see if we can help each other. No holy grail exists and what works one year may certainly not work the next. Even if we are performing well at the moment, our heads should always be on a swivel and we should constantly be on the prowl for something better and/or uncorrelated.

GP seemed outmatched when it wasn’t even a competition.

GP seemed quiet when I was looking for an exchange.

GP seemed humiliated, maybe, when I wasn’t posing any threat.

It’s almost hard to believe the person I saw represented on the website is the GP I spoke to over the phone. Maybe the website is stale and hardly used. The consulting services, now, seem quite hollow and bogus to me.

Onwards and upwards!

Categories: Networking | Comments (0) | Permalink

Networking Call (Part 2)

Posted by Mark on November 23, 2020 at 07:33 | Last modified: July 2, 2021 10:55

Today I continue presentation of my networking call with GP: a self-proclaimed delta-neutral trader with a business website advertising consulting services.

After he figured out who I was, we offered introductory greetings and I quickly ventured into what would be the meat of the conversation: “so I browsed your website and I see that you’re a full-time trader and trader consultant…”

“Actually I’m a software engineer. Trading is more like my side hustle.”

Say what?

His story begins with professional trading experience working for a firm. He then left to start a fund of his own. He took out a bank loan, which would constitute 50% of his trading capital, and used personal savings as the other 50%. He said the pressure of having to profit on a consistent basis was too great and the fund eventually failed. He didn’t say specifically how much he was trading, how much he lost, over what period of time, or any other details—just that things didn’t work out. He then got an “offer I couldn’t refuse” and went back to work as a software engineer.

I said that I have been a full-time option trader for the last 12 years, which he called the “perfect gig.” I said it was much better than being put through the pharmacy wringer every day working 8-12-hour shifts on my feet continuously for 50-60 hours/week. I faced relentless phone calls, being constantly pulled in multiple directions at once, no breaks for bathroom or food, and limited time to exercise.

I told him while I love what I’m doing now, I still seek something more.

He mentioned starting a fund, but I am not currently thinking about going in that direction because I know little about sales or how to proceed with marketing the fund.

“Just start with friends and family,” he said. “You can raise money from them and then solicit outside investors as you grow. It’s not really worthwhile to get $5M-$10M. You really want at least $50M to start, and then you can hire a sales person and someone to help with accounting.”

He didn’t realize raising money amounts to sales. “You still have to create a pitch book, present it, and sell it,” I said.

“Oh right, righttttt…” he said, and he uh-huh’d me the rest of the way. Perhaps this is something he went through before settling on the bank loan. I don’t know. I had lots of unanswered questions and was finding little new information from him to be of any use. Some of what he did say didn’t even make complete sense for someone with his alleged experience.

I will conclude next time.

Categories: Networking | Comments (0) | Permalink

Older Entries Newer Entries

Review of Python Courses (Part 13)

Review of Python Courses (Part 12)

Review of Python Courses (Part 11)

Review of Python Courses (Part 10)

Review of Python Courses (Part 9)

Review of Python Courses (Part 8)

Review of Python Courses (Part 7)

Review of Python Courses (Part 6)

Networking Call (Part 3)

Networking Call (Part 2)

Pages

Recent Posts

Categories