Transparency
AI Signal Methodology
We believe AI-powered investment tools should be transparent. This page explains exactly how our machine learning models generate stock signals, what data they use, and how we measure accuracy.
Last updated: March 5, 2026
1. Data Sources
Our AI models analyze four primary data sources, all of which are publicly available:
| Source | Data | Volume | Update Frequency |
|---|---|---|---|
| SEC EDGAR | 10-K, 10-Q, 8-K, Form 4 filings | 3,000+ filings/day | Real-time |
| Earnings Transcripts | Full call transcripts via FMP API | 4,000+ per quarter | Within hours of call |
| Financial News | Major outlets, press releases | 10,000+ articles/day | Real-time |
| Financial Data | Prices, fundamentals, estimates | All S&P 500 stocks | Real-time / daily |
2. NLP Pipeline
Unstructured text (earnings call transcripts, SEC filings, news articles) is processed through our NLP pipeline to extract structured features:
- a.Sentiment scoring — Each document receives a sentiment score from -1 (strongly negative) to +1 (strongly positive) using fine-tuned transformer models trained on financial text.
- b.Entity extraction — We identify companies, financial metrics, dates, and monetary values mentioned in the text.
- c.Tone analysis — For earnings calls, we measure management confidence level, hedging language frequency, and forward-looking statement ratio.
- d.Change detection — For SEC filings, we compare against prior filings to flag material changes in risk factors, accounting policies, and financial metrics.
3. Machine Learning Models
We use a two-stage model architecture:
Stage 1: Individual Signal Models
Five specialized gradient-boosted decision tree models (XGBoost), each trained on features specific to their signal type. Each model outputs a score from 1-10 with a confidence interval.
Stage 2: Ensemble Meta-Model
A meta-model that combines individual signal scores, weighted by their historical accuracy for each stock's sector and market cap range. Weights update monthly based on rolling 12-month performance.
4. Backtesting & Accuracy
All signal models are backtested on historical data before deployment. Our backtesting methodology:
- Walk-forward validation with no lookahead bias
- 3-year rolling window (2023-2025 historical data)
- Out-of-sample testing on 20% held-out data
- Directional accuracy measured (did the signal correctly predict the stock's direction over the next 30 days?)
- Results published monthly with full transparency
| Signal Type | Directional Accuracy | Avg. Confidence |
|---|---|---|
| Earnings NLP | 78% | 72% |
| Filing Analysis | 82% | 68% |
| News Sentiment | 71% | 65% |
| Insider Activity | 74% | 70% |
| Composite Signal | 76% | 74% |
5. Limitations & Honest Disclaimers
AI signals are not investment advice. They are informational tools that should be one input among many in your research process.
Past performance does not predict future results. Backtested accuracy may not reflect real-world performance due to market regime changes.
Models have blind spots. Our AI cannot predict black swan events, regulatory changes, or management fraud that isn't reflected in public filings.
NLP has inherent limitations. Sarcasm, ambiguity, and context-dependent language can lead to misinterpretation in automated text analysis.
6. Academic References
Our approach builds on peer-reviewed research in financial NLP and machine learning:
- Loughran & McDonald (2011). "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks." Journal of Finance.
- Chen et al. (2024). "Artificial Intelligence in Financial Market Prediction." Frontiers in AI.
- Aggarwal et al. (2023). "GEO: Generative Engine Optimization." KDD 2024.