Already the case with consulting companies, have seen it myself
I do know about linear regression even had quite some of it at university.
But I still wouldn’t be able to just implement it on some data without good couple days to weeks of figuring things out and which tools to use so I don’t implement it from scratch.
let v0 = 0
let v1 = 0.40978399*(0.616*u + 0.291*v)
let v2 = if 0 > v1 then 0 else v1
let v3 = 0
let v4 = 0.377928*(0.261*u + 0.468*v)
let v5 = if 0 > v4 then 0 else v4... // inputs: u, v
// --- hidden layer 1 (3 neurons) ---
let v0 = 0.616*u + 0.291*v - 0.135
let v1 = if 0 > v0 then 0 else v0
let v2 = -0.482*u + 0.735*v + 0.044
let v3 = if 0 > v2 then 0 else v2
let v4 = 0.261*u - 0.553*v + 0.310
let v5 = if 0 > v4 then 0 else v4
// --- hidden layer 2 (2 neurons) ---
let v6 = 0.410*v1 - 0.378*v3 + 0.528*v5 + 0.091
let v7 = if 0 > v6 then 0 else v6
let v8 = -0.194*v1 + 0.617*v3 - 0.291*v5 - 0.058
let v9 = if 0 > v8 then 0 else v8
// --- output layer (binary classification) ---
let v10 = 0.739*v7 - 0.415*v9 + 0.022
// sigmoid squashing v10 into the range (0, 1)
let out = 1 / (1 + exp(-v10))is there something 'less good' about:
let v1 = if v0 < 0 then 0 else v0
Am I the only one who stutter-parses "0 > value" vs my counterexample?Is Yoda condition somehow better?
Shouldn't we write: Let v1 = max 0 v0
From around when the term was first coined: "artificial intelligence research is concerned with constructing machines (usually programs for general-purpose computers) which exhibit behavior such that, if it were observed in human activity, we would deign to label the behavior 'intelligent.'" [1]
At some point someone will realise that backpropagation and adjoint solves are the same thing.
[1] https://archive.ics.uci.edu/ml/datasets/HIGGS
In my experiments, linear regression with extended (addition of squared values) attributes is very much competitive in accuracy terms with reported MLP accuracy.
https://opendata-qa.cern.ch/record/93940
if you can beat it with linear regression we'd be happy to know.
The paper [1] referenced in your link follows the lagacy of the paper on the HIGGS dataset, and does not operate with quantities like accuracy and/or perplexity. HIGGS dataset paper provided area under ROC, from which one had to approximate accuracy. I used accuracy from the ADMM paper [2] to compare my results with. As I checked later, area under ROC in [1] mostly agrees with [2] SGD training results on HIGGS.
[1] https://arxiv.org/pdf/2505.19689
[2] https://proceedings.mlr.press/v48/taylor16.pdf
I think that perplexity measure is appropriate there in [1] because we need to discern between three outcomes. This calls for softmax and for perplexity as a standard measure.So, my questions are: 1) what perplexity should I target when dealing with "mc-flavtag-ttbar-small" dataset? And 2) what is the split of train/validate/test ratio there?
It’s impressive, honestly.